The idea that variables on a stack are less efficient to access if they're on a stack than if they're global is just ... wrong.
Consider "i += 1; // increment variable."
on AVR: ;; globals:
lds r16, i ; load from global: 4 bytes, 2 cycles
subi r16, -1 ; increment: 2 bytes, 1 cycle
sts r16, i ; store to global: 4 bytes, 2 cycles
;; locals
ldd r16, Y+i ; load from stack: 2 bytes, 2 cycles
subi r16, -1
std r16, Y+i ;store to stack: 2 bytes, 2 cycles.
67% bigger to access i as a global. Z80, 68xx, 6502 all have some form of indexed addressing and should end up using similar code.
It's 100% bigger on an ARM, because there's no LDS equivalent:;; Global
ldr r3, [pc, #8] ; get address of i (6 bytes!)
ldrb r2, [r3, #0] ; get i
adds r2, #6 ; add
strb r2, [r3, #0] ; store
;; local
ldr r3, [sp, #4]
adds r3, #6
str r3, [sp, #4]
PIC16f might be able to do the global in 2 instructions, but the compiler I tried implemented the "local variable" version as a global anyway. PIC18 and "enhanced PIC16f15xx" implement some form of indexed addressing, so they get better at handling a stack, even though there is no "hardware stack" (we remember that stacks are just index registers with some autoincrement/decrement modes, and "stack frames" don't actually need those...) MSP430 might be able to add immediate values direct to memory, but the instructions start to get long and slow...
And that's without worrying that you need a separate "i" for each subroutine, if you're using globals.
Argument-wise, if your arguments aren't passed in registers (which is common with modern processor) or on the stack, you eventually end up with the (old) BASIC-like syntax where you end up moving your arguments into the correct globals: sub1_param1=thisval
sub1_param2=anotherval
sub1();
(assuming you re-use your subroutines more than once, which ... is part of the idea, you know.) This is obviously no better than moving the values to the stack (and might be worse, as per the above instructions.)
So, in the absence of an actual example where using globals is more efficient, I claim your entire premise is incorrect!