In the old Z80 etc days, statics produced much faster code than stack based variables but I don't think that's true anymore.
z80 is just SO ANNOYING to program. It shouldn't be worse than 6502, but it pretty much is, because it's just so inconsistent.
On z80 some things you can only do with 8 bit registers. Some things you can only do with register pairs. It's super fast to push any register pair onto the stack (11 cycles) or pop it (10 cycles) (+4 for IX/XY in both cases, as usually, because of the extra byte of opcode). Spilling a register pair to static memory takes significantly longer, at 16 cycles for each of load&store for HL and 20 cycles each for other pairs. In 8 bit registers only A can be loaded/stored to a static location, and that takes 13 cycles -- plus 4 more to get it to/from where you really want it. For sequential accesses you can load/store A using (HL), (BC), or (DE) in 7 cycles, then increment/decrement the pointer in 6 cycles -- so no advantage over a static location. The same with load/store of B,C,D,E,H,L using (HL) only. The indirect load/store and inc/dec is only 2 bytes of code vs 3 for load/store to a static location (A only, remember), so there is that. But in general you should push/pop pairs whenever possible, and load/store pairs to static locations when not.
Access to something in the middle of the stack is just awful!! First of all, it's definitely 1 byte at a time. But there is no (SP+offset) addressing. There is (IX+$nn) and (IY+$nn) load/store addressing for all of A,B,C,D,E,H,L, but they're 19 cycles per byte! And you need to somehow get SP into IX or IY first. You can move HL,IX,IY *into* SP, but not the reverse. You can add SP to IX or IY in 15 cycles (11 for HL) but that means you need to zero them or get some other constant offset into them first. You can do LD IX,$nnnn in 14 cycles (or IY, or 10 for HL,BC,DE). You can do "XOR A;LD IXL,A;LD IXH,A" in 4+8+8=20, so that's a non-starter.
So to load BC with bytes from offsets 10 and 11 from SP you have a choice of "LD HL,$000A;ADD HL,SP;LD B,(HL);INC HL;LD C,(HL)" for 7 bytes and 41 cycles or "LD IX,$0000;ADD IX,SP;LD B,(IX,$0A);LD C,(IX,$0B)" for 12 bytes and 67 cycles.
On 6502 you can do the equivalent thing, transferring two bytes from offsets 10 and 11 in the (256 byte) hardware stack to two Zero Page locations (let's say 6&7) using "TSX;LDA $010A,X;STA $06;LDA $010B,X;STA $07" which is 11 bytes and 16 clock cycles -- and I didn't have to think at all about what is the best way to do it ... it's basically the obvious, only way.
If you're not using the very limited hardware stack, but making your own using a pair of Zero Page locations (let's say 8&9) then you'd have "LDY #$0A;LDA ($08),Y;STA $06;INY;LDA ($08),Y;STA $07" for (again) 11 bytes but this time 20 clock cycles (21 or 22 in the somewhat unlucky event one or both LDAs cross a page boundary with the indexing)
The z80 code has the advantage it can do up to 64k offset into the stack while the 6502 code only does up to a 255 byte offset. That would seldom be a factor.
The 6502 code has the advantage that you effectively have 256 8-bit registers, or 128 16-bit/pointer registers vs the z80's 11 8-bit registers or 5 16-bit/pointer registers.
Another example: add two 8 bit quantities and put the result in a 3rd:
6502: "CLC;LDA $05;ADC $06;STA $07" 7 bytes and 11 cycles
z80 #1: "LD A,B;ADD A,C;LD D,A" 3 bytes and 12 cycles. Very good!
z80 #2: "LD A,($0005);LD B,A;LD A,($0006);ADD A,B;LD ($0007),A" 11 bytes and 47 clock cycles. Ugh!
The z80 can have really fast and compact code if you manage to keep everything in its very limited register set. But if you run out and start having to load and store things to RAM then it gets pretty awful pretty quickly.