Update... it wasn't the FIFO after all. Instead it was the SPI interface, which ironically is much slower. Set up on falling edges, sample on rising, so it should 'just work' provided the clock isn't too fast - except it was.
It turns out there are significant differences in the delays getting signals on and off the chip, depending on which pins are used for the clock and/or where the output signal is registered.
I gained about 5ns of additional setup time at the CPU by registering MISO in the I/O ring of the Efinix FPGA, and moving the SPI clock to one of the pins that supports the GCLK input type. You have to do this in order for the clock to be available to the registers in the I/O ring; they can't accept a clock from just anywhere.
I know it's early days and I'm still learning, but it's also quite apparent that there are hardware limitations in the Efinix architecture compared to the Altera parts I've been using for many years. The separate I/O ring is a PITA, frankly, and I can't help but feel that such a new device architecture should have fewer limitations, not more, compared to one that's been around since 2009 (Cyclone IV).
Although I/O pins can just be configured as transparent, there's significant propagation delay through the ring which means it may be necessary to modify the core logic and use the registers in the I/O ring instead just to meet timing. Portability definitely takes a hit; my VHDL code is now logically different from how it was before, and no longer describes the behaviour of the physical device in full.
It's not entirely clear whether the clear distinction between the core logic and the I/O ring is truly architectural (ie. the chip itself is physically different from other FPGAs in this respect), or whether it's a software thing. Why should the synthesis tool NOT be able to see that a logic output is registered, and make use of the register physically located in the I/O ring? Why would the option even exist to register it in the core, then add the delay through an I/O buffer configured to transparent mode?
Waiting patiently (ish) for a new version of the Efinity tool chain to get a tick box for "configure I/O clocks & registers automatically to make best use of them given the design of the core logic".