SPI peripherals with FIFOs are indeed the best thing since sliced bread!
Not only it increases performance, it makes writing code simpler, and code more readable. Of course the downside is, if you write code assuming you have those FIFOs, then you need to pretty much rewrite it if you port it to something with no FIFOs.
Example:
Read accelerometer 3 x 16-bit values:
In an ISR: write 7 bytes to SPI data register: command and 6 dummy bytes. This is just 3 memory accesses: word, halfword and byte.
Set a timer, or SPI interrupt, whichever is fine, to give you another interrupt once you have full 7 bytes in RX FIFO.
In the second ISR: Read 7 bytes from the DR (again 3 memory accesses), do whatever you want with the data.
With DMA, you would still use an interrupt to do something with the data, and you would still read it from the memory. In this case, just accessing the SPI through the FIFO is just as efffient.
Without the FIFO, you have to interrupt every damn byte, or use DMA, and the DMA will still chuck quite some memory BW because every byte creates a DMA transfer. With FIFO, the DMA and interrupt based both are viable options, and having more options never hurts.