I second SiliconWizard's suggestion to utilize the FIFOs whenever this is possible. I have written some interrupt-driven state machines which are very efficient but also very robust.
For example, when the timing is deterministic and being late is a fatal error indicative of something being broken and requiring shutdown or at least complex recovery attempts, you can turn the classic idea of polling a completion flag with a timeout upside down; to using a timer instead to trig the next state, and then in the timer interrupt own_assert() for the availability (and correct number!) of data and absence of any error flags. Let's say with SPI, one state would maybe need to write 5 bytes; it would do two FIFO write accesses, one 32-bit and one 8-bit, then set a timer to trig the next interrupt. The next ISR would then look at the FIFO fill level (and error flags) from the status register, and error out if different than 5 bytes (on STM32 SPI for example, this requires checking a strange combination of flags, but is doable). Then it would do two FIFO read operations, again 32-bit and 8-bit. DMA would win only when either the data doesn't fit in the FIFOs, or if the length starts going over some 16 bytes.
Depending on a certain FIFO size can be a portability problem though. For example, on many STM32 devices, the different instances of peripherals even on the same device (SPI1 vs. SPI2 and so on) have different sized FIFOs, and migration between devices, even more so.