I had a similar problem with the STM32F0 using the DMA with a DAC that needed exactly what you've just described.
The easiest answer was to use the DMA to do a single word write (it needed 24bits per write) then trigger a tuned interrupt routine to toggle a GPIO pin that was used for _CS. The key was to just write this naked ISR in assembly, then it only took a few CPU cycles to write the set and reset GPIO registers with a literal, re-enable the DMA to do another 3Byte transfer and return.