I think CIC could be faster, but it depends whether you implement a clean-up FIR at the end or not. Also, it requires fixed point (the efficiency derives in part from the overflows happening the right way). Definitely less coef storage though!
Not necessarily... Following what Richard Lyons writes in his very good book, I have implemented in my ARM Radio project a polyphase CIC, all in floating point. The following is the comment at the beginning of the CIC block :
//-------------------------------------------------------------------------
// Now we decimate by 16 the input samples, using the CIC polyphase decomposition
// technique, which has the advantage of eliminating the recursive
// component, allowing the use of floating point, rather fast on a Cortex M4F
//
// A dividing by 16, order 4, CIC is used. Then a 4096-entry buffer is filled, and passed
// to the baseband interrupt routine, where it is additionally filtered with a
// sync-compensating FIR, which also adds further stop band rejection and a decimation by 4
//-------------------------------------------------------------------------
I cannot publish the source code (yet) as this project is my answer to the Keil/ARM design contest
(look here :
http://www2.keil.com/mdk5/contest), but after the end date of the contest, it will be published as open source.
The STM32F429ZIT chip samples with two of its ADCs in interleaved mode, then a complex multiplication is done with an LO signal generated by a quadrature complex oscillator (again, thanks to Richard Lyons...), brought to zero IF, downsampled first with the CIC, then with a FIR that implements also the compensation for the droop of the response curve of the CIC (actually very small...).
At this point the downsampled complex signal (at a sampling rate of 27901.786 Hz (plus or minus the tolerance of the 8 MHz quartz of the STM32F4 clock...) is bandpassed with the fast convolution method (overlap-and-discard) with a selectable bandwidth, then applied to the AM, or SSB, or CW demodulator, with a selectable AGC time constant. The output of the demodulator is sent to the on-chip DAC, using DMA with two flip-flop buffers, and the output of the DAC is sent to an external, hardware, reconstruction filter. That's all.
The project is finished, all working, and I am now busy to write the documentation, the less pleasant step, but badly needed from the rules of the contest..
This is a bad photo of the on-board TFT screen, also driven by the STM32f429ZIT processor, with the ARM Radio tuned to the DCF77 time/frequency standard in Mainflingen, Germany :
I am intending to shot also an YouTube short video, but haven't found the time yet...
Alberto