I'd love to see your sound camera in action, do you have a link of your project? I didn't know the arduino boards could push 1msps! is this per channel? Through the arduino dev IDE or are you poking at registers directly?
I don't know how to calculate sps but maybe you're talking about the sampling frequency. In that case I'm sampling at 10khz.
As for the FFT timing, on the esp32, one 512 bin FFT takes 14ms, the capture takes 10ms @ 10khz, 40Khz is the sampling limit using a delayed loop of analogRead.
Someone bypassed the register safety lock and managed 25 MHZ output so fast input sampling must be possible but I must say I didn't understand it yet how it's done and what the impact on using a second core for capturing (what I'm doing) is.
I haven't used DMA but I read that the esp32 has DMA as well, I don't know if this is the same access as your DUE though.
I don't have a video, it's lost. Visualization was done on android tablet, I used BT to transfer processed/ filtered and sorted data. 1 mega sample per seconds via direct registers programming, arduino IDE is sloppy. One adc, 1 msps per all channels.
Stream sample rate only makes sense with real-time data processing, when adc conversion is going in background continuously, using dma or interrupts. FFT should be computed faster than sampling, 14 msec with 512 fft means you theoretically could get 36571 Hz sampling, not counting overhead for data management.
What does that mean, real-FFT? Is there non-real part as well exist? And "combine signal" is a new mathematics term? Sorry for my limited vocabulary, I know + - * /, never heard combine.
Most FFT implementations are complex<->complex, what we are generally interested in in DSP is a real->complex FFT and a complex->real iFFT. You can use a complex FFT to do two real-FFTs, but it's a headache.
It's an old myth, that we could save time not doing math on the empty imaginary part of the input data. I did some research, and it's turns out that cpu clock savings comes with only half frequency resolution, so all this theory to define real->FFT is complete BS. There are many proved optimization technics, that doesn't sacrifice frequency resolution, like using higher Radix or Split-radix, but all depends on specific uCPU and it's instructions set, availability MAC, MULT vs ADD performance etc.