While we're on the subject, can an IO line really toggle at 1 MHz to start the samples?
Absolutely. Even basic STM32F0's can do this. If you jump up into the F4/7/H7 range you can probably expect to toggle an i/o at 40MHz.
What is the limit of the uC SPI clock.
Quite a lot of simple micros can do 15MHz +, a lot of mid range ones can do 40MHz.
A single chip solution like the STM32H7 Series with two built in 3.6MSPS 16-bit adc with up to 20 channels each should be quite adequate as a simpler one chip does all solution.
(i had earlier though it was only one, however now i realise there are two that can both operate at 3.6MSPS, you should be able to read 4 channels at up to 1.8MSPS @16-bit, )
With the ADC's in DMA reception mode, all the processor will have to do is read the raw conversion data out of memory and do the math on it.
Which the H7 should be more than capable of, 400MHZ core with hardware single and double floating point units.
Then depending on the data rate required to a host, either use the inbuilt USB FS device in Virtual com port mode. Or if higher data-rates are required, hook up an external USB HS PHY.
Now so as i am not seeming biased to stm products (they are just what I use day to day!) here are a couple of other options:
PIC24FJ128GC006 - 2x differential 16-bit adc's @1MSPS, and a 10MSPS 12-bit 50 channel ADC. quite a bit slower processor speed (32MHz)
ATSAM4E16C - 2x 16-bit @ 1MSPS
LPC55S6x - new, not much info but seems to have a 16 bit 1MSPS adc, not sure on channels etc.