The problem I have now is the large computational power needed to multiply two streams...
The multiplication is not the problem. AFAIK, the SMLAL instruction takes only one cycle. And gcc does recognize the pattern
int64_t c;
int32_t a, b;
c += (int64_t)a * b;
and emits a SMLAL instruction. The problem is more the necessary glue code to shuffle the data around.
Note that the clock frequency is only 5x higher than the clock frequency of your AVR, but you want a
500x higher data rate. And now you expect that the more efficient instruction set of the ARM processor can compensate the remaining gap of factor 100x? No, it cannot.
Nevertheless I think that it may be feasible to run the algorithm of your Milliohm meter (single channel) at 4 MSa/s, if you do it smart. Collect the samples with double buffered DMA, and process all samples in a received DMA buffer at once. At that speed it is no longer possible to process an interrupt for each sample. That's ways too much overhead per sample.
If you look
here, you can see that the innermost loop (label .L2, which is executed for each sample) has 11 instructions, and I think that's 14...17 clock cycles, and the code outside the loop (which uses significantly more cycles) runs only once for 256 samples, so let's assume ~1 extra cycle per sample. At 80 MHz clock and 4 MSa/s, you can spend 20 cycles/sample. So I think that's in the ballpark.
And if you decimate before demodulation (see
here), as Kleinstein suggested, then the overhead per sample becomes even smaller. Here, the loop at label .L2 now processes 8 samples at each iteration.
I have placed an order for another board, a NUCLEO-G431KB, which runs at more than twice the speed (170MHz) and has 2 ADCs and 2 DACs.
I'm unsure if the amount of RAM is sufficient. It depends of course on the exact algorithms you want to run at the end. It may be possible to put constant tables in flash, but I'm not sure if data access to flash also has zero wait states.
Btw, see also:
https://wiki.st.com/stm32mcu/wiki/Getting_started_with_ADChttps://www.st.com/resource/en/application_note/an2834-how-to-optimize-the-adc-accuracy-in-the-stm32-mcus-stmicroelectronics.pdf