I didn't worked with RP series MCU, but I read in their advertisement that it has single cycle 32-bit MAC and two cycle float64 operations. After all it can write at least 32-bits per cycle. At 150 MHz it can produce up to 4800 MBps with simple bit-banging... And even can do something more complicated and useful, like applying filter to the realtime stream on the fly.
But with no high speed communication interface all that speed is almost useless...
What is the reason for two cycle float64 operation or single cycle MAC if there is no way to get the data at higher than 12 Mbps? And no way to transfer processed data out of the chip at speed higher than 12 MBps...
Lots and lots of things.
For starters, the (encouraged to use, by the PI foundation) Micropython is an interpreter, so is not going to set any speed records.
Some things, even in C/C++, can need quite a few lines of code, maybe hundreds, thousands or more, in order to process the algorithms and do all the calculations.
In which case, they probably would not handle or need that much data throughput, going in and out.
I agree there is the odd edge case, where the task is so simple, like moving data between devices, that it could easily overload the slower, old USB interfaces.
But that is just one of tens of thousands (or more), of possible application areas.
Basically, with MCUs, you get what you pay for.
If you want a quad core 900 MHz operations, dual-issue, out-of-order execution, MCU, with 276 pins, full range of possible I/O devices, USB3 built in, etc. 16MB of on-chip SRAM, 32MB of built in flash, all on the same die. Quad 10 MHz 16 bit analogue converters, quad 14 bit DACs, with 8 spare very high speed op-amps, analogue comparators, etc.
Then the MCU, usually costs more than the $0.80, I originally mentioned.
Even tens times as much, $8 (I'm not sure, but definitely a lot more than $0.80 each), probably wouldn't be nearly enough. Unless the quantities involves were massive, perhaps 100,000 or even millions, needed.