The high bandwidth need is from the X86 to the micro.
What's the typical packet size? (I.e. bytes from host to device, between responses.) Or is it just a continuous stream, with device-to-host responses completely separate from the host-to-device transfers?
Do you happen to know if they guarantee availability of the part for X number of years?
No, but they still support Teensy 2.0 (ATmega32u4), introduced in 2009. For technical questions, see
the PJRC forum.
I am sure there are other high-speed USB microcontrollers out there, but Teensys (and ATmega32u4) are the ones I personally am familiar with, and they are programmable in the Arduino and PlatformIO environments. For one-off (or very small number) of interfacing boards – that's how I use them – they're a good fit for me.
You can develop your own NXP MIMXRT1062-based microcontroller, although the Teensy design is proprietary. Another interesting microcontroller family (that I have looked at, but not used myself) is the PIC32MZ family: they also have a native high-speed USB 2.0 interface (480 Mbit/s max. theoretical bandwidth). There is at least mikroE
Flip&Click PIC32MZ development board with a PIC32MZ, likely others, that are already supported in Arduino and PlatformIO. I've looked at the PIC32MZ family datasheet and application notes, and it does look like a particularly easy MCU to create a board for. Also, Olimex has an
Open Hardware PIC32-HMZ144 dev board with PIC32MZ that has high-speed USB, if you prefer the MPLABX development environment; if you are making your own board, then looking at the board files for that one cannot hurt.
The key here is that these microcontrollers have a native USB interface on the microcontroller itself. All of the USB full-speed (FS, 12 Mbit/s) microcontrollers I've used, including 8-bit ATmega32U4, can reach about a megabyte per second (8 Mbit/s) using USB Serial protocol, and a trivial userspace program (written in any language; I've tested C and Python using the built-in termios module) even on a slow machine can reach that. With USB bulk transfers and 64-byte packets, you can reach even slightly higher bandwidth.
For high-speed USB (HS, 480 Mbit/s) – Teensy 4, PIC32MZ, and so on –, the USB Serial protocol kernel driver side can slow things down a bit, so you won't reach the ~ 48 Mbytes/sec bandwidth, even if the hardware is able to. A simple test shows that on my Linux machine, the kernel tty layer cannot get much above 30 Mbytes/sec, simply because it was not designed for high bandwidth. Using USB bulk transfers (64 or 256 byte packets) completely bypasses that, and has much less overhead; and is much more efficient on both the device and the host computer end.
(The nice thing about USB Serial in Linux is that you need absolutely zero drivers (other than the kernel built-in USB ACM driver), or extra libraries, to write efficient code. All you need is a bit of termios glue code, to set the tty layer properties for the USB Serial character device. Trivial in C and Python3.)
I do expect microcontrollers like PIC32MZ and Kinetis K26 (with high-speed USB interfaces) to also be able to reach around 20-25 Mbit/s (150-200 Mbit/s) using USB Serial, but somewhat more if using USB bulk transfers. (Again, from Teensy to Host I've
measured a reliable, sustained data rate of 25+ Mbytes/s = 200+ Mbits/s, measured by transferring 100,000,000 bytes from a Xorshift* PRNG to the host, and the host verifying each byte matches the byte it generated using the same PRNG.)
Because of how the USB bus interfaces to the host computer processors, the host computer processor will not be a bottleneck, if it is capable of running a standard Linux distribution (debian, armbian, etc.). A very simple test is to use an USB 2.0 -based external hard disk, and measure read bandwidth using e.g.
dd bs=524288 dev=raw-unmounted-device of=/dev/null count=200 (i.e., read 100 megabytes off the raw disk in 512k blocks). If it is less than 40 Mbytes/sec, then you have a problem (or a really slow external hard disk).
The reason I haven't yet measured the data rate from Host to Teensy 4, is because if I do it, I want to do it in an actually relevant manner; as closely matching the real-world requirements as possible. As before, I don't expect anyone to just take my word for it, as I've published the whole benchmarking code before, so that anyone can verify the results for themselves.