I should be clear that I don't need the full SS 5Gbps, but neither is HS adequate for my high speed data acquisition application. There's almost no processing done by the MCU other than byte packing. However, the benefit of using an MCU is that I can also integrate other features such as Ethernet (although lack of 1Gbps on a microcontroller is another problem I'd like to see addressed!).
Yes, I see. Just realize that even for just pumping data in and out, you'll be quickly pushing the MCU to the edge unless it's a very fast one. Even if the troughput you need is like 100MBytes/s or something. (Admittedly, anything above ~30MBytes/s would require USB SS instead of HS.)
As I understand it, the MCU would mostly act as a bridge. Something that's often naturally done with an FPGA. I can understand that using an MCU may be easier and more cost-effective though. (But 1Gsps ethernet with an FPGA is definitely possible - not saying it's a picnic, but I think others on the forum have done it.)
I do think that it's reasonable that considering the FT60x already has its own FIFOs, that it should be able to accept a generic external master without having to add an additional FIFO solution. It's certainly not too hard for an MCU to be able to generate 1Gbps data, or a 31MHz 32 bit word aggregate throughput.
I don't know why FTDI decided not to implement a true dual port FIFO on their interface ICs. As I said, I suspect it would have made them more complex and more expensive. All of the internal logic is synchronous to the USB clock. I know true dual port (with separate clock domains) are not rocket science, but they add a bit of complexity certainly, and there's probably a good reason they didn't implement that. If there's someone from FTDI that's reading the thread, that would definitely be interesting to have their point of view.
The attempt they made with the FT1248 mode (on the FT232H for instance) was pretty clunky as I remember, and not as good as their master synchronous mode.
I personally have used FT2232H and FT232H chips, either in asynchronous mode with MCUs (which again would get me about 8MBytes/s max, which is documented!), and when I needed more throughput, I used FPGAs in synchronous mode. I'd use the clock provided by the FTDI chip to clock most of my logic. No big deal. But yes, that wouldn't work with most MCUs.
As to the point I made earlier regarding handshaking, it's definitely not something trivial IMO (I may miss something obvious though). You can't just be pumping data in and out of the FTDI FIFO blindly, you have to take the Empty and Full flags into account, and of course they are completely dependent on the USB transfers, thus basically impredictable. You can't just issue fixed-size DMA transfers, that wouldn't work. Some MCUs may have a parallel interface that can handle this "handshaking" transparently, so that would work, but I just have never used any that could.
It's certainly not too hard for an MCU to be able to generate 1Gbps data, or a 31MHz 32 bit word aggregate throughput.
If 124 MBytes/s of throughput, which I infer from the above, is what you're after, true. (Still requires a fast MCU though, but that's doable without requiring anything too fancy.) And true you couldn't get that with USB HS obviously.
The point above concerning handshaking still puzzles me though. Could you give me an example of an MCU (it's not a trick or challenge question, but a real one) that has a parallel interface peripheral that can transparently handle handshaking (compatible with "empty" and "full" states of an external FIFO) in DMA mode? (Because if you have to handle those by polling, it'll make for pretty inefficient transfers IMO, as the DMA would not directly be usable).