It is huge difference in speed between USB 2 and USB 3. The USB 2 solutions are abundant. USB 3 options are very scarce. All I could find is Cypress FX3 and FT601. FX3 is expensive, complex, and poorly documented. So, I made a prototype board with FT601. It worked well. It took me less than a day to begin communicating with FPGA through their library.
The FT601 FPGA FIFO interface is rather straightforward. The timing requirements in their documentation are very strange. They require huge hold time (4.8 ns) which is absolutely impossible to meet. They have an XDL example where they "forgot" to put minus - they have "set_output_delay -min 4.8" instead of "set_output_delay -min -4.8". This certainly passes timing analysis every time. But if you put the "minus" in, it's utterly impossible. Even though I had some help from the trace delay (400 ps), I could only meet 2.3 ns hold. This worked well at 100 MHz, buf failed at 66 MHz. From which I concluded that 4.8 ns is either bogus, or reflects only the worst case (66 MHz at 1.8V).
The DLL works, but is cumbersome, loads slow (300 ms), and is somewhat buggy. FT601 doesn't really do FIFO, rather block transfers with FIFO interface to FPGA. If you do simple block transfers with blocks of the fixed size everything works. With variable block size it becomes less reliable. Sometimes the read function returns with data size less than requested (contrary to the docs). If application quits without reading all the requested data, next time the DLL loads, it can read leftover data from the previous start. Sometimes it simply hangs and starts returning errors on all reads.
I couldn't get full speed from the DLL (381 MB/s), even when reading one long block. No matter what you do there are delays between blocks, so you realistically get 320-340 MB/s. While analyzing delays on the FPGA side I noticed that gaps between blocks are typically small, but sometimes there are longer delays as well. The nature of these delays is unknown. May be FT601 limitations. May be DLL problems.
It should be possible to figure out the USB protocol directly and get rid of the DLL, fixing the bugs, and possibly improving performance. I was going to explore this opportunity, but by that time the cheap Xilinx FPGAs turned into unobtainium and the project was scrapped.