I managed 42MB/s device->host at one point with the FT2232H, under Linux with a USB3 (xHCI) host. 40MB/s with a USB EHCI host and 38MB/s with an older kernel, so clearly newer drivers and newer hardware help get more throughput out of USB2.0.
The device does have a hardware FIFO, but the interface also looks like a FIFO. It's very similar to the kind of interface that you'd expect from a normal producer/consumer block RAM FIFO module in an FPGA, except for the bit where tx and rx both share the same bus so there's arbitration involved. Whenever I work with the 2232H I usually throw my own FIFOs in on the FPGA side though, unless whatever you're interfacing already allows pauses in data consumption.
The easiest way to look at is is as a chain of FIFOs. The FTDI chip has its own FIFO. You will probably implement a FIFO in your FPGA. The FTDI will accept data over its synchronous interface from your FIFO, as long as its own FIFO has space. The host has its own buffers (as part of the URBs/buffer descriptors maintained by the USB driver). It will poll the FTDI chip (via IN packets) if there are free/empty buffers in the queue. The FTDI will reply to one of those packets once it has a full USB packet's worth of data (512 bytes for bulk) or after a timeout if there is less data available (the FTDI datasheet has some documentation around this, but you can mostly ignore it unless your application is very latency-sensitive; the data will get to the host eventually).
Keep in mind that the FTDI chip is the clock master of the interface, so you can either run your FPGA core clock off of the FTDI's 60MHz interface clock, to avoid having to cross clock domains, or you can use a clock-domain crossing FIFO in the design and run your core clock off of whatever oscillator you want.