You're not saying if you want your device to act as an USB audio device, or if accessing it in bulk mode is OK.
In the former case, I'd also suggest using a XMOS-based solution, or a specialized IC. Implementing USB audio on some MCU is never a picnic. There are some example projects out there, but that's never as pain-free as it looks. IMHO. Also, in this case, FTDI devices can't be used, as they don't support the USB audio class.
Now if bulk mode is OK, you have quite a few options there. FTDI devices, MCUs with an USB core, ... you'll just have to make sure your throughput (and maybe latency, if that matters) requirements can be met.
I don't see the point of using SPI between the FPGA and FTDI device (if you go that route). For a few more data lines, just use a parallel mode, which will give you better throughput and will additionally be easier to implement.