Do you need to crunch data? If you need a lot of multiplications and accumulations, consider Analog Devices ADSP-BF70x family.
Built-in DSP library in ROM, built-in large L2 RAM, built-in L1 and scratchpad, and built-in HS-USB.
The IDE & compiler will set you back $1000, and they offer free 30-day trial license. You can hack the license generation server to let it to spit a 3 year license.
They don't provide free USB libraries, unless you pay $8000 on uC-OSiii. They do provide a HAL level driver for USB without documentation for free with their IDE.
I was able to crack the uC-OSiii installer and extract the code, reverse engineer how it talks to the HAL driver, and document the ADI USB HAL driver.
based on the API document, I wrote a simple USB stack for BF70x with the help of forum member Alex, a few books, and USB spec.
If you decide to use the chip, I can post the code on GitHub.