I'm not sure what you'd like to do with 32-bit, 768 kHz audio at the moment. Even 192 kHz is considered overkill by many. A good 24-bit, 96 kHz ADC will be better than a loosy 32-bit, 768 kHz one.
The AK5538, with a 103 dB 'S/(N+D)' (which I suppose is the dual of the more common THD+N) is good but by no means exceptional. So not too sure about your attraction to this sampling rate inflation.

Apparently, this is the "VELVET SOUND" range of AKM ADCs. Granted digital filters benefit from higher sampling rates, but again the rated performance is not really better than other high-end ADCs from TI for instance at lower sampling rates.
Anyway, back to your question. In theory, starting with the USB 2.0 audio class, sampling rates higher than 192 kHz should be supported, but I'm not 100% certain that all OSs support this fully.
XMOS claims up to 384 kHz apparently?
http://www.xmos.com/support/software/uac2There is a difference between hardware and software support...
Yes the usual solution is not to use the USB audio class. You can stream audio with USB bulk transfers. Getting low latency with bulk transfers is a bit challenging, but doable. The end-result will still depend on the OS and the software implementation, more so than if you use the USB audio class and get guaranteed timings, so you will have to provide for sufficient buffering.
As for off-the-shelf solutions, CMedia has chips that are widely used. The latest: is
https://www.cmedia.com.tw/products/USB20_HIGH_SPEED/CM6632ADoesn't support anything above 192 kHz though. So XMOS may be your best bet at the moment, up to 384 kHz.