I did some tests using an STM32F469 @ 180MHz for comparing HS-USB and 4bit SD card running at 24MHz:
SD read test...
reading 1 bytes...1 bytes done in 0.000s ( 2.224kByte/s)
reading 2 bytes...2 bytes done in 0.000s ( 5.710kByte/s)
reading 4 bytes...4 bytes done in 0.000s (11.322kByte/s)
reading 8 bytes...8 bytes done in 0.000s (22.843kByte/s)
reading 16 bytes...16 bytes done in 0.000s (45.687kByte/s)
reading 32 bytes...32 bytes done in 0.000s (91.374kByte/s)
reading 64 bytes...64 bytes done in 0.000s (182.215kByte/s)
reading 128 bytes...128 bytes done in 0.000s (363.372kByte/s)
reading 256 bytes...256 bytes done in 0.000s (716.332kByte/s)
reading 512 bytes...512 bytes done in 0.000s (1457.726kByte/s)
reading 1024 bytes...1024 bytes done in 0.000s (2506.266kByte/s)
reading 2048 bytes...2048 bytes done in 0.000s (4065.042kByte/s)
reading 4096 bytes...4096 bytes done in 0.001s (5882.355kByte/s)
reading 8192 bytes...8192 bytes done in 0.001s (7561.440kByte/s)
reading 16384 bytes...16384 bytes done in 0.002s (8834.902kByte/s)
reading 32768 bytes...32768 bytes done in 0.004s (7540.060kByte/s)
reading 65536 bytes...65536 bytes done in 0.008s (8323.583kByte/s)
reading 131072 bytes...131072 bytes done in 0.015s (8772.535kByte/s)
reading 262144 bytes...262144 bytes done in 0.028s (9023.303kByte/s)
reading 524288 bytes...524288 bytes done in 0.057s (9036.202kByte/s)
reading 1048576 bytes...1048576 bytes done in 0.112s (9160.118kByte/s)
reading 2097152 bytes...2097152 bytes done in 0.222s (9223.775kByte/s)
reading 4194304 bytes...4194304 bytes done in 0.442s (9257.087kByte/s)
reading 8388608 bytes...8388608 bytes done in 0.884s (9268.472kByte/s)
reading 16777216 bytes...16777216 bytes done in 1.766s (9275.356kByte/s)
USB read test...
reading 1 bytes...1 bytes done in 0.001s ( 1.247kByte/s)
reading 2 bytes...2 bytes done in 0.001s ( 2.488kByte/s)
reading 4 bytes...4 bytes done in 0.001s ( 4.901kByte/s)
reading 8 bytes...8 bytes done in 0.001s ( 9.990kByte/s)
reading 16 bytes...16 bytes done in 0.001s (20.006kByte/s)
reading 32 bytes...32 bytes done in 0.001s (40.064kByte/s)
reading 64 bytes...64 bytes done in 0.001s (79.617kByte/s)
reading 128 bytes...128 bytes done in 0.001s (159.235kByte/s)
reading 256 bytes...256 bytes done in 0.001s (315.258kByte/s)
reading 512 bytes...512 bytes done in 0.001s (636.132kByte/s)
reading 1024 bytes...1024 bytes done in 0.001s (1269.036kByte/s)
reading 2048 bytes...2048 bytes done in 0.001s (2395.210kByte/s)
reading 4096 bytes...4096 bytes done in 0.001s (3960.398kByte/s)
reading 8192 bytes...8192 bytes done in 0.001s (5869.408kByte/s)
reading 16384 bytes...16384 bytes done in 0.002s (8815.431kByte/s)
reading 32768 bytes...32768 bytes done in 0.003s (12061.822kByte/s)
reading 65536 bytes...65536 bytes done in 0.006s (10555.835kByte/s)
reading 131072 bytes...131072 bytes done in 0.011s (11769.038kByte/s)
reading 262144 bytes...262144 bytes done in 0.021s (12391.100kByte/s)
reading 524288 bytes...524288 bytes done in 0.040s (12781.792kByte/s)
reading 1048576 bytes...1048576 bytes done in 0.079s (12970.898kByte/s)
reading 2097152 bytes...2097152 bytes done in 0.157s (13033.883kByte/s)
reading 4194304 bytes...4194304 bytes done in 0.312s (13107.332kByte/s)
reading 8388608 bytes...8388608 bytes done in 0.624s (13121.314kByte/s)
reading 16777216 bytes...16777216 bytes done in 1.248s (13131.705kByte/s)
USB write test...
writing 1 bytes...1 bytes done in 0.001s ( 1.856kByte/s)
writing 2 bytes...2 bytes done in 0.001s ( 3.727kByte/s)
writing 4 bytes...4 bytes done in 0.001s ( 7.440kByte/s)
writing 8 bytes...8 bytes done in 0.001s (14.880kByte/s)
writing 16 bytes...16 bytes done in 0.001s (30.339kByte/s)
writing 32 bytes...32 bytes done in 0.001s (59.523kByte/s)
writing 64 bytes...64 bytes done in 0.001s (119.047kByte/s)
writing 128 bytes...128 bytes done in 0.001s (237.642kByte/s)
writing 256 bytes...256 bytes done in 0.001s (472.590kByte/s)
writing 512 bytes...512 bytes done in 0.001s (452.898kByte/s)
writing 1024 bytes...1024 bytes done in 0.001s (903.342kByte/s)
writing 2048 bytes...2048 bytes done in 0.001s (1792.115kByte/s)
writing 4096 bytes...4096 bytes done in 0.001s (3246.754kByte/s)
writing 8192 bytes...8192 bytes done in 0.001s (5449.594kByte/s)
writing 16384 bytes...16384 bytes done in 0.004s (4539.009kByte/s)
writing 32768 bytes...32768 bytes done in 0.004s (7191.014kByte/s)
writing 65536 bytes...65536 bytes done in 0.015s (4258.152kByte/s)
writing 131072 bytes...131072 bytes done in 0.016s (8192.004kByte/s)
writing 262144 bytes...262144 bytes done in 0.030s (8404.745kByte/s)
writing 524288 bytes...524288 bytes done in 0.060s (8464.493kByte/s)
writing 1048576 bytes...1048576 bytes done in 0.120s (8547.369kByte/s)
writing 2097152 bytes...2097152 bytes done in 0.241s (8502.092kByte/s)
writing 4194304 bytes...4194304 bytes done in 0.509s (8052.676kByte/s)
writing 8388608 bytes...8388608 bytes done in 0.995s (8235.363kByte/s)
writing 16777216 bytes...16777216 bytes done in 1.991s (8229.559kByte/s)
SD write test...
writing 1 bytes...1 bytes done in 0.001s ( 1.547kByte/s)
writing 2 bytes...2 bytes done in 0.001s ( 3.066kByte/s)
writing 4 bytes...4 bytes done in 0.001s ( 6.132kByte/s)
writing 8 bytes...8 bytes done in 0.001s (12.131kByte/s)
writing 16 bytes...16 bytes done in 0.001s (24.801kByte/s)
writing 32 bytes...32 bytes done in 0.001s (48.981kByte/s)
writing 64 bytes...64 bytes done in 0.001s (97.809kByte/s)
writing 128 bytes...128 bytes done in 0.001s (193.798kByte/s)
writing 256 bytes...256 bytes done in 0.001s (393.700kByte/s)
writing 512 bytes...512 bytes done in 0.190s ( 2.633kByte/s)
writing 1024 bytes...1024 bytes done in 0.130s ( 7.706kByte/s)
writing 2048 bytes...2048 bytes done in 0.006s (357.653kByte/s)
writing 4096 bytes...4096 bytes done in 0.006s (699.423kByte/s)
writing 8192 bytes...8192 bytes done in 0.007s (1175.261kByte/s)
writing 16384 bytes...16384 bytes done in 0.014s (1158.497kByte/s)
writing 32768 bytes...32768 bytes done in 0.007s (4348.419kByte/s)
writing 65536 bytes...65536 bytes done in 0.013s (4863.963kByte/s)
writing 131072 bytes...131072 bytes done in 0.069s (1859.141kByte/s)
writing 262144 bytes...262144 bytes done in 0.047s (5484.737kByte/s)
writing 524288 bytes...524288 bytes done in 0.284s (1800.256kByte/s)
writing 1048576 bytes...1048576 bytes done in 0.187s (5467.634kByte/s)
writing 2097152 bytes...2097152 bytes done in 0.372s (5505.334kByte/s)
writing 4194304 bytes...4194304 bytes done in 0.999s (4101.966kByte/s)
writing 8388608 bytes...8388608 bytes done in 1.779s (4605.802kByte/s)
writing 16777216 bytes...16777216 bytes done in 3.090s (5302.262kByte/s)
It was only a single measurement, but it was pretty constant when repeating the test. The irregularities during writing are probably wear leveling activities.
The speed varies a lot between different SD cards and USB drives, but the most important parameter is the block size: You need to write many sectors at once, otherwise it will be really slow. Especially for modern flash devices with large write pages, you need to read/write at least 4-32kBytes at once to nearly reach the maximum speed possible.
I haven't used the recent versions of Microchip's MSD code, but the older versions didn't make use of the read multiple sector commands efficiently.
Another catch when using USB drives and writing data to them: You need to send the SYNCHRONIZE CACHE command before unplugging, otherwise the last written data may get lost, because they could be still in the write buffer of the thumb drive. Both the code provided by Microchip and ST hadn't implemented this correctly.
If you need to write a continous stream of data to flash devices, don't use the newest, largest devices, but go for older, smaller ones. I had many issues with 8 and 16GB SD cards. They often paused writing for a couple of seconds (!) to do wear leveling. 1-4GB seem to be the best choice for embedded systems with limited amount of memory for write buffer.
The separately selectable data width for DMA is a nice feature in STM32 for transferring unaligned data packets to a 32bit target. But otherwise I prefer PIC24/PIC32, because its DMA is much easier to understand but very powerful because of the many trigger sources.