Just write your own in-application programming code (sometimes wrongly called "bootloader"). Often, it's something you want anyway at a later point. This way, you can reprogram any way you want, through any interface you are using, without any special devices, cables or even boot pins. Then you can add convenient "remote update" features, even for end user.
Flash speed is not necessarily the bottleneck. For example, when I program through my own USART programmer ("bootloader", except it has nothing to do with booting or loading, it's entered during normal operation, rewrites the flash except it's own memory area, and ends up in a reset) at 115200 baudrate, it's approximately 5 times faster than the factory STM32 USART bootloader running at the same 115200.