Part of the fun of programming on these devices and retro computers is learning and proving what can be done with limited resources. (Look at some of the amazing games they are still making for an Atari 2600, 50 years after release putting the original versions to shame). Too much code / compilers today just assumes unlimited storage, bandwidth and CPU power, and thus we get very bloated software across phones and computers that constantly require hardware updates to keep up.
An ability to get something done and "fit" in the confines of some particular device definitely something many can't do. I wish you the best on your efforts here to make it all work!
Sure it is fun to reach and work around the limits of the hardware, and 100MSPS dual channel oscilloscope is at the limit of an MCU without the help of an FPGA.
The HAL code of CumbeMX is very bloated, eating up a lot of flash, but I am not willing to go the bare metal programming route, just to rewrite everything for a different MCU later. So bloated CubeMX it is. Most of the flash space is eaten up by the fonts

My solution to the problem is using the external SPI flash as firmware storage, and copying everything from there into RAM at boot time. This way I have about 900kB of firmware space, that should be plenty enough.
This is a compromise, and this way the bootloader is only a sort-of bootloader, which has the limitation of not working anymore if a firmware update gets messed up (ex. power loss while updating).
In the unlikely case of a messed up firmware update, only recovery by JTAG is possible. This should not be a huge problem, because installing the firmware for the first time will require a JTAG adapter (at least an ST-Link) anyway.
The bigger MCU is still very tempting, because of the 480Mhz f_max of the core. I would just love to tell everyone, that my DMMs CPU is running at 400MHz

My first computer (C64) was running 400x slower....