STM32 libraries make everything difficult since they don't abstract the implementation details out as they should, but only add an extra layer of burden and uncertainty.
Doing it without the library structs and functions produces much more readable code, quicker development, higher level of understanding and more robust implementations, compared to library code copypasting from Stackoverflow; this has nothing to do with the concept of using libraries in general, but everything to do with the STM32 libraries, which, unfortunately, are totally unusable and completely doomed. This is a trap for young players, who will find a lot of confusing examples on the web, while it's difficult to find simple examples that don't use the libraries.
You could easily as well argue the other way around.
CubeMX icw SW4STM32 works, it is definitely not great, as you say a lot of layers but it works and there is a forum with thousand people giving you support and as last resort a company that has people added to the forum to add support.
Compare that to your own bare metal implementation or that from anyone else, there is no support, it might work but easily as well fail at some time if the youngplayer wants to do something else, eg add a peripheral.
The 3rd baremetal engineer has its own way of thinking and working so created that code with his mind which is perfectly ok for him/her but another person needs to grasp that way of thinking, invest in understanding the code. Often there is no or minmimal documentation, minimal comments in the code, no support , that considered it is equally bad in my eyes.
Now I have started with CubeMX and SW4STM32 with a modern nucleo board with a decent processor (STM32L4xx or STM32F4xx are good for starters, don't use the F1 since it is old and has lots of quircks) and had it up and running in one day with a blinky. Another 1 hour for a UART that did not do what I wanted
So there I encountered my first issue, it only had interrupt functionality to receive a fixed number of bytes. Calling the function with 1 byte half the data was lost. So looking at those huge functions with extra code around it to make it pretty robust I hacked and slashed it back so the IRQ function would not call 3 layer deep other stuff but in case there was no error it stored the byte in my own queue module and be done with it. So total 1 day and I had a uart working, not bad.
A youngster might need a week or more with help from the forum to get things going but way better than starting from scratch.
Then another argument: code size, I know MX consumes code as we drink water but my project above took 12kB , I still have 988kB left, so what is the problem? For mass production yeah don't go that way, for hobbieist, amateurs One of a kind projects, come one , unless you are a student or unemployed your time costs more than the ROM on the chip.
IMO the CubeMX can be used as a start to get things going: I/O properly defined, clocks and peripherals defined and working. Then the hard stuff starts to tweak and adjust it to your own personal needs.
For youngplayers, stick to the HAL because then you have support.