I don't understand the emphasis on "bitness" in this discussion. 8b, 32b, what does it matter? These chips all have CPUs with a bunch of peripherals with registers sitting at memory addresses. The complexity increase, if there is any, is in the number and complexity of the peripherals themselves. Seems to have almost nothing to do with the core.
My problem with the ST HAL is that it does a poor job of abstracting anything. To give a minor example, you need to turn on the clocks to various peripherals in order to use them. But the peripherals are spread out on two separate APB busses. So far, so good. I can think of good reasons to do that. But the HAL requires the programmer to know which bus the given peripheral is on, and you have to use a separate call RCC_APB2PeriphClockCmd or RCC_APB1PeriphClockCmd to turn the peripheral on. Moreover, those calls take constants with the peripheral names that are defined as macros. You can assign the wrong peripheral to the wrong function with nary a peep of complaint at compile time.
This is a case where ST could have implemented one call, with a proper typed enum and the HAL could have abstracted which clock is getting turned on for the programmer. In fact, with a bit of effort, they could probably have made a nice abstraction to handle ALL the clock configuration on the chip in a programmer-friendly manner.
Most of the ST HAL functions also have the flavor of taking a huge struct that corresponds very closely to the registers themselves, and then using that to set up and control stuff for you. OK, but that's also not providing much abstraction, either. It's just sort of half-cooked. It's one thing to abstract away the hardware and provide a SW interface which is more attuned to how the programmer would /use/ the peripheral, it's quite another to just put a layer of gloss on poking.
Finally, I want to make a shout-out to the auto-generated boilerplate code put out by some of these configuration tools. There is nothing wrong with this concept, but you have to admit that code is often of low quality, has all the hallmarks of being copied and pasted, usually comes with voluminous "comments" that don't tell you much of anything, and often has the full monty of register configuration when, for a given task, only a few things are important. This makes such code a pretty crappy learning tool, which is a shame, since it seems to be the main way to learn how ST parts work -- more popular than the docs.
ALL THAT SAID, they're nice chips, they do all sorts of cool stuff and for cheap, and it's not like you can't make it all work with a little effort. But I think it's weird to say that "this is the price of complexity, so deal with it." Complexity is the sh*ts, but that does not mean there cannot be better and worse ways of supporting it.