Well it is pretty difficult to make a universal peripheral library work across all chips. New chips keep coming out that might do something in a slightly different way that the existing library is not made for and so it needs changing.
Some things are actually done pretty well in HAL, like having functions with the word IT in front for things that are interrupt driven so you can clearly see what to use. It provides built in mutexes and locks to prevent problems with multithreading and interrupts. Even sets up DMA for you if you use appropriate functions.
But then there are a lot of little things that make me facepalm. Like flipping a GPIO pin between input and output requires you to allocate a whole gpio pin init structure, fill that structure and then pass it in to the init function to have your pin reconfigured. Some peripherals also hold on to this init structure in RAM as part of the hal peripheral struct that holds the state of the peripheral. While i do applaud them for using simply a C struct to store this (Rather than making a C++ class for the same thing) that init structure can sometimes be over a kilobyte in size and has no use anymore after the init function finishes setting up the peripheral according to your wishes. I would much prefer the init would only do the very basic things to get the peripheral running and then have you call separate things like HAL_GPIO_SetMode(GPIOC, 10, GPIO_INPUT_PULLUP)
Then there are things where the HAL library is built to work well with a very specific use case in mind, yet this use case is something really niche and rare, while the most common use case is not supported at all. One such thing is you can ask to receive a certain number of characters from UART, depending on how you call it you can have the function wait as a blocking call, interrupt you or DMA them somewhere, notifying you when done and stopping receiving. This is great if you expect exactly 10 characters to arrive just after this call. But if you want to constantly receive? Well you could always keep asking for 1 character but the UART stops receiving when its done with 1 character and only starts receiving again once you call the receive function again so you will loose any characters that arrive in that time. Are you kidding me?!
Then you get things that are implemented without any thought as to there use case at all. Like having I2C with DMA. Sounds great since I2C is awfully slow so you don't want to wait around for data. Well turns out this is only really useful if you want to transfer a big block read of data as it will only do one operation at a time. It won't let you automate the usual I2C dance that happens before the read transfer (Adress the slave, write to address, restart, adress slave, read, get data...). My solution was about a 100 line long function that takes in a structure of instructions for the I2C to do and executes them in interrupts by direct poking of bits in hardware registers. Works great and will read data from multiple I2C chips, doing it from a single function call all in the background. Versus the many thousands of lines of code of that crappy HAL I2C library.
All feels like the HAL library was written by a bunch of high level programmers used to writing C code for Windows, following a checklist of implementing something for every peripheral in blocking, interrupt and DMA mode before moving on. None of those programmers having actually used any MCU peripheral libraries, having no idea what embeded programmers actually want a device driver to look like.
Still HAL is a great way of getting something up and running quickly and to get you familiar with the peripheral. Then you can slowly rewrite parts of the HAL library to actually do it in a proper way.