Again. Optimize where necessary. Avoid everywhere else.
Do I care that I'm 3 stackframes deep to toggle a GPIO? 99% of the time, no, not at all. When I do, then I worry about it and instruct the compiler to be more aggressive.
Just a quick test on a codebase I had handy: arm-none-eabi 6.2.1, with -Og, doesn't inline automatically and produces less than spectacular code (which is expected, since I told it I want to be able to debug without losing my mind). With -O3, all my HAL GPIO calls were inlined and the set/clear ones resolved to 3 thumb instructions each. The HAL_GPIO_Init() ones were a little goofier, ending up being about ten instructions each but this is harder to tease out since gcc is doing a good job of reordering and reusing values.
Interestingly, with -Os, the GPIO functions are still inlined, presumably because they're small and it takes more space to set up/tear down a stack pointer and the call is also probably a far call since the codebase is about 200kB.
First rule of optimization: don't optimize.
Second rule of optimization: don't optimize yet.