A disadvantage is that it has runtime code overhead
For a 32bit mcu, it really doesn't matter. So you took a few extra cycles to toggle your led, and a few extra bytes of ram/flash. Make code that is readable and reliable, let the compiler handle the details. Nothing wrong with pursuing optimal code, its just not a great use of time when in most cases you really gain nothing.
Start at the usage side, so you can write code the way you would like it to 'look and feel', then work out the details to get there.
I have c++ code for avr, stm32, nrf52, efr32, ra, etc., and it all looks basically the same. There are hardware differences so there will be differences, but for the most part the usage can be similar for each mcu.
sample GpioPin class for stm32, G031 in this case (no pin irq stuff in this version, although is not much more code)-
https://github.com/cv007/NUCLEO32_G031K8_chrono/blob/main/include/GpioPin.hppThis repository was meant to test the chrono library, among other things like ownership and atomic types. Also note that the GpioPin class in this version uses (my) Atomic types so if pin properties happen to be manipulated in any irq code the operation will be atomic (pins share registers such as AFR, MODE, etc). I doubt anyone does this, and there may be little need to do, but it is the correct thing to do unless you warn the 'user' not to manipulate pin properties in irq's.
You can use 'normal' classes and get the compiler to produce optimal code by keeping the object inside a local scope, but again its not worth the trouble and you end up with dual usage plus an additional decision each time (do I want fast or normal). What is worth the trouble, is to use const/constexpr/static whenever you can just like any other C/C++ code. Templates are useful, but if used to get always optimal code produced such as with a gpio class, you will end up with templates everywhere as each pin will be a specific type (want to pass a pin to a function? function will need to be a function template to handle any pin). Eventually you get better at seeing when to use/not use templates, and will tend to avoid if possible.
When I was learning C++ on the pic32mm, I knew a lot less than I know now but that code was reliable and everything I tried worked usually the first time. I wrote a driver for every peripheral including usb (a driver for each datasheet chapter, 1hpp+1cpp file). I knew nothing about templates, could not really read mips so did not spend much time worrying about what the compiler produced (and the pci32mm was 256k). It was straight forward, easy to read and use, and in some ways probably as good as what I do now (more knowledge = more clever = usually worse, realize its getting worse, work your way back to more simple, better again). This is when I also had a goal to rid myself of defines, as they have a tendency to take over and you end up with a preprocessor 'language' as large as your mcu language in use. That complete pic32mm driver code had only a few defines in the usb header, as I could find no good way to handle string creation other than with macros. I still avoid defines if possible because I just dislike them for various reasons (and its easier to do without in C++).
You can also use the online compiler to try out ideas-
https://godbolt.org/z/768nEcGjrI copied/pasted some code, modified to remove atomic types, etc. Also added ability to add always_inline option to GpioPin functions, to see what difference it makes, although you do then start down another path of user optimization. You can also see code produced for a local object vs a global object, and also try out an array of pins. It is a great way to test out code, except you will not have mcu headers available so have to just deal with it in some way (typically working on limited pieces of code so can add enough info to get the code to work, as in the above examples).