When working with the HAL libraries on an STM32 using the OpenSTM32 workbench I noticed a substantial performance hit when calling a function from an interrupt. It seemed on the magnitude of 800 clock cycles.
If you want to know it more accurately, you could set up a internal timer ready-to-go in your main code. Then clear&start it when you enter the ISR and stop it when you leave it. The timer value then contains the cycles (or multiple of) taken which can be read while running your main code.
Alternatively you could set/clear a GPIO upon entering/leaving the ISR and measure it's timing with a scope or LA.
You could obviously also narrow this down around particular parts of the code.
Now, on another project I'm working with the CMSIS libraries on an STM32F4. In the current revision of the firmware functions are called to setup a DAC. I'm looking at changing this up so that a Macro is called instead in an attempt to improve the amount of clock cycles it takes to complete the interrupt.
Is it correct to say that the compiler simply takes the code behind the macro and substitutes it where it's called in the interrupt. So no additional stack work is done, like what happens when you call a function?
Timing in this situation is critical due to the interrupt is being driven at a maximum of 320-kHz. This is indeed required by the specific application.
My initial thought is that the assertions of ST's CMSIS library are very slow. ST's libraries assert everything, even on the most simple GPIO_SetBits functions, in order to check if the port specified is a GPIO port, etc.
In general I consider assertions to be a good thing, but they can be a bit annoying when performance gets in the way. You could try disabling assertions (which is often useful in e.g. release mode) and see if that speed things up.
If so I would rewrite the functions in your ISR so you can keep assertions turned on in the rest of your project.
If not then I would probably try to profile (manually) what code takes so many cycles to execute and reconsider the approach taken.
In terms of the choice between macro's and functions: always prefer functions when you can. Macro's are horrible to debug and very error prone way of writing code. As demonstrated a single semicolon, brackets inside macro arguments or forgetting braces can introduce 'invisible bugs' because the actual compiled code is not transparant anymore. Debugging that via the C preprocessor output is not my definition of fun. This makes existing macro's almost "work of magic" that you never want to change again, because you need to be careful with using and writing them.
If you're using GCC you could use -O3 and write it with a few short explicit functions instead. If you want to keep debugability in the rest of your project, you could consider using
a few pragma's (or better: int foo(int i) __attribute__((optimize("-O3"))) ) around the ISR functions. This should considerably 'streamline' the assembly for speed.