Is it correct to say that the compiler simply takes the code behind the macro and substitutes it where it's called in the interrupt. So no additional stack work is done, like what happens when you call a function?
When working with the HAL libraries on an STM32 using the OpenSTM32 workbench I noticed a substantial performance hit when calling a function from an interrupt. It seemed on the magnitude of 800 clock cycles.
Now, on another project I'm working with the CMSIS libraries on an STM32F4. In the current revision of the firmware functions are called to setup a DAC. I'm looking at changing this up so that a Macro is called instead in an attempt to improve the amount of clock cycles it takes to complete the interrupt.
Is it correct to say that the compiler simply takes the code behind the macro and substitutes it where it's called in the interrupt. So no additional stack work is done, like what happens when you call a function?
Timing in this situation is critical due to the interrupt is being driven at a maximum of 320-kHz. This is indeed required by the specific application.
When working with the HAL libraries on an STM32 using the OpenSTM32 workbench I noticed a substantial performance hit when calling a function from an interrupt. It seemed on the magnitude of 800 clock cycles.
Timing in this situation is critical due to the interrupt is being driven at a maximum of 320-kHz. This is indeed required by the specific application.
When working with the HAL libraries on an STM32 using the OpenSTM32 workbench I noticed a substantial performance hit when calling a function from an interrupt. It seemed on the magnitude of 800 clock cycles.Calling a function doesn't cost 800 cycles. The function you called may have cost 800 cycles. There is a big difference.
The function call overhead on a Cortex-M should be on the order of ten cycles. Any decent compiler will inline static functions that are defined above the call site, and that are called only once.
When working with the HAL libraries on an STM32 using the OpenSTM32 workbench I noticed a substantial performance hit when calling a function from an interrupt. It seemed on the magnitude of 800 clock cycles.Calling a function doesn't cost 800 cycles. The function you called may have cost 800 cycles. There is a big difference.
The function call overhead on a Cortex-M should be on the order of ten cycles. Any decent compiler will inline static functions that are defined above the call site, and that are called only once.
The called function is only ~12 lines of code setting up GPIO pins. Using HAL when this seemed to cause collisions in the interrupt. At the time it seemed like it would have taken a couple hundred clock cycles for that to happen. This time it was the same exercise using the CMSIS libraries. I know I won't be able to get upwards of the 260-kHz that I want on the F4 but I was able to squeeze out 140-kHz, which is enough for now until I get around to laying out the F7 board.
Using the direct code on the HAL F7 substantially improved performance, I was hoping the same with the F4 but using Macros instead of copy and pasting the same 12 lines of code throughout the interrupt. The same code is executed at about 8 different locations depending on the situation.
what are those 12 lines of code? setting up IO with any of the usual libraries could (will) do a crap ton of checking and possible
call function to do so
what are those 12 lines of code? setting up IO with any of the usual libraries could (will) do a crap ton of checking and possible
call function to do so
Direct port manipulation. Setting up a uint16 variable based on an input value and control bits, then pushing it into a GPIO register, clearing, and repeating.
The contents of the function isn't an issue. It's as streamlined as possible. The issue was the best way to write the code so it exists once and used in multiple places. With functions there will always be stack work. I wanted to be certain on how the compiler treats a macro in order to minimize the amount of clock cycles it takes to complete the interrupt. I am going to be triggering it as fast as possible before it starts to break down.
Sometime a function (in particular static functions) will be more efficient than a macro because it allows the compiler to make optimizations that it would otherwise not make.
Don't know about HAL but there is a small gotcha with macros, they paste text so watch out for formatting like brackets etc if you're using complex constructs. Maybe wrap in do{}while(0).
#define do_something( a ) do { do_this( a ); do_that( a ); } while (0)
uint16_t mustIncrementOnce = 0;
do_something( mustIncrementOnce++ ); // Oops!
Sometime a function (in particular static functions) will be more efficient than a macro because it allows the compiler to make optimizations that it would otherwise not make.Why is that? Macros are expanded by the preprocessor (cpp in the GNU tools) which is nothing more than a sophisticated text search & replace. After cpp is done the resulting text is fed into the C compiler. All in all an equally well written inline function or macro shouldn't make any difference.
Why would anyone ever write:Code: [Select]do { ... } while (0)
What's the point?
(In other words, why not just write:Code: [Select]{ ... }
without the redundant do and while bits?)
equally well written inline function or macro
Before optimizing, using inlines etc. it would be better to understand what's happening.
The contents of the function isn't an issue. It's as streamlined as possible.
When working with the HAL libraries on an STM32 using the OpenSTM32 workbench I noticed a substantial performance hit when calling a function from an interrupt. It seemed on the magnitude of 800 clock cycles.
Now, on another project I'm working with the CMSIS libraries on an STM32F4. In the current revision of the firmware functions are called to setup a DAC. I'm looking at changing this up so that a Macro is called instead in an attempt to improve the amount of clock cycles it takes to complete the interrupt.
Is it correct to say that the compiler simply takes the code behind the macro and substitutes it where it's called in the interrupt. So no additional stack work is done, like what happens when you call a function?
Timing in this situation is critical due to the interrupt is being driven at a maximum of 320-kHz. This is indeed required by the specific application.
If you want to know it more accurately, you could set up a internal timer ready-to-go in your main code. Then clear&start it when you enter the ISR and stop it when you leave it. The timer value then contains the cycles (or multiple of) taken which can be read while running your main code.
Alternatively you could set/clear a GPIO upon entering/leaving the ISR and measure it's timing with a scope or LA.
You could obviously also narrow this down around particular parts of the code.