Electronics > Microcontrollers

[ARM] optimization and inline functions

(1/16) > >>

Simon:
I'm playing. So I wrote a test project for a SAMC. I am toggling a port pin in a loop - the hello world project.

So  on a 4MHz clock with a static inline function:
with no optimization I get a symmetrical 50% duty output at 45kHz,
With optimization 1 I get 666kHz 50% duty output
with optimization 2 I get 1MHz but now it's a 25% duty
With optimization 3 I get the same

With a non inline-d function:
with no optimization I get a symmetrical 50% duty output at 45kHz,
With optimization 1 I get 111kHz 50% duty output
with optimization 2 I get 111kHz 50% duty output
With optimization 3 I get 111kHz 50% duty output

So actually forcing the inlining makes the biggest gain, but surely if the compiler is optimizing hard it is looking at size of code, execution time and time to call the function and it would inline automatically.

I'm a little confused about the 25% duty, the instructions to go low and high are identical, it's like the instructions are prefetched in two's into some sort of father cache? it's only an M0+

eutectique:
Would you share the generated assembly code?

Nominal Animal:

--- Quote from: Simon on January 18, 2022, 09:36:27 am ---le confused about the 25% duty, the instructions to go low and high are identical
--- End quote ---
Including the loop jump?

If you use something like
    while (keep_generating) {
        GPIOn_PSOR = 1 << GPIOn_BIT;  /* Set output high */
        GPIOn_PCOR = 1 << GPIOn_BIT;  /* Set output low */
    }
then the pin stays high for the duration of the single instruction, but low for the duration of the loop jump (at least one clock cycle on Cortex-M0+) and whatever the keep_generating test takes, if it is not always true.

If your code is
    while (pulses-->0) {
        GPIOn_PSOR = 1 << GPIOn_BIT;  /* Set output high */
        GPIOn_PCOR = 1 << GPIOn_BIT;  /* Set output low */
    }
then in the optimized Thumb code for Cortex-M0+, the output would be high for one clock cycle, but low for three (one for the assignment/store, one for decrementing the pulse count in a register variable, and one for the branch), giving a 25% duty cycle.

The tightest GPIO toggle loop with 50% duty without unrolling uses the GPIO TOGGLE register (GPIOn_PTOR in NXP KL26 family Cortex-M0+ controllers), using
    while (keep_generating) {
        GPIOn_PTOR = 1 << GPIOn_BIT;
    }
which, if keep_generating always evaluates to True, and GPIOn_BIT is 0, compiles to the following Thumb assembly on Cortex-M0+:
        ldr     rA, .address
        movs    rB, 1
    .loop:
        str     rB, [rA]
        b       .loop
    .address:
        .long   2684354628
where rA and rB are some general registers, often r0 and r1.

The TOGGLE register is useful in that the bit mask on the right side can toggle any set of bits in the same GPIO port, and if they are initialized to opposite output states, they will also toggle in opposite states.  For more complex sequences, you can use an array of toggle masks, say

--- Code: ---void toggle32(volatile uint32_t *reg, uint32_t mask[], uint32_t masks, uint32_t pulses)
{
    while (pulses-->0) {
        uint32_t *const  ends = mask + masks;
        uint32_t        *curr = mask;

        while (curr < ends) {
            *reg = *(curr++);
        }
    }
}

--- End code ---
where masks is usually even, and (mask[0] ^ mask[1] ^ ... ^ mask[masks-2] ^ mask[masks-1]) == 0, i.e. the full toggle sequence returns the GPIO outputs to their original states.  That sequence is then repeated pulses times, but note that there is a "delay" (due to the outer loop and inner loop setup) after the last toggle.  If you have the memory, and especially if you can limit to a single byte within the toggle register (at least NXP KL26 allows 8-bit, 16-bit and 32-bit accesses to the register), you can do pretty complicated digital waveform patterns on the output at a pretty darned high rate.

Simon:
I am using toggle, so it's the same instructions all round, I guess the initial state is 0 so it looks like it is faster to flip low to high than it is to flip high to low.

MK14:
The peripheral clock, might be running slower than the main cpu clock. So because of aliasing effects, it can do strange things. I'm not sure of the part number (MCU), so I can't check the datasheet, to see a possible divider ratio.

Navigation

[0] Message Index

[#] Next page

There was an error while thanking
Thanking...
Go to full version