Computing > Programming

Can I increment an integer faster if 'step' is a power of 2?

(1/4) > >>

741:
This kind of thing - of course the compiler is free to lose any 'help' I am trying to give it with these powers of two, but anyway...

#define INCREMENT 64
uint16_t u;

for(u = 0; u < LIMIT; u += INCREMENT)
{
//Do whatever
}

tggzzz:
With questions like this, the answer will depend on the what's inside the loop, target processor, the compiler, the compiler version, the compilation flags - and what is/isn't in any cache.

The contents of the loop would have to be small for it to make a significant difference - if any. Don't microoptimise!

If you want to play around, insert your code into https://godbolt.org/

T3sl4co1l:
I suppose an increment of 1 might be slightly faster on some platforms.  Difference being there's usually an INC instruction, whereas for others it's LDI reg, INCREMENT / ADD accum, reg, or ADD imm, more or longer instructions in either case.

I would not worry about it, at all.  There are far more important things to think about.  Loop optimization is a late, late stage of development.

The loop check may also vary, for example an increasing loop uses CMP accum, LIMIT which may be longer than the TST accum (or none at all when the ADD/SUB instruction returns zero) for decreasing to zero.  Mind that comparisons to zero are only equal when the initial value is a multiple of the increment.

Loops can be on pointers as well.  Instead of

--- Code: ---for (i = 0; i < size; i++) {
q[i] = p[i];
}

--- End code ---

you might have something like,

--- Code: ---void* qq = q;
void* pp = p;
void* ppend = pp + size;
for (; pp < ppend; ) {
*q++ = *p++;
}

--- End code ---

In certain (simple) cases, the compiler may elucidate this for you -- check the output listings.  This tends to be more compact, but may not necessarily be faster, for example on 8-bit platforms the 16-bit pointer compare requires at least two instructions (compare, compare with carry), usually four (load immediate for each byte of the comparison).

And in such cases, you could go even further and think about page-aligned accesses, for example copying a buffer aligned to 256-byte start and length, so that the low-byte compare can be trivial (0x00).  The compiler won't know about this; this only applies once you're deep in the assembler, desperate for CPU cycles, pushing around object allocations for lucky breaks like this.

Tim

Integer Increment is basically the same low level logic complexity as add, whether it’s a power of 2 or other integer. Some cpus have a shorter instruction size for increment than add (1), to make a small program size saving.

We’re you hoping to make the C for loop run with lower overhead?

741:

--- Quote from: gbaddeley on July 12, 2021, 12:34:16 pm ---Integer Increment is basically the same low level logic complexity as add

--- End quote ---

Thanks, that is what I suspected. This is on a PIC, and it is ages since I used assembly. The loop speed matters, since I want low overhead to be sure that N fixed delays add up to what I intended  (N * fixed delay). I have to take the ADC sample at the correct time relative to initiating the readings.

TM4 screen dumps show Microchip's RE46C200 smoke detector in a test mode. Yellow is integrator, blue is DAC.

Integrator slope is proportional to illumination. First the chip takes DARK readings, then LIT readings.

Once the fixed-time (here 200us) integration completes, the DAC ramps up at a fixed rate until is change in value matches the preceeding integration (in it's day job it is part on an on-chip ADC).

---> The time taken by the DAC slope varies with illumination.

Note 1: DAC limit is 5.3V.
Note 2:  I can't easily use the datum of the small dip in IR LED voltage (start of LIT integration, IR LED is on) with the hardware I have in place. I'd like to use timings from initial start-test only.