I suppose an increment of 1 might be slightly faster on some platforms. Difference being there's usually an INC instruction, whereas for others it's LDI reg, INCREMENT / ADD accum, reg, or ADD imm, more or longer instructions in either case.
I would not worry about it, at all. There are far more important things to think about. Loop optimization is a late, late stage of development.
The loop check may also vary, for example an increasing loop uses CMP accum, LIMIT which may be longer than the TST accum (or none at all when the ADD/SUB instruction returns zero) for decreasing to zero. Mind that comparisons to zero are only equal when the initial value is a multiple of the increment.
Loops can be on pointers as well. Instead of
for (i = 0; i < size; i++) {
q[i] = p[i];
}
you might have something like,
void* qq = q;
void* pp = p;
void* ppend = pp + size;
for (; pp < ppend; ) {
*q++ = *p++;
}
In certain (simple) cases, the compiler may elucidate this for you -- check the output listings. This tends to be more compact, but may not necessarily be faster, for example on 8-bit platforms the 16-bit pointer compare requires at least two instructions (compare, compare with carry), usually four (load immediate for each byte of the comparison).
And in such cases, you could go even further and think about page-aligned accesses, for example copying a buffer aligned to 256-byte start and length, so that the low-byte compare can be trivial (0x00). The compiler won't know about this; this only applies once you're deep in the assembler, desperate for CPU cycles, pushing around object allocations for lucky breaks like this.
Tim
I suppose an increment of 1 might be slightly faster on some platforms. Difference being there's usually an INC instruction, whereas for others it's LDI reg, INCREMENT / ADD accum, reg, or ADD imm, more or longer instructions in either case.
A number of machines allow fast incrementing by 1, 2, 4 and perhaps 8, as these are the increments needed for incrementing the addresses of the various sizes of variable supported by the machine.
Ah, neat. I haven't seen any like that (that I know of). Though I do know of an analogous feature. On 80386+, the extended address calculation allows addition, immediate offset, and multiplication by several values (depending on whether you use the multiplicand in the addition), all done in separate hardware at nearly no expense. This is one case where the loop,
char *p;
for (i = 0; i < MAX; i++) {
p[i * STEP]++;
}
is one of the better ways to do it (assuming it's followed literally, and not optimized to another form like as mentioned above). Mind, this multiplication is exactly implicit in non-void pointers: "*(p+1)" aka "p[1]" is the next element in the array, of whatever size p is typed as; it's not the raw address! I use char above, to force a size of 1 so the array access is obviously spaced out. I could've written it for any sized object, but it may not be as apparent unless one is aware of this. Usual caveats, mind alignment and padding, depending on platform, if you need to save memory or optimize for accesses like these.
Tim
As to optimizing increments for larger-than-native integers, as I mentioned, a decent compiler will also do it on its own. Again as long as the increment is known statically (thus is a constant).
As an example with GCC and RV32:
#include <stdint.h>
uint64_t Inc1(uint64_t n)
{
return n + 1;
}
uint64_t Inc2(uint64_t n)
{
return n + 0x100000000; // 2^32
}
yields: (parameter in {a1, a0}; return value in {a1, a0})
Inc1:
mv a5,a0
addi a0,a0,1
sltu a5,a0,a5
add a1,a5,a1
ret
Inc2:
addi a1,a1,1
ret