I think my earlier point was still not fully understood.
Finite loops and infinite loops have nothing to do with each other.
An infinite loop (such as 'while (1)' with no exit path inside its body, or equivalently 'for (; ; )' ) can't be optimized out, as it's blocking the execution flow, making it impossible for any compiler to optimize it out. That would be such a severe compiling bug that it would need to be thrown away.
As long as you have a loop with an exit path, this is NOT an infinite loop. But from experience and what can be assumed in this thread, there is some common misnaming of finite and infinite here. It seems that many people will call a loop with a simple condition based on a counter with a fixed range a "finite" loop, and anything else an "infinite" loop. Which isn't correct IMO. And that leads to confusions ("unfortunate shortcuts") regarding optimizations.
'while (1) {}' is an infinite loop, has no exit condition whatsoever, and can never be optimized out, as it blocks execution flow. What happens inside the body loop doesn't matter - it can't get out.
'for (int i = 0; i < 100; i++) {}' can definitely be optimized out. It has an exit condition that can be determined statically and no effect.
What confuses some developers is that execution time itself is NEVER considered an observable effect in the language specification in C, and many (if not most) other languages higher level than assembly. (Well, there are even some assemblers that can optimize instructions, so...) For embedded developers in particular, this point often sounds confusing, as timing in embedded development is key. But C and other similar or higher-level languages have no notion of timing whatsoever.
The only "right" way of writing a delay loop in C is to either use a volatile-qualified counter, such as: 'for (volatile int i = 0; i < 100; i++) {}' or do something that is considered having an effect in the body loop - which may not be trivial. When coding for MCUs, a relatively common example of this is to use some kind of 'nop' (which is usually defined as inline assembly with a 'nop' instruction for the given target, qualified volatile, so that it itself can't be optimized out - something like 'asm volatile ("nop")' ). So that would look like: 'for (int i = 0; i < 100; i++) { nop(); }'. In either case, it will still be a 'hack' when it comes to obtaining a particular delay, but it will spend some execution time for sure. For the record, the version with a volatile-qualified counter is usually the more expensive one, as it's most often implemented with a counter placed on the stack, and read and written at each iteration, which is more expensive usuallly than a typical "nop" instructtion, while the version without the volatile qualifier will usually be implemented with a register.
As to busy loops reading some MCU "register" (not to be mixed with the CPU registers themselves), yes, as long as said register is declared volatile, the loop will never be optimized out either. That's guaranteed to work, and that's the reason register definitions are all marked volatile in well-written C: all accesses are guaranteed to be honored by the compiler.
Note that with some compilers, dereferencing pointers that are cast from integer values (which can usually be considered direct "addresses" on many targets), even when not qualified volatile, does act as though it was. That may be one reason some developers have seen that not using volatile for "register definitions" works, and so using "volatile" can be done without. I wouldn't recommend that, as there is no such guarantee that I've seen in any standard revision, so you'd be playing with the particular behavior of some compiler. What I mean is for instance:
'*(uint32_t *) 0x10000' would never be optimized out, and so would be equivalent to '*(volatile uint32_t *) 0x10000'. But that would be a particular case with a particular compiler and should not be considered a rule.
The typical case where a loop is very likely to be optimized out is the following:
uint32_t n;
void Foo(void)
{
while (n > 0) {}
}
The reason is that the compiler, here, can assume that n never changes when Foo() is being executed.
Adding the volatile qualifier to the declaration of n will prevent this optimization, and will compile as a loop which reads and compares 'n' at each iteration.
I think that does sum it up reasonably (let me know if I missed some case) and there is no black magic involved.