Products > Programming

While {you're down there ...}

<< < (4/4)

westfw:
Empty loops make me a bit nervous, especially from a readability point of view.
(and see also: https://stackoverflow.com/questions/3592557/optimizing-away-a-while1-in-c0x (which I think is not directly applicable, but adds to the nervousness.)


--- Code: ---while (!button);
--- End code ---
is particularly bad.  "Is that semicolon really supposed to be there?"
Comments are good, like the "// Fall through" comment in case statements.

--- Code: ---while (!button)  ; /* spin, waiting */
while (!button)
    ; // spin
while (!button) {
   // wait
}

--- End code ---
are all better.
And I'll put in a new suggestion

--- Code: ---while (1) {
   if (button)
      break;
}
--- End code ---
which I like for indeterminate and long loops.  (if you're checking something like a UART status flag, it's a loop that you expect to terminate "soon."  Whereas with a button, it could be a very long time indeed.  The less likely the event is to occur, the more I like to emphasize the "infinite-ness" of the loop.  Or something like that.  YMMV, local styles override, etc...)

rstofer:

--- Quote from: SiliconWizard on September 26, 2019, 08:00:37 pm ---
On modern MCUs with caches and branch prediction and whatnot, those delay loops are virtually useless, even when you make sure they don't get optimized out (which is still a common trap for young players). Just adapting them to the clock freq won't work to get any sensible idea of the real delay.

On ARM Cortex ones, you can use special counter registers for that instead. I think PIC32 also have CPU counters which you can use for that as well.

--- End quote ---

This is a HUGE trap when optimization is turned on.  The program ran great, everything is ready to ship, let's optimize the code and go out for Pizza.

In theory, and I wouldn't want to guarantee it, if the variable in the while statement is declared volatile, the loop shouldn't be optimized away.  But it also has to be a global variable, reachable from the world.  The idea is that an interrupt routine, in another module, could change the value so the compiler can't just assume it never changes and, therefore, it leaves the code alone.

Spin loops are problematic!

westfw:
Here's an improved delay mechanism for Cortex-M4, using the 32bit cycle counter that is part of the "Debug, Watchpoint, and Trace" (DWT) unit that is present in (most?) CM4 chips...

https://github.com/adafruit/ArduinoCore-samd/pull/77/commits/faf28a90ca4c97229736bd5fbbbbba5b1bbcc808

(replaces the instruction-counting loop in the Adafruit SAMD51 Arduino core that (surprise!) didn't work right with the cache enabled.)


All ARM Cortext chips have Systick, which can be used similarly (but it's only 24 bits, and is usually set to reload periodically, and counts down, which makes it more awkward to use.  https://github.com/WestfW/Duino-hacks/blob/master/systick_delay/systick_delay.ino )

SiliconWizard:
Yup, I actually gave some code for STM32 (HAL, but the HAL part only to get the clock frequency) here:
https://www.eevblog.com/forum/microcontrollers/ad7685-reading-data-problem/msg2687904/#msg2687904

Siwastaja:

--- Quote from: SiliconWizard on September 26, 2019, 08:00:37 pm ---On modern MCUs with caches and branch prediction and whatnot, those delay loops are virtually useless...

--- End quote ---

I don't understand this. I have done quite some work with modern MCUs with caches, and they mostly come with core-coupled instruction scratchpad, and I just simply put all even remotely timing-critical code there. It has the same performance than having everything 100% in cache with no misses, and is predictable, including the first iteration.

Branch prediction in a simple delay loop should be predictable as well.

IMO, caches are there to increase the average performance of large routines when you run out of small core-coupled RAM and have to run "directly" out of FLASH, or, worse, out of external SD card or similar. But this doesn't matter much for small timing critical routines (which the delay loops obviously are) - just run them out of instruction RAM.

I have never turned caches on in an MCU project; turns out, I can always fit all timing-critical processing in tightly coupled RAM, and the rest can be "slow" from flash.

But even if you "have" to run it from cache - it's going to be a small offset at the start, depending on whether it produces a miss or not in the first cycle. Assuming you still run from the internal flash, the difference is a flash cycle or two, or maybe about 20ns.

And I'm using delay loops extensively. Of course they aren't good for precision timing in presence of interrupts, but neither are timer-based busy loops, or interrupt handlers. Doing it accurately in a system with existing interrupts requires a big picture understanding, i.e., setting your interrupt priorities right while making sure nothing breaks in edge cases.

Using a simple delay loop to implement something at least won't risk the existing interrupts, and it's very obvious that the delays are going to be longer than specified depending on interrupt load level. This is much easier than to add a new interrupt handler, configure the pre-emptive priorities right, and still get jitter to the less important task.

Navigation

[0] Message Index

[*] Previous page

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod