Author Topic: Why is my microsecond delay function suddenly malfunctioning on my STM32?  (Read 507 times)

Fire Doger, TC2, abyrvalg and 2 Guests are viewing this topic.

Online TC2

  • Contributor
  • Posts: 9
  • Country: us
Hi all! Hope everything is going well. I'm honestly at a loss here, hoping you guys can help. I am returning to a project from a few months ago I was working on and was trying to get some video of it. For example, I have a ultrasonic sensor that I wanted to show working in my video. It wasn't working, and I had no idea why so I did some debugging. Keep in mind this use to work fine for the past year, I don't get what's wrong.

I narrowed it down to what seems to be my microsecond delay function. It would produce a delay that I didn't specify. I tried reconfiguring the clocks so it didn't work, so I opened an older test project and it still is acting up.

Some info
My board is an STM32F746ZG
I'm using TIM3 for the delay function
My code for the delay function
Code: [Select]
void delay_us(uint16_t us){
__HAL_TIM_SET_COUNTER(&htim3,0); // Set counter to 0
while(__HAL_TIM_GET_COUNTER(&htim3) < us);
}

TIM3 according to the datasheet is connected to APB1. Here is what it's set to

TIM3 itself has a prescalar of 4.5-1, because 4.5Mhz/4.5 = 1Mhz, 1/1Mhz = 1uS for microseconds. The period is 0xFFFF-1.

Below is some example test code
Code: [Select]
HAL_GPIO_TogglePin(GPIOD, GPIO_PIN_11);
delay_us(10);

Here is the result on the oscillscope. So instead of 10uS, using cursors for measurements (sorry it's not in this photo), it's around 14.7uS instead. Or if I do something like 100uS, I get 94uS instead.


Am I missing something? I don't know if I'm forgetting something since I took a break, or something else. Thanks
 

Online abyrvalg

  • Frequent Contributor
  • **
  • Posts: 576
  • Country: ru
Re: Why is my microsecond delay function suddenly malfunctioning on my STM32?
« Reply #1 on: September 16, 2021, 11:27:10 pm »
TIMx prescaler is integer, so your 4.5 gets truncated to 4, resulting in 88us delay for 100 counts and 8.8us for 10. Additional ~6us in both cases must be coming from bloated HAL inefficiency and slow APB1 register accesses at such low bus frequency.
« Last Edit: September 16, 2021, 11:31:08 pm by abyrvalg »
 

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 2828
  • Country: ca
They are macro defines, not bloated:

Code: [Select]
__HAL_TIM_SET_COUNTER(__HANDLE__, __COUNTER__)   ((__HANDLE__)->Instance->CNT = (__COUNTER__))
  Sets the TIM Counter Register value on runtime.

Code: [Select]
__HAL_TIM_GET_COUNTER(__HANDLE__)   ((__HANDLE__)->Instance->CNT)
  Gets the TIM Counter Register value on runtime.
 
The following users thanked this post: newbrain

Offline gnif

  • Administrator
  • *****
  • Posts: 1278
  • Country: au
They are macro defines, not bloated:

Code: [Select]
__HAL_TIM_SET_COUNTER(__HANDLE__, __COUNTER__)   ((__HANDLE__)->Instance->CNT = (__COUNTER__))
  Sets the TIM Counter Register value on runtime.

Code: [Select]
__HAL_TIM_GET_COUNTER(__HANDLE__)   ((__HANDLE__)->Instance->CNT)
  Gets the TIM Counter Register value on runtime.

`HAL_GPIO_TogglePin` is in also use... that said though it's not a complex function so shouldn't add much latency.

Since you're running at a fixed frequency wouldn't it be better to simply use a timed nop loop for such short delays? This is what I have done in the past

Code: [Select]
#define _NOP_LOOP(x) \                                                           
  __asm__( \                                                                     
      "mov r0,#" #x "\n" \                                                       
      "1:\n" \                                                                   
      "sub r0,#1\n" \                                                             
      "cmp r0,#0\n" \                                                             
      "bge 1b\n" \                                                               
      ::: "r0" \                                                                 
  )                                                                               
                                                                                 
#define NOP() __asm__("nop");
#define NOP_LOOP(x ) _NOP_LOOP(((x) / 6) - 1)                                     
#define DELAY_NS(ns) NOP_LOOP((ns * 1000) / (1000000000/72000000))

Note: I am not a professional firmware developer, the above may be terrible :)
« Last Edit: Yesterday at 01:53:35 am by gnif »
HostFission - Full Server Monitoring and Management Solutions.
https://hostfission.com/
https://twitter.com/HostFission
https://twitter.com/Geoffrey_McRae
 

Offline lucazader

  • Regular Contributor
  • *
  • Posts: 196
  • Country: nz
There is anopther topic around here that was looking at using the CYCCNT register (which is present in your F7) as a timer and then using that to do timing.

I generally dont choose this route and choose something similar to you.
However instead of setting up the timer like you do. I start a free-running timer with a 1us tick rate that just counts up forever (usually use tim6).
But functionally it does similar with a while loop waiting to check if the timer tick if above the time I wish.

As others have said; I would suggest clocking the timer peripheral to an integer value (say 4, 8 or something like that)
And then setting the pre scaler appropriately. This works super well in all my projects.

Another quick way that you can sanity check the timer is giving you the correct time is to compare it against the HAL tick in HAL_GetTick().
If you delay for say 16000us, and then either using a debugger or printf find out the Hal time before and after you entered your custom delay. and check that the value makes sense.
 

Online TC2

  • Contributor
  • Posts: 9
  • Country: us
TIMx prescaler is integer, so your 4.5 gets truncated to 4,
Sorry I should of been more specific, this is just one out of many examples I tried. I've used 36Mhz for example, with 36 prescalar. Still same issue.

Another quick way that you can sanity check the timer is giving you the correct time is to compare it against the HAL tick in HAL_GetTick().
If you delay for say 16000us, and then either using a debugger or printf find out the Hal time before and after you entered your custom delay. and check that the value makes sense.
I did some sanity checks like you said. If I do 16000us for example, the reported number from the debugger is 16ms. 20000us is 20ms, etc.  :-\
 

Offline bson

  • Supporter
  • ****
  • Posts: 1959
  • Country: us
Are you optimizing the code with inlining enabled?  If not, you might have significant function call overhead.  At what speed is the core running?

Regardless, consider moving the delay_us() function to a header file and define it there as:
Code: [Select]
static void __attribute__((always_inline, optimize)) delay_us(uint16_t us) {
        __HAL_TIM_SET_COUNTER(&htim3, 0); // Set counter to 0
while (__HAL_TIM_GET_COUNTER(&htim3) < us)
                            ;
}
It should only produce a few inline instructions.  But the drawback of getting more accurate timing in debug builds is you won't be able step through it with the debugger, although that's not particularly useful here.
« Last Edit: Yesterday at 10:57:02 pm by bson »
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 3485
  • Country: us
Quote
wouldn't it be better to simply use a timed nop loop for such short delays?
It gets very tricky on the newer, faster, ARM chips with caches and "flash accelerators" and shared buses and such.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 7168
  • Country: fr
Using the CYCCNT register on ARM is the easiest way of doing this really. And it has the benefit of being accessed at the core clock's frequency, thus there is no extra synchronization latency compared to accessing a peripheral register through an APB.

All you need is enable it (as was already shown). Which is even easier to do than setting up a peripheral timer.

In any case, getting µs-accuracy with waiting loops obviously implies running the core at a frequency that is significantly higher than 1 MHz to make up for the waiting loop code overhead, even if it's just a couple instructions.

It may still be hard to get accurate, very short delays, depending on cache considerations, branch prediction, and such, so you should also consider the underlying architecture.

Thus using a timer or cycle counter will get you a delay of the form a.n + b (with a, the resolution, being much more accurate than with any ugly nop-based hack), but there will always be a b delay offset (software overhead) which becomes significant for very small a and n.

 

Online abyrvalg

  • Frequent Contributor
  • **
  • Posts: 576
  • Country: ru
And you can compensate that b by reducing the count limit by some fixed offset. Or just use that offset as the initial timer value instead of 0 - this way you’ll avoid subtraction (which could roll over if the input value is smaller than offset).
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf