Author Topic: Clock cycles, timing, and the cost of things on an STM32  (Read 7853 times)

0 Members and 1 Guest are viewing this topic.

Offline Pack34Topic starter

  • Frequent Contributor
  • **
  • Posts: 753
Clock cycles, timing, and the cost of things on an STM32
« on: March 18, 2018, 12:57:25 am »
I have doubt that my internal clock on my STM32 is configured properly. Is there a way to output the system clock to a pin to probe?

Doing simple pin throttling like the below takes an order of magnitude longer than it should. This should only take a single clock cycle to execute, right?

TEST_PORT->BSRRL = TEST_PIN;
TEST_PORT->BSRRH = TEST_PIN;

Then if I do a large amount of nops it'll take substantially longer than it seems it should. Take for example, the code below. This should cause a delay of multiple seconds. The 335999999 nops should equal 2 seconds when running at a 168MHz internal clock. The total time it takes to complete is about 28-30 seconds. How many cycles does it take to decrement an unsigned 32bit int by one?

uint32_t duration = 335999999 ;
while (duration-- > 0)
      asm("NOP");

 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #1 on: March 18, 2018, 01:51:28 am »
Execution time depends on, amongst other things, the core clock, the bus clocks, the flash speed, and the compiler's code generation. Subtracting one from an int takes one cycle only if the value is already in a register, the result doesn't need to be stored, and there is no delay in fetching the subtraction instruction itself.

If the device you're using doesn't allow outputting the system clock directly, try configuring a regular timer so its clock is derived from the core clock.

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #2 on: March 18, 2018, 02:11:27 am »
I have doubt that my internal clock on my STM32 is configured properly. Is there a way to output the system clock to a pin to probe?

Yes. Look through the manual and/or datasheet for the specific STM32 you are using and see if it supports the MCO pin. MCO standing for Master Clock Out. Most, if not all, of the STM32 chips support this. You'll have to check the exact registers you need to set but it probably needs an 'alternate function' set for the relevant pin and  set it up in the clock configuration register.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #3 on: March 18, 2018, 07:19:22 am »
Quote
This should only take a single clock cycle to execute, right?
TEST_PORT->BSRRL = TEST_PIN;

No.

It usually takes a minimum of 3 instructions for an ARM to move a bit to a memory location.  Something like:
Code: [Select]
    mov r1, &TEST_PORT
    mov r2, 1<<TEST_PIN
    str   r2, BSRRL[r1]
If you're lucky, your pin-toggle loop might move the first two moves outside your loop, but ... you can't really count on that unless you look at the code produced.
On top of that GPIO is frequently located on the other side of a peripheral bus that runs at a lower speed than the main CPU (and has to be configured, too.)  An STM32f403 datasheet I had handy said that it has two APB, one "high speed" that runs at a max of 84MHz, and one lower speed that runs up to 42MHz.   And "flash wait states" and/or "accelerator" issues show up as well.  So you might be getting close to the "order of magnitude slow" that you're seeing, even if you have the CPU clock running at full speed.   You shouldn't even be theorizing without at least looking at the assembly language you end up with.  (you don't have to KNOW assembly language to be able to get clues from LOOKING at it...)

(and on some chips, each peripheral might have yet another clock that needs to be configured for maximum speed.)


Looking into the MCO looks like a good idea, but it also might have limitations.  At least one datasheet I have says:
Quote
The selected clock to output onto MCO must not exceed 100 MHz (the maximum I/O speed).
(and then there's the depressing thought that you might misconfigure MCO...)
 

Offline dgtl

  • Regular Contributor
  • *
  • Posts: 183
  • Country: ee
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #4 on: March 18, 2018, 12:19:10 pm »
MCO has dividing options for most parts. Send out a divided down clock and check that. Even if you violate the maximum timings and transmit a too fast clock, it will be probably good enough to check with scope.
Alternatively, set up the SPI with a known divider and transmit some garbage data in infinite loop. Check SCK pin with scope.
 

Offline donotdespisethesnake

  • Super Contributor
  • ***
  • Posts: 1093
  • Country: gb
  • Embedded stuff
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #5 on: March 18, 2018, 01:09:04 pm »
I have doubt that my internal clock on my STM32 is configured properly. Is there a way to output the system clock to a pin to probe?

Doing simple pin throttling like the below takes an order of magnitude longer than it should. This should only take a single clock cycle to execute, right?

A common error by newbies and indeed not so newbies is to think that ARM executes one instruction per cycle, regardless. I've no idea where that idea comes from. Presumably because it is described as RISC, and people think RISC means one instruction per cycle?

The other common error is to think you can scale all features of an MCU linearly with CPU clock speed, that doesn't work. In practice CPU speed quickly outstrips the performance of RAM, Flash, peripheral bus.

Your loop runs about 14 cycles per loop, that sounds about right.

But I suspect this is an X-Y problem. What is it that you are actually trying to do?


Bob
"All you said is just a bunch of opinions."
 

Offline Pack34Topic starter

  • Frequent Contributor
  • **
  • Posts: 753
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #6 on: March 18, 2018, 04:32:25 pm »
I have doubt that my internal clock on my STM32 is configured properly. Is there a way to output the system clock to a pin to probe?

Yes. Look through the manual and/or datasheet for the specific STM32 you are using and see if it supports the MCO pin. MCO standing for Master Clock Out. Most, if not all, of the STM32 chips support this. You'll have to check the exact registers you need to set but it probably needs an 'alternate function' set for the relevant pin and  set it up in the clock configuration register.

Good idea... Thanks!



I have doubt that my internal clock on my STM32 is configured properly. Is there a way to output the system clock to a pin to probe?

Doing simple pin throttling like the below takes an order of magnitude longer than it should. This should only take a single clock cycle to execute, right?

A common error by newbies and indeed not so newbies is to think that ARM executes one instruction per cycle, regardless. I've no idea where that idea comes from. Presumably because it is described as RISC, and people think RISC means one instruction per cycle?

The other common error is to think you can scale all features of an MCU linearly with CPU clock speed, that doesn't work. In practice CPU speed quickly outstrips the performance of RAM, Flash, peripheral bus.

Your loop runs about 14 cycles per loop, that sounds about right.

But I suspect this is an X-Y problem. What is it that you are actually trying to do?

I'm trying to get some accurate delays in there. I don't want to overwhelm a sensor that I'm trying to configure. It takes three bytes to configure a register. For this to be reliable I need about 50ms between each byte, 150ms between each triplet, and then a second between each chunk. I noticed this when piping everything down using a USB to TTL cable. I figured that something like this would be cleaning just stuffing a bunch of delays in there since this will happen before external IO is enabled so the end-user can't talk to the device until it's at least attempted to configure itself properly.
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #7 on: March 18, 2018, 09:05:24 pm »
I'm trying to get some accurate delays in there. I don't want to overwhelm a sensor that I'm trying to configure. It takes three bytes to configure a register. For this to be reliable I need about 50ms between each byte, 150ms between each triplet, and then a second between each chunk. I noticed this when piping everything down using a USB to TTL cable. I figured that something like this would be cleaning just stuffing a bunch of delays in there since this will happen before external IO is enabled so the end-user can't talk to the device until it's at least attempted to configure itself properly.
Use timers, if accurate delays are needed.
The CPU is then free to do other work, while the timer is running.

As for the CPU clock setup, if you are still in doubt, post your clock initialization code.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline iMo

  • Super Contributor
  • ***
  • Posts: 4766
  • Country: nr
  • It's important to try new things..
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #8 on: March 18, 2018, 10:42:18 pm »
With stm32F1 and F4 you may use this when you want to mess with clock cycle precision

Code: [Select]
#define DWTEn()         (*((uint32_t*)0xE000EDFC)) |= (1<<24)
#define CpuTicksEn()    (*((uint32_t*)0xE0001000)) = 0x40000001
#define CpuTicksDis()   (*((uint32_t*)0xE0001000)) = 0x40000000
#define CpuGetTicks()   (*((uint32_t*)0xE0001004))
..
  DWTEn();
  CpuTicksEn();
..
  uint32_t elapsed = CpuGetTicks();      // Measure the 1us delay
  delay_usecs(1);
  elapsed = CpuGetTicks()- elapsed;      // How many CPUTicks?

CpuGetTicks() returns the internal CPU clock counter. The resolution is 1 CPU clock.
« Last Edit: March 18, 2018, 10:54:39 pm by imo »
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #9 on: March 19, 2018, 08:22:32 am »
Huh.  Those are part of the " Data Watchpoint and Trace Unit"
Nice!

   
 

Offline iMo

  • Super Contributor
  • ***
  • Posts: 4766
  • Country: nr
  • It's important to try new things..
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #10 on: March 19, 2018, 08:49:36 pm »
For example this returns "8 cpu clocks" with stm32F1

Code: [Select]
// Fast IO
#define FIO_SET(port, pins)         port->regs->BSRR = (pins)
#define FIO_CLR(port, pins)         port->regs->BRR = (pins)
..
  uint32_t elapsed = CpuGetTicks();
  FIO_SET(GPIOB, 12);
  FIO_CLR(GPIOB, 12);
  elapsed = CpuGetTicks()- elapsed;
« Last Edit: March 19, 2018, 08:56:47 pm by imo »
 

Offline MT

  • Super Contributor
  • ***
  • Posts: 1616
  • Country: aq
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #11 on: March 20, 2018, 12:02:47 am »
I have doubt that my internal clock on my STM32 is configured properly. Is there a way to output the system clock to a pin to probe?

Yes, some devices have MCO some other devices have MCO1 and MCO2. Else use timers easier to set high division
ratios.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: Clock cycles, timing, and the cost of things on an STM32
« Reply #12 on: March 20, 2018, 09:12:25 am »
(Note that all the ARM Cortex chips that I've seen also contain a "SysTick" timer that usually runs at the CPU frequency.  It's normally "in use" for system clock-tick type purposes, but you can use it for measuring short delays as well.)
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf