Author Topic: 32F4 - a timer without interrupts - CYCCNT  (Read 3675 times)

0 Members and 1 Guest are viewing this topic.

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3697
  • Country: gb
  • Doing electronics since the 1960s...
32F4 - a timer without interrupts - CYCCNT
« on: August 31, 2021, 07:44:14 am »
This seems to be a free running 32 bit counter, which can be polled at any time. So it could be used to implement a timeout on some loop.

A search of my project, including all the HAL stuff, finds no reference to it except definitions in various .h files. So it doesn't appear to get started anywhere.

This https://stackoverflow.com/questions/42246125/how-to-get-time-intervals-on-stm32 shows a simple usage example but how do I guard against the 32 bit rollover? The timeouts I want are only milliseconds, but the counter will still sometimes roll over during that. I don't want to be resetting the counter in case another bit of code is using it.

Also searching (e.g. https://developer.arm.com/documentation/ddi0337/e/System-Debug/DWT/Summary-and-description-of-the-DWT-registers?lang=en#BABFDDBA) suggests that debuggers may be resetting it to zero, etc.

Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline lucazader

  • Regular Contributor
  • *
  • Posts: 221
  • Country: au
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #1 on: August 31, 2021, 08:03:02 am »
As you may have stumbled on the ARM documentation, "When the counter overflows it wraps to zero, transparently".
I would suggest this makes it not ideal for your purposes.

A better method would be to either use the systick and take timestamps, or timeout times based on this.
If you need a faster timer than this then i would suggest creating a 1us tick timer based on one of the other timers in the system.
I do this quite often. Set the timer to count at a 1us tick rate. Interrupt on overflow and increment a an overflow counter.
Then implement a get_usec() function that returns the time since program start in us. (I tend to use a uint64_t for this as a 32 bit counter would overflow at ~71 Mins)

Code: [Select]
volatile uint64_t hrt_overflows = 0;
void TIMER_IRQN_HANDLER(void) // correct irq hander for your timer
{
    __HAL_TIM_CLEAR_IT(&htim6, TIM_IT_UPDATE);
    ++hrt_overflows;
}

uint64_t get_tick_usec(void)
{
    return  (hrt_overflows << 16) + htim6.Instance->CNT;
}

Very low overhead, as it will only interrupt once every 65ms
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3697
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #2 on: August 31, 2021, 08:19:33 am »
In my project, initially HAL based, systick is heavily referenced within the STM code (and is used by FreeRTOS), and HAL_GetTick is also heavily referenced. Systick is a 1kHz interrupt. HAL_GetTick is

Code: [Select]
/**
  * @brief Provides a tick value in millisecond.
  * @note This function is declared as __weak to be overwritten in case of other
  *       implementations in user file.
  * @retval tick value
  */
__weak uint32_t HAL_GetTick(void)
{
  return uwTick;
}

and uwTick seems to be point to nothing meaningful

Code: [Select]
/* Private macro -------------------------------------------------------------*/
/* Private variables ---------------------------------------------------------*/
/** @addtogroup HAL_Private_Variables
  * @{
  */
__IO uint32_t uwTick;
uint32_t uwTickPrio   = (1UL << __NVIC_PRIO_BITS); /* Invalid PRIO */
HAL_TickFreqTypeDef uwTickFreq = HAL_TICK_FREQ_DEFAULT;  /* 1KHz */

so it may be doing nothing.

AIUI, regarding 32 bit rollover, if you read the reg into uint32_t, then subtraction always works even if it has overflowed.

Yes one could also read some timer. I am using TIM6 to generate a 1kHz interrupt for various things. And a debugger would not mess with that.

« Last Edit: August 31, 2021, 08:22:36 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline lucazader

  • Regular Contributor
  • *
  • Posts: 221
  • Country: au
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #3 on: August 31, 2021, 09:23:03 am »
You mention FreeRTOS in your reply.
If you are using freertos then you should probably be using the freertos based timers, delays and tick counts.
E.G. xTaskGetTickCount() (https://www.freertos.org/a00021.html#xTaskGetTickCount)
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3697
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #4 on: August 31, 2021, 09:35:46 am »
FreeRTOS needs interrupts (obviously) and using any of that would make the code non-portable to a different scenario.

Hence reading the CPU cycle count seems a good option. It just seems curious that this counter was never started...
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5907
  • Country: es
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #5 on: August 31, 2021, 09:44:00 am »
As long as you read it at least twice the rate than it overflows, and also check if it's smaller than the last time (It rolled over), it will be ok.
For 48MHz it's every 89 seconds, for 72MHz is 59 seconds and so on, you have plenty of time to check it.

Being free running with no interrupts, you have to make a handler and call in from the main loop, so it checks CYCCNT.
You must store the last CYCNT value, compare with the new one, increase the millisecond counter every FCY/1000.
Normally you woud find the difference doing last-new, but if CYCCNT is lower than the previous value, you know it overflowed, so on this case you must do (2^32-old)+CYCCNT.
And you only update the old value when increasing the millisecond counter.
« Last Edit: August 31, 2021, 09:54:39 am by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #6 on: August 31, 2021, 09:48:06 am »
https://github.com/adafruit/ArduinoCore-samd/blob/master/cores/arduino/delay.c#L64


dwt cyccnt timer used for delayMicroseconds() in arduino since 2019 on Adafruit samd51 boards

 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4427
  • Country: dk
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #7 on: August 31, 2021, 10:02:19 am »
if you use signed numbers and only have a maximum of one overflow between two timestamps, the difference between to timestamps still come out correct
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5907
  • Country: es
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #8 on: August 31, 2021, 10:18:32 am »
It's perfectly valid as long as your main loop doesn't get blocked somewhere for too long.
I think this would do ok, I might have missed something.


Code: [Select]
#define SYS_FCY  48000000                         // System clock in Hz
uint32_t sys_ms;                                  // Variable for storing system milliseconds
uint32_t last_cnt;                                // Variable for storing previous CYCCNT value

uint32_t get_ms(void){                            // Read current time
  return sys_ms;
}

void reset_timer(void){                           // Reset timer
  sys_ms = 0;
  last_cnt = 0;
  DWT->CYCCNT = 0;
}

void handle_Timer(void){                          // Update system time
  uint32_t diff, current = DWT->CYCCNT;
 
  if(current>=last_cnt){                          // No rollover
    diff=current-last-cnt;
  }
  else{                                           // Rollover
    diff=current + (0xFFFFFFFF-last_cnt);
  }
 
  if(diff>=(SYS_FCY/1000)){                       // If diff >= FCY/1000 (1ms)
    sys_ms++;                                     // Increase ms counter
    last = current - (diff-(SYS_FCY/1000));       // update last, not to current, but current-(diff-(FCY/1000)) to keep the remainder for the next millisecond step
  }
}


int main(void){
  ...
  init, etc
  ...
  while(1){
    ...
    ...
    handle_Timer();
  }
}
« Last Edit: August 31, 2021, 10:30:16 am by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3697
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #9 on: August 31, 2021, 10:57:10 am »
Looking at the above, and the Github link further up, it looks like the simplest way one could do a timeout is like this:

CYCCNT  initialisation (at CPU startup only):

Code: [Select]
#define CoreDebug_DEMCR_TRCENA_Msk         (1UL << 24U)
#define DWT_CTRL_CYCCNTENA_Msk             0x1UL

CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
DWT->CYCCNT = 0;
DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;

Code: [Select]
int32_t start_time = DWT->CYCCNT;
int32_t max_count = 10000;  // probably this counts CPU clocks / 4... need to look it up

do
{
   whatever stuff could take too long
   ...
   ...
   ...
}
while ( (DWT->CYCCNT < (start_time+max_count)) || (some other condition) );

and using signed int32 the overflow condition is not an issue.

I recall a post here recently saying that using unsigned avoids the overflow condition but I can't work that out.


« Last Edit: August 31, 2021, 10:58:44 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5907
  • Country: es
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #10 on: August 31, 2021, 11:07:18 am »
Unless "whatever stuff could take too long" gets stuck and never reaches the while check :-DD
And yes it counts CPU clock, you need to convert time to clocks.
« Last Edit: August 31, 2021, 11:09:07 am by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3697
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #11 on: August 31, 2021, 11:20:15 am »
Yes :)

Just compiled this

Code: [Select]
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
DWT->CYCCNT = 0;
DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;

int32_t start_time = DWT->CYCCNT;
int32_t max_count = 1000000000;

do
{
debug_puts("hello\n");  // ITM data console version of "puts()"
hang_around(1000);  // dumb wait for 1000ms (uses inline asm)
}
while (DWT->CYCCNT < max_count);

and I see six "hello"s, which is spot on for 168MHz :)

But this does not work

Code: [Select]

        CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
DWT->CYCCNT = 0;
DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;

int32_t start_time = DWT->CYCCNT;
max_count = 168000000;  // 168M = 1 sec

do
{
debug_puts("counting...\n");
start_time = DWT->CYCCNT;
while ( DWT->CYCCNT < (start_time+max_count) ) {}
}
while (true);


I get one "counting..." and then nothing. The comparison clearly doesn't work.

It works with uint32_t but also falls over after some tens of seconds. So what is the secret of the code which handles overflow inherently?
« Last Edit: August 31, 2021, 11:45:24 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5907
  • Country: es
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #12 on: August 31, 2021, 03:40:40 pm »
langwadt, where's that int magic? I don't see anything else than problems.
The function might fail because the int32_t becomes negative (2147483646,  2147483647 -> -2147483648, -2147483647...)

The check would no longer be valid if CYCCNT is still positive but adding max_count causes int overflow, becoming negative.

ex:
CYCCNT = 0x7FFFD8EF (2147473647)
2147473647+168000000 = 0x8A0352EF (-1979493649)

while ( 2147483647 <  -1979483649 )


So in the end it's the same thing than with plain unsigned, but adding another headache.

Add more checks, or just don't run a while loop like that, clearing the counter after each loop:
Code: [Select]
max_count = 168000000;  // 168M = 1 sec
do
{
                DWT->CYCCNT=0;
debug_puts("counting...\n");
while ( DWT->CYCCNT < max_count);
}
while (true);
« Last Edit: August 31, 2021, 04:01:01 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3697
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #13 on: August 31, 2021, 04:40:10 pm »
This post says otherwise
https://www.eevblog.com/forum/microcontrollers/32f417-spi-master-mode-1-byte-case-mystery/msg3637762/#msg3637762

and this seems to be the ST HAL implementation - example

Code: [Select]
  tickstart = HAL_GetTick();

  while(__HAL_FLASH_GET_FLAG(FLASH_FLAG_BSY) != RESET)
  {
    if(Timeout != HAL_MAX_DELAY)
    {
      if((Timeout == 0U)||((HAL_GetTick() - tickstart ) > Timeout))
      {
        return HAL_TIMEOUT;
      }
    }
  }
« Last Edit: August 31, 2021, 04:45:45 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4427
  • Country: dk
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #14 on: August 31, 2021, 04:45:11 pm »
langwadt, where's that int magic? I don't see anything else than problems.
The function might fail because the int32_t becomes negative (2147483646,  2147483647 -> -2147483648, -2147483647...)

The check would no longer be valid if CYCCNT is still positive but adding max_count causes int overflow, becoming negative.

ex:
CYCCNT = 0x7FFFD8EF (2147473647)
2147473647+168000000 = 0x8A0352EF (-1979493649)

while ( 2147483647 <  -1979483649 )


it's a bit of a mind bender, but when things get truncated is important

try

uint32_t start = CYCCNT;
while((CYCCNT-start) < max_count) ;




 
The following users thanked this post: DavidAlfa

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #15 on: August 31, 2021, 11:49:26 pm »
Quote
uint32_t start = CYCCNT;
while((CYCCNT-start) < max_count) ;
Handling overflow of counters, whether they are cycle counters or millisecond counters, is pretty well understood, and the above is the correct way to do it.


Consider an 8bit counter where you want to time 10 counts, and start at t=250 (an unsigned 8bit quantity, of course.)
After 5 ticks, 255-250 = 5, which is exactly what you expect.  At 6 ticks the counter has overflowed to 0, and you have 0-250 = 6, once you truncate to 8bits.  After 10 ticks, t = 4, and 4-250=10, which is just right.  Make sure you comparison happens in the same word size as your variables, which may not be the case in languages like C with integer promotion:
Code: [Select]
(gdb) print (unsigned char)(4 - 250)
$3 = 10 '\n'
(gdb) print (unsigned char)6 - (unsigned char)250
$4 = -244

 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14476
  • Country: fr
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #16 on: September 01, 2021, 12:43:57 am »
This is basic modular arithmetic. Of course you need to use unsigned integers for the operations to be correct. Subtracting works as expected even when there's an overflow - due to C (as would be done in pure assembly) doing roll-over arithmetic, which is just modulo 2^N, where N is the integer bit width.

Of course, the subtraction stops being correct though if the counter has rolled over more than once.
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4427
  • Country: dk
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #17 on: September 01, 2021, 01:10:59 am »
This is basic modular arithmetic. Of course you need to use unsigned integers for the operations to be correct. Subtracting works as expected even when there's an overflow - due to C (as would be done in pure assembly) doing roll-over arithmetic, which is just modulo 2^N, where N is the integer bit width.

Of course, the subtraction stops being correct though if the counter has rolled over more than once.

I'll have think about it but I don't think it needs to be unsigned
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #18 on: September 01, 2021, 03:01:48 am »
Quote
due to C (as would be done in pure assembly) doing roll-over arithmetic
I think it's more the CPUs themselves than the language.  Though I suppose that you could somehow use saturating arithmetic or cause exceptions...
Quote
I'll have think about it but I don't think it needs to be unsigned
The math doesn't care, but the comparison needs to know whether the result is signed or not.  Otherwise -1 is less than 1.  This is more obvious on CPUs that have different conditional branch instructions for signed vs unsigned.  Like BRLO vs BRLT on AVR.  One of them checks the carry bit, and one checks (HIGHBIT xor OVERFLOW) bits.
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #19 on: September 01, 2021, 10:15:27 am »
Note that the ARM cycle counter (CYCCNT) must be enabled in the DWT; it may not be enabled by default.

In C, and on all 32-bit ARM arches, given uint32_t  before, after;
    after - before
is equivalent to
    (uint32_t)(after - before)
and is nonnegative.  In the case where before > after, the value follows modulo 232 arithmetic, and is exactly 232-before+after.

No tricks needed, no ifs needed.  If you know before occurred before after, and the difference between the two was less than 232 cycles, then after - before always yields the correct number of cycles in between, even when before > after, due to modulo arithmetic rules.  And those are enforced both by GCC/Clang, and the ARM hardware.

Signed arithmetic differs, except that both GCC and Clang on all 32-bit arches always implement twos complement modulo arithmetic (with representable values ranging from -231 to 231-1, inclusive).  You can think of it as adding or subtracting 232 to/from the result, until it is within the representable range.  In reality, the overflow is either lost, or saved in a flag.  GCC and Clang do provide integer overflow built-ins as extensions to C and C++, if you do want to detect signed or unsigned overflow.

If you want to use 64-bit cycle counter, you can use for example
Code: [Select]
static volatile word64  cycle_counter;

static inline uint64_t  cycles(void)
{
    word64  old_state, new_state;

    old_state = cycle_counter;

    new_state.u32_lo = ARM_DWT_CYCCNT;  // or DWT->CYCCNT, dunno
    new_state.u32_hi = old_state.u32_hi + (new_state.u32_lo < old_state.u32_lo);

    cycle_counter = new_state;  // TODO: Make this atomic!

    return new_state.u64;
}
where the word64 type is an union,
Code: [Select]
#include <stdint.h>

typedef union {
    uint64_t  u64;
    int64_t   i64;
#ifdef  __BYTE_ORDER__-0 == __ORDER_LITTLE_ENDIAN__-0
    struct {
        uint32_t  u32_lo;
        uint32_t  u32_hi;
    };
    struct {
        uint32_t  i32_lo;
        uint32_t  i32_hi;
    };
#elif __BYTE_ORDER__-0 == __ORDER_BIG_ENDIAN__-0
    struct {
        uint32_t  u32_hi;
        uint32_t  u32_lo;
    };
    struct {
        uint32_t  i32_hi;
        uint32_t  i32_lo;
    };
#else
#error Unknown byte order
#endif
}  word64;
Type punning via an union like this is explicitly allowed in C.  The __BYTE_ORDER__ check is needed (this is written for GCC and Clang, other compilers may export other macros) because the byte order (endianness) of the target architecture determines whether the high or low 32 bits of the combined 64-bit value is stored first.  You could use bit shifts for portable code, but cycle counting (CYCCNT and similar, for example __builtin_ia32_rdtsc() on x86 and x86-64) itself is already something very hardware-dependent; I believe that in this particular case, the type punning approach is better/warranted.

If the cycle_counter = new_state; assignment was atomic, or done while interrupts disabled on a single-core system, then it would be safe to call cycles() at any point, even in an interrupt context.  As it is written now, it is not reliable if it is ever called in an interrupt context, because there is a tiny possibility of the interrupt occurring exactly in the middle of the 64-bit assignment, since it is done using two 32-bit assignments.
If you cannot make it atomic (noting that it is not enough to unconditionally disable then enable interrupts, because it might be called with interrupts already disabled), then there are ways to rewrite the function to do it anyway (because the cycle counter grows upward, and we can more or less assume that any interrupts take less than 2³⁰ or so cycles), but it gets a bit complicated and much less robust.

The "odd" part in it, + (new_state.u32_lo < old_state.u32_lo), simply adds 1 if the low 32 bits wrap around.  In C, logical expressions evaluate to 1 if true, and to zero if false.  We just need to ensure the order of operations using the parentheses, and the compiler will handle the details for us.

Finally, note that you need to call cycles() at least once per every 232 cycles, so that CYCCNT wraparound is correctly detected.  Perhaps in SysTick, or some other periodic timer you use.
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4427
  • Country: dk
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #20 on: September 01, 2021, 10:56:06 am »
Note that the ARM cycle counter (CYCCNT) must be enabled in the DWT; it may not be enabled by default.

In C, and on all 32-bit ARM arches, given uint32_t  before, after;
    after - before
is equivalent to
    (uint32_t)(after - before)
and is nonnegative.  In the case where before > after, the value follows modulo 232 arithmetic, and is exactly 232-before+after.

No tricks needed, no ifs needed.  If you know before occurred before after, and the difference between the two was less than 232 cycles, then after - before always yields the correct number of cycles in between, even when before > after, due to modulo arithmetic rules.  And those are enforced both by GCC/Clang, and the ARM hardware.

Signed arithmetic differs, except that both GCC and Clang on all 32-bit arches always implement twos complement modulo arithmetic (with representable values ranging from -231 to 231-1, inclusive).  You can think of it as adding or subtracting 232 to/from the result, until it is within the representable range.  In reality, the overflow is either lost, or saved in a flag.  GCC and Clang do provide integer overflow built-ins as extensions to C and C++, if you do want to detect signed or unsigned overflow.

If you want to use 64-bit cycle counter, you can use for example
Code: [Select]
static volatile word64  cycle_counter;

static inline uint64_t  cycles(void)
{
    word64  old_state, new_state;

    old_state = cycle_counter;

    new_state.u32_lo = ARM_DWT_CYCCNT;  // or DWT->CYCCNT, dunno
    new_state.u32_hi = old_state.u32_hi + (new_state.u32_lo < old_state.u32_lo);

    cycle_counter = new_state;  // TODO: Make this atomic!

    return new_state.u64;
}
where the word64 type is an union,
Code: [Select]
#include <stdint.h>

typedef union {
    uint64_t  u64;
    int64_t   i64;
#ifdef  __BYTE_ORDER__-0 == __ORDER_LITTLE_ENDIAN__-0
    struct {
        uint32_t  u32_lo;
        uint32_t  u32_hi;
    };
    struct {
        uint32_t  i32_lo;
        uint32_t  i32_hi;
    };
#elif __BYTE_ORDER__-0 == __ORDER_BIG_ENDIAN__-0
    struct {
        uint32_t  u32_hi;
        uint32_t  u32_lo;
    };
    struct {
        uint32_t  i32_hi;
        uint32_t  i32_lo;
    };
#else
#error Unknown byte order
#endif
}  word64;
Type punning via an union like this is explicitly allowed in C.  The __BYTE_ORDER__ check is needed (this is written for GCC and Clang, other compilers may export other macros) because the byte order (endianness) of the target architecture determines whether the high or low 32 bits of the combined 64-bit value is stored first.  You could use bit shifts for portable code, but cycle counting (CYCCNT and similar, for example __builtin_ia32_rdtsc() on x86 and x86-64) itself is already something very hardware-dependent; I believe that in this particular case, the type punning approach is better/warranted.

If the cycle_counter = new_state; assignment was atomic, or done while interrupts disabled on a single-core system, then it would be safe to call cycles() at any point, even in an interrupt context.  As it is written now, it is not reliable if it is ever called in an interrupt context, because there is a tiny possibility of the interrupt occurring exactly in the middle of the 64-bit assignment, since it is done using two 32-bit assignments.
If you cannot make it atomic (noting that it is not enough to unconditionally disable then enable interrupts, because it might be called with interrupts already disabled), then there are ways to rewrite the function to do it anyway (because the cycle counter grows upward, and we can more or less assume that any interrupts take less than 2³⁰ or so cycles), but it gets a bit complicated and much less robust.

The "odd" part in it, + (new_state.u32_lo < old_state.u32_lo), simply adds 1 if the low 32 bits wrap around.  In C, logical expressions evaluate to 1 if true, and to zero if false.  We just need to ensure the order of operations using the parentheses, and the compiler will handle the details for us.

Finally, note that you need to call cycles() at least once per every 232 cycles, so that CYCCNT wraparound is correctly detected.  Perhaps in SysTick, or some other periodic timer you use.



and it also  works for safely extending a counter (e.g. quadrature) that can count both ways, at regular intervals shorter than the wraparound
add the signed difference to a larger counter variable
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3697
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #21 on: September 01, 2021, 12:41:53 pm »
I remember, from decades ago, a shaft encoder front end using a Z80 and the Z80-CTC http://www.z80.info/zip/z80ctc.pdf which fed the two pulse trains to two 8-bit counters, and never lost track of the shaft position no matter when the CPU read the counters, provided it read them before they saw 256 pulses. I think we extracted the two values on a 10ms interrupt.

It came out from some appnote and neither myself nor anybody else understood how it worked, but it worked perfectly, and my then company made a fair bit of money out of selling it :)

I reckon this gets used in inertial nav systems too; they use optical encoders for zero friction.
« Last Edit: September 01, 2021, 12:45:41 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #22 on: September 01, 2021, 02:29:14 pm »
If the pulse counters are directional as opposed to quadrature – i.e., counter A counts pulses clockwise, counter B counterclockwise, both counters increasing, then
Code: [Select]
static uint8_t  rot_inc_lo, rot_dec_lo, rot_hi;

static inline uint16_t  rotation(void)
{
    uint8_t  old_inc_lo = rot_inc_lo;
    uint8_t  old_dec_lo = rot_dec_lo;
    rot_inc_lo = read_counter_a();
    rot_dec_lo = read_counter_b();
    rot_hi += (rot_inc_lo < old_inc_lo);
    rot_hi -= (rot_dec_lo < old_dec_lo);
    return ((uint16_t)(rot_hi) << 8) | (uint8_t)(rot_inc_lo - rot_dec_lo);
}
which can look really odd unless you understand how it works.  Both A and B counters are increasing 8-bit counters, but their difference is the value sought for.  The high byte is incremented whenever the positive counter wraps around, and decremented whenever the negative counter wraps around.  (While the result is 16-bit, you can treat it either as unsigned or signed on a Z80, by just changing the function result type above.)

The logic here is exactly the same as my previous post, except that here we have two opposing counters whose difference is the desired state.

In Z80 assembly, you can leverage the fact that subtracting the previous 8-bit counter from the new value yields the difference (that can be added to the previous counter value), but also sets the carry flag (to indicate whether the high byte needs to be incremented/decremented).  I'm not sure if you'd need to disable interrupts for a couple of instructions (definitely not for more than a few instructions), but the code implementing similar logic as above would be a rather funky mix of IN, SUB, ADC, SBC, and LD Z80 instructions.  Without a comment describing the logic (perhaps in pseudocode), I'm pretty sure the assembly would be write-only code.

Several variants exist for quadrature logic, although I personally prefer the see-saw approach (each rising transition switches the interrupt between the two pins, and the direction depends on whether the other pin is high or low), since that gives the most robust reading (and no contact bounce issues).  That does generate an interrupt per pulse pair, and at varying time intervals when direction changes, but in my use cases that has not mattered at all.  (I do like tactile EC11 rotary encoders, especially the kind with the extra push switch, for human interfaces.)
I do believe you only need some Schmitt trigger inputs and simple logic to derive the separate increment/decrement pulses like above from quadrature logic.  For mechanical encoders I think contact bounce would be a bit problematic, but I guess that's how you did it back then for optical encoders?  :-//
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3697
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #23 on: September 01, 2021, 03:07:11 pm »
It was all in assembler (c. 1980) but the bit I could never understand is how it worked given that it was obviously impossible to read the two counters atomically. There could easily be pulses happening betwen reading one and reading the other.

I don't have the circuit but yes there were a few gates before the CTC. The up/down direction was determined from the phase relationship of the two square waves.

The resemblance of this scheme to the CYCCNT topic is that it didn't matter if the counters wrapped around. Modular arithmetic. It was really elegant. But none of us ever understood it :)
« Last Edit: September 01, 2021, 04:01:46 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14476
  • Country: fr
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #24 on: September 01, 2021, 05:04:17 pm »
Quote
I'll have think about it but I don't think it needs to be unsigned
The math doesn't care, but the comparison needs to know whether the result is signed or not.  Otherwise -1 is less than 1.  This is more obvious on CPUs that have different conditional branch instructions for signed vs unsigned.  Like BRLO vs BRLT on AVR.  One of them checks the carry bit, and one checks (HIGHBIT xor OVERFLOW) bits.

Exactly.
Note, as I said, that it works if the language you use allows it. Many, such as C, do, because they just implement 2^N modular arithmetic, which is cheapest on most common CPUs.
Of course, with languages that do not implement overflow as a simple "rollover" - either by saturating the result or generating an exception - this simple scheme won't work.

With a language such as ADA, you can do this using a "mod" type. Basic integers won't do.
For instance, for a 32-bit counter, you could define the following type:
Code: [Select]
type Counter32_t is mod 2**32;
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #25 on: September 01, 2021, 10:15:42 pm »
On the Z80, you do have 16-bit atomic loads and stores as long as the address is in one of the register pairs (BC, DE, HL), and even an atomic 16-bit swap (but only to the top of the stack), and also 16-bit addition and subtraction, so there are several different ways one could do it atomically.  Expressing them in C is difficult, though.

Cortex-M4 (see PM0214 at st.com) implements load-linked, store-conditional primitives.  Essentially, you have a short interval, a few instructions, to do some operations, and then you can do a store-conditional write to that variable.  If that variable was not accessed in between (by anything in the system), the store succeeds; otherwise, the store fails (the value is not updated!), with the success/failure in a (chosen) register.  CMSIS provides these as __LDREXB()/__LDREXH()/__LDREXW() and __STREXB()/__STREXH()/__STREXW() macros or functions.

For the 64-bit cycle counter, you can make it "lockless" by using two copies of the 64-bit counter, and an 8-bit LL/SC index or offset indicating which copy is valid:
Code: [Select]
static volatile word64   cycles_count[2];
static volatile uint8_t  cycles_index;  /* 0 if cycles_count[0] is valid, 8 if cycles_count[1] is valid */

uint64_t  cycles(void)
{
    word64    count;
    uint32_t  i;

    /* Load most recently fully updated value */
    do {
        i = __LDREXB(&cycles_index) & 8;
        count = cycles_count[i >> 3];
    } while (__STREXB(i, &cycles_index));

    const uint32_t  cyccnt = ARM_DWT_CYCCNT; /* Or DWT->CYCCNT */

    /* Update cycle count */
    count.u32_hi += (cyccnt < count.u32_lo);
    count.u32_lo  =  cyccnt;

    /* Store the cycle count, and if valid, the index too. */
    do {
        i = (__LDREXB(&cycles_index) ^ 8) & 8;
        cycles_count[i >> 3] = count;
    } while (__STREXB(i, &cycles_index));

    return count.u64;
}
The cycles_index contains the byte offset to the currently valid cycles_index entry.

The first do { .. } while loop obtains the currently valid 64-bit counter state.  It almost always only does one iteration, as it only loops if the operation is interrupted (or some other code accesses cycles_index).  Because we cannot read the 64-bit counter state atomically, we do need a loop here.

We then read the cycle counter, and update the local copy of the 64-bit counter state.  (We cannot read the cycle counter before we know the old counter state, because otherwise we might see time flowing backwards.)

Finally, in the second do { .. } while loop, we store the updated 64-bit counter value to the not-currently-valid entry.  If the store is not interrupted and no other code accesses cycles_index while we do the store, cycles_index is updated to reflect this new entry is now valid.  We do need a loop here, because there are only two slots we can use.  If there were more slots than it were possible to have nested cycles() calls (consider interrupts with different priorities!), then we could just do one iteration.  In practice, the loop is so rarely repeated, it is not worth it to use more than two slots.

Because cycles_index is only modified when the corresponding entry has been updated, the currently valid entry is never trashed.

If a cycles() call is interrupted by something that also does a cycles() call,  afterwards the valid entry reflects the 64-bit cycle counter value in the outermost call (which is obviously a bit earlier than the innermost cycles() call obtained).  The return values from cycles() are monotonic; it is only the concurrent cycles() calls that can cause the internal state to change in a non-monotonic manner.  It only matters if you do debugging and examine the cycle counter state variables: then, if you have concurrent calls to cycle(), say one call is interrupted by a function that also calls cycles(), then you can see the internal state updated in a non-monotonic order.

The above probably needs the following definitions:
Code: [Select]
#include <stdint.h>

typedef union {
    uint64_t        u64;
    int64_t         i64;
    uint32_t        u32[2];
    int32_t         i32[2];
#if defined(__BYTE_ORDER__) && defined(__ORDER_LITTLE_ENDIAN__) && __BYTE_ORDER__-0 == __ORDER_LITTLE_ENDIAN__-0
    struct {
        uint32_t    u32_lo;
        uint32_t    u32_hi;
    };
    struct {
        int32_t     i32_lo;
        int32_t     i32_hi;
    };
#elif defined(__BYTE_ORDER__) && defined(__ORDER_BIG_ENDIAN__) && __BYTE_ORDER__-0 == __ORDER_BIG_ENDIAN__-0
    struct {
        uint32_t    u32_hi;
        uint32_t    u32_lo;
    };
    struct {
        int32_t     i32_hi;
        int32_t     i32_lo;
    };
#else
#error Unknown byte order.  Define __BYTE_ORDER__ to __ORDER_LITTLE_ENDIAN__ or to __ORDER_BIG_ENDIAN__.
#endif
} word64;

__attribute__((always_inline))
static inline uint8_t  __LDREXB(const volatile uint8_t *addr)
{
    uint32_t  result;
    asm volatile ("ldrexb\t%0, [%1]\n\t"
                 : "=r" (result)
                 : "r" (addr)
                 );
    return result;
}

__attribute__((always_inline))
static inline uint32_t  __STREXB(uint8_t val, volatile uint8_t *addr)
{
    uint32_t  result;
    asm volatile ("strexb\t%0, %1, [%2]\n\t"
                 : "=&r" (result)
                 : "r" (val), "r" (addr)
                 );
    return result;
}

__attribute__((always_inline))
static inline uint16_t  __LDREXH(const volatile uint16_t *addr)
{
    uint32_t  result;
    asm volatile ("ldrexh\t%0, [%1]\n\t"
                 : "=r" (result)
                 : "r" (addr)
                 );
    return result;
}

__attribute__((always_inline))
static inline uint32_t  __STREXH(uint16_t val, volatile uint16_t *addr)
{
    uint32_t  result;
    asm volatile ("strexh\t%0, %1, [%2]\n\t"
                 : "=&r" (result)
                 : "r" (val), "r" (addr)
                 );
    return result;
}

__attribute__((always_inline))
static inline uint32_t  __LDREXW(const volatile uint32_t *addr)
{
    uint32_t  result;
    asm volatile ("ldrexw\t%0, [%1]\n\t"
                 : "=r" (result)
                 : "r" (addr)
                 );
    return result;
}

__attribute__((always_inline))
static inline uint32_t  __STREXW(uint32_t val, volatile uint32_t *addr)
{
    uint32_t  result;
    asm volatile ("strexh\t%0, %1, [%2]\n\t"
                 : "=&r" (result)
                 : "r" (val), "r" (addr)
                 );
    return result;
}

Note that the same approach works no matter how large each slot (cycle_counter) is; it could very well be a structure.  Each does need their own index or offset variable, of course.  Because the load-linked store-conditional is done only for the duration of copying the structure (and incrementing the index in the store case), even a larger structure does not increase the retry count, because the window for interruption is just a few clock cycles each time.  Again, there is no race window per se, because even if the interrupt occurs at the worst possible moment, that only means the loop does another iteration, and each iteration only takes a few clock cycles anyway.
 

Offline wek

  • Frequent Contributor
  • **
  • Posts: 495
  • Country: sk
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #26 on: September 01, 2021, 11:51:28 pm »
Type punning via an union like this is explicitly allowed in C. 
Please give chapter and verse.

JW
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #27 on: September 02, 2021, 07:18:13 am »
Type punning via an union like this is explicitly allowed in C.
Please give chapter and verse.
C99 6.5.2.3, footnote 82:
Quote
If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
C99 6.2.6 describes the rules for representations of values, like padding bits et cetera.

The exact same wording is in C11 (6.5.2.3, footnote 95), and in C17 (6.5.2.3, footnote 97).

If you are one of the language lawyers who insist that footnotes are not normative so I shouldn't say "explicitly allows", just fuck off.  The footnotes exist for a purpose, they're to clarify the intent of the text of the standard.  Any C compiler that fails to allow type punning via an union as described in that footnote will break a lot of existing code, and therefore be useless to most people (all except language lawyer wanks).  Reality beats language lawyers (with a cluebat) every time.
 
The following users thanked this post: SiliconWizard

Offline wek

  • Frequent Contributor
  • **
  • Posts: 495
  • Country: sk
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #28 on: September 02, 2021, 05:42:50 pm »
C99 6.5.2.3, footnote 82:
Thanks.

This obviously boils down to definition of "explicitly allowed".

I just wondered if I overlooked something more substantial - and by that I don't mean the fact that it's a footnote (otherwise I would just fuck off as per your recommendation), but that it refers to "[re]interpretation as an object representation", which are the traditional murky waters of C.

In this regard, this won't make type punning through unions any more or less "allowed" than type punning through pointer casting (C99 6.3.2.3 in its entire beauty). Both ultimately result in what C99 calls "unspecified behaviour", which can be translated as "whatever" ("The value of a union member other than the last one stored into" in J1). Arguably, with unions, it's easier for compiler makers to get it "right".

JW

PS. Apologies to OP for the OT.
PS2. Full disclosure: I do use type punning in its both forms extensively.
« Last Edit: September 02, 2021, 05:48:11 pm by wek »
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #29 on: September 03, 2021, 06:14:17 am »
Full disclosure: I do use type punning in its both forms extensively.
Yep: that's exactly why the compilers do have to support them.

(I do recommend the union form, simply because the dereferencing a pointer form can run afoul of some of the seemingly unrelated pet rules compiler developers have; especially strict aliasing and what that means when determining whether an object value has changed or not.  It is one of the cases where "volatile" can seem to "fix" the situation, but it really only does so because it slightly changes the rules applied... In other words, the union form is less likely to conflict with other stuff.)

C is extensively used for hardware interfacing and interchange formats, and these often require storage representations to be reinterpreted as a different type.  It would be nicer to have an explicit keyword for such, but the amount of existing C code that does type punning these ways is just too big to make such a change worthwhile.  It would just be another C11 Annex K that almost nobody ever uses.

It is easy to forget that it is the C compilers and libraries that the users use, not the text of the standard.  The standard exists to help users and compiler developers agree on the rules used.  If the standard text changes, it does not affect the existing working programs.  If the compiler developers change the rules in a way that break existing, previously working code, the users will be unhappy and switch to another compiler; possibly even fork the compiler itself.  Conversely, the compilers must evolve, or their developers will switch to a new one to work on.  The ye olde EGCS fork off GCC a couple of decades ago is an informational tale on this.

Similarly, the slow adoption of C11 and C17 shows that adding features to the standard does not mean the compilers or C libraries will automatically implement them, or that the users will use or demand them.  Adding features users don't need or use is not going to help the compiler or the standard library implementation, and is basically a waste of effort.

We know, from decades of practical experience, how the implementations and the standard interact and evolve.  It is directly observable.  This is why I get so angry when I perceive "language lawyerism" –– so called, because it ignores the observable reality and instead relies on (re)interpretation of the text describing the consensus, and ignores the wider reaching real-world effects of doing so.

The text is not the key; the consensus it attempts to describe is.

(I consider myself a proponent of standards and of portable code where possible, although I do use compiler-specific extensions when necessary.  Most of the systems programming I do relies on C99 and POSIX.1-2008, but that's just a quirk of my focus.  The important thing to me is that I know the standard texts are imperfect, and can be interpreted in many different ways.  Thus, I must insist that the most practical interpretation of the standard text is the correct one; just like in physics, the model that best reproduces the observed behaviour and provides useful predictions, is the correct one to rely on.)
« Last Edit: September 03, 2021, 06:21:46 am by Nominal Animal »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3697
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #30 on: September 05, 2021, 03:55:59 pm »
"On the Z80, you do have 16-bit atomic loads and stores as long as the address is in one of the register pairs (BC, DE, HL), and even an atomic 16-bit swap (but only to the top of the stack), and also 16-bit addition and subtraction, so there are several different ways one could do it atomically.  "

I don't remember clearly but they aren't atomic if using DMA. On the 32F4, I have no idea but with it doing quite a lot in one CPU clock cycle one has a better chance of finding a way.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gcewing

  • Regular Contributor
  • *
  • Posts: 197
  • Country: nz
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #31 on: September 09, 2021, 11:42:49 am »
the bit I could never understand is how it worked given that it was obviously impossible to read the two counters atomically. There could easily be pulses happening betwen reading one and reading the other.
It probably didn't matter. The count might be out by +/-1 on any given read, but that could happen anyway if you had just happened to read a little sooner or a little later, and the errors wouldn't accumulate.
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #32 on: September 09, 2021, 12:26:24 pm »
Also, true atomicity isn't usually required; we usually only need consistency.

For example, if you have a large structure, you can use two atomic generation counters; even a single byte suffices.  Have the writer increment counter A, then update the structure, then increment counter B.  Readers read counter B first, copy the structure to local store, and then the counter A.  Readers repeat until the two counters read as the same value.

If you have room for two or more copies of the structure, you can use two counters as indexes referring to the copies of the structure.  Readers use one index, writers use the other.  Writers increment the write index, store the updated structure to the referred to slot, then set the read index to match.  Readers read the read index once, then copy the structure from that slot to local store.

In both cases, only the counter or index needs to be atomic.  In the former, consistency is achieved by repeated reads.  In the latter, consistency is only achieved if updates are rare enough compared to reads, and reads are not interrupted by long-running tasks.  To achieve true consistency in the latter, generation counters can be added for each slot.

This logic extends to (inode-based) filesystem operations as well.  We can't write a file atomically, but we can first create it with a temporary name in the target filesystem, then rename it over an existing file.  Any processes having the old contents (inode) open, are not affected as the inode is only deleted when the last open handle is closed, and the inode no longer has a file name associated with it.  Opening or re-opening the file yields access to the new contents.  (For robustness, I recommend using a temporary subdirectory relative to the target file to put the replacement file in.  This avoids certain possible races if other processes are scanning the target directory contents while the rename occurs and both old and new name are in the same directory; I do believe on some file systems there is a small time window when the scanning process misses both the old and new files, and does not see any file at all.)
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3697
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #33 on: September 14, 2021, 02:17:17 pm »
A final update - a distillation of the previous posts

Code: [Select]

// Enable CYCCNT as a free running counter, for various timeouts where we don't want to rely on interrupts

//#define CoreDebug_DEMCR_TRCENA_Msk         (1UL << 24U)
//#define DWT_CTRL_CYCCNTENA_Msk             0x1UL


CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
DWT->CYCCNT = 0;
DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;

uint32_t start_time;
uint32_t max_count = 168000000;  // 168M = 1 sec

do
{
debug_puts("+. ");
start_time = DWT->CYCCNT;
// 1 second wait
while((DWT->CYCCNT-start_time) < max_count) ;
}
while (true);
« Last Edit: September 14, 2021, 02:24:23 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf