Author Topic: 32F4 - a timer without interrupts - CYCCNT  (Read 4972 times)

0 Members and 1 Guest are viewing this topic.

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7533
  • Country: fi
    • My home page and email address
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #25 on: September 01, 2021, 10:15:42 pm »
On the Z80, you do have 16-bit atomic loads and stores as long as the address is in one of the register pairs (BC, DE, HL), and even an atomic 16-bit swap (but only to the top of the stack), and also 16-bit addition and subtraction, so there are several different ways one could do it atomically.  Expressing them in C is difficult, though.

Cortex-M4 (see PM0214 at st.com) implements load-linked, store-conditional primitives.  Essentially, you have a short interval, a few instructions, to do some operations, and then you can do a store-conditional write to that variable.  If that variable was not accessed in between (by anything in the system), the store succeeds; otherwise, the store fails (the value is not updated!), with the success/failure in a (chosen) register.  CMSIS provides these as __LDREXB()/__LDREXH()/__LDREXW() and __STREXB()/__STREXH()/__STREXW() macros or functions.

For the 64-bit cycle counter, you can make it "lockless" by using two copies of the 64-bit counter, and an 8-bit LL/SC index or offset indicating which copy is valid:
Code: [Select]
static volatile word64   cycles_count[2];
static volatile uint8_t  cycles_index;  /* 0 if cycles_count[0] is valid, 8 if cycles_count[1] is valid */

uint64_t  cycles(void)
{
    word64    count;
    uint32_t  i;

    /* Load most recently fully updated value */
    do {
        i = __LDREXB(&cycles_index) & 8;
        count = cycles_count[i >> 3];
    } while (__STREXB(i, &cycles_index));

    const uint32_t  cyccnt = ARM_DWT_CYCCNT; /* Or DWT->CYCCNT */

    /* Update cycle count */
    count.u32_hi += (cyccnt < count.u32_lo);
    count.u32_lo  =  cyccnt;

    /* Store the cycle count, and if valid, the index too. */
    do {
        i = (__LDREXB(&cycles_index) ^ 8) & 8;
        cycles_count[i >> 3] = count;
    } while (__STREXB(i, &cycles_index));

    return count.u64;
}
The cycles_index contains the byte offset to the currently valid cycles_index entry.

The first do { .. } while loop obtains the currently valid 64-bit counter state.  It almost always only does one iteration, as it only loops if the operation is interrupted (or some other code accesses cycles_index).  Because we cannot read the 64-bit counter state atomically, we do need a loop here.

We then read the cycle counter, and update the local copy of the 64-bit counter state.  (We cannot read the cycle counter before we know the old counter state, because otherwise we might see time flowing backwards.)

Finally, in the second do { .. } while loop, we store the updated 64-bit counter value to the not-currently-valid entry.  If the store is not interrupted and no other code accesses cycles_index while we do the store, cycles_index is updated to reflect this new entry is now valid.  We do need a loop here, because there are only two slots we can use.  If there were more slots than it were possible to have nested cycles() calls (consider interrupts with different priorities!), then we could just do one iteration.  In practice, the loop is so rarely repeated, it is not worth it to use more than two slots.

Because cycles_index is only modified when the corresponding entry has been updated, the currently valid entry is never trashed.

If a cycles() call is interrupted by something that also does a cycles() call,  afterwards the valid entry reflects the 64-bit cycle counter value in the outermost call (which is obviously a bit earlier than the innermost cycles() call obtained).  The return values from cycles() are monotonic; it is only the concurrent cycles() calls that can cause the internal state to change in a non-monotonic manner.  It only matters if you do debugging and examine the cycle counter state variables: then, if you have concurrent calls to cycle(), say one call is interrupted by a function that also calls cycles(), then you can see the internal state updated in a non-monotonic order.

The above probably needs the following definitions:
Code: [Select]
#include <stdint.h>

typedef union {
    uint64_t        u64;
    int64_t         i64;
    uint32_t        u32[2];
    int32_t         i32[2];
#if defined(__BYTE_ORDER__) && defined(__ORDER_LITTLE_ENDIAN__) && __BYTE_ORDER__-0 == __ORDER_LITTLE_ENDIAN__-0
    struct {
        uint32_t    u32_lo;
        uint32_t    u32_hi;
    };
    struct {
        int32_t     i32_lo;
        int32_t     i32_hi;
    };
#elif defined(__BYTE_ORDER__) && defined(__ORDER_BIG_ENDIAN__) && __BYTE_ORDER__-0 == __ORDER_BIG_ENDIAN__-0
    struct {
        uint32_t    u32_hi;
        uint32_t    u32_lo;
    };
    struct {
        int32_t     i32_hi;
        int32_t     i32_lo;
    };
#else
#error Unknown byte order.  Define __BYTE_ORDER__ to __ORDER_LITTLE_ENDIAN__ or to __ORDER_BIG_ENDIAN__.
#endif
} word64;

__attribute__((always_inline))
static inline uint8_t  __LDREXB(const volatile uint8_t *addr)
{
    uint32_t  result;
    asm volatile ("ldrexb\t%0, [%1]\n\t"
                 : "=r" (result)
                 : "r" (addr)
                 );
    return result;
}

__attribute__((always_inline))
static inline uint32_t  __STREXB(uint8_t val, volatile uint8_t *addr)
{
    uint32_t  result;
    asm volatile ("strexb\t%0, %1, [%2]\n\t"
                 : "=&r" (result)
                 : "r" (val), "r" (addr)
                 );
    return result;
}

__attribute__((always_inline))
static inline uint16_t  __LDREXH(const volatile uint16_t *addr)
{
    uint32_t  result;
    asm volatile ("ldrexh\t%0, [%1]\n\t"
                 : "=r" (result)
                 : "r" (addr)
                 );
    return result;
}

__attribute__((always_inline))
static inline uint32_t  __STREXH(uint16_t val, volatile uint16_t *addr)
{
    uint32_t  result;
    asm volatile ("strexh\t%0, %1, [%2]\n\t"
                 : "=&r" (result)
                 : "r" (val), "r" (addr)
                 );
    return result;
}

__attribute__((always_inline))
static inline uint32_t  __LDREXW(const volatile uint32_t *addr)
{
    uint32_t  result;
    asm volatile ("ldrexw\t%0, [%1]\n\t"
                 : "=r" (result)
                 : "r" (addr)
                 );
    return result;
}

__attribute__((always_inline))
static inline uint32_t  __STREXW(uint32_t val, volatile uint32_t *addr)
{
    uint32_t  result;
    asm volatile ("strexh\t%0, %1, [%2]\n\t"
                 : "=&r" (result)
                 : "r" (val), "r" (addr)
                 );
    return result;
}

Note that the same approach works no matter how large each slot (cycle_counter) is; it could very well be a structure.  Each does need their own index or offset variable, of course.  Because the load-linked store-conditional is done only for the duration of copying the structure (and incrementing the index in the store case), even a larger structure does not increase the retry count, because the window for interruption is just a few clock cycles each time.  Again, there is no race window per se, because even if the interrupt occurs at the worst possible moment, that only means the loop does another iteration, and each iteration only takes a few clock cycles anyway.
 

Offline wek

  • Frequent Contributor
  • **
  • Posts: 561
  • Country: sk
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #26 on: September 01, 2021, 11:51:28 pm »
Type punning via an union like this is explicitly allowed in C. 
Please give chapter and verse.

JW
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7533
  • Country: fi
    • My home page and email address
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #27 on: September 02, 2021, 07:18:13 am »
Type punning via an union like this is explicitly allowed in C.
Please give chapter and verse.
C99 6.5.2.3, footnote 82:
Quote
If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
C99 6.2.6 describes the rules for representations of values, like padding bits et cetera.

The exact same wording is in C11 (6.5.2.3, footnote 95), and in C17 (6.5.2.3, footnote 97).

If you are one of the language lawyers who insist that footnotes are not normative so I shouldn't say "explicitly allows", just fuck off.  The footnotes exist for a purpose, they're to clarify the intent of the text of the standard.  Any C compiler that fails to allow type punning via an union as described in that footnote will break a lot of existing code, and therefore be useless to most people (all except language lawyer wanks).  Reality beats language lawyers (with a cluebat) every time.
 
The following users thanked this post: SiliconWizard

Offline wek

  • Frequent Contributor
  • **
  • Posts: 561
  • Country: sk
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #28 on: September 02, 2021, 05:42:50 pm »
C99 6.5.2.3, footnote 82:
Thanks.

This obviously boils down to definition of "explicitly allowed".

I just wondered if I overlooked something more substantial - and by that I don't mean the fact that it's a footnote (otherwise I would just fuck off as per your recommendation), but that it refers to "[re]interpretation as an object representation", which are the traditional murky waters of C.

In this regard, this won't make type punning through unions any more or less "allowed" than type punning through pointer casting (C99 6.3.2.3 in its entire beauty). Both ultimately result in what C99 calls "unspecified behaviour", which can be translated as "whatever" ("The value of a union member other than the last one stored into" in J1). Arguably, with unions, it's easier for compiler makers to get it "right".

JW

PS. Apologies to OP for the OT.
PS2. Full disclosure: I do use type punning in its both forms extensively.
« Last Edit: September 02, 2021, 05:48:11 pm by wek »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7533
  • Country: fi
    • My home page and email address
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #29 on: September 03, 2021, 06:14:17 am »
Full disclosure: I do use type punning in its both forms extensively.
Yep: that's exactly why the compilers do have to support them.

(I do recommend the union form, simply because the dereferencing a pointer form can run afoul of some of the seemingly unrelated pet rules compiler developers have; especially strict aliasing and what that means when determining whether an object value has changed or not.  It is one of the cases where "volatile" can seem to "fix" the situation, but it really only does so because it slightly changes the rules applied... In other words, the union form is less likely to conflict with other stuff.)

C is extensively used for hardware interfacing and interchange formats, and these often require storage representations to be reinterpreted as a different type.  It would be nicer to have an explicit keyword for such, but the amount of existing C code that does type punning these ways is just too big to make such a change worthwhile.  It would just be another C11 Annex K that almost nobody ever uses.

It is easy to forget that it is the C compilers and libraries that the users use, not the text of the standard.  The standard exists to help users and compiler developers agree on the rules used.  If the standard text changes, it does not affect the existing working programs.  If the compiler developers change the rules in a way that break existing, previously working code, the users will be unhappy and switch to another compiler; possibly even fork the compiler itself.  Conversely, the compilers must evolve, or their developers will switch to a new one to work on.  The ye olde EGCS fork off GCC a couple of decades ago is an informational tale on this.

Similarly, the slow adoption of C11 and C17 shows that adding features to the standard does not mean the compilers or C libraries will automatically implement them, or that the users will use or demand them.  Adding features users don't need or use is not going to help the compiler or the standard library implementation, and is basically a waste of effort.

We know, from decades of practical experience, how the implementations and the standard interact and evolve.  It is directly observable.  This is why I get so angry when I perceive "language lawyerism" –– so called, because it ignores the observable reality and instead relies on (re)interpretation of the text describing the consensus, and ignores the wider reaching real-world effects of doing so.

The text is not the key; the consensus it attempts to describe is.

(I consider myself a proponent of standards and of portable code where possible, although I do use compiler-specific extensions when necessary.  Most of the systems programming I do relies on C99 and POSIX.1-2008, but that's just a quirk of my focus.  The important thing to me is that I know the standard texts are imperfect, and can be interpreted in many different ways.  Thus, I must insist that the most practical interpretation of the standard text is the correct one; just like in physics, the model that best reproduces the observed behaviour and provides useful predictions, is the correct one to rely on.)
« Last Edit: September 03, 2021, 06:21:46 am by Nominal Animal »
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4605
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #30 on: September 05, 2021, 03:55:59 pm »
"On the Z80, you do have 16-bit atomic loads and stores as long as the address is in one of the register pairs (BC, DE, HL), and even an atomic 16-bit swap (but only to the top of the stack), and also 16-bit addition and subtraction, so there are several different ways one could do it atomically.  "

I don't remember clearly but they aren't atomic if using DMA. On the 32F4, I have no idea but with it doing quite a lot in one CPU clock cycle one has a better chance of finding a way.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gcewing

  • Regular Contributor
  • *
  • Posts: 207
  • Country: nz
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #31 on: September 09, 2021, 11:42:49 am »
the bit I could never understand is how it worked given that it was obviously impossible to read the two counters atomically. There could easily be pulses happening betwen reading one and reading the other.
It probably didn't matter. The count might be out by +/-1 on any given read, but that could happen anyway if you had just happened to read a little sooner or a little later, and the errors wouldn't accumulate.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7533
  • Country: fi
    • My home page and email address
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #32 on: September 09, 2021, 12:26:24 pm »
Also, true atomicity isn't usually required; we usually only need consistency.

For example, if you have a large structure, you can use two atomic generation counters; even a single byte suffices.  Have the writer increment counter A, then update the structure, then increment counter B.  Readers read counter B first, copy the structure to local store, and then the counter A.  Readers repeat until the two counters read as the same value.

If you have room for two or more copies of the structure, you can use two counters as indexes referring to the copies of the structure.  Readers use one index, writers use the other.  Writers increment the write index, store the updated structure to the referred to slot, then set the read index to match.  Readers read the read index once, then copy the structure from that slot to local store.

In both cases, only the counter or index needs to be atomic.  In the former, consistency is achieved by repeated reads.  In the latter, consistency is only achieved if updates are rare enough compared to reads, and reads are not interrupted by long-running tasks.  To achieve true consistency in the latter, generation counters can be added for each slot.

This logic extends to (inode-based) filesystem operations as well.  We can't write a file atomically, but we can first create it with a temporary name in the target filesystem, then rename it over an existing file.  Any processes having the old contents (inode) open, are not affected as the inode is only deleted when the last open handle is closed, and the inode no longer has a file name associated with it.  Opening or re-opening the file yields access to the new contents.  (For robustness, I recommend using a temporary subdirectory relative to the target file to put the replacement file in.  This avoids certain possible races if other processes are scanning the target directory contents while the rename occurs and both old and new name are in the same directory; I do believe on some file systems there is a small time window when the scanning process misses both the old and new files, and does not see any file at all.)
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4605
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F4 - a timer without interrupts - CYCCNT
« Reply #33 on: September 14, 2021, 02:17:17 pm »
A final update - a distillation of the previous posts

Code: [Select]

// Enable CYCCNT as a free running counter, for various timeouts where we don't want to rely on interrupts

//#define CoreDebug_DEMCR_TRCENA_Msk         (1UL << 24U)
//#define DWT_CTRL_CYCCNTENA_Msk             0x1UL


CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
DWT->CYCCNT = 0;
DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;

uint32_t start_time;
uint32_t max_count = 168000000;  // 168M = 1 sec

do
{
debug_puts("+. ");
start_time = DWT->CYCCNT;
// 1 second wait
while((DWT->CYCCNT-start_time) < max_count) ;
}
while (true);
« Last Edit: September 14, 2021, 02:24:23 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf