Note that the ARM cycle counter (
CYCCNT) must be enabled in the
DWT; it may not be enabled by default.
In C, and on all 32-bit ARM arches, given
uint32_t before, after; after - beforeis equivalent to
(uint32_t)(after - before)and is nonnegative. In the case where
before > after, the value follows modulo 2
32 arithmetic, and is exactly
232-before+after.
No tricks needed, no ifs needed. If you know
before occurred before
after, and the difference between the two was less than 2
32 cycles, then
after - before always yields the correct number of cycles in between, even when
before > after, due to modulo arithmetic rules. And those are enforced both by GCC/Clang, and the ARM hardware.
Signed arithmetic differs, except that both GCC and Clang on all 32-bit arches always implement twos complement modulo arithmetic (with representable values ranging from -2
31 to 2
31-1, inclusive). You can think of it as adding or subtracting 2
32 to/from the result, until it is within the representable range. In reality, the overflow is either lost, or saved in a flag. GCC and Clang do provide
integer overflow built-ins as extensions to C and C++, if you do want to detect signed or unsigned overflow.
If you want to use 64-bit cycle counter, you can use for example
static volatile word64 cycle_counter;
static inline uint64_t cycles(void)
{
word64 old_state, new_state;
old_state = cycle_counter;
new_state.u32_lo = ARM_DWT_CYCCNT; // or DWT->CYCCNT, dunno
new_state.u32_hi = old_state.u32_hi + (new_state.u32_lo < old_state.u32_lo);
cycle_counter = new_state; // TODO: Make this atomic!
return new_state.u64;
}
where the
word64 type is an union,
#include <stdint.h>
typedef union {
uint64_t u64;
int64_t i64;
#ifdef __BYTE_ORDER__-0 == __ORDER_LITTLE_ENDIAN__-0
struct {
uint32_t u32_lo;
uint32_t u32_hi;
};
struct {
uint32_t i32_lo;
uint32_t i32_hi;
};
#elif __BYTE_ORDER__-0 == __ORDER_BIG_ENDIAN__-0
struct {
uint32_t u32_hi;
uint32_t u32_lo;
};
struct {
uint32_t i32_hi;
uint32_t i32_lo;
};
#else
#error Unknown byte order
#endif
} word64;
Type punning via an union like this is explicitly allowed in C. The
__BYTE_ORDER__ check is needed (this is written for GCC and Clang, other compilers may export other macros) because the byte order (endianness) of the target architecture determines whether the high or low 32 bits of the combined 64-bit value is stored first. You could use bit shifts for portable code, but cycle counting (CYCCNT and similar, for example
__builtin_ia32_rdtsc() on x86 and x86-64) itself is already something very hardware-dependent; I believe that in this particular case, the type punning approach is better/warranted.
If the
cycle_counter = new_state; assignment was atomic, or done while interrupts disabled on a single-core system, then it would be safe to call
cycles() at any point, even in an interrupt context. As it is written now, it is not reliable if it is ever called in an interrupt context, because there is a tiny possibility of the interrupt occurring exactly in the middle of the 64-bit assignment, since it is done using two 32-bit assignments.
If you cannot make it atomic (noting that it is not enough to unconditionally disable then enable interrupts, because it might be called with interrupts already disabled), then there are ways to rewrite the function to do it anyway (because the cycle counter grows upward, and we can more or less assume that any interrupts take less than 2³⁰ or so cycles), but it gets a bit complicated and much less robust.
The "odd" part in it,
+ (new_state.u32_lo < old_state.u32_lo), simply adds 1 if the low 32 bits wrap around. In C, logical expressions evaluate to 1 if true, and to zero if false. We just need to ensure the order of operations using the parentheses, and the compiler will handle the details for us.
Finally, note that you need to call
cycles() at least once per every 2
32 cycles, so that CYCCNT wraparound is correctly detected. Perhaps in SysTick, or some other periodic timer you use.