Fast unsigned integer multiply by x100 on 8bit AVR?

Quote from: beduino on 14 Jan, 2019 19:57

Sorry, didn't notice in a hurry earlier that we have +=100 while was so shocked that AVR C generator was not able generate code with less amount of paintfull loops but 2,5,6 for shifting as mentioned before

You asked it to optimize for size, which will request it to produce the most compact code. In general loops are more compact than other forms of repetition so optimizing for size may favor loops.

Quote

Trying to figure out how this loop showed in your code might help ensure we have consistent timer counter and timer counter.

Well, I did have < wrong (should have been t1 > TCNT0)

Code: [Select]

uint32_t get_time(){
    uint32_t t0; uint8_t t1;
    do{
        t1 = TCNT0;
        t0 = avr_time_counter;
    }while(t1 > TCNT0); //do again if overflowed
    return t0+t1;
}



/*
00000006 <get_time>:
   6:   22 b7           in      r18, 0x32       ; 50
   8:   80 91 60 00     lds     r24, 0x0060     ; 0x800060 <_edata>
   c:   90 91 61 00     lds     r25, 0x0061     ; 0x800061 <_edata+0x1>
  10:   a0 91 62 00     lds     r26, 0x0062     ; 0x800062 <_edata+0x2>
  14:   b0 91 63 00     lds     r27, 0x0063     ; 0x800063 <_edata+0x3>
  18:   32 b7           in      r19, 0x32       ; 50
  1a:   32 17           cp      r19, r18
  1c:   a0 f3           brcs    .-24            ; 0x6 <get_time>
  1e:   68 2f           mov     r22, r24
  20:   79 2f           mov     r23, r25
  22:   8a 2f           mov     r24, r26
  24:   9b 2f           mov     r25, r27
  26:   62 0f           add     r22, r18
  28:   71 1d           adc     r23, r1
  2a:   81 1d           adc     r24, r1
  2c:   91 1d           adc     r25, r1
  2e:   08 95           ret
*/

It simply gets 'avr_time_counter' (which has to be a volatile, I assume it is), and also gets TCNT0, then it gets TCNT0 again and if the latest version (TCNT0) is less than t1, TCNT0 must have overflowed so do again. When TCNT0 >= t1, no overflow could have happened (not really true, but assuming you have no other interrupts that can exceed 10ms).

t1 = 99
t0 = 12300
//TCNT0 just rolled over, fired the isr, and is now back here
t1 > TCNT0 ? yes, do again - t1 = 99, TCNT now is 0 (less than 99 anyway)

t1 = 1
t0 = 12400
t1 > TCNT0 ? no, all done, no rollover, t1 = 1, TCNT0 = 3 (lets say)

Quote

but unsure whether is it cleared by hardware at the end of ISR when "reti" is called from ISR, or earlier at the begining of ISR handling

I would dare say that flag is already clear upon entering the isr as hardware clears it 'when executing' the vector- which also means if global interrupts not enabled, no vector execute, no flag clear (whcih means you can poll for it and clear it yourself if interrupts not used).

I don't have avr hardware (that I want to dig out), but I think the isr simply becomes-

ISR(TIM0_COMPA_vect){
avr_time_counter += 100;
}

for a pic16, I use the nco as a system clock, and essentially do the same thing-
https://github.com/cv007/SNaPmate/blob/c334890d41945d3a2af0ab1c50772f087cf514a8/nco.c#L76
the nco has a 20bit counter with 2us resoltion (internal 500Khz clock), I keep track of overflows and inc a counter +16, then when time wanted I do the function linked above, and shift my counter up into the upper 12 bits, to get 32bits total

The Germans dis some neat 64bit routines here (use google xlate)
https://www.mikrocontroller.net/topic/237643?reply_to=2411363#2411398

Watch out for "linker trickery" , if replacing the built in routines w. the asm code,

/Bingo

Quote from: cv007 on 14 Jan, 2019 20:20

Quote
Trying to figure out how this loop showed in your code might help ensure we have consistent timer counter and timer counter.
Well, I did have < wrong (should have been t1 > TCNT0)

Yep, it looked strange because of at 10kHz timer CTC with 8MHz F_CPU system clock, ISR should complete in fraction of time tick period - by time tick I mean ISR call btw..

I've changed ISR code and +=100 looks like this:

Code: [Select]

  30               	__vector_10:
  31 0014 1F92      		push r1
  32 0016 0F92      		push r0
  33 0018 0FB6      		in r0,__SREG__
  34 001a 0F92      		push r0
  35 001c 1124      		clr __zero_reg__
  36 001e 8F93      		push r24
  37 0020 9F93      		push r25
  38 0022 AF93      		push r26
  39 0024 BF93      		push r27
  40               	/* prologue: Signal */
  41               	/* frame size = 0 */
  42               	/* stack size = 7 */
  43               	.L__stack_usage = 7
  44 0026 8091 0000 		lds r24,avr_time_counter
  45 002a 9091 0000 		lds r25,avr_time_counter+1
  46 002e A091 0000 		lds r26,avr_time_counter+2
  47 0032 B091 0000 		lds r27,avr_time_counter+3
  48 0036 8C59      		subi r24,-100
  49 0038 9F4F      		sbci r25,-1
  50 003a AF4F      		sbci r26,-1
  51 003c BF4F      		sbci r27,-1
  52 003e 8093 0000 		sts avr_time_counter,r24
  53 0042 9093 0000 		sts avr_time_counter+1,r25
  54 0046 A093 0000 		sts avr_time_counter+2,r26
  55 004a B093 0000 		sts avr_time_counter+3,r27
  56               	/* epilogue start */
  57 004e BF91      		pop r27
  58 0050 AF91      		pop r26
  59 0052 9F91      		pop r25
  60 0054 8F91      		pop r24
  61 0056 0F90      		pop r0
  62 0058 0FBE      		out __SREG__,r0
  63 005a 0F90      		pop r0
  64 005c 1F90      		pop r1
  65 005e 1895      		reti

When added at the begining of avr_time_us_get() wait for OCF0A cleared in TIFR by ISR during its execution, since OCF0A is set when TCNT0 is reset to 0 in CTC as shown in attached image from Atmega328 intro to interrupts

Code: [Select]

	//nop();
	uint8_t isr_is;
	do { 
		isr_is= ((TIFR & (1<<OCF0A) )==0 ? 0 : 1 );
	} while (isr_is );
	//nop();

without any do-while loop it looks like this, but have no idea for the moment
howto OCF0A set in TIFR when TCNT0 becomes 0 could be usefull if at all in this function:

Code: [Select]

  87               	avr_time_us_get:
  88               	/* prologue: function */
  89               	/* frame size = 0 */
  90               	/* stack size = 0 */
  91               	.L__stack_usage = 0
  92               	/* #APP */
  93               	 ;  12 "avr_utils.c" 1
  94 007c 0000      		nop
  95               	 ;  0 "" 2
  96               	/* #NOAPP */
  97               	.L7:
  98 007e 08B6      		in __tmp_reg__,0x38
  99 0080 04FC      		sbrc __tmp_reg__,4
 100 0082 00C0      		rjmp .L7
 101               	/* #APP */
 102               	 ;  12 "avr_utils.c" 1
 103 0084 0000      		nop
 104               	 ;  0 "" 2
 105               	/* #NOAPP */
 106 0086 22B7      		in r18,0x32
 107 0088 6091 0000 		lds r22,avr_time_counter
 108 008c 7091 0000 		lds r23,avr_time_counter+1
 109 0090 8091 0000 		lds r24,avr_time_counter+2
 110 0094 9091 0000 		lds r25,avr_time_counter+3
 111 0098 620F      		add r22,r18
 112 009a 711D      		adc r23,__zero_reg__
 113 009c 811D      		adc r24,__zero_reg__
 114 009e 911D      		adc r25,__zero_reg__
 115 00a0 0895      		ret

We can see that this check for this flag set can be very fast, but probably useless since ISR clear this flag during its execution.

Code: [Select]

  97               	.L7:
  98 007e 08B6      		in __tmp_reg__,0x38
  99 0080 04FC      		sbrc __tmp_reg__,4
 100 0082 00C0      		rjmp .L7

I wouldn't like to disable global interrupts and make things simple,
so I will look closer to your approach for this synchronization if it fits my needs, but I do not like this do-while loop,
since I'd like to catch time difference as fast as possible, so there should be enougth time to do some additional computations, so even x100 multiply shouldn't be such horrible, but by using +=100 trick in ISR now the onlly thing is to get consistent not corrupted "avr_time_counter" 32bit value

Quote from: bingo600 on 14 Jan, 2019 21:13

The Germans dis some neat 64bit routines here (use google xlate)

Thanks for this hint, anyway I do not speak or read German

Can you make your timer roll over at 256 ticks rather than at 100? This way you eliminate the need for multiplication completely.

Quote

but probably useless since ISR clear this flag during its execution

You cannot make use of that flag if using the isr.

Quote

but I do not like this do-while loop

it only repeats if there is a problem. Most of the time it will not repeat, but if you happen to hit the wrong time, you will get 1 repeat. Hardly a deal killer to get the correct time, and you have to do 'something' about the problem.

Quote

Can you make your timer roll over at 256 ticks rather than at 100? This way you eliminate the need for multiplication completely.

That's probably a better idea- just let the counter run and use the overflow irq, increment the counter by 256 in the isr. Probably not a big difference, though (but still better). (the multiplication is eliminated in the +=100 version also, by the way)

Code: [Select]

//obviously a lot is missing- like timer setup, and so on, this is minimal to show the idea

#include <avr/io.h>
#include <avr/interrupt.h>

volatile uint32_t avr_time_counter;

uint32_t get_time(){
    uint32_t t;
    do{
        t = TCNT0 | avr_time_counter;
    }while((uint8_t)t > TCNT0); //do again if overflowed
    return t;
}

int main(void) {}

ISR(TIM0_OVF_vect){
    avr_time_counter += 256;
}

/*
00000006 <get_time>:
   6:   22 b7           in      r18, 0x32       ; 50
   8:   80 91 60 00     lds     r24, 0x0060     ; 0x800060 <_edata>
   c:   90 91 61 00     lds     r25, 0x0061     ; 0x800061 <_edata+0x1>
  10:   a0 91 62 00     lds     r26, 0x0062     ; 0x800062 <_edata+0x2>
  14:   b0 91 63 00     lds     r27, 0x0063     ; 0x800063 <_edata+0x3>
  18:   68 2f           mov     r22, r24
  1a:   79 2f           mov     r23, r25
  1c:   8a 2f           mov     r24, r26
  1e:   9b 2f           mov     r25, r27
  20:   62 2b           or      r22, r18
  22:   22 b7           in      r18, 0x32       ; 50
  24:   26 17           cp      r18, r22
  26:   78 f3           brcs    .-34            ; 0x6 <get_time>
  28:   08 95           ret
*/

Quote from: snarkysparky on 14 Jan, 2019 18:45

I say answer the OP question as he posted it. Maybe I want to marry a chicken with a porcupine , don't ask why, if you don't have any approaches then don't reply.

The better lesson -- for OPs and readers alike -- is to recognize that there are better root solutions out there, so don't ask XY Problems.

(I was wondering how long it would be until someone noticed this was an XY problem

)

Tim

Rats. And I thought I was being so clever with the multiple word sizes...

Quote

>
Quote
If you just use multiplication, good chance the C compiler can figure out everything by itself.

Nope, __mulsi3 is used when we let C compiler for too much

Change the optimization to -O3, and it will produce inline code for the 32bit multiply by 100...I believe that there are gcc-specific pragmas that will allow to to change optimization for a specific function.

Code: [Select]

volatile uint32_t counter;

uint32_t gettime() {
    uint32_t y = counter;
   0:   40 91 00 00     lds     r20, 0x0000     ; 0x800000 <__SREG__+0x7fffc1>
   4:   50 91 00 00     lds     r21, 0x0000     ; 0x800000 <__SREG__+0x7fffc1>
   8:   60 91 00 00     lds     r22, 0x0000     ; 0x800000 <__SREG__+0x7fffc1>
   c:   70 91 00 00     lds     r23, 0x0000     ; 0x800000 <__SREG__+0x7fffc1>
    y *= 100;
  10:   44 0f           add     r20, r20
  12:   55 1f           adc     r21, r21
  14:   66 1f           adc     r22, r22
  16:   77 1f           adc     r23, r23
  18:   44 0f           add     r20, r20
  1a:   55 1f           adc     r21, r21
  1c:   66 1f           adc     r22, r22
  1e:   77 1f           adc     r23, r23
  20:   db 01           movw    r26, r22
  22:   ca 01           movw    r24, r20
  24:   88 0f           add     r24, r24
  26:   99 1f           adc     r25, r25
  28:   aa 1f           adc     r26, r26
  2a:   bb 1f           adc     r27, r27
  2c:   88 0f           add     r24, r24
  2e:   99 1f           adc     r25, r25
  30:   aa 1f           adc     r26, r26
  32:   bb 1f           adc     r27, r27
  34:   48 0f           add     r20, r24
  36:   59 1f           adc     r21, r25
  38:   6a 1f           adc     r22, r26
  3a:   7b 1f           adc     r23, r27
  3c:   db 01           movw    r26, r22
  3e:   ca 01           movw    r24, r20
  40:   88 0f           add     r24, r24
  42:   99 1f           adc     r25, r25
  44:   aa 1f           adc     r26, r26
  46:   bb 1f           adc     r27, r27
  48:   88 0f           add     r24, r24
  4a:   99 1f           adc     r25, r25
  4c:   aa 1f           adc     r26, r26
  4e:   bb 1f           adc     r27, r27
  50:   84 0f           add     r24, r20
  52:   95 1f           adc     r25, r21
  54:   a6 1f           adc     r26, r22
  56:   b7 1f           adc     r27, r23
    y += TCNT0;
  58:   22 b7           in      r18, 0x32       ; 50
    return y;
  5a:   bc 01           movw    r22, r24
  5c:   cd 01           movw    r24, r26
  5e:   62 0f           add     r22, r18
  60:   71 1d           adc     r23, r1
  62:   81 1d           adc     r24, r1
  64:   91 1d           adc     r25, r1
}
  66:   08 95           ret

Quote from: cv007 on 14 Jan, 2019 22:29

Quote
but I do not like this do-while loop
it only repeats if there is a problem. Most of the time it will not repeat, but if you happen to hit the wrong time, you will get 1 repeat. Hardly a deal killer to get the correct time, and you have to do 'something' about the problem.

Anyway, I've decided to... do not use do-while loop, but instead added correction for time - something like back in time

Quote from: cv007 on 14 Jan, 2019 22:29

Quote
but probably useless since ISR clear this flag during its execution
You cannot make use of that flag if using the isr.

But, I've used this flag as you can see in assembler code above

Warning: Do not try this code at home - it is not tested yet, but "Patent pending

"

Update: Bug fixed in assembler listing below

Code: [Select]

  85               	.global	avr_time_us_get
  87               	avr_time_us_get:
  88               	/* prologue: function */
  89               	/* frame size = 0 */
  90               	/* stack size = 0 */
  91               	.L__stack_usage = 0
  92 007c 22B7      		in r18,0x32
  93               	.L7:
  94 007e 08B6      		in __tmp_reg__,0x38
  95 0080 04FC      		sbrc __tmp_reg__,4
  96 0082 00C0      		rjmp .L7
  97 0084 6091 0000 		lds r22,avr_time_counter
  98 0088 7091 0000 		lds r23,avr_time_counter+1
  99 008c 8091 0000 		lds r24,avr_time_counter+2
 100 0090 9091 0000 		lds r25,avr_time_counter+3
 101 0094 32B7      		in r19,0x32
 102 0096 3217      		cp r19,r18
 103 0098 00F4      		brsh .L8
 104 009a 6091 0000 		lds r22,avr_time_counter
 105 009e 7091 0000 		lds r23,avr_time_counter+1
 106 00a2 8091 0000 		lds r24,avr_time_counter+2
 107 00a6 9091 0000 		lds r25,avr_time_counter+3
 108 00aa 6456      		subi r22,100
 109 00ac 7109      		sbc r23,__zero_reg__
 110 00ae 8109      		sbc r24,__zero_reg__
 111 00b0 9109      		sbc r25,__zero_reg__
 112               	.L8:
 113 00b2 620F      		add r22,r18
 114 00b4 711D      		adc r23,__zero_reg__
 115 00b6 811D      		adc r24,__zero_reg__
 116 00b8 911D      		adc r25,__zero_reg__
 117 00ba 0895      		ret

Thanks for many hints

Quote from: T3sl4co1l on 14 Jan, 2019 22:47

The better lesson -- for OPs and readers alike -- is to recognize that there are better root solutions out there, so don't ask XY Problems.

Replace "don't ask XY problems" with "describe what you are trying to solve, rather than the problems you are having with your chosen solution to the original problem", and I'll agree. Otherwise it sounds like you don't want OP and readers alike to ask about their problems.

For what it's worth, I'd start with

Code: [Select]

#include <stdint.h>

static volatile unsigned char avr_timer_pre = 0;
static volatile unsigned char avr_timer_post = 0;
static volatile uint32_t      avr_timer_counter = 0;

#define  AVR_TIMER_STEP  128

void tcnt0_compare_match_isr(void)
{
    avr_timer_pre++;
    avr_timer_counter += AVR_TIMER_STEP;
    avr_timer_post++;
}

uint32_t get_time(void)
{
    uint32_t       counter;
    unsigned char  generation;
    do {
        generation = avr_timer_post;
        counter = avr_timer_counter;
    } while (generation != avr_timer_pre);
    return counter;
}

uint32_t get_time_coarse(void)
{
    uint32_t       counter;
    unsigned char  generation;
    do {
        generation = avr_timer_post;
        counter = avr_timer_counter;
    } while (generation != avr_timer_pre);
    return counter >> 7;
}

where the multiplier is 128 instead of 100. The get_timer() returns the original clock, and get_timer_coarse() the slower clock.

In a real implementation, mark all these functions static inline, so the compiler can inline them in their callsites; while that increases code size, it can cut down on unnecessary register moves.

The avr_timer_pre and avr_timer_post form a generation counter pair. It assumes the hardware does not reorder normal reads and writes to ram. Any timer modification begins with modifying the pre counter, and completes when modifying the post counter. When reading the timer counter, you start by remembering the post counter, then copy the timer counter. If the pre counter does not match the remembered post counter, reading the timer counter was interrupted by a modification, and you redo the entire operation. This is essentially a spin lock, where writers are never interrupted, but readers may have to spin. (Readers will only spin when each iteration is interrupted by a timer update; thus at most twice in normal operation.)

With old avr-gcc-4.9.2, using -Wall -Os -mmcu=attiny85, that gets you (omitting directives for simplicity)

Code: [Select]

tcnt0_compare_match_isr:
        lds r24,avr_timer_pre
        subi r24,lo8(-(1))
        sts avr_timer_pre,r24
        lds r24,avr_timer_counter
        lds r25,avr_timer_counter+1
        lds r26,avr_timer_counter+2
        lds r27,avr_timer_counter+3
        subi r24,-128
        sbci r25,-1
        sbci r26,-1
        sbci r27,-1
        sts avr_timer_counter,r24
        sts avr_timer_counter+1,r25
        sts avr_timer_counter+2,r26
        sts avr_timer_counter+3,r27
        lds r24,avr_timer_post
        subi r24,lo8(-(1))
        sts avr_timer_post,r24
        ret

get_time:
.L3:
        lds r25,avr_timer_post
        lds r20,avr_timer_counter
        lds r21,avr_timer_counter+1
        lds r22,avr_timer_counter+2
        lds r23,avr_timer_counter+3
        lds r24,avr_timer_pre
        cpse r25,r24
        rjmp .L3
        movw r24,r22
        movw r22,r20
        ret

get_time_coarse:
.L7:
        lds r25,avr_timer_post
        lds r20,avr_timer_counter
        lds r21,avr_timer_counter+1
        lds r22,avr_timer_counter+2
        lds r23,avr_timer_counter+3
        lds r24,avr_timer_pre
        cpse r25,r24
        rjmp .L7
        movw r24,r22
        movw r22,r20
        ldi r18,7
        1:
        lsr r25
        ror r24
        ror r23
        ror r22
        dec r18
        brne 1b
        ret

Now, let's say you wanted both a fine timer (every tick) and a coarse timer (every 1000th tick), and the division by one thousand is problematic. If you can accept an additional cost to the interrupt service routine, you can provide both, with zero added cost to readers. (The downside is jitter in the ISR duration; every thousandth one takes twice as long as a normal call.)

Code: [Select]

#include <stdint.h>

static volatile unsigned char avr_timer_pre;
static volatile uint32_t      avr_timer_fine;       /* = AVR_COARSE_STEPS * avr_timer_coarse + avr_timer_step */
static volatile uint32_t      avr_timer_coarse;
static volatile uint16_t      avr_timer_step;
static volatile unsigned char avr_timer_post;

#define  AVR_COARSE_STEPS  1000

void tcnt0_compare_match_isr(void)
{
    uint16_t  step;

    avr_timer_pre++;

    avr_timer_fine++;

    step = avr_timer_step;
    if (step >= AVR_COARSE_STEPS - 1) {
        avr_timer_coarse++;
        avr_timer_step = 0;
    } else {
        avr_timer_step = step + 1;
    }

    avr_timer_post++;
}

uint32_t get_time_coarse(void)
{
    uint32_t       coarse;
    unsigned char  generation;

    do {
        generation = avr_timer_post;
        coarse = avr_timer_coarse;
    } while (generation != avr_timer_pre);

    return coarse;
}

uint32_t get_time_fine(void)
{
    uint32_t       fine;
    unsigned char  generation;

    do {
        generation = avr_timer_post;
        fine = avr_timer_fine;
    } while (generation != avr_timer_pre);

    return fine;
}

The ISR becomes

Code: [Select]

tcnt0_compare_match_isr:
        lds r24,avr_timer_pre
        subi r24,lo8(-(1))
        sts avr_timer_pre,r24
        lds r24,avr_timer_fine
        lds r25,avr_timer_fine+1
        lds r26,avr_timer_fine+2
        lds r27,avr_timer_fine+3
        adiw r24,1
        adc r26,__zero_reg__
        adc r27,__zero_reg__
        sts avr_timer_fine,r24
        sts avr_timer_fine+1,r25
        sts avr_timer_fine+2,r26
        sts avr_timer_fine+3,r27
        lds r24,avr_timer_step
        lds r25,avr_timer_step+1
        cpi r24,-25
        ldi r18,3
        cpc r25,r18
        brlo .L2
        lds r24,avr_timer_coarse
        lds r25,avr_timer_coarse+1
        lds r26,avr_timer_coarse+2
        lds r27,avr_timer_coarse+3
        adiw r24,1
        adc r26,__zero_reg__
        adc r27,__zero_reg__
        sts avr_timer_coarse,r24
        sts avr_timer_coarse+1,r25
        sts avr_timer_coarse+2,r26
        sts avr_timer_coarse+3,r27
        sts avr_timer_step+1,__zero_reg__
        sts avr_timer_step,__zero_reg__
        rjmp .L3
.L2:
        adiw r24,1
        sts avr_timer_step+1,r25
        sts avr_timer_step,r24
.L3:
        lds r24,avr_timer_post
        subi r24,lo8(-(1))
        sts avr_timer_post,r24
        ret

and if you use AVR_COARSE_STEPS less than 256, you can change avr_timer_step and step variables to unsigned char type, and simplify it even further.

A much more interesting is the case where you want a normal timer tick, but also a slower adjustable/runtime calibrated tick. We can use a 16-bit tick rate, so that the slower tick counter is (ignoring overflows) fast*rate/65536. If you want a /10 slower clock, you can choose between 6553 and 6554, corresponding to 1:10.000916 and 1:9.99939 (or 0.099991 and 0.100006), respectively:

Code: [Select]

#include <stdint.h>

static volatile unsigned char avr_timer_pre;
static volatile uint32_t      avr_timer_fast;
static volatile uint32_t      avr_timer_slow;
static volatile uint16_t      avr_timer_phase;
static          uint16_t      avr_timer_rate;
static volatile unsigned char avr_timer_post;

void tcnt0_compare_match_isr(void)
{
    uint16_t  phase;

    avr_timer_pre++;

    avr_timer_fast++;

    phase = avr_timer_phase;
    phase += avr_timer_rate;
    avr_timer_phase = phase;
    avr_timer_slow += (phase < avr_timer_rate);

    avr_timer_post++;
}

uint32_t get_time_fast(void)
{
    uint32_t       fast;
    unsigned char  generation;

    do {
        generation = avr_timer_post;
        fast = avr_timer_fast;
    } while (generation != avr_timer_pre);

    return fast;
}

uint32_t get_time_slow(void)
{
    uint32_t       slow;
    unsigned char  generation;

    do {
        generation = avr_timer_post;
        slow = avr_timer_slow;
    } while (generation != avr_timer_pre);

    return slow;
}

This yields

Code: [Select]

tcnt0_compare_match_isr:
        lds r24,avr_timer_pre
        subi r24,lo8(-(1))
        sts avr_timer_pre,r24
        lds r24,avr_timer_fast
        lds r25,avr_timer_fast+1
        lds r26,avr_timer_fast+2
        lds r27,avr_timer_fast+3
        adiw r24,1
        adc r26,__zero_reg__
        adc r27,__zero_reg__
        sts avr_timer_fast,r24
        sts avr_timer_fast+1,r25
        sts avr_timer_fast+2,r26
        sts avr_timer_fast+3,r27
        lds r24,avr_timer_phase
        lds r25,avr_timer_phase+1
        sts avr_timer_phase+1,r25
        sts avr_timer_phase,r24
        lds r24,avr_timer_slow
        lds r25,avr_timer_slow+1
        lds r26,avr_timer_slow+2
        lds r27,avr_timer_slow+3
        sts avr_timer_slow,r24
        sts avr_timer_slow+1,r25
        sts avr_timer_slow+2,r26
        sts avr_timer_slow+3,r27
        lds r24,avr_timer_post
        subi r24,lo8(-(1))
        sts avr_timer_post,r24
        ret

get_time_fast:
.L3:
        lds r19,avr_timer_post
        lds r22,avr_timer_fast
        lds r23,avr_timer_fast+1
        lds r24,avr_timer_fast+2
        lds r25,avr_timer_fast+3
        lds r18,avr_timer_pre
        cpse r19,r18
        rjmp .L3
        ret

get_time_slow:
.L7:
        lds r19,avr_timer_post
        lds r22,avr_timer_slow
        lds r23,avr_timer_slow+1
        lds r24,avr_timer_slow+2
        lds r25,avr_timer_slow+3
        lds r18,avr_timer_pre
        cpse r19,r18
        rjmp .L7
        ret

Again, using just a 8-bit rate would still give you plenty of precision, to within 0.2% of the fast rate. For /10, 25 would correspond to 1:10.24 (0.09765625), and 26 to 1:9.846154 (0.1015625), but would simplify the timer interrupt service a bit.

While the ISR does take roughly twice as long as the simple version, it executes the exact same instructions on every call, so it should take the exact same number of cycles each time. This makes it much easier to check the effects of the ISR latency; no oddball cases where the latency is much higher.

Do note that this is not what I'd necessarily go with, because I haven't used ATtiny85's for anything time critical, and I just whipped up the above code without testing it on actual hardware.

Quote

Anyway, I've decided to... do not use do-while loop, but instead added correction for time - something like back in time

So you want to replace 36 bytes of code/14 instructions, with 52 bytes/22 instructions? You certainly can do whatever you want to. I'm not sure what you are doing with that flag, as you cannot catch it if you are using the compare irq.

I've probably said enough, and you have enough info already.

Quote from: cv007 on 15 Jan, 2019 02:17

I'm not sure what you are doing with that flag, as you cannot catch it if you are using the compare irq.

One can imagine that similar to this example shown below described here: http://maxembedded.com/2011/07/avr-timers-ctc-mode/

Code: [Select]

   // loop forever
    while(1)
    {
        // check whether the flag bit is set
        // if set, it means that there has been a compare match
        // and the timer has been cleared
        // use this opportunity to toggle the led
        if (TIFR & (1 << OCF1A)) // NOTE: '>=' used instead of '=='
        {
            PORTC ^= (1 << 0); // toggles the led
        }
  
        // wait! we are not done yet!
        // clear the flag bit manually since there is no ISR to execute
        // clear it by writing '1' to it (as per the datasheet)
        TIFR |= (1 << OCF1A);
  
        // yeah, now we are done!
    }

We would like to disable ISR interrupt so simply by adding a few lines of code now I can use the same time function to catch in a loop time differences without waste time in interrupt ISR, when time between time retrieval is less than 100us, so this TIFR OCF0A flag can be very usefull :

Code: [Select]

inline uint32_t avr_time_us_get_inline() {

	uint8_t x0 = TCNT0;

	uint8_t time_flag_is;
	if( time_flag_is= ((TIFR & (1<<OCF0A) )==0 ? 0 : 1 ) ) {
		// Manually correct time counter while probably disabled ISR
		avr_time_counter+= 100;

		// Clear OCF0A flag by writing logic 1
		TIFR |= (1<<OCF0A);		
	} // if
	....

Quote

I've probably said enough

And yet I continue

Maybe you already know this, or maybe you already have a truckload of tiny85's you need to use, but I'm sure the attiny family of parts will also have something available (even in 8pins) that has an input capture feature which would make it easier to get a more accurate time at the point of the event (whatever it may be), and with higher resolution. You are finding out why they came up with the input capture feature.

Quote from: cv007 on 15 Jan, 2019 16:45

You are finding out why they came up with the input capture feature.

Maybe, but for the moment, I've managed to quite easy extend time counter to....
64bit which allows collect microseconds for...more than 200000 years

32bit time counter could overflow after hour, but thanks to removeing "volatile" statement,
when time microseconds needed from last seond or so, assembler code looks much better while loading this 'avr_time_count_max'
variable to uint32_t variable - not needed LDS for hi32(avr_time_count) are not generated, so we are able to make "time traveling" hundreds/thousands years and still get event timimgs in the magnitude of microseconds when needed

Code: [Select]

#ifdef AVR_TIME_COUNTER_UINT64T
// (%i1) avr_time_count_max: 2^64;
// (%o1)                        18446744073709551616
// (%i2) avr_time_count_max/1000000.0/83376000.0;
// (%o2)                          221247.6500876697  [years]
static uint64_t avr_time_counter;
#else
// (%i3)  avr_time_count_max: 2^32;
// (%o3)                             4294967296
// (%i6) avr_time_count_max/1000000.0/3600;
// (%o6)                          1.193046471111111  [hours]
static uint32_t avr_time_counter;
#endif // AVR_TIME_COUNTER_UINT64T

I hope, that harddware during ISR execution clears timer interrupt flag at the begining, but it is easy to debug, so time to run some code and see logic analyser outputs

BTW: It was supprised that there is no left shift with bits number as second parameter on those tiny "RISCy" AVR's

I can't get the idea, why would one need such large system tick counter.

Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.

uin16_t millisecond timer is sufficient for 65 seconds. If no time interval used in the application will be larger than that, there is little to no use of making it larger.

It overflows after the minute or so I hear you saying? So what? There's no problem, if you know how to write code correctly, to accommodate for the overflow.

Tip: if writing delays or checking for intervals, use condition written as this: (actual_time - time_stamp > delay_interval). This way the wrap-around will work correctly.

Quote from: Yansi on 15 Jan, 2019 23:22

I can't get the idea, why would one need such large system tick counter.

Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.

That rather depends on the processor, of course.

The XMOS xCORE processors have 32 bit timers with a 40s interval - because they are counting instructions in their 100MHz/4000MIPS processors. They also have 16 bit timers on each I/O port counting I/O clock cycles at up to 250MHz. The combination of those timers and the architecuture means they can guarantee exactly when output will occur or when input did occur - and the program responds with a 10ns latency

I think that does not disprove anything I've stated above.

Any modern 32bit architecture provides means of clock counting.

And BTW, I wouldn't touch XMOS chips with a stick. Blargh.

Quote from: tggzzz on 15 Jan, 2019 23:47

Quote from: Yansi on 15 Jan, 2019 23:22
I can't get the idea, why would one need such large system tick counter.

Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.

That rather depends on the processor, of course.

Not too much room for fancy playing with timers on 8 pin Attiny85 since they are 8bit, so in combination with clocks 8MHz/16MHz timer in CTC mode has compare much at 100-1/200-1 when 8 prescaler used, which means that this way just by cating timer counter TCNT0 time differences we have microsecond time differences in 100us window - 64bit time counter can be used to try catch events longer than 1 hour, but for shorter ime differences even 16bit time counter can be used by using only 2 bytes of longer time counter.

Regardless of how many bytes we will use to implement longer timer counter than hardware 8bit timers on this device,
still a key is to try synchronize reading extender counter with real hardware TCNT0 timer register if, we need something bigger than 100us time difference, so to be below milisecond we need another software time couner byte, so 64bit/32bit/16bit or even 48bit timer counter can be implemented in software depending on application.

Since, I've managed howto include in AVR C assembler code to quite easy increment in ISR timer counters of any byte size - eg. 64bit in this example code below, I can define if wanted eg. AVR_TIME_COUNTER_UINT48 for example and do not loose time for 64bits

Code: [Select]

//ISR
...
	asm volatile (
			"add %[ratc0], %[rinc] \n\t"
			"adc %[ratc1], __zero_reg__ \n\t"
			"adc %[ratc2], __zero_reg__ \n\t"
			"adc %[ratc3], __zero_reg__ \n\t"
#ifdef AVR_TIME_COUNTER_UINT64T
			"adc %[ratc4], __zero_reg__ \n\t"
			"adc %[ratc5], __zero_reg__ \n\t"
			"adc %[ratc6], __zero_reg__ \n\t"
			"adc %[ratc7], __zero_reg__ \n\t"
#endif // AVR_TIME_COUNTER_UINT64T
			"sts avr_time_counter, %[ratc0] \n\t"
			"sts avr_time_counter+1, %[ratc1] \n\t"
			"sts avr_time_counter+2, %[ratc2] \n\t"
			"sts avr_time_counter+3, %[ratc3] \n\t"
#ifdef AVR_TIME_COUNTER_UINT64T
			"sts avr_time_counter+4, %[ratc4] \n\t"
			"sts avr_time_counter+5, %[ratc5] \n\t"
			"sts avr_time_counter+6, %[ratc6] \n\t"
			"sts avr_time_counter+7, %[ratc7] \n\t"
#endif // AVR_TIME_COUNTER_UINT64T
			:
			:
			[rinc]"r"(rinc),
			[ratc0]"a"(*(patc+0) ), [ratc1]"a"(*(patc+1)),[ratc2]"a"(*(patc+2)),[ratc3]"a"(*(patc+3))
#ifdef AVR_TIME_COUNTER_UINT64T
			,[ratc4]"a"(*(patc+4)), [ratc5]"a"(*(patc+5)),[ratc6]"a"(*(patc+6)),[ratc7]"a"(*(patc+7))
#endif // AVR_TIME_COUNTER_UINT64T
			 );
...
reti

Quote from: beduino on 16 Jan, 2019 00:49

Regardless of how many bytes we will use to implement longer timer counter than hardware 8bit timers on this device,
still a key is to try synchronize reading extender counter with real hardware TCNT0 timer register if, we need something bigger than 100us time difference, so to be below milisecond we need another software time couner byte, so 64bit/32bit/16bit or even 48bit timer counter can be implemented in software depending on application.

Modern PIC16s let you concatenate hardware timers, so you can create a big timer in hardware. Not sure about 64-bit though.

Everything is related to time. You may have tasks which need very fine time resolution. Your code also executes in time. Therefore, if you want fine time resolution, you either need to count every cycle of your code, or you will need to use something which is not affected by code execution timing, such as CPP modules.

When you need to measure big periods of time, such as days, high resolution cannot be achieved, and even if it could, you usually don't need it for the tasks which measure time in days. Say, if you want to make backup every day, it doesn't matter if it is few minutes earlier or late.

Therefore, it is silly to use one single timer for everything. Chips will usually have multiple timers which you can use for different purposes.

Here is a proper suggestion for ATtiny85, for a 32-bit TIMER0 counter:

Code: [Select]

#include <stdint.h>

static volatile uint8_t   avr_timer_updates[2];
static volatile uint32_t  avr_timer_counter;

extern uint32_t get_timer(void);
extern uint32_t get_timer_coarse(void);

with the TIMER0 overflow interrupt, get_timer(), and get_timer_coarse() functions implemented in assembly in asm-timer0.s:

Code: [Select]

        .file "asm-timer0.s"
        ; SPDX-License-Identifier: CC0-1.0

        __SP_H__ = 0x3e
        __SP_L__ = 0x3d
        __SREG__ = 0x3f
        __tmp_reg__ = 0
        __zero_reg__ = 1

        .text

        ;
        ; timer0 overflow interrupt vector
        ;
        .global __vector_5
        .type   __vector_5, @function
__vector_5:
        ; ISR prolog
        push    r1
        push    r0
        in      r0, __SREG__
        push    r0
        clr     __zero_reg__

        push    r20
        push    r19
        push    r18

        lds     r20, avr_timer_updates+1
        inc     r20
        sts     avr_timer_updates+1, r20

        in      r19, 0x29                   ; OCR0A
        lds     r18, avr_timer_counter+0
        add     r18, r19
        sts     avr_timer_counter+0, r18
        brcc    .done

        lds     r18, avr_timer_counter+1
        inc     r18
        sts     avr_timer_counter+1, r18
        brne    .done

        lds     r18, avr_timer_counter+2
        inc     r18
        sts     avr_timer_counter+2, r18
        brne    .done

        lds     r18, avr_timer_counter+3
        inc     r18
        sts     avr_timer_counter+3, r18

.done:
        sts     avr_timer_updates+0, r20
        pop     r18
        pop     r19
        pop     r20

        ; ISR epilog
        pop     r0
        out     __SREG__, r0
        pop     r0
        pop     r1
        reti

        .size   __vector_5, .-__vector_5


        .global get_timer
        .type   get_timer, @function
get_timer:
        lds     r21, avr_timer_updates+0

        lds     r22, avr_timer_counter+0
        lds     r23, avr_timer_counter+1
        lds     r24, avr_timer_counter+2
        lds     r25, avr_timer_counter+3
        in      r18, 0x32                   ; r18 = TCNT0
        in      r19, 0x38                   ; r19 = TIFR

        sbrc    r19, 1                      ; TOV0
        rjmp    get_timer

        lds     r20, avr_timer_updates+1
        cpse    r20, r21
        rjmp    get_timer

        add     r22, r18
        adc     r23, __zero_reg__
        adc     r24, __zero_reg__
        adc     r25, __zero_reg__
        ret

        .size   get_timer, .-get_timer


        .global get_timer_coarse
        .type   get_timer_coarse, @function
get_timer_coarse:
        lds     r21, avr_timer_updates+0

        lds     r22, avr_timer_counter+0
        lds     r23, avr_timer_counter+1
        lds     r24, avr_timer_counter+2
        lds     r25, avr_timer_counter+3

        lds     r20, avr_timer_updates+1
        cpse    r20, r21
        rjmp    get_timer
        ret

        .size   get_timer_coarse, .-get_timer_coarse

        .comm   avr_timer_counter,4,1
        .comm   avr_timer_updates,2,1

Just feed that asm-timer0.s file to avr-gcc as if it was a C file. Not tested on actual ATtiny85 hardware, but it does compile using old avr-gcc-4.9.2 (avr-gcc-4.9.2 -Wall -mmcu=attiny85 -c asm-timer0.s), and the logic is sound, but do beware of bugs.

The idea is that whenever an overflow interrupt occurs, the avr_timer_counter value is incremented by OCR0A. The second of the avr_timer_updates[2] bytes is incremented before the counter is incremented, and the first after the counter is incremented, so that readers can spin if an interrupt occurs.

The get_timer() function adds TCNT0 to the counter value, so the result is essentially the 32-bit TIMER0 virtual counter. It uses both the avr_timer_updates[2] guard bytes, and TOV0 bit in TIFR to detect if the combined 32-bit counter is valid. If interrupts occur too often, it might spin forever; so test before use.

The get_timer_coarse() function omits the TCNT0 and TOV0 bit checks, and so is more lightweight, although the value is coarser. Although the timers are derived from the same source, you should not mix the values, unless you are prepared for get_timer_coarse() < get_timer() even if obtained at the very same moment somehow.

The timer ISR itself is a bit tricky, as (256/OCR0A) of ticks only take 25 instructions (I didn't bother to calculate cycle counts), and uses only six bytes of stack. Of the other cases, it takes 29, 33, or 36 instructions. If the jitter a variable-duration TIMER0 ISR is problematic, the code can be changed to fixed 33 instructions instead (no jumps nor conditional jumps), using

Code: [Select]

        ;
        ; timer0 overflow interrupt vector
        ;
        .global __vector_5
        .type   __vector_5, @function
__vector_5:
        ; ISR prolog
        push    r1
        push    r0
        in      r0, __SREG__
        push    r0
        clr     __zero_reg__

        push    r20
        push    r19
        push    r18

        lds     r20, avr_timer_updates+1
        inc     r20
        sts     avr_timer_updates+1, r20

        in      r19, 0x29                   ; OCR0A
        lds     r18, avr_timer_counter+0
        add     r18, r19
        sts     avr_timer_counter+0, r18

        lds     r18, avr_timer_counter+1
        adc     r18, __zero_reg__
        sts     avr_timer_counter+1, r18

        lds     r18, avr_timer_counter+2
        adc     r18, __zero_reg__
        sts     avr_timer_counter+2, r18

        lds     r18, avr_timer_counter+3
        adc     r18, __zero_reg__
        sts     avr_timer_counter+3, r18

        sts     avr_timer_updates+0, r20
        pop     r18
        pop     r19
        pop     r20

        ; ISR epilog
        pop     r0
        out     __SREG__, r0
        pop     r0
        pop     r1
        reti

        .size   __vector_5, .-__vector_5

I have no idea which one performs better in practice.

Note that lds+adc+sts pattern that uses only one register to update all bytes in a multibyte integer works, because neither lds nor sts modify the carry flag; only adc does. I think avr-gcc only uses N registers for N-byte integers, because that way the externally visible change occurs in one short window, shortening race windows. In my case, using the two generation/update counters and spinning until the match avoid any need for that. In an ISR, it is useful because it lessens the amount of stack used. Probably could reduce the stack use even more, but I just grabbed the ISR prolog and epilog from what avr-gcc generates for ISR(TIMER0_OVF_vect) { ... } when using #include <avr/interrupt.h>.

Quote from: Nominal Animal on 16 Jan, 2019 05:38

Note that lds+adc+sts pattern that uses only one register to update all bytes in a multibyte integer works, because neither lds nor sts modify the carry flag; only adc does. I think avr-gcc only uses N registers for N-byte integers, because that way the externally visible change occurs in one short window, shortening race windows.

I was thinking about lds+adc+sts pattern, but still learning howto include assembler code inside AVR C code and strugled to manage pass "avr_time_counter" to get given byte from multibyte uint32_t ot uint64_t to register,
so for the moment assembler listing of ISR looks like this and of course the less "push/pop" stack operations the better since it costs 2 cycles for each.

Code: [Select]

// ISR
...
  51 0034 E0E0      		ldi r30,lo8(avr_time_counter)
  52 0036 F0E0      		ldi r31,hi8(avr_time_counter)
  53 0038 2081      		ld r18,Z
  54 003a 3181      		ldd r19,Z+1
  55 003c 4281      		ldd r20,Z+2
  56 003e 5381      		ldd r21,Z+3
  57 0040 6481      		ldd r22,Z+4
  58 0042 7581      		ldd r23,Z+5
  59 0044 1681      		ldd r17,Z+6
  60 0046 0781      		ldd r16,Z+7
  61 0048 84E6      		ldi r24,lo8(100)
  62               	/* #APP */
  63               	 ;  277 "avr_utils.c" 1
  64 004a 280F      		add r18, r24 
  65 004c 311D      		adc r19, __zero_reg__ 
  66 004e 411D      		adc r20, __zero_reg__ 
  67 0050 511D      		adc r21, __zero_reg__ 
  68 0052 611D      		adc r22, __zero_reg__ 
  69 0054 711D      		adc r23, __zero_reg__ 
  70 0056 111D      		adc r17, __zero_reg__ 
  71 0058 011D      		adc r16, __zero_reg__ 
...

Note, that those ldi/ld/ldd inline assembler code was automatically added by AVR C compiler,
when I've those registers like shown a few posts above:

Code: [Select]

...
asm volatile(....::	[ratc0]"a"(*(patc+0) ), [ratc1]"a"(*(patc+1)),[ratc2]"a"(*(patc+2)),[ratc3]"a"(*(patc+3)),... );
...

Not sure howto implement in inline assembler something like this:

Code: [Select]

asm volatile( "lds %[ratcX], avr_time_counter+X \n\t sts avr_time_counter, %[ratcX] " :???:??? );

where "ratcx" in Nx byte from N byte time counter

Update: Never mind - I've found answer howto use LDS in inline AVR C assebler here: https://www.avrfreaks.net/forum/inline-asm-3

Code: [Select]

static inline void   test ( uint8_t ok );
uint8_t  input = 100;

static inline void   test ( uint8_t ok ){

asm volatile(
           "\n\t"
       "lds %[ok], input"   "\n\t"
           : [ok] "=d" ( ok )
           :
       );
}

Futher optimisations to timer ISR can be made if needed easy now to support any multibyte time counter

Anyway, it is yet another optimisation possible, but I believe in that TCNT0 should be enougth to make decent guard and more - I'm more interest in timing differences when time retrieval function was called, so the only way to do so is I think is to use TCNT0 when this time get function is called, so any other updates to TCNT0 in hardware while processing additional code in get time function, especially reading TCNT0 and adding changed TCNT0 values to software time counter, while time is still running is not too good, so in my implementation in the case when get time function might be hit while processing code after TCNT0 was read at the begining by ISR handling software time counter (whatever size in bytes it has) I simply try to correct this - when ISR finishes we have updated timer counter and TCNT0 starting count from 0 again in timer CTC mode, so we might have quite decent time, but it may be different (longer) than when we called get time function, since you are in do-while loop while waiting for ISR to complete

That is why I've marked in one of those assembler listings of ISR "Patent pending

", just for fun, to send a message that something else is going there in my get time implementation since this is what I'm interested in to achieve with extended software time counter and real time TCNT0 update by timer hardware in microsecond intervals on 8MHz Attiny85 @ 10kHz timer ISR

48bit time counter ISR with 100 increment for 8MHz @ 10kHz timer now looks good, while written in AVR C inline assembler - only one register r24 needed to push/pop and temporary r0 as well as zero register r1

Time counter stored as "uint64_t", but in ISR only 6 bytes incremented which is enougth even while storing microseconds there for more than a year:

Code: [Select]

// (%i2) avr_time_count_max: 2^48;
// (%o2)                           281474976710656
// (%i3) avr_time_count_max/1000000.0/83376000.0;
// (%o3)                          3.375971223261562 [years]

Quote from: Yansi on 15 Jan, 2019 23:22

I can't get the idea, why would one need such large system tick counter.

Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.

uin16_t millisecond timer is sufficient for 65 seconds. If no time interval used in the application will be larger than that, there is little to no use of making it larger.

It overflows after the minute or so I hear you saying? So what? There's no problem, if you know how to write code correctly, to accommodate for the overflow.

Tip: if writing delays or checking for intervals, use condition written as this: (actual_time - time_stamp > delay_interval). This way the wrap-around will work correctly.

Not actually true for a free running timer, which is what you are describing in the last sentence. The longest time you can cover from a 16 bit free running timer incrementing a 1ms interval is 32.7 seconds.

Why so?

If I make interval > 1/2 of the range, then it still wraps correctly.

For example having a 3bit counter, I can still use that to produce interval over 3, say for example 5.

So for example if the timestamp is 2, then at time of 7 i get the difference of 5, at time of 0 I get 6, works correctly.

If I pick another timestamp, for example 5, then at time of 2 I get the difference of 5, at time of 3, I get 6.

So where do you see the problem? cause I don]t.

Fast unsigned integer multiply by x100 on 8bit AVR?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Navigation

Common actions