Sorry, didn't notice in a hurry earlier that we have +=100 while was so shocked that AVR C generator was not able generate code with less amount of paintfull loops but 2,5,6 for shifting as mentioned before
You asked it to optimize for size, which will request it to produce the most compact code. In general loops are more compact than other forms of repetition so optimizing for size may favor loops.
Trying to figure out how this loop showed in your code might help ensure we have consistent timer counter and timer counter.
Well, I did have < wrong (should have been t1 > TCNT0)
uint32_t get_time(){
uint32_t t0; uint8_t t1;
do{
t1 = TCNT0;
t0 = avr_time_counter;
}while(t1 > TCNT0); //do again if overflowed
return t0+t1;
}
/*
00000006 <get_time>:
6: 22 b7 in r18, 0x32 ; 50
8: 80 91 60 00 lds r24, 0x0060 ; 0x800060 <_edata>
c: 90 91 61 00 lds r25, 0x0061 ; 0x800061 <_edata+0x1>
10: a0 91 62 00 lds r26, 0x0062 ; 0x800062 <_edata+0x2>
14: b0 91 63 00 lds r27, 0x0063 ; 0x800063 <_edata+0x3>
18: 32 b7 in r19, 0x32 ; 50
1a: 32 17 cp r19, r18
1c: a0 f3 brcs .-24 ; 0x6 <get_time>
1e: 68 2f mov r22, r24
20: 79 2f mov r23, r25
22: 8a 2f mov r24, r26
24: 9b 2f mov r25, r27
26: 62 0f add r22, r18
28: 71 1d adc r23, r1
2a: 81 1d adc r24, r1
2c: 91 1d adc r25, r1
2e: 08 95 ret
*/
It simply gets 'avr_time_counter' (which has to be a volatile, I assume it is), and also gets TCNT0, then it gets TCNT0 again and if the latest version (TCNT0) is less than t1, TCNT0 must have overflowed so do again. When TCNT0 >= t1, no overflow could have happened (not really true, but assuming you have no other interrupts that can exceed 10ms).
t1 = 99
t0 = 12300
//TCNT0 just rolled over, fired the isr, and is now back here
t1 > TCNT0 ? yes, do again - t1 = 99, TCNT now is 0 (less than 99 anyway)
t1 = 1
t0 = 12400
t1 > TCNT0 ? no, all done, no rollover, t1 = 1, TCNT0 = 3 (lets say)
but unsure whether is it cleared by hardware at the end of ISR when "reti" is called from ISR, or earlier at the begining of ISR handling
I would dare say that flag is already clear upon entering the isr as hardware clears it 'when executing' the vector- which also means if global interrupts not enabled, no vector execute, no flag clear (whcih means you can poll for it and clear it yourself if interrupts not used).
I don't have avr hardware (that I want to dig out), but I think the isr simply becomes-
ISR(TIM0_COMPA_vect){
avr_time_counter += 100;
}
for a pic16, I use the nco as a system clock, and essentially do the same thing-
https://github.com/cv007/SNaPmate/blob/c334890d41945d3a2af0ab1c50772f087cf514a8/nco.c#L76the nco has a 20bit counter with 2us resoltion (internal 500Khz clock), I keep track of overflows and inc a counter +16, then when time wanted I do the function linked above, and shift my counter up into the upper 12 bits, to get 32bits total
Trying to figure out how this loop showed in your code might help ensure we have consistent timer counter and timer counter.
Well, I did have < wrong (should have been t1 > TCNT0)
Yep, it looked strange because of at 10kHz timer CTC with 8MHz F_CPU system clock, ISR should complete in fraction of time tick period - by time tick I mean ISR call btw..
I've changed ISR code and +=100 looks like this:
30 __vector_10:
31 0014 1F92 push r1
32 0016 0F92 push r0
33 0018 0FB6 in r0,__SREG__
34 001a 0F92 push r0
35 001c 1124 clr __zero_reg__
36 001e 8F93 push r24
37 0020 9F93 push r25
38 0022 AF93 push r26
39 0024 BF93 push r27
40 /* prologue: Signal */
41 /* frame size = 0 */
42 /* stack size = 7 */
43 .L__stack_usage = 7
44 0026 8091 0000 lds r24,avr_time_counter
45 002a 9091 0000 lds r25,avr_time_counter+1
46 002e A091 0000 lds r26,avr_time_counter+2
47 0032 B091 0000 lds r27,avr_time_counter+3
48 0036 8C59 subi r24,-100
49 0038 9F4F sbci r25,-1
50 003a AF4F sbci r26,-1
51 003c BF4F sbci r27,-1
52 003e 8093 0000 sts avr_time_counter,r24
53 0042 9093 0000 sts avr_time_counter+1,r25
54 0046 A093 0000 sts avr_time_counter+2,r26
55 004a B093 0000 sts avr_time_counter+3,r27
56 /* epilogue start */
57 004e BF91 pop r27
58 0050 AF91 pop r26
59 0052 9F91 pop r25
60 0054 8F91 pop r24
61 0056 0F90 pop r0
62 0058 0FBE out __SREG__,r0
63 005a 0F90 pop r0
64 005c 1F90 pop r1
65 005e 1895 reti
When added at the begining of avr_time_us_get() wait for OCF0A cleared in TIFR by ISR during its execution, since OCF0A is set when TCNT0 is reset to 0 in CTC as shown in attached image from Atmega328 intro to interrupts
//nop();
uint8_t isr_is;
do {
isr_is= ((TIFR & (1<<OCF0A) )==0 ? 0 : 1 );
} while (isr_is );
//nop();
without any do-while loop it looks like this, but have no idea for the moment
howto OCF0A set in TIFR when TCNT0 becomes 0 could be usefull if at all in this function:
87 avr_time_us_get:
88 /* prologue: function */
89 /* frame size = 0 */
90 /* stack size = 0 */
91 .L__stack_usage = 0
92 /* #APP */
93 ; 12 "avr_utils.c" 1
94 007c 0000 nop
95 ; 0 "" 2
96 /* #NOAPP */
97 .L7:
98 007e 08B6 in __tmp_reg__,0x38
99 0080 04FC sbrc __tmp_reg__,4
100 0082 00C0 rjmp .L7
101 /* #APP */
102 ; 12 "avr_utils.c" 1
103 0084 0000 nop
104 ; 0 "" 2
105 /* #NOAPP */
106 0086 22B7 in r18,0x32
107 0088 6091 0000 lds r22,avr_time_counter
108 008c 7091 0000 lds r23,avr_time_counter+1
109 0090 8091 0000 lds r24,avr_time_counter+2
110 0094 9091 0000 lds r25,avr_time_counter+3
111 0098 620F add r22,r18
112 009a 711D adc r23,__zero_reg__
113 009c 811D adc r24,__zero_reg__
114 009e 911D adc r25,__zero_reg__
115 00a0 0895 ret
We can see that this check for this flag set can be very fast, but probably useless since ISR clear this flag during its execution.
97 .L7:
98 007e 08B6 in __tmp_reg__,0x38
99 0080 04FC sbrc __tmp_reg__,4
100 0082 00C0 rjmp .L7
I wouldn't like to disable global interrupts and make things simple,
so I will look closer to your approach for this synchronization if it fits my needs, but I do not like this do-while loop,
since I'd like to catch time difference as fast as possible, so there should be enougth time to do some additional computations, so even x100 multiply shouldn't be such horrible, but by using +=100 trick in ISR now the onlly thing is to get consistent not corrupted "avr_time_counter" 32bit value
Can you make your timer roll over at 256 ticks rather than at 100? This way you eliminate the need for multiplication completely.
but probably useless since ISR clear this flag during its execution
You cannot make use of that flag if using the isr.
but I do not like this do-while loop
it only repeats if there is a problem. Most of the time it will not repeat, but if you happen to hit the wrong time, you will get 1 repeat. Hardly a deal killer to get the correct time, and you have to do 'something' about the problem.
Can you make your timer roll over at 256 ticks rather than at 100? This way you eliminate the need for multiplication completely.
That's probably a better idea- just let the counter run and use the overflow irq, increment the counter by 256 in the isr. Probably not a big difference, though (but still better). (the multiplication is eliminated in the +=100 version also, by the way)
//obviously a lot is missing- like timer setup, and so on, this is minimal to show the idea
#include <avr/io.h>
#include <avr/interrupt.h>
volatile uint32_t avr_time_counter;
uint32_t get_time(){
uint32_t t;
do{
t = TCNT0 | avr_time_counter;
}while((uint8_t)t > TCNT0); //do again if overflowed
return t;
}
int main(void) {}
ISR(TIM0_OVF_vect){
avr_time_counter += 256;
}
/*
00000006 <get_time>:
6: 22 b7 in r18, 0x32 ; 50
8: 80 91 60 00 lds r24, 0x0060 ; 0x800060 <_edata>
c: 90 91 61 00 lds r25, 0x0061 ; 0x800061 <_edata+0x1>
10: a0 91 62 00 lds r26, 0x0062 ; 0x800062 <_edata+0x2>
14: b0 91 63 00 lds r27, 0x0063 ; 0x800063 <_edata+0x3>
18: 68 2f mov r22, r24
1a: 79 2f mov r23, r25
1c: 8a 2f mov r24, r26
1e: 9b 2f mov r25, r27
20: 62 2b or r22, r18
22: 22 b7 in r18, 0x32 ; 50
24: 26 17 cp r18, r22
26: 78 f3 brcs .-34 ; 0x6 <get_time>
28: 08 95 ret
*/
I say answer the OP question as he posted it. Maybe I want to marry a chicken with a porcupine , don't ask why, if you don't have any approaches then don't reply.
The better lesson -- for OPs and readers alike -- is to recognize that there are better root solutions out there, so don't ask XY Problems.
(I was wondering how long it would be until someone noticed this was an XY problem
)
Tim
Rats. And I thought I was being so clever with the multiple word sizes...
>If you just use multiplication, good chance the C compiler can figure out everything by itself.
Nope, __mulsi3 is used when we let C compiler for too much
Change the optimization to -O3, and it will produce inline code for the 32bit multiply by 100...I believe that there are gcc-specific pragmas that will allow to to change optimization for a specific function.
volatile uint32_t counter;
uint32_t gettime() {
uint32_t y = counter;
0: 40 91 00 00 lds r20, 0x0000 ; 0x800000 <__SREG__+0x7fffc1>
4: 50 91 00 00 lds r21, 0x0000 ; 0x800000 <__SREG__+0x7fffc1>
8: 60 91 00 00 lds r22, 0x0000 ; 0x800000 <__SREG__+0x7fffc1>
c: 70 91 00 00 lds r23, 0x0000 ; 0x800000 <__SREG__+0x7fffc1>
y *= 100;
10: 44 0f add r20, r20
12: 55 1f adc r21, r21
14: 66 1f adc r22, r22
16: 77 1f adc r23, r23
18: 44 0f add r20, r20
1a: 55 1f adc r21, r21
1c: 66 1f adc r22, r22
1e: 77 1f adc r23, r23
20: db 01 movw r26, r22
22: ca 01 movw r24, r20
24: 88 0f add r24, r24
26: 99 1f adc r25, r25
28: aa 1f adc r26, r26
2a: bb 1f adc r27, r27
2c: 88 0f add r24, r24
2e: 99 1f adc r25, r25
30: aa 1f adc r26, r26
32: bb 1f adc r27, r27
34: 48 0f add r20, r24
36: 59 1f adc r21, r25
38: 6a 1f adc r22, r26
3a: 7b 1f adc r23, r27
3c: db 01 movw r26, r22
3e: ca 01 movw r24, r20
40: 88 0f add r24, r24
42: 99 1f adc r25, r25
44: aa 1f adc r26, r26
46: bb 1f adc r27, r27
48: 88 0f add r24, r24
4a: 99 1f adc r25, r25
4c: aa 1f adc r26, r26
4e: bb 1f adc r27, r27
50: 84 0f add r24, r20
52: 95 1f adc r25, r21
54: a6 1f adc r26, r22
56: b7 1f adc r27, r23
y += TCNT0;
58: 22 b7 in r18, 0x32 ; 50
return y;
5a: bc 01 movw r22, r24
5c: cd 01 movw r24, r26
5e: 62 0f add r22, r18
60: 71 1d adc r23, r1
62: 81 1d adc r24, r1
64: 91 1d adc r25, r1
}
66: 08 95 ret
but I do not like this do-while loop
it only repeats if there is a problem. Most of the time it will not repeat, but if you happen to hit the wrong time, you will get 1 repeat. Hardly a deal killer to get the correct time, and you have to do 'something' about the problem.
Anyway, I've decided to... do not use do-while loop, but instead added correction for time - something like back in time
but probably useless since ISR clear this flag during its execution
You cannot make use of that flag if using the isr.
But, I've used this flag as you can see in assembler code above
Warning: Do not try this code at home - it is not tested yet, but "Patent pending
"
Update: Bug fixed in assembler listing below
85 .global avr_time_us_get
87 avr_time_us_get:
88 /* prologue: function */
89 /* frame size = 0 */
90 /* stack size = 0 */
91 .L__stack_usage = 0
92 007c 22B7 in r18,0x32
93 .L7:
94 007e 08B6 in __tmp_reg__,0x38
95 0080 04FC sbrc __tmp_reg__,4
96 0082 00C0 rjmp .L7
97 0084 6091 0000 lds r22,avr_time_counter
98 0088 7091 0000 lds r23,avr_time_counter+1
99 008c 8091 0000 lds r24,avr_time_counter+2
100 0090 9091 0000 lds r25,avr_time_counter+3
101 0094 32B7 in r19,0x32
102 0096 3217 cp r19,r18
103 0098 00F4 brsh .L8
104 009a 6091 0000 lds r22,avr_time_counter
105 009e 7091 0000 lds r23,avr_time_counter+1
106 00a2 8091 0000 lds r24,avr_time_counter+2
107 00a6 9091 0000 lds r25,avr_time_counter+3
108 00aa 6456 subi r22,100
109 00ac 7109 sbc r23,__zero_reg__
110 00ae 8109 sbc r24,__zero_reg__
111 00b0 9109 sbc r25,__zero_reg__
112 .L8:
113 00b2 620F add r22,r18
114 00b4 711D adc r23,__zero_reg__
115 00b6 811D adc r24,__zero_reg__
116 00b8 911D adc r25,__zero_reg__
117 00ba 0895 ret
Thanks for many hints
The better lesson -- for OPs and readers alike -- is to recognize that there are better root solutions out there, so don't ask XY Problems.
Replace
"don't ask XY problems" with
"describe what you are trying to solve, rather than the problems you are having with your chosen solution to the original problem", and I'll agree. Otherwise it sounds like you don't want OP and readers alike to ask about their problems.
For what it's worth, I'd start with
#include <stdint.h>
static volatile unsigned char avr_timer_pre = 0;
static volatile unsigned char avr_timer_post = 0;
static volatile uint32_t avr_timer_counter = 0;
#define AVR_TIMER_STEP 128
void tcnt0_compare_match_isr(void)
{
avr_timer_pre++;
avr_timer_counter += AVR_TIMER_STEP;
avr_timer_post++;
}
uint32_t get_time(void)
{
uint32_t counter;
unsigned char generation;
do {
generation = avr_timer_post;
counter = avr_timer_counter;
} while (generation != avr_timer_pre);
return counter;
}
uint32_t get_time_coarse(void)
{
uint32_t counter;
unsigned char generation;
do {
generation = avr_timer_post;
counter = avr_timer_counter;
} while (generation != avr_timer_pre);
return counter >> 7;
}
where the multiplier is 128 instead of 100. The
get_timer() returns the original clock, and
get_timer_coarse() the slower clock.
In a real implementation, mark all these functions
static inline, so the compiler can inline them in their callsites; while that increases code size, it can cut down on unnecessary register moves.
The
avr_timer_pre and
avr_timer_post form a generation counter pair. It assumes the hardware does not reorder normal reads and writes to ram. Any timer modification begins with modifying the pre counter, and completes when modifying the post counter. When reading the timer counter, you start by remembering the post counter, then copy the timer counter. If the pre counter does not match the remembered post counter, reading the timer counter was interrupted by a modification, and you redo the entire operation. This is essentially a spin lock, where writers are never interrupted, but readers may have to spin. (Readers will only spin when each iteration is interrupted by a timer update; thus at most twice in normal operation.)
With old avr-gcc-4.9.2, using
-Wall -Os -mmcu=attiny85, that gets you (omitting directives for simplicity)
tcnt0_compare_match_isr:
lds r24,avr_timer_pre
subi r24,lo8(-(1))
sts avr_timer_pre,r24
lds r24,avr_timer_counter
lds r25,avr_timer_counter+1
lds r26,avr_timer_counter+2
lds r27,avr_timer_counter+3
subi r24,-128
sbci r25,-1
sbci r26,-1
sbci r27,-1
sts avr_timer_counter,r24
sts avr_timer_counter+1,r25
sts avr_timer_counter+2,r26
sts avr_timer_counter+3,r27
lds r24,avr_timer_post
subi r24,lo8(-(1))
sts avr_timer_post,r24
ret
get_time:
.L3:
lds r25,avr_timer_post
lds r20,avr_timer_counter
lds r21,avr_timer_counter+1
lds r22,avr_timer_counter+2
lds r23,avr_timer_counter+3
lds r24,avr_timer_pre
cpse r25,r24
rjmp .L3
movw r24,r22
movw r22,r20
ret
get_time_coarse:
.L7:
lds r25,avr_timer_post
lds r20,avr_timer_counter
lds r21,avr_timer_counter+1
lds r22,avr_timer_counter+2
lds r23,avr_timer_counter+3
lds r24,avr_timer_pre
cpse r25,r24
rjmp .L7
movw r24,r22
movw r22,r20
ldi r18,7
1:
lsr r25
ror r24
ror r23
ror r22
dec r18
brne 1b
ret
Now, let's say you wanted both a fine timer (every tick) and a coarse timer (every 1000th tick), and the division by one thousand is problematic. If you can accept an additional cost to the interrupt service routine, you can provide both, with zero added cost to readers. (The downside is jitter in the ISR duration; every thousandth one takes twice as long as a normal call.)
#include <stdint.h>
static volatile unsigned char avr_timer_pre;
static volatile uint32_t avr_timer_fine; /* = AVR_COARSE_STEPS * avr_timer_coarse + avr_timer_step */
static volatile uint32_t avr_timer_coarse;
static volatile uint16_t avr_timer_step;
static volatile unsigned char avr_timer_post;
#define AVR_COARSE_STEPS 1000
void tcnt0_compare_match_isr(void)
{
uint16_t step;
avr_timer_pre++;
avr_timer_fine++;
step = avr_timer_step;
if (step >= AVR_COARSE_STEPS - 1) {
avr_timer_coarse++;
avr_timer_step = 0;
} else {
avr_timer_step = step + 1;
}
avr_timer_post++;
}
uint32_t get_time_coarse(void)
{
uint32_t coarse;
unsigned char generation;
do {
generation = avr_timer_post;
coarse = avr_timer_coarse;
} while (generation != avr_timer_pre);
return coarse;
}
uint32_t get_time_fine(void)
{
uint32_t fine;
unsigned char generation;
do {
generation = avr_timer_post;
fine = avr_timer_fine;
} while (generation != avr_timer_pre);
return fine;
}
The ISR becomes
tcnt0_compare_match_isr:
lds r24,avr_timer_pre
subi r24,lo8(-(1))
sts avr_timer_pre,r24
lds r24,avr_timer_fine
lds r25,avr_timer_fine+1
lds r26,avr_timer_fine+2
lds r27,avr_timer_fine+3
adiw r24,1
adc r26,__zero_reg__
adc r27,__zero_reg__
sts avr_timer_fine,r24
sts avr_timer_fine+1,r25
sts avr_timer_fine+2,r26
sts avr_timer_fine+3,r27
lds r24,avr_timer_step
lds r25,avr_timer_step+1
cpi r24,-25
ldi r18,3
cpc r25,r18
brlo .L2
lds r24,avr_timer_coarse
lds r25,avr_timer_coarse+1
lds r26,avr_timer_coarse+2
lds r27,avr_timer_coarse+3
adiw r24,1
adc r26,__zero_reg__
adc r27,__zero_reg__
sts avr_timer_coarse,r24
sts avr_timer_coarse+1,r25
sts avr_timer_coarse+2,r26
sts avr_timer_coarse+3,r27
sts avr_timer_step+1,__zero_reg__
sts avr_timer_step,__zero_reg__
rjmp .L3
.L2:
adiw r24,1
sts avr_timer_step+1,r25
sts avr_timer_step,r24
.L3:
lds r24,avr_timer_post
subi r24,lo8(-(1))
sts avr_timer_post,r24
ret
and if you use
AVR_COARSE_STEPS less than 256, you can change
avr_timer_step and
step variables to
unsigned char type, and simplify it even further.
A much more interesting is the case where you want a normal timer tick, but also a slower adjustable/runtime calibrated tick. We can use a 16-bit tick rate, so that the slower tick counter is (ignoring overflows) fast*rate/65536. If you want a /10 slower clock, you can choose between 6553 and 6554, corresponding to 1:10.000916 and 1:9.99939 (or 0.099991 and 0.100006), respectively:
#include <stdint.h>
static volatile unsigned char avr_timer_pre;
static volatile uint32_t avr_timer_fast;
static volatile uint32_t avr_timer_slow;
static volatile uint16_t avr_timer_phase;
static uint16_t avr_timer_rate;
static volatile unsigned char avr_timer_post;
void tcnt0_compare_match_isr(void)
{
uint16_t phase;
avr_timer_pre++;
avr_timer_fast++;
phase = avr_timer_phase;
phase += avr_timer_rate;
avr_timer_phase = phase;
avr_timer_slow += (phase < avr_timer_rate);
avr_timer_post++;
}
uint32_t get_time_fast(void)
{
uint32_t fast;
unsigned char generation;
do {
generation = avr_timer_post;
fast = avr_timer_fast;
} while (generation != avr_timer_pre);
return fast;
}
uint32_t get_time_slow(void)
{
uint32_t slow;
unsigned char generation;
do {
generation = avr_timer_post;
slow = avr_timer_slow;
} while (generation != avr_timer_pre);
return slow;
}
This yields
tcnt0_compare_match_isr:
lds r24,avr_timer_pre
subi r24,lo8(-(1))
sts avr_timer_pre,r24
lds r24,avr_timer_fast
lds r25,avr_timer_fast+1
lds r26,avr_timer_fast+2
lds r27,avr_timer_fast+3
adiw r24,1
adc r26,__zero_reg__
adc r27,__zero_reg__
sts avr_timer_fast,r24
sts avr_timer_fast+1,r25
sts avr_timer_fast+2,r26
sts avr_timer_fast+3,r27
lds r24,avr_timer_phase
lds r25,avr_timer_phase+1
sts avr_timer_phase+1,r25
sts avr_timer_phase,r24
lds r24,avr_timer_slow
lds r25,avr_timer_slow+1
lds r26,avr_timer_slow+2
lds r27,avr_timer_slow+3
sts avr_timer_slow,r24
sts avr_timer_slow+1,r25
sts avr_timer_slow+2,r26
sts avr_timer_slow+3,r27
lds r24,avr_timer_post
subi r24,lo8(-(1))
sts avr_timer_post,r24
ret
get_time_fast:
.L3:
lds r19,avr_timer_post
lds r22,avr_timer_fast
lds r23,avr_timer_fast+1
lds r24,avr_timer_fast+2
lds r25,avr_timer_fast+3
lds r18,avr_timer_pre
cpse r19,r18
rjmp .L3
ret
get_time_slow:
.L7:
lds r19,avr_timer_post
lds r22,avr_timer_slow
lds r23,avr_timer_slow+1
lds r24,avr_timer_slow+2
lds r25,avr_timer_slow+3
lds r18,avr_timer_pre
cpse r19,r18
rjmp .L7
ret
Again, using just a 8-bit rate would still give you plenty of precision, to within 0.2% of the fast rate. For /10, 25 would correspond to 1:10.24 (0.09765625), and 26 to 1:9.846154 (0.1015625), but would simplify the timer interrupt service a bit.
While the ISR does take roughly twice as long as the simple version, it executes the exact same instructions on every call, so it should take the exact same number of cycles each time. This makes it much easier to check the effects of the ISR latency; no oddball cases where the latency is much higher.
Do note that this is not what I'd necessarily go with, because I haven't used ATtiny85's for anything time critical, and I just whipped up the above code without testing it on actual hardware.
Anyway, I've decided to... do not use do-while loop, but instead added correction for time - something like back in time
So you want to replace 36 bytes of code/14 instructions, with 52 bytes/22 instructions? You certainly can do whatever you want to. I'm not sure what you are doing with that flag, as you cannot catch it if you are using the compare irq.
I've probably said enough, and you have enough info already.
I'm not sure what you are doing with that flag, as you cannot catch it if you are using the compare irq.
One can imagine that similar to this example shown below described here:
http://maxembedded.com/2011/07/avr-timers-ctc-mode/ // loop forever
while(1)
{
// check whether the flag bit is set
// if set, it means that there has been a compare match
// and the timer has been cleared
// use this opportunity to toggle the led
if (TIFR & (1 << OCF1A)) // NOTE: '>=' used instead of '=='
{
PORTC ^= (1 << 0); // toggles the led
}
// wait! we are not done yet!
// clear the flag bit manually since there is no ISR to execute
// clear it by writing '1' to it (as per the datasheet)
TIFR |= (1 << OCF1A);
// yeah, now we are done!
}
We would like to disable ISR interrupt so simply by adding a few lines of code now I can use the same time function to catch in a loop time differences without waste time in interrupt ISR, when time between time retrieval is less than 100us, so this TIFR OCF0A flag can be very usefull :
inline uint32_t avr_time_us_get_inline() {
uint8_t x0 = TCNT0;
uint8_t time_flag_is;
if( time_flag_is= ((TIFR & (1<<OCF0A) )==0 ? 0 : 1 ) ) {
// Manually correct time counter while probably disabled ISR
avr_time_counter+= 100;
// Clear OCF0A flag by writing logic 1
TIFR |= (1<<OCF0A);
} // if
....
I've probably said enough
And yet I continue
Maybe you already know this, or maybe you already have a truckload of tiny85's you need to use, but I'm sure the attiny family of parts will also have something available (even in 8pins) that has an input capture feature which would make it easier to get a more accurate time at the point of the event (whatever it may be), and with higher resolution. You are finding out why they came up with the input capture feature.
You are finding out why they came up with the input capture feature.
Maybe, but for the moment, I've managed to quite easy extend time counter to....
64bit which allows collect microseconds for...more than 200000 years
32bit time counter could overflow after hour, but thanks to removeing "volatile" statement,
when time microseconds needed from last seond or so, assembler code looks much better while loading this 'avr_time_count_max'
variable to uint32_t variable - not needed LDS for hi32(avr_time_count) are not generated, so we are able to make "time traveling" hundreds/thousands years and still get event timimgs in the magnitude of microseconds when needed
#ifdef AVR_TIME_COUNTER_UINT64T
// (%i1) avr_time_count_max: 2^64;
// (%o1) 18446744073709551616
// (%i2) avr_time_count_max/1000000.0/83376000.0;
// (%o2) 221247.6500876697 [years]
static uint64_t avr_time_counter;
#else
// (%i3) avr_time_count_max: 2^32;
// (%o3) 4294967296
// (%i6) avr_time_count_max/1000000.0/3600;
// (%o6) 1.193046471111111 [hours]
static uint32_t avr_time_counter;
#endif // AVR_TIME_COUNTER_UINT64T
I hope, that harddware during ISR execution clears timer interrupt flag at the begining, but it is easy to debug, so time to run some code and see logic analyser outputs
BTW: It was supprised that there is no left shift with bits number as second parameter on those tiny "RISCy" AVR's
I can't get the idea, why would one need such large system tick counter.
Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.
uin16_t millisecond timer is sufficient for 65 seconds. If no time interval used in the application will be larger than that, there is little to no use of making it larger.
It overflows after the minute or so I hear you saying? So what? There's no problem, if you know how to write code correctly, to accommodate for the overflow.
Tip: if writing delays or checking for intervals, use condition written as this: (actual_time - time_stamp > delay_interval). This way the wrap-around will work correctly.
I can't get the idea, why would one need such large system tick counter.
Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.
That rather depends on the processor, of course.
The XMOS xCORE processors have 32 bit timers with a 40s interval - because they are counting instructions in their 100MHz/4000MIPS processor
s. They also have 16 bit timers on each I/O port counting I/O clock cycles at up to 250MHz. The combination of those timers and the architecuture means they can guarantee exactly when output will occur or when input did occur - and the program responds with a 10ns latency
I think that does not disprove anything I've stated above.
Any modern 32bit architecture provides means of clock counting.
And BTW, I wouldn't touch XMOS chips with a stick. Blargh.
I can't get the idea, why would one need such large system tick counter.
Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.
That rather depends on the processor, of course.
Not too much room for fancy playing with timers on 8 pin Attiny85 since they are 8bit, so in combination with clocks 8MHz/16MHz timer in CTC mode has compare much at 100-1/200-1 when 8 prescaler used, which means that this way just by cating timer counter TCNT0 time differences we have microsecond time differences in 100us window - 64bit time counter can be used to try catch events longer than 1 hour, but for shorter ime differences even 16bit time counter can be used by using only 2 bytes of longer time counter.
Regardless of how many bytes we will use to implement longer timer counter than hardware 8bit timers on this device,
still a key is to try synchronize reading extender counter with real hardware TCNT0 timer register if, we need something bigger than 100us time difference, so to be below milisecond we need another software time couner byte, so 64bit/32bit/16bit or even 48bit timer counter can be implemented in software depending on application.
Since, I've managed howto include in AVR C assembler code to quite easy increment in ISR timer counters of any byte size - eg. 64bit in this example code below, I can define if wanted eg. AVR_TIME_COUNTER_UINT48 for example and do not loose time for 64bits
//ISR
...
asm volatile (
"add %[ratc0], %[rinc] \n\t"
"adc %[ratc1], __zero_reg__ \n\t"
"adc %[ratc2], __zero_reg__ \n\t"
"adc %[ratc3], __zero_reg__ \n\t"
#ifdef AVR_TIME_COUNTER_UINT64T
"adc %[ratc4], __zero_reg__ \n\t"
"adc %[ratc5], __zero_reg__ \n\t"
"adc %[ratc6], __zero_reg__ \n\t"
"adc %[ratc7], __zero_reg__ \n\t"
#endif // AVR_TIME_COUNTER_UINT64T
"sts avr_time_counter, %[ratc0] \n\t"
"sts avr_time_counter+1, %[ratc1] \n\t"
"sts avr_time_counter+2, %[ratc2] \n\t"
"sts avr_time_counter+3, %[ratc3] \n\t"
#ifdef AVR_TIME_COUNTER_UINT64T
"sts avr_time_counter+4, %[ratc4] \n\t"
"sts avr_time_counter+5, %[ratc5] \n\t"
"sts avr_time_counter+6, %[ratc6] \n\t"
"sts avr_time_counter+7, %[ratc7] \n\t"
#endif // AVR_TIME_COUNTER_UINT64T
:
:
[rinc]"r"(rinc),
[ratc0]"a"(*(patc+0) ), [ratc1]"a"(*(patc+1)),[ratc2]"a"(*(patc+2)),[ratc3]"a"(*(patc+3))
#ifdef AVR_TIME_COUNTER_UINT64T
,[ratc4]"a"(*(patc+4)), [ratc5]"a"(*(patc+5)),[ratc6]"a"(*(patc+6)),[ratc7]"a"(*(patc+7))
#endif // AVR_TIME_COUNTER_UINT64T
);
...
reti
Regardless of how many bytes we will use to implement longer timer counter than hardware 8bit timers on this device,
still a key is to try synchronize reading extender counter with real hardware TCNT0 timer register if, we need something bigger than 100us time difference, so to be below milisecond we need another software time couner byte, so 64bit/32bit/16bit or even 48bit timer counter can be implemented in software depending on application.
Modern PIC16s let you concatenate hardware timers, so you can create a big timer in hardware. Not sure about 64-bit though.
Everything is related to time. You may have tasks which need very fine time resolution. Your code also executes in time. Therefore, if you want fine time resolution, you either need to count every cycle of your code, or you will need to use something which is not affected by code execution timing, such as CPP modules.
When you need to measure big periods of time, such as days, high resolution cannot be achieved, and even if it could, you usually don't need it for the tasks which measure time in days. Say, if you want to make backup every day, it doesn't matter if it is few minutes earlier or late.
Therefore, it is silly to use one single timer for everything. Chips will usually have multiple timers which you can use for different purposes.
Here is a proper suggestion for ATtiny85, for a 32-bit TIMER0 counter:
#include <stdint.h>
static volatile uint8_t avr_timer_updates[2];
static volatile uint32_t avr_timer_counter;
extern uint32_t get_timer(void);
extern uint32_t get_timer_coarse(void);
with the TIMER0 overflow interrupt, get_timer(), and get_timer_coarse() functions implemented in assembly in asm-timer0.s:
.file "asm-timer0.s"
; SPDX-License-Identifier: CC0-1.0
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__tmp_reg__ = 0
__zero_reg__ = 1
.text
;
; timer0 overflow interrupt vector
;
.global __vector_5
.type __vector_5, @function
__vector_5:
; ISR prolog
push r1
push r0
in r0, __SREG__
push r0
clr __zero_reg__
push r20
push r19
push r18
lds r20, avr_timer_updates+1
inc r20
sts avr_timer_updates+1, r20
in r19, 0x29 ; OCR0A
lds r18, avr_timer_counter+0
add r18, r19
sts avr_timer_counter+0, r18
brcc .done
lds r18, avr_timer_counter+1
inc r18
sts avr_timer_counter+1, r18
brne .done
lds r18, avr_timer_counter+2
inc r18
sts avr_timer_counter+2, r18
brne .done
lds r18, avr_timer_counter+3
inc r18
sts avr_timer_counter+3, r18
.done:
sts avr_timer_updates+0, r20
pop r18
pop r19
pop r20
; ISR epilog
pop r0
out __SREG__, r0
pop r0
pop r1
reti
.size __vector_5, .-__vector_5
.global get_timer
.type get_timer, @function
get_timer:
lds r21, avr_timer_updates+0
lds r22, avr_timer_counter+0
lds r23, avr_timer_counter+1
lds r24, avr_timer_counter+2
lds r25, avr_timer_counter+3
in r18, 0x32 ; r18 = TCNT0
in r19, 0x38 ; r19 = TIFR
sbrc r19, 1 ; TOV0
rjmp get_timer
lds r20, avr_timer_updates+1
cpse r20, r21
rjmp get_timer
add r22, r18
adc r23, __zero_reg__
adc r24, __zero_reg__
adc r25, __zero_reg__
ret
.size get_timer, .-get_timer
.global get_timer_coarse
.type get_timer_coarse, @function
get_timer_coarse:
lds r21, avr_timer_updates+0
lds r22, avr_timer_counter+0
lds r23, avr_timer_counter+1
lds r24, avr_timer_counter+2
lds r25, avr_timer_counter+3
lds r20, avr_timer_updates+1
cpse r20, r21
rjmp get_timer
ret
.size get_timer_coarse, .-get_timer_coarse
.comm avr_timer_counter,4,1
.comm avr_timer_updates,2,1
Just feed that asm-timer0.s file to avr-gcc as if it was a C file. Not tested on actual ATtiny85 hardware, but it does compile using old avr-gcc-4.9.2 (avr-gcc-4.9.2 -Wall -mmcu=attiny85 -c asm-timer0.s), and the logic is sound, but do beware of bugs.
The idea is that whenever an overflow interrupt occurs, the avr_timer_counter value is incremented by OCR0A. The second of the avr_timer_updates[2] bytes is incremented before the counter is incremented, and the first after the counter is incremented, so that readers can spin if an interrupt occurs.
The get_timer() function adds TCNT0 to the counter value, so the result is essentially the 32-bit TIMER0 virtual counter. It uses both the avr_timer_updates[2] guard bytes, and TOV0 bit in TIFR to detect if the combined 32-bit counter is valid. If interrupts occur too often, it might spin forever; so test before use.
The get_timer_coarse() function omits the TCNT0 and TOV0 bit checks, and so is more lightweight, although the value is coarser. Although the timers are derived from the same source, you should not mix the values, unless you are prepared for get_timer_coarse() < get_timer() even if obtained at the very same moment somehow.
The timer ISR itself is a bit tricky, as (256/OCR0A) of ticks only take 25 instructions (I didn't bother to calculate cycle counts), and uses only six bytes of stack. Of the other cases, it takes 29, 33, or 36 instructions. If the jitter a variable-duration TIMER0 ISR is problematic, the code can be changed to fixed 33 instructions instead (no jumps nor conditional jumps), using
;
; timer0 overflow interrupt vector
;
.global __vector_5
.type __vector_5, @function
__vector_5:
; ISR prolog
push r1
push r0
in r0, __SREG__
push r0
clr __zero_reg__
push r20
push r19
push r18
lds r20, avr_timer_updates+1
inc r20
sts avr_timer_updates+1, r20
in r19, 0x29 ; OCR0A
lds r18, avr_timer_counter+0
add r18, r19
sts avr_timer_counter+0, r18
lds r18, avr_timer_counter+1
adc r18, __zero_reg__
sts avr_timer_counter+1, r18
lds r18, avr_timer_counter+2
adc r18, __zero_reg__
sts avr_timer_counter+2, r18
lds r18, avr_timer_counter+3
adc r18, __zero_reg__
sts avr_timer_counter+3, r18
sts avr_timer_updates+0, r20
pop r18
pop r19
pop r20
; ISR epilog
pop r0
out __SREG__, r0
pop r0
pop r1
reti
.size __vector_5, .-__vector_5
I have no idea which one performs better in practice.
Note that lds+adc+sts pattern that uses only one register to update all bytes in a multibyte integer works, because neither lds nor sts modify the carry flag; only adc does. I think avr-gcc only uses N registers for N-byte integers, because that way the externally visible change occurs in one short window, shortening race windows. In my case, using the two generation/update counters and spinning until the match avoid any need for that. In an ISR, it is useful because it lessens the amount of stack used. Probably could reduce the stack use even more, but I just grabbed the ISR prolog and epilog from what avr-gcc generates for ISR(TIMER0_OVF_vect) { ... } when using #include <avr/interrupt.h>.
Note that lds+adc+sts pattern that uses only one register to update all bytes in a multibyte integer works, because neither lds nor sts modify the carry flag; only adc does. I think avr-gcc only uses N registers for N-byte integers, because that way the externally visible change occurs in one short window, shortening race windows.
I was thinking about lds+adc+sts pattern, but still learning howto include assembler code inside AVR C code and strugled to manage pass "avr_time_counter" to get given byte from multibyte uint32_t ot uint64_t to register,
so for the moment assembler listing of ISR looks like this and of course the less "push/pop" stack operations the better since it costs 2 cycles for each.
// ISR
...
51 0034 E0E0 ldi r30,lo8(avr_time_counter)
52 0036 F0E0 ldi r31,hi8(avr_time_counter)
53 0038 2081 ld r18,Z
54 003a 3181 ldd r19,Z+1
55 003c 4281 ldd r20,Z+2
56 003e 5381 ldd r21,Z+3
57 0040 6481 ldd r22,Z+4
58 0042 7581 ldd r23,Z+5
59 0044 1681 ldd r17,Z+6
60 0046 0781 ldd r16,Z+7
61 0048 84E6 ldi r24,lo8(100)
62 /* #APP */
63 ; 277 "avr_utils.c" 1
64 004a 280F add r18, r24
65 004c 311D adc r19, __zero_reg__
66 004e 411D adc r20, __zero_reg__
67 0050 511D adc r21, __zero_reg__
68 0052 611D adc r22, __zero_reg__
69 0054 711D adc r23, __zero_reg__
70 0056 111D adc r17, __zero_reg__
71 0058 011D adc r16, __zero_reg__
...
Note, that those ldi/ld/ldd inline assembler code was automatically added by AVR C compiler,
when I've those registers like shown a few posts above:
...
asm volatile(....:: [ratc0]"a"(*(patc+0) ), [ratc1]"a"(*(patc+1)),[ratc2]"a"(*(patc+2)),[ratc3]"a"(*(patc+3)),... );
...
Not sure howto implement in inline assembler something like this:
asm volatile( "lds %[ratcX], avr_time_counter+X \n\t sts avr_time_counter, %[ratcX] " :???:??? );
where "ratcx" in Nx byte from N byte time counter
Update: Never mind - I've found answer howto use LDS in inline AVR C assebler here:
https://www.avrfreaks.net/forum/inline-asm-3static inline void test ( uint8_t ok );
uint8_t input = 100;
static inline void test ( uint8_t ok ){
asm volatile(
"\n\t"
"lds %[ok], input" "\n\t"
: [ok] "=d" ( ok )
:
);
}
Futher optimisations to timer ISR can be made if needed easy now to support any multibyte time counter
Anyway, it is yet another optimisation possible, but I believe in that TCNT0 should be enougth to make decent guard and more - I'm more interest in timing differences when time retrieval function was called, so the only way to do so is I think is to use TCNT0 when this time get function is called, so any other updates to TCNT0 in hardware while processing additional code in get time function, especially reading TCNT0 and adding changed TCNT0 values to software time counter, while time is still running is not too good, so in my implementation in the case when get time function might be hit while processing code after TCNT0 was read at the begining by ISR handling software time counter (whatever size in bytes it has) I simply try to correct this - when ISR finishes we have updated timer counter and TCNT0 starting count from 0 again in timer CTC mode, so we might have quite decent time, but it may be different (longer) than when we called get time function, since you are in do-while loop while waiting for ISR to complete
That is why I've marked in one of those assembler listings of ISR "Patent pending
", just for fun, to send a message that something else is going there in my get time implementation since this is what I'm interested in to achieve with extended software time counter and real time TCNT0 update by timer hardware in microsecond intervals on 8MHz Attiny85 @ 10kHz timer ISR
48bit time counter ISR with 100 increment for 8MHz @ 10kHz timer now looks good, while written in AVR C inline assembler - only one register r24 needed to push/pop and temporary r0 as well as zero register r1
Time counter stored as "uint64_t", but in ISR only 6 bytes incremented which is enougth even while storing microseconds there for more than a year:
// (%i2) avr_time_count_max: 2^48;
// (%o2) 281474976710656
// (%i3) avr_time_count_max/1000000.0/83376000.0;
// (%o3) 3.375971223261562 [years]
I can't get the idea, why would one need such large system tick counter.
Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.
uin16_t millisecond timer is sufficient for 65 seconds. If no time interval used in the application will be larger than that, there is little to no use of making it larger.
It overflows after the minute or so I hear you saying? So what? There's no problem, if you know how to write code correctly, to accommodate for the overflow.
Tip: if writing delays or checking for intervals, use condition written as this: (actual_time - time_stamp > delay_interval). This way the wrap-around will work correctly.
Not actually true for a free running timer, which is what you are describing in the last sentence. The longest time you can cover from a 16 bit free running timer incrementing a 1ms interval is 32.7 seconds.
Why so?
If I make interval > 1/2 of the range, then it still wraps correctly.
For example having a 3bit counter, I can still use that to produce interval over 3, say for example 5.
So for example if the timestamp is 2, then at time of 7 i get the difference of 5, at time of 0 I get 6, works correctly.
If I pick another timestamp, for example 5, then at time of 2 I get the difference of 5, at time of 3, I get 6.
So where do you see the problem? cause I don]t.