EEVblog Electronics Community Forum

Electronics => Microcontrollers => Topic started by: beduino on January 14, 2019, 12:26:09 am

Title: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 14, 2019, 12:26:09 am
Hello,
Just trying to tell somehow AVR C to reuse x2 in code below for faster
multiply by 100, while x2 can be reused and instead of 13 shifts left,
we can have only 6 as C code suggest,
but I still get in assembled listing ugly looking compilator assembler code
with 3 loops for 2,5,6 left shifts  :o
(https://www.eevblog.com/forum/microcontrollers/fast-unsigned-integer-multiply-by-x100-on-8bit-avr/?action=dlattach;attach=622792)
It is code for ATTiny85 with optimisation for size enabled,
however can not figure out howto get code with x2 reused and x5,
while x6 is only 1 left shift more  :palm:
(https://www.eevblog.com/forum/microcontrollers/fast-unsigned-integer-multiply-by-x100-on-8bit-avr/?action=dlattach;attach=622798)
Probably I will try rewrite this part in assembler, but it puzles me fora while why such ugly assembler code we get,
while I've suggested in C language howto make this code more eficcient or maybe not?  :-//
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Yansi on January 14, 2019, 12:47:47 am
Why in hell would you want that, if AVR already has a two cycle 8x8 multiplier  :o

//EDIT: Sorry, I forgot that the "tinyAVR" garbage can't multiply. :-/  It's been a while I toyed with them.

Better to ask question then, why do you need to multiply by a 100? (maybe the task can be optimized in other ways)
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: NorthGuy on January 14, 2019, 12:49:48 am
If you just use multiplication, good chance the C compiler can figure out everything by itself.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 14, 2019, 01:06:58 am
If you just use multiplication, good chance the C compiler can figure out everything by itself.
Nope, __mulsi3  is used when we let C compiler for too much  :-DMM

Code: [Select]
return (x*100 +x0);
Any other ideas?
BTW: cut down to 16bit probably will also be fine instead of 32bit, while microsecond time is needed at time periods for frequencies higher than 100Hz, but the same - bloody C compiler generate the same ugly looking loops for x5,x6 - only x2looks good while it  is replaced by 2 inline shifts  :-/O
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Yansi on January 14, 2019, 01:10:05 am
So what are you trying to achieve with this. Stop being secret. Otherwise we can't help much.

More code is needed, than just return (x*100 + x0)

What is the range of X and X0? Why does it need to get multiplied? What does it calculate. There may be better ways to implement that.


Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Rerouter on January 14, 2019, 01:31:21 am
Not that good at reading avr assembler. But it would be number in. Shift twice. Copy result to another register. Shift 3 times. Copy to third register. Shift once. Add all 3 registers.

Could this method not condense your assembler. If you need more registers. You always have the 3 gpio registers that dont get touched by the c compiler.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: blacksheeplogic on January 14, 2019, 01:31:44 am
My guess glancing at the output although I don't know the compiler being used is that the optimizer is reducing register dependencies.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: IanB on January 14, 2019, 01:35:57 am
Hello,
Just trying to tell somehow AVR C to reuse x2 in code below for faster
multiply by 100, while x2 can be reused and instead of 13 shifts left,
we can have only 6 as C code suggest,
but I still get in assembled listing ugly looking compilator assembler code
with 3 loops for 2,5,6 left shifts  :o
(https://www.eevblog.com/forum/microcontrollers/fast-unsigned-integer-multiply-by-x100-on-8bit-avr/?action=dlattach;attach=622792)
It is code for ATTiny85 with optimisation for size enabled,
however can not figure out howto get code with x2 reused and x5,
while x6 is only 1 left shift more  :palm:

Firstly, this seems to be an 8-bit micro, and you are trying to work with 32 bit values. So that is bound to lead to complex code since the 32 bit quantities have to be split into 8 bit chunks.

Secondly, have you tried writing code that looks more like this: ?

Code: [Select]
  uint32_t x0 = TCNT0;

  uint32_t x = avr_time_counter;

  x <<= 1;
  x <<= 1;
  x0 += x;

  x <<= 1;
  x <<= 1;
  x <<= 1;
  x0 += x;

  x <<= 1;
  x0 += x;

  return x0;
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: sleemanj on January 14, 2019, 02:02:47 am
You could use a volatile variable to "decouple" the stages, forcing the compiler to save and read the result between each step...

Code: [Select]
  volatile uint32_t t_x;

  x2 = x<<2; // First Step

  t_x = x2; // Save it
  x5 = t_x; // Read it
  x5 = x5 << 3; // Next Step

  t_x = x5; // Save it
  x6 = t_x; // Read it
  x6 = x6 << 1; // Last step


Should work I expect, of course, shuffling that stuff back and forth between ram and registers won't be likely to save you a lot.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: T3sl4co1l on January 14, 2019, 02:42:49 am
You told it to optimize for size; I don't see that chaining the shifts saves much space compared to what it's produced.  (Changing the last one from six to one shifts would save the last loop, at least.) Reducing register pressure might help, but who knows.  That isn't obvious from the short section, at least.

Compilers are smart enough to implement constants in non-obvious ways, like this.  They're also smart enough to know if a drop-in library routine is better.

You can try link-time optimization and expensive optimizations (-flto, -fexpensive-optimizations) too, see if that helps any.

If you explicitly want it written as a loop, you should probably start with it that way, and let the compiler unroll if it wants to.  Example:
Code: [Select]
for (i = 0x1e; i; i >>= 2) {
    x0 >>= (i & 0x03);
    x1 += x0;
}
Note the magic number 0x1e coding for the number of shifts to perform, as packed 2-bit integers.  Just the first thing that came to mind, maybe a bit too perverse for readability/maintainability's sake, but it should compile to almost exactly the function you were expecting. :)

Tim
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Yansi on January 14, 2019, 02:58:30 am
Impressive trick with the for cycle there. :-+
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: cv007 on January 14, 2019, 05:41:24 am
I could be way off, and don't really know what you are doing, but maybe you can rethink the problem-

assuming TCNT0 will be 0-99 (some timer compare mode), and you are incrementing 'avr_time_counter' in an interrupt,
and it also seems you want to be able to get the 'current time'

this is what it seems you are doing-

tcnt0_compare_match_isr(){  avr_time_counter++; } //inc counter every overflow
uint32_t get_time(){  return avr_time_counter*100 + TCNT0; } //but don't want *100

if so, maybe this would make more sense-

tcnt0_compare_match_isr(){  avr_time_counter += 100; } //inc by 100, not too painful

uint32_t get_time(){
    uint32_t t0; uint8_t t1;
    do{
        t1 = TCNT0;
        t0 = avr_time_counter;
    }while(t1 < TCNT0); //do again if overflowed
    return t0+t1;
}

maybe not any better, but maybe it is (the get_time() could probably be better, just showing that TCNT0 can 'overflow' while dealing with these multi-byte numbers and could end up having a mismatch of tcnt0 and its bigger counter, so simply check if tcnt0 overflowed in the process and do again if so)

I'm sure there are also ways to stay with base 2 numbers (and its advantages) when you think you have to use base 10. That's probably not thought about much anymore. but when dealing with an attiny or a smaller pic, it sometimes takes a little effort and creative thinking to keep the multiply/divide code from showing up.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: westfw on January 14, 2019, 08:22:05 am
If people are going to play this game, they need to attach the code produced, preferably with instruction and cycle counts...
This bit has the advantage of being extremely obvious.  Both multiplies are optimized to inline code:
Code: [Select]
int main() {    // multiply an 8bit number by 100, by first    //  multiplying by 25 (to yield a 16bit number)    //   and then by 4 (to yield 24 bits.)
    uint16_t x = PORTB;
    __uint24 x1 = x*25;  // always fits in 16bits.
    x1 *= 4;    // fits in 24 bits...
    PORTB = x1;
    PORTB = x1>>8;   // output all the bits...
    PORTB = x1>>16;
}
Produces:
Code: [Select]
   0:   88 b3           in      r24, 0x18       ; 24
    __uint24 x1 = x*25;
   2:   90 e0           ldi     r25, 0x00       ; extend to 16 bits.
   4:   9c 01           movw    r18, r24
   6:   22 0f           add     r18, r18
   8:   33 1f           adc     r19, r19
   a:   22 0f           add     r18, r18
   c:   33 1f           adc     r19, r19
   e:   82 0f           add     r24, r18
  10:   93 1f           adc     r25, r19
  12:   9c 01           movw    r18, r24
  14:   22 0f           add     r18, r18
  16:   33 1f           adc     r19, r19
  18:   22 0f           add     r18, r18
  1a:   33 1f           adc     r19, r19
  1c:   82 0f           add     r24, r18
  1e:   93 1f           adc     r25, r19
  20:   a0 e0           ldi     r26, 0x00       ; extend to 24 bits.
    x1 *= 4;   
  22:   88 0f           add     r24, r24
  24:   99 1f           adc     r25, r25
  26:   aa 1f           adc     r26, r26
  28:   88 0f           add     r24, r24
  2a:   99 1f           adc     r25, r25
  2c:   aa 1f           adc     r26, r26
    PORTB = x1;
  2e:   88 bb           out     0x18, r24       ; 24
That seems pretty good.Note that __uint24 is pretty new.
Annoyingly, there is no uint24_t.   Also, gcc will will not inline a __uint24 * 10.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: westfw on January 14, 2019, 11:24:56 am
Wait a minute.  Why are we calculating this with a 32bit integer?  16 is plenty for 255*100

Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: cv007 on January 14, 2019, 04:13:30 pm
Quote
Why are we calculating this with a 32bit integer?  16 is plenty for 255*100
He is adding avr_time_counter to tcnt0 to get some kind of system time. So its not an 8bit number*255, its a larger number*100 + 0 to 255 (or to 99 I presume, otherwise it would be a little odd). How large a number is needed, we don't know.

Ultimately, I think his bigger problem is yet unseen- reading TCNT0 while its running and adding to presumably an interrupt incremented 32/24/16bit number. As I have shown, that's easily dealt with, but if not dealt with incorrect times will show up occasionally (and in many cases, very incorrect).
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Yansi on January 14, 2019, 04:15:54 pm
Well, I think I might now understand what he is doing.

Why don't you just increment the 32bit system time variable by 100 by default, instead of 1? (You would not then need to multiply by a 100).
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 14, 2019, 05:27:50 pm
Thanks everyone for brain storm which during lunch get me idea of trying to tell AVR C compiler to do something simpler like
instead of multiplay by 100, just write optimised multiply by 10 and... input result of one to another to get x100= 10x10  :phew:
x10= x2 * 5, x2= (x<<1)
Code: [Select]
// Multiply by 100
inline uint32_t avr_x100_uint32(uint32_t x ) {

x= avr_x10_uint32(x ); // x10
x= avr_x10_uint32(x ); // x10

return x;
}

Hopefuly, when wrote this crude simply x10=x2*5 this way  :-DD
Code: [Select]
// Multiply by 10
inline uint32_t avr_x10_uint32(uint32_t x ) {

uint32_t x2= (x<<1); // x2

x= x2;
x+= x2;
x+= x2;
x+= x2;
x+= x2;

return x;
}

Now, no complains for AVR C optimizer which done pretty smart code from this crude brute force aproach shown above  8) Only one loop with 2 iterations for shift by 2 - x4 , but didn't even tried understand whole code generated, but it looks much better than those bloody a few loops 2,5,6  :)
Code: [Select]
  85                .global avr_time_us_get
  87                avr_time_us_get:
  88 007c 0F93      push r16
  89 007e 1F93      push r17
  90                /* prologue: function */
  91                /* frame size = 0 */
  92                /* stack size = 2 */
  93                .L__stack_usage = 2
  94 0080 42B7      in r20,0x32
  95 0082 0091 0000 lds r16,avr_time_counter
  96 0086 1091 0000 lds r17,avr_time_counter+1
  97 008a 2091 0000 lds r18,avr_time_counter+2
  98 008e 3091 0000 lds r19,avr_time_counter+3
  99 0092 000F      lsl r16
 100 0094 111F      rol r17
 101 0096 221F      rol r18
 102 0098 331F      rol r19
 103 009a D901      movw r26,r18
 104 009c C801      movw r24,r16
 105 009e 52E0      ldi r21,2
 106                1:
 107 00a0 880F      lsl r24
 108 00a2 991F      rol r25
 109 00a4 AA1F      rol r26
 110 00a6 BB1F      rol r27
 111 00a8 5A95      dec r21
 112 00aa 01F4      brne 1b
 113 00ac 800F      add r24,r16
 114 00ae 911F      adc r25,r17
 115 00b0 A21F      adc r26,r18
 116 00b2 B31F      adc r27,r19
 117 00b4 880F      lsl r24
 118 00b6 991F      rol r25
 119 00b8 AA1F      rol r26
 120 00ba BB1F      rol r27
 121 00bc 8C01      movw r16,r24
 122 00be 9D01      movw r18,r26
 123 00c0 000F      lsl r16
 124 00c2 111F      rol r17
 125 00c4 221F      rol r18
 126 00c6 331F      rol r19
 127 00c8 040F      add r16,r20
 128 00ca 111D      adc r17,__zero_reg__
 129 00cc 211D      adc r18,__zero_reg__
 130 00ce 311D      adc r19,__zero_reg__
 131 00d0 080F      add r16,r24
 132 00d2 191F      adc r17,r25
 133 00d4 2A1F      adc r18,r26
 134 00d6 3B1F      adc r19,r27
 135 00d8 080F      add r16,r24
 136 00da 191F      adc r17,r25
 137 00dc 2A1F      adc r18,r26
 138 00de 3B1F      adc r19,r27
 139 00e0 BC01      movw r22,r24
 140 00e2 CD01      movw r24,r26
 141 00e4 600F      add r22,r16
 142 00e6 711F      adc r23,r17
 143 00e8 821F      adc r24,r18
 144 00ea 931F      adc r25,r19
 145                /* epilogue start */
 146 00ec 1F91      pop r17
 147 00ee 0F91      pop r16

Ultimately, I think his bigger problem is yet unseen- reading TCNT0 while its running and adding to presumably an interrupt incremented 32/24/16bit number. As I have shown, that's easily dealt with, but if not dealt with incorrect times will show up occasionally (and in many cases, very incorrect).
Yep, It is another concern - howto synchronize to get TCNT0 and avr_time_counter not corrupted during ISR compare match - exactly how someone noticed, it iscompare match  at 99, since at 8Mhz system clock with 8 timer prescaler we have 1MHz, so time tick is 10kHz, so we need to multiply by 100 avr_time_clock to have estimated time in microseconds used rather for time difference than exact timing, while of course here will be some latency in ISR (avr_time_count++ only btw).

Looking for a way to check somehow while readint TCNT0 and avr_time_counter to maybe wait until timer compare match ISR (avr_time_counter++) completes - there should be somewhere in AVR register flag than timer 0 compare match is executed?  :-/O

If someone already did such tricky thing on ATTiny85 let us know...[/code]
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: cv007 on January 14, 2019, 05:41:38 pm
Quote
avr_time_count++ only btw
As I already showed you, avr_time_count += 100 would work just as well and eliminate any need for *100. You do get one extra instruction in the isr because of that. One.

Quote
Looking for a way to check
I already showed that.

Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Kleinstein on January 14, 2019, 05:47:52 pm
Reading the counter, that is extended in resolution while there is a possible update from an ISR in between is a known problem.
A possible solution is to first stop interrupts, than read the timer register (= low byte) and software counter, than check the interrupt flag. One needs to correct the software counter (+1) fIf there is an ISR pending and the reading from the timer is low (just past causing the interrupt). If there is an interrupt pending but the counter value is high, the interrupt comes after the current reading and thus no correction needed.
After that interrups can be enabled again.

If x100 takes too much time - why do a divide by 100 with the timer at all. x64 or  x128 are easier.
The 8 bit timers also usually allow for quite some prescaler values.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: kosine on January 14, 2019, 05:49:59 pm
Just a thought, but how accurate do you need this to be?

If you're running the chip using the internal clock it may be off by 5%, and will likely vary between chips. If that's tollerable, then maybe just x96 instead of x100.

If it needs to be more accurate, then an external crystal may be required, in which case maybe use one that divides by 2.
____

I just checked the ATtiny85 datasheet. Section 6.1.6 mentions that it has a compatibility mode for ATtiny15 that recalibrates the internal RC oscillator to 6.4MHz. This is divided by 4 to run the system clock at 1.6MHz. Not sure if that will help, but might be an easier frequency to work with.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Kilrah on January 14, 2019, 05:58:07 pm
Typical XY problem (http://xyproblem.info/). OP was asked several times what he actually wants to do, but instead of answering that insists on his x100 that may not even be a good solution to the actual problem in the first place.

So post the full picture if you want good answers, until then everyone's jsut wasting their time.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: NorthGuy on January 14, 2019, 06:09:56 pm
Typical XY problem (http://xyproblem.info/). OP was asked several times what he actually wants to do, but instead of answering that insists on his x100 that may not even be a good solution to the actual problem in the first place.

So post the full picture if you want good answers, until then everyone's jsut wasting their time.

I don't agree with that. Besides OP, there are hundreds of other people who read the thread, and there are even more people who will read this thread in the future. They're not interested in the OP problem (whatever it is). They're presumably interested in multiplication by 100. If, instead, they find a solution to the OP problem, it will be mostly useless to them - they will waste their time, and more and more people will continue reading this thread and wasting their time.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Kilrah on January 14, 2019, 06:27:07 pm
That's a weird way to think about it...

OK going about the x100 thing might help future viewers, but AFAIK the main purpose of this thread isn't to give something to a potential future viewer, it's to solve OP's problem they have now. And a x100 may not be a good solution to solve that problem... and we aren't going to even know whether it is until they properly describe their actual goal.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: snarkysparky on January 14, 2019, 06:45:39 pm
I say answer the OP question as he posted it.  Maybe I want to marry a chicken with a porcupine ,  don't ask why,  if you don't have any approaches then don't reply.

Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 14, 2019, 07:57:56 pm
tcnt0_compare_match_isr(){  avr_time_counter += 100; } //inc by 100, not too painful
Sorry, didn't notice in a hurry earlier that we have +=100 while was so shocked that AVR C generator was not able generate code with less amount of paintfull loops but 2,5,6 for shifting as mentioned before  :-+

Trying to figure out how this loop showed in your code might help ensure we have consistent timer counter and timer counter.
Maybe by using this:
[qote]OCF0A: Output Compare Flag 0 A[/qote]
from
[qote]TIFR – Timer/Counter Interrupt Flag Register[/qote]
while according to Attiny85 datasheet:
[qote]
The OCF0A bit is set when a Compare Match occurs between the Timer/Counter0 and the data in OCR0A – Out-
put Compare Register0. OCF0A is cleared by hardware when executing the corresponding interrupt handling
vector.
[/qote]
so maybe this OCF0A flag could be usefull, but unsure whether is it cleared by hardware at the end of ISR when "reti" is called from ISR, or earlier at the begining of ISR handling  :-\

(https://www.eevblog.com/forum/microcontrollers/fast-unsigned-integer-multiply-by-x100-on-8bit-avr/?action=dlattach;attach=623584)
Anyway, at least 8MHz @ 3.3Vcc or 16MHz @ 5Vcc F_CPU usually will be used, so different system clock is not an option.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: IanB on January 14, 2019, 08:03:08 pm
Sorry, didn't notice in a hurry earlier that we have +=100 while was so shocked that AVR C generator was not able generate code with less amount of paintfull loops but 2,5,6 for shifting as mentioned before  :-+

You asked it to optimize for size, which will request it to produce the most compact code. In general loops are more compact than other forms of repetition so optimizing for size may favor loops.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: cv007 on January 14, 2019, 08:20:06 pm
Quote
Trying to figure out how this loop showed in your code might help ensure we have consistent timer counter and timer counter.
Well, I did have < wrong (should have been t1 > TCNT0)
Code: [Select]
uint32_t get_time(){
    uint32_t t0; uint8_t t1;
    do{
        t1 = TCNT0;
        t0 = avr_time_counter;
    }while(t1 > TCNT0); //do again if overflowed
    return t0+t1;
}



/*
00000006 <get_time>:
   6:   22 b7           in      r18, 0x32       ; 50
   8:   80 91 60 00     lds     r24, 0x0060     ; 0x800060 <_edata>
   c:   90 91 61 00     lds     r25, 0x0061     ; 0x800061 <_edata+0x1>
  10:   a0 91 62 00     lds     r26, 0x0062     ; 0x800062 <_edata+0x2>
  14:   b0 91 63 00     lds     r27, 0x0063     ; 0x800063 <_edata+0x3>
  18:   32 b7           in      r19, 0x32       ; 50
  1a:   32 17           cp      r19, r18
  1c:   a0 f3           brcs    .-24            ; 0x6 <get_time>
  1e:   68 2f           mov     r22, r24
  20:   79 2f           mov     r23, r25
  22:   8a 2f           mov     r24, r26
  24:   9b 2f           mov     r25, r27
  26:   62 0f           add     r22, r18
  28:   71 1d           adc     r23, r1
  2a:   81 1d           adc     r24, r1
  2c:   91 1d           adc     r25, r1
  2e:   08 95           ret
*/
It simply gets 'avr_time_counter' (which has to be a volatile, I assume it is), and also gets TCNT0, then it gets TCNT0 again and if the latest version (TCNT0) is less than t1, TCNT0 must have overflowed so do again. When TCNT0 >= t1, no overflow could have happened (not really true, but assuming you have no other interrupts that can exceed 10ms).

t1 = 99
t0 = 12300
//TCNT0 just rolled over, fired the isr, and is now back here
t1 > TCNT0 ? yes, do again - t1 = 99, TCNT now is 0 (less than 99 anyway)

t1 = 1
t0 = 12400
t1 > TCNT0 ? no, all done, no rollover, t1 = 1, TCNT0 = 3 (lets say)

Quote
but unsure whether is it cleared by hardware at the end of ISR when "reti" is called from ISR, or earlier at the begining of ISR handling
I would dare say that flag is already clear upon entering the isr as hardware clears it 'when executing' the vector- which also means if global interrupts not enabled, no vector execute, no flag clear (whcih means you can poll for it and clear  it yourself if interrupts not used).

I don't have avr hardware (that I want to dig out), but I think the isr simply becomes-

ISR(TIM0_COMPA_vect){
    avr_time_counter += 100;
}

for a pic16, I use the nco as a system clock, and essentially do the same thing-
https://github.com/cv007/SNaPmate/blob/c334890d41945d3a2af0ab1c50772f087cf514a8/nco.c#L76
the nco has a 20bit counter with 2us resoltion (internal 500Khz clock), I keep track of overflows and inc a counter +16, then when time wanted I do the function linked above, and shift my counter up into the upper 12 bits, to get 32bits total
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: bingo600 on January 14, 2019, 09:13:31 pm
The Germans dis some neat 64bit routines here (use google xlate)
https://www.mikrocontroller.net/topic/237643?reply_to=2411363#2411398 (https://www.mikrocontroller.net/topic/237643?reply_to=2411363#2411398)

Watch out for "linker trickery" , if replacing the built in routines w. the asm code,

/Bingo
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 14, 2019, 09:54:08 pm
Quote
Trying to figure out how this loop showed in your code might help ensure we have consistent timer counter and timer counter.
Well, I did have < wrong (should have been t1 > TCNT0)
Yep, it looked strange because of at 10kHz timer CTC with 8MHz F_CPU system clock, ISR should complete in fraction of time tick period - by time tick I mean ISR call btw..

I've changed ISR code and +=100 looks like this:
Code: [Select]
  30                __vector_10:
  31 0014 1F92      push r1
  32 0016 0F92      push r0
  33 0018 0FB6      in r0,__SREG__
  34 001a 0F92      push r0
  35 001c 1124      clr __zero_reg__
  36 001e 8F93      push r24
  37 0020 9F93      push r25
  38 0022 AF93      push r26
  39 0024 BF93      push r27
  40                /* prologue: Signal */
  41                /* frame size = 0 */
  42                /* stack size = 7 */
  43                .L__stack_usage = 7
  44 0026 8091 0000 lds r24,avr_time_counter
  45 002a 9091 0000 lds r25,avr_time_counter+1
  46 002e A091 0000 lds r26,avr_time_counter+2
  47 0032 B091 0000 lds r27,avr_time_counter+3
  48 0036 8C59      subi r24,-100
  49 0038 9F4F      sbci r25,-1
  50 003a AF4F      sbci r26,-1
  51 003c BF4F      sbci r27,-1
  52 003e 8093 0000 sts avr_time_counter,r24
  53 0042 9093 0000 sts avr_time_counter+1,r25
  54 0046 A093 0000 sts avr_time_counter+2,r26
  55 004a B093 0000 sts avr_time_counter+3,r27
  56                /* epilogue start */
  57 004e BF91      pop r27
  58 0050 AF91      pop r26
  59 0052 9F91      pop r25
  60 0054 8F91      pop r24
  61 0056 0F90      pop r0
  62 0058 0FBE      out __SREG__,r0
  63 005a 0F90      pop r0
  64 005c 1F90      pop r1
  65 005e 1895      reti

When added at the begining of avr_time_us_get() wait for OCF0A cleared in TIFR by ISR during its execution, since OCF0A is set when TCNT0 is reset to 0 in CTC as shown in attached image from Atmega328 intro to interrupts
Code: [Select]
//nop();
uint8_t isr_is;
do {
isr_is= ((TIFR & (1<<OCF0A) )==0 ? 0 : 1 );
} while (isr_is );
//nop();

without any do-while loop it looks like this, but have no idea for the moment
howto OCF0A set in TIFR when TCNT0 becomes 0 could be usefull if at all in this function:
Code: [Select]
  87                avr_time_us_get:
  88                /* prologue: function */
  89                /* frame size = 0 */
  90                /* stack size = 0 */
  91                .L__stack_usage = 0
  92                /* #APP */
  93                ;  12 "avr_utils.c" 1
  94 007c 0000      nop
  95                ;  0 "" 2
  96                /* #NOAPP */
  97                .L7:
  98 007e 08B6      in __tmp_reg__,0x38
  99 0080 04FC      sbrc __tmp_reg__,4
 100 0082 00C0      rjmp .L7
 101                /* #APP */
 102                ;  12 "avr_utils.c" 1
 103 0084 0000      nop
 104                ;  0 "" 2
 105                /* #NOAPP */
 106 0086 22B7      in r18,0x32
 107 0088 6091 0000 lds r22,avr_time_counter
 108 008c 7091 0000 lds r23,avr_time_counter+1
 109 0090 8091 0000 lds r24,avr_time_counter+2
 110 0094 9091 0000 lds r25,avr_time_counter+3
 111 0098 620F      add r22,r18
 112 009a 711D      adc r23,__zero_reg__
 113 009c 811D      adc r24,__zero_reg__
 114 009e 911D      adc r25,__zero_reg__
 115 00a0 0895      ret

We can see that this check for this flag set can be very fast, but probably useless since ISR clear this flag during its execution.
Code: [Select]
  97                .L7:
  98 007e 08B6      in __tmp_reg__,0x38
  99 0080 04FC      sbrc __tmp_reg__,4
 100 0082 00C0      rjmp .L7

I wouldn't like to disable global interrupts and make things simple,
so I will look closer to your approach for this synchronization if it fits my needs, but I do not like this do-while loop,
since I'd like to catch time difference as fast as possible, so there should be enougth time to do some additional computations, so even x100 multiply shouldn't be such horrible, but by using +=100 trick in ISR now the onlly thing is to get consistent not corrupted "avr_time_counter" 32bit value  :-/O
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 14, 2019, 09:59:17 pm
The Germans dis some neat 64bit routines here (use google xlate)
Thanks for this hint, anyway I do not speak or read German  ;)
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: NorthGuy on January 14, 2019, 10:05:33 pm
Can you make your timer roll over at 256 ticks rather than at 100? This way you eliminate the need for multiplication completely.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: cv007 on January 14, 2019, 10:29:52 pm
Quote
but probably useless since ISR clear this flag during its execution
You cannot make use of that flag if using the isr.

Quote
but I do not like this do-while loop
it only repeats if there is a problem. Most of the time it will not repeat, but if you happen to hit the wrong time, you will get 1 repeat. Hardly a deal killer to get the correct time, and you have to do 'something' about the problem.

Quote
Can you make your timer roll over at 256 ticks rather than at 100? This way you eliminate the need for multiplication completely.
That's probably a better idea- just let the counter run and use the overflow irq, increment the counter by 256 in the isr. Probably not a big difference, though (but still better). (the multiplication is eliminated in the +=100 version also, by the way)

Code: [Select]
//obviously a lot is missing- like timer setup, and so on, this is minimal to show the idea

#include <avr/io.h>
#include <avr/interrupt.h>

volatile uint32_t avr_time_counter;

uint32_t get_time(){
    uint32_t t;
    do{
        t = TCNT0 | avr_time_counter;
    }while((uint8_t)t > TCNT0); //do again if overflowed
    return t;
}

int main(void) {}

ISR(TIM0_OVF_vect){
    avr_time_counter += 256;
}

/*
00000006 <get_time>:
   6:   22 b7           in      r18, 0x32       ; 50
   8:   80 91 60 00     lds     r24, 0x0060     ; 0x800060 <_edata>
   c:   90 91 61 00     lds     r25, 0x0061     ; 0x800061 <_edata+0x1>
  10:   a0 91 62 00     lds     r26, 0x0062     ; 0x800062 <_edata+0x2>
  14:   b0 91 63 00     lds     r27, 0x0063     ; 0x800063 <_edata+0x3>
  18:   68 2f           mov     r22, r24
  1a:   79 2f           mov     r23, r25
  1c:   8a 2f           mov     r24, r26
  1e:   9b 2f           mov     r25, r27
  20:   62 2b           or      r22, r18
  22:   22 b7           in      r18, 0x32       ; 50
  24:   26 17           cp      r18, r22
  26:   78 f3           brcs    .-34            ; 0x6 <get_time>
  28:   08 95           ret
*/
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: T3sl4co1l on January 14, 2019, 10:47:13 pm
I say answer the OP question as he posted it.  Maybe I want to marry a chicken with a porcupine ,  don't ask why,  if you don't have any approaches then don't reply.

The better lesson -- for OPs and readers alike -- is to recognize that there are better root solutions out there, so don't ask XY Problems.

(I was wondering how long it would be until someone noticed this was an XY problem ;) )

Tim
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: westfw on January 14, 2019, 11:42:33 pm
Rats.  And I  thought I was being so clever with the multiple word sizes...

Quote
>
Quote
If you just use multiplication, good chance the C compiler can figure out everything by itself.
Nope, __mulsi3  is used when we let C compiler for too much
Change the optimization to -O3, and it will produce inline code for the 32bit multiply by 100...I believe that there are gcc-specific pragmas that will allow to to change optimization for a specific function.
Code: [Select]
volatile uint32_t counter;

uint32_t gettime() {
    uint32_t y = counter;
   0:   40 91 00 00     lds     r20, 0x0000     ; 0x800000 <__SREG__+0x7fffc1>
   4:   50 91 00 00     lds     r21, 0x0000     ; 0x800000 <__SREG__+0x7fffc1>
   8:   60 91 00 00     lds     r22, 0x0000     ; 0x800000 <__SREG__+0x7fffc1>
   c:   70 91 00 00     lds     r23, 0x0000     ; 0x800000 <__SREG__+0x7fffc1>
    y *= 100;
  10:   44 0f           add     r20, r20
  12:   55 1f           adc     r21, r21
  14:   66 1f           adc     r22, r22
  16:   77 1f           adc     r23, r23
  18:   44 0f           add     r20, r20
  1a:   55 1f           adc     r21, r21
  1c:   66 1f           adc     r22, r22
  1e:   77 1f           adc     r23, r23
  20:   db 01           movw    r26, r22
  22:   ca 01           movw    r24, r20
  24:   88 0f           add     r24, r24
  26:   99 1f           adc     r25, r25
  28:   aa 1f           adc     r26, r26
  2a:   bb 1f           adc     r27, r27
  2c:   88 0f           add     r24, r24
  2e:   99 1f           adc     r25, r25
  30:   aa 1f           adc     r26, r26
  32:   bb 1f           adc     r27, r27
  34:   48 0f           add     r20, r24
  36:   59 1f           adc     r21, r25
  38:   6a 1f           adc     r22, r26
  3a:   7b 1f           adc     r23, r27
  3c:   db 01           movw    r26, r22
  3e:   ca 01           movw    r24, r20
  40:   88 0f           add     r24, r24
  42:   99 1f           adc     r25, r25
  44:   aa 1f           adc     r26, r26
  46:   bb 1f           adc     r27, r27
  48:   88 0f           add     r24, r24
  4a:   99 1f           adc     r25, r25
  4c:   aa 1f           adc     r26, r26
  4e:   bb 1f           adc     r27, r27
  50:   84 0f           add     r24, r20
  52:   95 1f           adc     r25, r21
  54:   a6 1f           adc     r26, r22
  56:   b7 1f           adc     r27, r23
    y += TCNT0;
  58:   22 b7           in      r18, 0x32       ; 50
    return y;
  5a:   bc 01           movw    r22, r24
  5c:   cd 01           movw    r24, r26
  5e:   62 0f           add     r22, r18
  60:   71 1d           adc     r23, r1
  62:   81 1d           adc     r24, r1
  64:   91 1d           adc     r25, r1
}
  66:   08 95           ret
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 15, 2019, 12:20:07 am
Quote
but I do not like this do-while loop
it only repeats if there is a problem. Most of the time it will not repeat, but if you happen to hit the wrong time, you will get 1 repeat. Hardly a deal killer to get the correct time, and you have to do 'something' about the problem.

Anyway, I've decided to... do not use do-while loop, but instead added correction for time - something like back in time  :popcorn:

(https://www.eevblog.com/forum/microcontrollers/fast-unsigned-integer-multiply-by-x100-on-8bit-avr/?action=dlattach;attach=623758)

Quote
but probably useless since ISR clear this flag during its execution
You cannot make use of that flag if using the isr.
But, I've used this flag as you can see in assembler code above  >:D

Warning: Do not try this code at home - it is not tested yet, but "Patent pending :D"  :o

Update: Bug fixed in assembler listing below  ::)
Code: [Select]
  85                .global avr_time_us_get
  87                avr_time_us_get:
  88                /* prologue: function */
  89                /* frame size = 0 */
  90                /* stack size = 0 */
  91                .L__stack_usage = 0
  92 007c 22B7      in r18,0x32
  93                .L7:
  94 007e 08B6      in __tmp_reg__,0x38
  95 0080 04FC      sbrc __tmp_reg__,4
  96 0082 00C0      rjmp .L7
  97 0084 6091 0000 lds r22,avr_time_counter
  98 0088 7091 0000 lds r23,avr_time_counter+1
  99 008c 8091 0000 lds r24,avr_time_counter+2
 100 0090 9091 0000 lds r25,avr_time_counter+3
 101 0094 32B7      in r19,0x32
 102 0096 3217      cp r19,r18
 103 0098 00F4      brsh .L8
 104 009a 6091 0000 lds r22,avr_time_counter
 105 009e 7091 0000 lds r23,avr_time_counter+1
 106 00a2 8091 0000 lds r24,avr_time_counter+2
 107 00a6 9091 0000 lds r25,avr_time_counter+3
 108 00aa 6456      subi r22,100
 109 00ac 7109      sbc r23,__zero_reg__
 110 00ae 8109      sbc r24,__zero_reg__
 111 00b0 9109      sbc r25,__zero_reg__
 112                .L8:
 113 00b2 620F      add r22,r18
 114 00b4 711D      adc r23,__zero_reg__
 115 00b6 811D      adc r24,__zero_reg__
 116 00b8 911D      adc r25,__zero_reg__
 117 00ba 0895      ret

Thanks for many hints  :-+
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Nominal Animal on January 15, 2019, 01:31:35 am
The better lesson -- for OPs and readers alike -- is to recognize that there are better root solutions out there, so don't ask XY Problems.
Replace "don't ask XY problems" with "describe what you are trying to solve, rather than the problems you are having with your chosen solution to the original problem", and I'll agree.  Otherwise it sounds like you don't want OP and readers alike to ask about their problems.



For what it's worth, I'd start with
Code: [Select]
#include <stdint.h>

static volatile unsigned char avr_timer_pre = 0;
static volatile unsigned char avr_timer_post = 0;
static volatile uint32_t      avr_timer_counter = 0;

#define  AVR_TIMER_STEP  128

void tcnt0_compare_match_isr(void)
{
    avr_timer_pre++;
    avr_timer_counter += AVR_TIMER_STEP;
    avr_timer_post++;
}

uint32_t get_time(void)
{
    uint32_t       counter;
    unsigned char  generation;
    do {
        generation = avr_timer_post;
        counter = avr_timer_counter;
    } while (generation != avr_timer_pre);
    return counter;
}

uint32_t get_time_coarse(void)
{
    uint32_t       counter;
    unsigned char  generation;
    do {
        generation = avr_timer_post;
        counter = avr_timer_counter;
    } while (generation != avr_timer_pre);
    return counter >> 7;
}
where the multiplier is 128 instead of 100. The get_timer() returns the original clock, and get_timer_coarse() the slower clock.

In a real implementation, mark all these functions static inline, so the compiler can inline them in their callsites; while that increases code size, it can cut down on unnecessary register moves.

The avr_timer_pre and avr_timer_post form a generation counter pair.  It assumes the hardware does not reorder normal reads and writes to ram. Any timer modification begins with modifying the pre counter, and completes when modifying the post counter. When reading the timer counter, you start by remembering the post counter, then copy the timer counter. If the pre counter does not match the remembered post counter, reading the timer counter was interrupted by a modification, and you redo the entire operation.  This is essentially a spin lock, where writers are never interrupted, but readers may have to spin. (Readers will only spin when each iteration is interrupted by a timer update; thus at most twice in normal operation.)

With old avr-gcc-4.9.2, using -Wall -Os -mmcu=attiny85, that gets you (omitting directives for simplicity)
Code: [Select]
tcnt0_compare_match_isr:
        lds r24,avr_timer_pre
        subi r24,lo8(-(1))
        sts avr_timer_pre,r24
        lds r24,avr_timer_counter
        lds r25,avr_timer_counter+1
        lds r26,avr_timer_counter+2
        lds r27,avr_timer_counter+3
        subi r24,-128
        sbci r25,-1
        sbci r26,-1
        sbci r27,-1
        sts avr_timer_counter,r24
        sts avr_timer_counter+1,r25
        sts avr_timer_counter+2,r26
        sts avr_timer_counter+3,r27
        lds r24,avr_timer_post
        subi r24,lo8(-(1))
        sts avr_timer_post,r24
        ret

get_time:
.L3:
        lds r25,avr_timer_post
        lds r20,avr_timer_counter
        lds r21,avr_timer_counter+1
        lds r22,avr_timer_counter+2
        lds r23,avr_timer_counter+3
        lds r24,avr_timer_pre
        cpse r25,r24
        rjmp .L3
        movw r24,r22
        movw r22,r20
        ret

get_time_coarse:
.L7:
        lds r25,avr_timer_post
        lds r20,avr_timer_counter
        lds r21,avr_timer_counter+1
        lds r22,avr_timer_counter+2
        lds r23,avr_timer_counter+3
        lds r24,avr_timer_pre
        cpse r25,r24
        rjmp .L7
        movw r24,r22
        movw r22,r20
        ldi r18,7
        1:
        lsr r25
        ror r24
        ror r23
        ror r22
        dec r18
        brne 1b
        ret

Now, let's say you wanted both a fine timer (every tick) and a coarse timer (every 1000th tick), and the division by one thousand is problematic.  If you can accept an additional cost to the interrupt service routine, you can provide both, with zero added cost to readers. (The downside is jitter in the ISR duration; every thousandth one takes twice as long as a normal call.)
Code: [Select]
#include <stdint.h>

static volatile unsigned char avr_timer_pre;
static volatile uint32_t      avr_timer_fine;       /* = AVR_COARSE_STEPS * avr_timer_coarse + avr_timer_step */
static volatile uint32_t      avr_timer_coarse;
static volatile uint16_t      avr_timer_step;
static volatile unsigned char avr_timer_post;

#define  AVR_COARSE_STEPS  1000

void tcnt0_compare_match_isr(void)
{
    uint16_t  step;

    avr_timer_pre++;

    avr_timer_fine++;

    step = avr_timer_step;
    if (step >= AVR_COARSE_STEPS - 1) {
        avr_timer_coarse++;
        avr_timer_step = 0;
    } else {
        avr_timer_step = step + 1;
    }

    avr_timer_post++;
}

uint32_t get_time_coarse(void)
{
    uint32_t       coarse;
    unsigned char  generation;

    do {
        generation = avr_timer_post;
        coarse = avr_timer_coarse;
    } while (generation != avr_timer_pre);

    return coarse;
}

uint32_t get_time_fine(void)
{
    uint32_t       fine;
    unsigned char  generation;

    do {
        generation = avr_timer_post;
        fine = avr_timer_fine;
    } while (generation != avr_timer_pre);

    return fine;
}   
The ISR becomes
Code: [Select]
tcnt0_compare_match_isr:
        lds r24,avr_timer_pre
        subi r24,lo8(-(1))
        sts avr_timer_pre,r24
        lds r24,avr_timer_fine
        lds r25,avr_timer_fine+1
        lds r26,avr_timer_fine+2
        lds r27,avr_timer_fine+3
        adiw r24,1
        adc r26,__zero_reg__
        adc r27,__zero_reg__
        sts avr_timer_fine,r24
        sts avr_timer_fine+1,r25
        sts avr_timer_fine+2,r26
        sts avr_timer_fine+3,r27
        lds r24,avr_timer_step
        lds r25,avr_timer_step+1
        cpi r24,-25
        ldi r18,3
        cpc r25,r18
        brlo .L2
        lds r24,avr_timer_coarse
        lds r25,avr_timer_coarse+1
        lds r26,avr_timer_coarse+2
        lds r27,avr_timer_coarse+3
        adiw r24,1
        adc r26,__zero_reg__
        adc r27,__zero_reg__
        sts avr_timer_coarse,r24
        sts avr_timer_coarse+1,r25
        sts avr_timer_coarse+2,r26
        sts avr_timer_coarse+3,r27
        sts avr_timer_step+1,__zero_reg__
        sts avr_timer_step,__zero_reg__
        rjmp .L3
.L2:
        adiw r24,1
        sts avr_timer_step+1,r25
        sts avr_timer_step,r24
.L3:
        lds r24,avr_timer_post
        subi r24,lo8(-(1))
        sts avr_timer_post,r24
        ret
and if you use AVR_COARSE_STEPS less than 256, you can change avr_timer_step and step variables to unsigned char type, and simplify it even further.

A much more interesting is the case where you want a normal timer tick, but also a slower adjustable/runtime calibrated tick. We can use a 16-bit tick rate, so that the slower tick counter is (ignoring overflows) fast*rate/65536. If you want a /10 slower clock, you can choose between 6553 and 6554, corresponding to 1:10.000916 and 1:9.99939 (or 0.099991 and 0.100006), respectively:
Code: [Select]
#include <stdint.h>

static volatile unsigned char avr_timer_pre;
static volatile uint32_t      avr_timer_fast;
static volatile uint32_t      avr_timer_slow;
static volatile uint16_t      avr_timer_phase;
static          uint16_t      avr_timer_rate;
static volatile unsigned char avr_timer_post;

void tcnt0_compare_match_isr(void)
{
    uint16_t  phase;

    avr_timer_pre++;

    avr_timer_fast++;

    phase = avr_timer_phase;
    phase += avr_timer_rate;
    avr_timer_phase = phase;
    avr_timer_slow += (phase < avr_timer_rate);

    avr_timer_post++;
}

uint32_t get_time_fast(void)
{
    uint32_t       fast;
    unsigned char  generation;

    do {
        generation = avr_timer_post;
        fast = avr_timer_fast;
    } while (generation != avr_timer_pre);

    return fast;
}

uint32_t get_time_slow(void)
{
    uint32_t       slow;
    unsigned char  generation;

    do {
        generation = avr_timer_post;
        slow = avr_timer_slow;
    } while (generation != avr_timer_pre);

    return slow;
}
This yields
Code: [Select]
tcnt0_compare_match_isr:
        lds r24,avr_timer_pre
        subi r24,lo8(-(1))
        sts avr_timer_pre,r24
        lds r24,avr_timer_fast
        lds r25,avr_timer_fast+1
        lds r26,avr_timer_fast+2
        lds r27,avr_timer_fast+3
        adiw r24,1
        adc r26,__zero_reg__
        adc r27,__zero_reg__
        sts avr_timer_fast,r24
        sts avr_timer_fast+1,r25
        sts avr_timer_fast+2,r26
        sts avr_timer_fast+3,r27
        lds r24,avr_timer_phase
        lds r25,avr_timer_phase+1
        sts avr_timer_phase+1,r25
        sts avr_timer_phase,r24
        lds r24,avr_timer_slow
        lds r25,avr_timer_slow+1
        lds r26,avr_timer_slow+2
        lds r27,avr_timer_slow+3
        sts avr_timer_slow,r24
        sts avr_timer_slow+1,r25
        sts avr_timer_slow+2,r26
        sts avr_timer_slow+3,r27
        lds r24,avr_timer_post
        subi r24,lo8(-(1))
        sts avr_timer_post,r24
        ret

get_time_fast:
.L3:
        lds r19,avr_timer_post
        lds r22,avr_timer_fast
        lds r23,avr_timer_fast+1
        lds r24,avr_timer_fast+2
        lds r25,avr_timer_fast+3
        lds r18,avr_timer_pre
        cpse r19,r18
        rjmp .L3
        ret

get_time_slow:
.L7:
        lds r19,avr_timer_post
        lds r22,avr_timer_slow
        lds r23,avr_timer_slow+1
        lds r24,avr_timer_slow+2
        lds r25,avr_timer_slow+3
        lds r18,avr_timer_pre
        cpse r19,r18
        rjmp .L7
        ret
Again, using just a 8-bit rate would still give you plenty of precision, to within 0.2% of the fast rate. For /10, 25 would correspond to 1:10.24 (0.09765625), and 26 to 1:9.846154 (0.1015625), but would simplify the timer interrupt service a bit.

While the ISR does take roughly twice as long as the simple version, it executes the exact same instructions on every call, so it should take the exact same number of cycles each time.  This makes it much easier to check the effects of the ISR latency; no oddball cases where the latency is much higher.

Do note that this is not what I'd necessarily go with, because I haven't used ATtiny85's for anything time critical, and I just whipped up the above code without testing it on actual hardware.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: cv007 on January 15, 2019, 02:17:56 am
Quote
Anyway, I've decided to... do not use do-while loop, but instead added correction for time - something like back in time
So you want to replace 36 bytes of code/14 instructions,  with 52 bytes/22 instructions? You certainly can do whatever you want to. I'm not sure what you are doing with that flag, as you cannot catch it if you are using the compare irq.

I've probably said enough, and you have enough info already.

Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 15, 2019, 12:56:44 pm
I'm not sure what you are doing with that flag, as you cannot catch it if you are using the compare irq.

One can imagine that similar to this example shown below described here: http://maxembedded.com/2011/07/avr-timers-ctc-mode/ (http://maxembedded.com/2011/07/avr-timers-ctc-mode/)
Code: [Select]
   // loop forever
    while(1)
    {
        // check whether the flag bit is set
        // if set, it means that there has been a compare match
        // and the timer has been cleared
        // use this opportunity to toggle the led
        if (TIFR & (1 << OCF1A)) // NOTE: '>=' used instead of '=='
        {
            PORTC ^= (1 << 0); // toggles the led
        }
 
        // wait! we are not done yet!
        // clear the flag bit manually since there is no ISR to execute
        // clear it by writing '1' to it (as per the datasheet)
        TIFR |= (1 << OCF1A);
 
        // yeah, now we are done!
    }
We would like to disable ISR interrupt so simply by adding a few lines of code now I can use the same time function to catch in a loop time differences without waste time in interrupt ISR, when time between time retrieval is less than 100us, so this TIFR OCF0A flag can be very usefull :
Code: [Select]
inline uint32_t avr_time_us_get_inline() {

uint8_t x0 = TCNT0;

uint8_t time_flag_is;
if( time_flag_is= ((TIFR & (1<<OCF0A) )==0 ? 0 : 1 ) ) {
// Manually correct time counter while probably disabled ISR
avr_time_counter+= 100;

// Clear OCF0A flag by writing logic 1
TIFR |= (1<<OCF0A);
} // if
....
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: cv007 on January 15, 2019, 04:45:13 pm
Quote
I've probably said enough
And yet I continue :)

Maybe you already know this, or maybe you already have a truckload of tiny85's you need to use, but I'm sure the attiny family of parts will also have something available (even in 8pins) that has an input capture feature which would make it easier to get a more accurate time at the point of the event (whatever it may be), and with higher resolution. You are finding out why they came up with the input capture feature.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 15, 2019, 06:43:01 pm
You are finding out why they came up with the input capture feature.
Maybe, but for the moment, I've managed to quite easy extend time counter to....
64bit which allows collect  microseconds for...more than 200000 years  >:D
32bit time counter could overflow after hour, but thanks to removeing "volatile" statement,
when  time microseconds needed from last seond or so, assembler code looks much better while loading this 'avr_time_count_max'
variable to uint32_t variable - not needed LDS for hi32(avr_time_count) are not generated, so we are able to make "time traveling" hundreds/thousands  years and still get event timimgs in the magnitude of microseconds when needed  8)

Code: [Select]
#ifdef AVR_TIME_COUNTER_UINT64T
// (%i1) avr_time_count_max: 2^64;
// (%o1)                        18446744073709551616
// (%i2) avr_time_count_max/1000000.0/83376000.0;
// (%o2)                          221247.6500876697  [years]
static uint64_t avr_time_counter;
#else
// (%i3)  avr_time_count_max: 2^32;
// (%o3)                             4294967296
// (%i6) avr_time_count_max/1000000.0/3600;
// (%o6)                          1.193046471111111  [hours]
static uint32_t avr_time_counter;
#endif // AVR_TIME_COUNTER_UINT64T

I hope, that harddware during ISR execution clears timer interrupt flag at the begining, but it is easy to debug, so time to run some code and see logic analyser outputs  :-/O

BTW: It was supprised that there is no left shift with bits number as second parameter on those tiny "RISCy" AVR's  :palm:
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Yansi on January 15, 2019, 11:22:26 pm
I can't get the idea, why would one need such large system tick counter.

Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.

uin16_t millisecond timer is sufficient for 65 seconds. If no time interval used in the application will be larger than that, there is little to no use of making it larger.

It overflows after the minute or so I hear you saying? So what? There's no problem, if you know how to write code correctly, to accommodate for the overflow.

Tip: if writing delays or checking for intervals, use  condition written as this:  (actual_time - time_stamp > delay_interval). This way the wrap-around will work correctly.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: tggzzz on January 15, 2019, 11:47:14 pm
I can't get the idea, why would one need such large system tick counter.

Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.

That rather depends on the processor, of course.

The XMOS xCORE processors have 32 bit timers with a 40s interval - because they are counting instructions in their 100MHz/4000MIPS processors. They also have 16 bit timers on each I/O port counting I/O clock cycles at up to 250MHz. The combination of those timers and the architecuture means they can guarantee exactly when output will occur or when input did occur - and the program responds with a 10ns latency :)
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Yansi on January 16, 2019, 12:35:40 am
I think that does not disprove anything I've stated above.

Any modern 32bit architecture provides means of clock counting.

And BTW, I wouldn't touch XMOS chips with a stick. Blargh.   
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 16, 2019, 12:49:39 am
I can't get the idea, why would one need such large system tick counter.

Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.

That rather depends on the processor, of course.

Not too much room for fancy playing with timers on 8 pin Attiny85 since they are 8bit, so in combination with clocks 8MHz/16MHz timer in CTC mode has compare much at 100-1/200-1 when 8 prescaler used, which means that this way just by cating timer counter TCNT0 time differences we have microsecond time differences in 100us window - 64bit time counter can be used to try catch events longer than 1 hour, but for shorter ime differences even 16bit time counter can be used by using only 2 bytes of longer time counter.

Regardless  of how many bytes we will use to implement longer timer counter than hardware 8bit timers on this device,
still a key is to try synchronize reading extender counter with real hardware TCNT0 timer register if,  we need something bigger than 100us time difference, so to be below milisecond we need another software time couner byte, so 64bit/32bit/16bit or even 48bit timer counter can be implemented in software depending on application.

Since, I've managed howto include in AVR C assembler code to quite easy increment in ISR timer counters of any byte size - eg. 64bit in this example code below, I can define if wanted eg. AVR_TIME_COUNTER_UINT48 for example and do not loose time for 64bits  8)
Code: [Select]
//ISR
...
asm volatile (
"add %[ratc0], %[rinc] \n\t"
"adc %[ratc1], __zero_reg__ \n\t"
"adc %[ratc2], __zero_reg__ \n\t"
"adc %[ratc3], __zero_reg__ \n\t"
#ifdef AVR_TIME_COUNTER_UINT64T
"adc %[ratc4], __zero_reg__ \n\t"
"adc %[ratc5], __zero_reg__ \n\t"
"adc %[ratc6], __zero_reg__ \n\t"
"adc %[ratc7], __zero_reg__ \n\t"
#endif // AVR_TIME_COUNTER_UINT64T
"sts avr_time_counter, %[ratc0] \n\t"
"sts avr_time_counter+1, %[ratc1] \n\t"
"sts avr_time_counter+2, %[ratc2] \n\t"
"sts avr_time_counter+3, %[ratc3] \n\t"
#ifdef AVR_TIME_COUNTER_UINT64T
"sts avr_time_counter+4, %[ratc4] \n\t"
"sts avr_time_counter+5, %[ratc5] \n\t"
"sts avr_time_counter+6, %[ratc6] \n\t"
"sts avr_time_counter+7, %[ratc7] \n\t"
#endif // AVR_TIME_COUNTER_UINT64T
:
:
[rinc]"r"(rinc),
[ratc0]"a"(*(patc+0) ), [ratc1]"a"(*(patc+1)),[ratc2]"a"(*(patc+2)),[ratc3]"a"(*(patc+3))
#ifdef AVR_TIME_COUNTER_UINT64T
,[ratc4]"a"(*(patc+4)), [ratc5]"a"(*(patc+5)),[ratc6]"a"(*(patc+6)),[ratc7]"a"(*(patc+7))
#endif // AVR_TIME_COUNTER_UINT64T
);
...
reti
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: NorthGuy on January 16, 2019, 01:48:47 am
Regardless  of how many bytes we will use to implement longer timer counter than hardware 8bit timers on this device,
still a key is to try synchronize reading extender counter with real hardware TCNT0 timer register if,  we need something bigger than 100us time difference, so to be below milisecond we need another software time couner byte, so 64bit/32bit/16bit or even 48bit timer counter can be implemented in software depending on application.

Modern PIC16s let you concatenate hardware timers, so you can create a big timer in hardware. Not sure about 64-bit though.

Everything is related to time. You may have tasks which need very fine time resolution. Your code also executes in time. Therefore, if you want fine time resolution, you either need to count every cycle of your code, or you will need to use something which is not affected by code execution timing, such as CPP modules.

When you need to measure big periods of time, such as days, high resolution cannot be achieved, and even if it could, you usually don't need it for the tasks which measure time in days. Say, if you want to make backup every day, it doesn't matter if it is few minutes earlier or late.

Therefore, it is silly to use one single timer for everything. Chips will usually have multiple timers which you can use for different purposes.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Nominal Animal on January 16, 2019, 05:38:34 am
Here is a proper suggestion for ATtiny85, for a 32-bit TIMER0 counter:
Code: [Select]
#include <stdint.h>

static volatile uint8_t   avr_timer_updates[2];
static volatile uint32_t  avr_timer_counter;

extern uint32_t get_timer(void);
extern uint32_t get_timer_coarse(void);
with the TIMER0 overflow interrupt, get_timer(), and get_timer_coarse() functions implemented in assembly in asm-timer0.s:
Code: [Select]
        .file "asm-timer0.s"
        ; SPDX-License-Identifier: CC0-1.0

        __SP_H__ = 0x3e
        __SP_L__ = 0x3d
        __SREG__ = 0x3f
        __tmp_reg__ = 0
        __zero_reg__ = 1

        .text

        ;
        ; timer0 overflow interrupt vector
        ;
        .global __vector_5
        .type   __vector_5, @function
__vector_5:
        ; ISR prolog
        push    r1
        push    r0
        in      r0, __SREG__
        push    r0
        clr     __zero_reg__

        push    r20
        push    r19
        push    r18

        lds     r20, avr_timer_updates+1
        inc     r20
        sts     avr_timer_updates+1, r20

        in      r19, 0x29                   ; OCR0A
        lds     r18, avr_timer_counter+0
        add     r18, r19
        sts     avr_timer_counter+0, r18
        brcc    .done

        lds     r18, avr_timer_counter+1
        inc     r18
        sts     avr_timer_counter+1, r18
        brne    .done

        lds     r18, avr_timer_counter+2
        inc     r18
        sts     avr_timer_counter+2, r18
        brne    .done

        lds     r18, avr_timer_counter+3
        inc     r18
        sts     avr_timer_counter+3, r18

.done:
        sts     avr_timer_updates+0, r20
        pop     r18
        pop     r19
        pop     r20

        ; ISR epilog
        pop     r0
        out     __SREG__, r0
        pop     r0
        pop     r1
        reti

        .size   __vector_5, .-__vector_5


        .global get_timer
        .type   get_timer, @function
get_timer:
        lds     r21, avr_timer_updates+0

        lds     r22, avr_timer_counter+0
        lds     r23, avr_timer_counter+1
        lds     r24, avr_timer_counter+2
        lds     r25, avr_timer_counter+3
        in      r18, 0x32                   ; r18 = TCNT0
        in      r19, 0x38                   ; r19 = TIFR

        sbrc    r19, 1                      ; TOV0
        rjmp    get_timer

        lds     r20, avr_timer_updates+1
        cpse    r20, r21
        rjmp    get_timer

        add     r22, r18
        adc     r23, __zero_reg__
        adc     r24, __zero_reg__
        adc     r25, __zero_reg__
        ret

        .size   get_timer, .-get_timer


        .global get_timer_coarse
        .type   get_timer_coarse, @function
get_timer_coarse:
        lds     r21, avr_timer_updates+0

        lds     r22, avr_timer_counter+0
        lds     r23, avr_timer_counter+1
        lds     r24, avr_timer_counter+2
        lds     r25, avr_timer_counter+3

        lds     r20, avr_timer_updates+1
        cpse    r20, r21
        rjmp    get_timer
        ret

        .size   get_timer_coarse, .-get_timer_coarse

        .comm   avr_timer_counter,4,1
        .comm   avr_timer_updates,2,1
Just feed that asm-timer0.s file to avr-gcc as if it was a C file.  Not tested on actual ATtiny85 hardware, but it does compile using old avr-gcc-4.9.2 (avr-gcc-4.9.2 -Wall -mmcu=attiny85 -c asm-timer0.s), and the logic is sound, but do beware of bugs.

The idea is that whenever an overflow interrupt occurs, the avr_timer_counter value is incremented by OCR0A. The second of the avr_timer_updates[2] bytes is incremented before the counter is incremented, and the first after the counter is incremented, so that readers can spin if an interrupt occurs.

The get_timer() function adds TCNT0 to the counter value, so the result is essentially the 32-bit TIMER0 virtual counter. It uses both the avr_timer_updates[2] guard bytes, and TOV0 bit in TIFR to detect if the combined 32-bit counter is valid.  If interrupts occur too often, it might spin forever; so test before use.

The get_timer_coarse() function omits the TCNT0 and TOV0 bit checks, and so is more lightweight, although the value is coarser.  Although the timers are derived from the same source, you should not mix the values, unless you are prepared for get_timer_coarse() < get_timer() even if obtained at the very same moment somehow.

The timer ISR itself is a bit tricky, as (256/OCR0A) of ticks only take 25 instructions (I didn't bother to calculate cycle counts), and uses only six bytes of stack.  Of the other cases, it takes 29, 33, or 36 instructions.  If the jitter a variable-duration TIMER0 ISR is problematic, the code can be changed to fixed 33 instructions instead (no jumps nor conditional jumps), using
Code: [Select]
        ;
        ; timer0 overflow interrupt vector
        ;
        .global __vector_5
        .type   __vector_5, @function
__vector_5:
        ; ISR prolog
        push    r1
        push    r0
        in      r0, __SREG__
        push    r0
        clr     __zero_reg__

        push    r20
        push    r19
        push    r18

        lds     r20, avr_timer_updates+1
        inc     r20
        sts     avr_timer_updates+1, r20

        in      r19, 0x29                   ; OCR0A
        lds     r18, avr_timer_counter+0
        add     r18, r19
        sts     avr_timer_counter+0, r18

        lds     r18, avr_timer_counter+1
        adc     r18, __zero_reg__
        sts     avr_timer_counter+1, r18

        lds     r18, avr_timer_counter+2
        adc     r18, __zero_reg__
        sts     avr_timer_counter+2, r18

        lds     r18, avr_timer_counter+3
        adc     r18, __zero_reg__
        sts     avr_timer_counter+3, r18

        sts     avr_timer_updates+0, r20
        pop     r18
        pop     r19
        pop     r20

        ; ISR epilog
        pop     r0
        out     __SREG__, r0
        pop     r0
        pop     r1
        reti

        .size   __vector_5, .-__vector_5
I have no idea which one performs better in practice.

Note that lds+adc+sts pattern that uses only one register to update all bytes in a multibyte integer works, because neither lds nor sts modify the carry flag; only adc does.  I think avr-gcc only uses N registers for N-byte integers, because that way the externally visible change occurs in one short window, shortening race windows.  In my case, using the two generation/update counters and spinning until the match avoid any need for that.  In an ISR, it is useful because it lessens the amount of stack used.  Probably could reduce the stack use even more, but I just grabbed the ISR prolog and epilog from what avr-gcc generates for ISR(TIMER0_OVF_vect) { ... } when using #include <avr/interrupt.h>.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 16, 2019, 09:25:18 am
Note that lds+adc+sts pattern that uses only one register to update all bytes in a multibyte integer works, because neither lds nor sts modify the carry flag; only adc does.  I think avr-gcc only uses N registers for N-byte integers, because that way the externally visible change occurs in one short window, shortening race windows.
I was thinking about lds+adc+sts pattern, but still learning howto include assembler code inside AVR C code and strugled to manage pass "avr_time_counter" to get given byte from multibyte uint32_t ot uint64_t to register,
so for the moment assembler listing of ISR looks like this and of course the less "push/pop" stack operations the better since it costs 2 cycles for each.
Code: [Select]
// ISR
...
  51 0034 E0E0      ldi r30,lo8(avr_time_counter)
  52 0036 F0E0      ldi r31,hi8(avr_time_counter)
  53 0038 2081      ld r18,Z
  54 003a 3181      ldd r19,Z+1
  55 003c 4281      ldd r20,Z+2
  56 003e 5381      ldd r21,Z+3
  57 0040 6481      ldd r22,Z+4
  58 0042 7581      ldd r23,Z+5
  59 0044 1681      ldd r17,Z+6
  60 0046 0781      ldd r16,Z+7
  61 0048 84E6      ldi r24,lo8(100)
  62                /* #APP */
  63                ;  277 "avr_utils.c" 1
  64 004a 280F      add r18, r24
  65 004c 311D      adc r19, __zero_reg__
  66 004e 411D      adc r20, __zero_reg__
  67 0050 511D      adc r21, __zero_reg__
  68 0052 611D      adc r22, __zero_reg__
  69 0054 711D      adc r23, __zero_reg__
  70 0056 111D      adc r17, __zero_reg__
  71 0058 011D      adc r16, __zero_reg__
...

Note, that those ldi/ld/ldd  inline assembler code was automatically added by AVR C compiler,
when I've those registers like shown a few posts above:
Code: [Select]
...
asm volatile(....:: [ratc0]"a"(*(patc+0) ), [ratc1]"a"(*(patc+1)),[ratc2]"a"(*(patc+2)),[ratc3]"a"(*(patc+3)),... );
...
Not sure howto implement in inline assembler something like this:
Code: [Select]
asm volatile( "lds %[ratcX], avr_time_counter+X \n\t sts avr_time_counter, %[ratcX] " :???:??? );
where "ratcx" in Nx byte from N byte time counter  :-//

Update: Never mind - I've found answer howto use LDS in inline AVR C assebler here: https://www.avrfreaks.net/forum/inline-asm-3 (https://www.avrfreaks.net/forum/inline-asm-3)
Code: [Select]
static inline void   test ( uint8_t ok );
uint8_t  input = 100;

static inline void   test ( uint8_t ok ){

asm volatile(
           "\n\t"
       "lds %[ok], input"   "\n\t"
           : [ok] "=d" ( ok )
           :
       );
}
Futher optimisations to timer ISR can be made if needed easy now to support any multibyte time counter  :popcorn:

Anyway, it is yet another optimisation possible, but I believe in that TCNT0 should be enougth to make decent guard and more - I'm more interest in timing differences when time retrieval function was called, so the only way to do so is I think is to use TCNT0 when this time get function is called, so any other updates to TCNT0 in hardware while processing additional code in get time function, especially reading TCNT0 and adding changed TCNT0 values to software time counter, while time is still running is not too good, so in my implementation in the case when get time function might be hit while processing code after TCNT0 was read at the begining by ISR handling software time counter (whatever size in bytes it has) I simply try to correct this - when ISR finishes we have updated timer counter and TCNT0 starting count from 0 again in timer CTC mode, so we might have quite decent time, but it may be different (longer) than when we called get time function, since you are in do-while loop while waiting for ISR to complete  ???

That is why I've marked in one of those assembler listings of ISR "Patent pending :D", just for fun, to send a message that something else is going there in my get time implementation since this is what I'm interested in to achieve with extended software time counter and real time TCNT0 update by timer hardware in microsecond intervals on 8MHz Attiny85 @ 10kHz timer ISR  :)
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 16, 2019, 01:20:05 pm
48bit time counter ISR with 100 increment for 8MHz @ 10kHz timer now looks good, while written in AVR C inline assembler - only one register r24 needed to push/pop and temporary r0 as well as  zero register r1  8)
Time counter stored as "uint64_t", but in ISR only 6 bytes incremented which is enougth even while storing microseconds there for more than a year:
Code: [Select]
// (%i2) avr_time_count_max: 2^48;
// (%o2)                           281474976710656
// (%i3) avr_time_count_max/1000000.0/83376000.0;
// (%o3)                          3.375971223261562 [years]

(https://www.eevblog.com/forum/microcontrollers/fast-unsigned-integer-multiply-by-x100-on-8bit-avr/?action=dlattach;attach=624871)
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: mikerj on January 16, 2019, 01:36:22 pm
I can't get the idea, why would one need such large system tick counter.

Typically, it is just used for timing of tasks, probably some coarse interval measurements and such. There is no need to make it for 2000000 years long, unless you want to measure time intervals of such size.

uin16_t millisecond timer is sufficient for 65 seconds. If no time interval used in the application will be larger than that, there is little to no use of making it larger.

It overflows after the minute or so I hear you saying? So what? There's no problem, if you know how to write code correctly, to accommodate for the overflow.

Tip: if writing delays or checking for intervals, use  condition written as this:  (actual_time - time_stamp > delay_interval). This way the wrap-around will work correctly.


Not actually true for a free running timer, which is what you are describing in the last sentence.  The longest time you can cover from a 16 bit free running timer incrementing a 1ms interval is 32.7 seconds.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Yansi on January 16, 2019, 02:45:10 pm
Why so?

If I make interval > 1/2 of the range, then it still wraps correctly.

For example having a 3bit counter, I can still use that to produce interval over 3, say for example 5.

So for example if the timestamp is 2, then at time of 7 i get the difference of 5, at time of  0 I get 6, works correctly.

If I pick another timestamp, for example 5, then at time of 2 I get the difference of 5, at time of 3, I get 6.

So where do you see the problem? cause I don]t.





Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Leiothrix on January 16, 2019, 10:23:40 pm
The longest time you can cover from a 16 bit free running timer incrementing a 1ms interval is 32.7 seconds.

No, 65.5 seconds.  You'd use an unsigned int to hold the counter, not signed. 
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 18, 2019, 11:01:16 pm
For short time differences lower than 65ms with ISR disabled, but timer incrementing in hardware TCNT0 each 1us, this code
Code: [Select]
inline uint16_t avr_time_us_fast_inline(uint16_t *time_us_prev, uint8_t *t0_prev  ) {
uint8_t t0 = TCNT0;

// static uint8_t t0_prev = 0;

// static uint16_t time_us_prev= 0;


if(t0<(*t0_prev) ) {
(*time_us_prev)+= 100;
}


uint16_t time_us= (*time_us_prev) + t0;


(*time_us_prev)= time_us;
(*t0_prev)= t0;

return time_us;

}

 for fast timing retrieval optimized by AVR C looks very good, since there is no memory acceses despite we have pointers in function defined  8)
Below example where this function was used and in AVR C assemler listing only memory accesses to volatile variables are generated using LDS/STS - no memory acces for function pointers since only registers used in optimized code
Code: [Select]
// forever init
volatile uint8_t time_fast_is= 0;
volatile uint8_t time_fast_is_copy= 1;
volatile uint8_t time_fast_is_copy2= 2;

uint8_t t0_prev = 0;
uint16_t time_us_prev= 0;

uint16_t time_us_fast_prev= avr_time_us_fast_inline(&time_us_prev,&t0_prev  );

// forever
for(;;) {
uint16_t time_us_fast= avr_time_us_fast_inline(&time_us_prev,&t0_prev  );


uint16_t dtime_us_fast= time_us_fast - time_us_fast_prev;

if(dtime_us_fast>0 ) {
time_fast_is= 1;
} else {
time_fast_is= 0;
}

time_us_fast_prev= time_us_fast;

time_fast_is_copy= time_fast_is;
time_fast_is_copy2= time_fast_is_copy;
} // forever


Assembler listing of section above with inline avr_time_us_fast_inline function optimized:
Code: [Select]
214 0116 1B82      std Y+3,__zero_reg__
 215 0118 81E0      ldi r24,lo8(1)
 216 011a 8A83      std Y+2,r24
 217 011c 82E0      ldi r24,lo8(2)
 218 011e 8983      std Y+1,r24
 219 0120 42B7      in r20,0x32
 220 0122 242F      mov r18,r20
 221 0124 30E0      ldi r19,0
 222 0126 61E0      ldi r22,lo8(1)
 223                .L14:
 224 0128 52B7      in r21,0x32
 225 012a C901      movw r24,r18
 226 012c 5417      cp r21,r20
 227 012e 00F4      brsh .L11
 228 0130 8C59      subi r24,-100
 229 0132 9F4F      sbci r25,-1
 230                .L11:
 231 0134 850F      add r24,r21
 232 0136 911D      adc r25,__zero_reg__
 233 0138 2817      cp r18,r24
 234 013a 3907      cpc r19,r25
 235 013c 01F0      breq .L12
 236 013e 6B83      std Y+3,r22
 237                .L13:
 238 0140 2B81      ldd r18,Y+3
 239 0142 2A83      std Y+2,r18
 240 0144 2A81      ldd r18,Y+2
 241 0146 2983      std Y+1,r18
 242 0148 452F      mov r20,r21
 243 014a 9C01      movw r18,r24
 244 014c 00C0      rjmp .L14

I haven't got time to test this on real MPU, but assembler code looks very fast, so in "brute force" loop with ISR disabled not sure if we can get 1us in test "forever" loop, but it should be very fast - for example waiting for pin change time difference ...

Of course uint16_t time clock used here will overflow after 65ms, but it should be sometimes enougth time for eg. reading packet, etc...
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: NorthGuy on January 18, 2019, 11:42:48 pm
For short time differences lower than 65ms with ISR disabled, but timer incrementing in hardware TCNT0 each 1us, this code ...

appears to be too complex. Say, with timer overflowing at 256 you can do:

Code: [Select]
typedef union {
  struct {
    uint8_t time_low;
    uint8_t time_high;
  };
  uint16_t time;
} time_t;

uint16_t get_time() {
  static time_t cur_time;
  uint8_t t;
 
  if (cur_time.time_low > (t = TCNT0)) {
    cur_time.time_high ++;
  }
  cur_time.time_low = t;
 
  return cur_time.time;
}

which does exactly the same as yours, although I don't think I would do this in real life. The necessity of calling it very often is too restrictive.

Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on January 20, 2019, 06:23:24 pm
Say, with timer overflowing at 256 you can do:
...
which does exactly the same as yours, although I don't think I would do this in real life. The necessity of calling it very often is too restrictive.
My timer in CTC mode start from 0 after reaching 99, so it is not the same, however when you look into assembler code increment by one (1) looks similar to when 100 is added instead, but I do not have to call too often this function, since when I know that eg. pin change interrupt while decoding some input bits stream  is below 100us than it is sufficeint to get time during pin change interrupt and store to calculate differences, so it can be usefull sometimes.

It is always worth to see how assembler code looks like to ensure we are not loosing too many time, because of those experiments showed that by playing with different C code static/inline hints sometimes leads to interesting low level code generated, eg. in your case for this variable
Code: [Select]
  static time_t cur_time;
will not be optimized by using only registers and probably you will have LDS/STS instructions to read/store this variable in generated code, as well as difficult to reinitialize inside "get_time" function- that is why I pass those variables as pointers,
which seams in "inline" version that are optimized and registers are used without LDS/STS instructions in your case probably ;)
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: NorthGuy on January 20, 2019, 08:47:22 pm
My timer in CTC mode start from 0 after reaching 99

Make it roll after 255. Are you trying to make your life more difficult on purpose?

in your case for this variable
Code: [Select]
  static time_t cur_time;
will not be optimized by using only registers

Of course not. It is a long-term variable holding the time.

There's no reason to speculate about the assembler. Just compile and post.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: rhb on January 20, 2019, 10:35:05 pm
I've not followed or read this thread, but skimming over it I thought this might prove useful.


From "Hacker's Delight" by Henry S. Warren, Jr.

Edit:

100x = (32x - 8x + x) * 4

3 shifts and 2 adds
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: westfw on January 21, 2019, 06:07:14 am
Quote
100x = (32x - 8x + x) * 4
3 shifts and 2 adds
The original post had 3 shifts and 4 adds - theoretically only slightly worse.
The problem is that a 32bit shift is not particularly "inexpensive" on an AVR (which takes at least three instructions to shift 32bits by one position), and the original effort wouldn't factor the shifts to notice that 32x = 4*8x, or equiv.)

Since then, much of the discussion has been about how to re-think the overall program so that you never need to multiply by 100 in the first place.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: rhb on January 21, 2019, 11:35:37 am
All I'd  meant to do was post the section from "Hacker's Delight"  which is full of obscure tricks. 

Unfortunately I read enough while scanning it  that the edit popped into my head after I got in bed.  I knew it was not going  to leave me alone unless I added the edit.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: splin on January 22, 2019, 06:11:18 am

Anyway, I've decided to... do not use do-while loop, but instead added correction for time - something like back in time  :popcorn:

(https://www.eevblog.com/forum/microcontrollers/fast-unsigned-integer-multiply-by-x100-on-8bit-avr/?action=dlattach;attach=623758)

Quote
but probably useless since ISR clear this flag during its execution
You cannot make use of that flag if using the isr.

But, I've used this flag as you can see in assembler code above  >:D

Warning: Do not try this code at home - it is not tested yet, but "Patent pending :D"  :o

I wouldn't bother with that patent as it doesn't work  :--

It seems that what you are trying to do is:

Quote
When added at the begining of avr_time_us_get() wait for OCF0A cleared in TIFR by ISR during its execution, since OCF0A is set when TCNT0 is reset to 0 in CTC

Ok, so what you need is to wait (for up to 100us) until you detect that OCFOA has changed from set to cleared at which point you can be sure that the ISR has just executed and you can safely read avr_time_counter knowing that the ISR won't run whilst you're doing so. But as you have already surmised OCFOA will only be active for a very short time (unless you have interrupts disabled - which I don't believe you do) - specifically from the moment the timer rolls over, until the ISR starts.

The exact timings might be published in some application note but it doesn't matter - your non-interrupt code probably won't ever see it and then only if it happens to read the TIFR register within that very short time window. Depending on how the MPU is designed there is a possibility that your code could *never* see the OFCOA flag set because it is only set, within each 1us processor clock cycle, at a point *after* the 'in Rx, 0x38' instruction actually reads the flag; at the end of that instruction the ISR will execute, resetting the flag.

If you really want to wait until just after the ISR has executed then have a do while() loop waiting for TCNT0 to change. But I'm pretty sure that isn't what you want as it would waste far too much time - an average of 50us for each call to avr_time_counter(). In your code you check the OFCOA flag and if it's clear you go on to read the 4 bytes of avr_time_counter - but the ISR can occur at any point after your check of OFCOA including part way theough reading avr_time_counter.

cv007 had the solution - re-read avr_time_counter if TCNT0 has rolled over (but it doesn't need to be in a loop unless you have an ISR that can take 100us or more).
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Doctorandus_P on January 26, 2019, 10:07:07 am
If you need some performance, then use a microcontroller that has MUL instructions, or at least a barrel shifter.
If you don't need the performance then why bother?

It all looks like some silly academic excersize to me.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Nominal Animal on January 26, 2019, 01:28:04 pm
If you need some performance, then use a microcontroller that has MUL instructions, or at least a barrel shifter.
If you don't need the performance then why bother?
For the same reason we don't live on the Savannah anymore, clubbing animals on the head, and making single-use tools by knapping flint.

"Just throw money at it!" is not the best option, it is just the easiest one, and one that any monkey with fistfuls of cash and no real skills can do.

In cases like this, where the microcontroller has the necessary performance, but the designer is having difficulty utilizing that, this "academic exercise" has two purposes: One is to save money and resources by using the cheaper hardware, the second is to become a better designer/developer for personal and business reasons by learning how to utilize the microcontroller to its full potential.  To me, that makes perfect business sense, and worth the bother.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: SiliconWizard on January 27, 2019, 08:03:06 pm
Knowing how to use fixed-point or fully integer calculations instead of resorting to floating-point is not just useful for performance reasons on low-end parts.
It's an essential skill every time you need to control the precision of calculations at every stage, something that is much harder (sometimes impossible) to guarantee with floating-point.
It's also an essential skill to actually understand how to properly use floating-point!
Oh, and it's also an essential skill if you do digital design. It's so useful that's it's very far from being an academic exercise only.

Lastly, only if you have that skill can you actually judge whether/or when it's appropriate to use it or not.

I know they say ignorance is bliss, but it's certainly not an engineer's best friend.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Doctorandus_P on January 28, 2019, 05:05:06 pm
It seems that you are also simply assuming that the undisclosed "8bit AVR" that OP is using is cheaper than some other 32bit ARM (or other) processor.

There is hardly any correlation between processing speed and price in small microcontrollers nowaday's.

I haven't even seen any evidence that price is a concern in this thread.
These "academic excersizes" can be (are) usefull to jog the brain and improve programming skills. That is exactly what academic excersizes are for.

But I see my fault now.
I should not have used the word "silly", that was a fart of the moment and I apologise for that.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Nominal Animal on January 28, 2019, 08:19:07 pm
There is hardly any correlation between processing speed and price in small microcontrollers nowaday's.
True, but I'd still claim it is a good idea to make sure you can utilize each microcontrollers' subsystems to their fullest extent.

I *am* assuming ATtiny85 here. They happen to be quite cheap, but interestingly powerful microcontrollers. I personally like the original DigiSpark approach, where the less-than-square-inch PCB itself acts as a full-sized USB connector.  There are only a couple of I/O pins, but it opens up a large number of options.  I do normally use ATmega32u4 for USB 1.1 stuff, and various ARM Cortex microcontrollers (esp. Cortex-M4F) for anything that needs any computational oomph.

If it isn't the exact one OP is using, it seems to be darned close; close enough for the discussion to make sense.



In this particular case, we can throw away the stated question (fast integer multiply by 100), and look at the underlying problem OP is working on.

The microcontroller has an 8-bit timer/counter, that is used for a regular timer tick. (There are two other timers, one of them 16-bit if I recall correctly offhand, but in many cases you want to use them for something more important.)

The idea OP has, is to use the actual timer (at or around instruction clock frequency) value as a timer, with the overflow counter just updating the extra bits.  The problem is that to use a decimal rate (some power of ten cycles per second), the 8-bit timer must wrap around at 100 or 200.  Not all use cases need the exact timer, and for best bang for buck and largest number of options, one would prefer to have both a fine (down to timer/counter step) and coarse (just number of overflows) timer values, especially if coarse timer is cheaper to read.

This is not a complicated problem, and definitely does not warrant using a 16- or 32-bit microcontroller.

My suggestion above is the absolute overengineered one, that shows how to do it with zero issues as to cases when the reader is interrupted by the timer overflow itself.  It involves using generation counters, which are the most basic form of spinlocks (although mine is for a single writer and multiple readers only), and is easily formally proven to result in at most two iterations if the timer overflows occur at intervals greater than a few dozen cycles, and allow "atomic" snapshots of both the timer counter and the overflow counter as a single unsigned integer value.  Essentially, you can make it a cycle counter on ATtiny85 if you want, if you write the ISR in assembly.  The fine counter is incremented by the counter limit, so that the full cycle counter value is just a sum of the timer/counter and the overflow count. The coarse counter value is incremented by one.  You can implement either a minimum average cycle version (by only incrementing the least significan bytes when they do not overflow), or a fixed-duration version (whose latency effects are trivial to note an measure at run time, and being absolutely regular, are easy to take into account).

If this was a single one-off product being implemented for a paying customer, I'd agree with Doctorandus_P: then, switching to a more powerful microcontroller gives you much more leeway, and simply Makes Sense.

However, I don't think people like to discuss any single one-off products they're working on here.  So, I am working on the assumption that this is a prototype, or for-learning experimentation.  For that, just discovering the generation counters and using a single ISR to provide multiple clocks running at different rates, makes this thread worthwhile.

Not seeing the value in doing this, and instead recommending using a more powerful microcontroller, is alarming to me.
(This also means my argument may look too aggressive.  Just remember I am arguing against your argument, and not you as a person.)



I also do not think "academic exercise" is anything anyone should use in any derisive manner, ever.

Our technology is advancing at a tremendous pace, and software engineering and programming languages are not keeping up.  This leads to a prevalent belief that we already know everything there is to know about software engineering, and rather than waste time with "academic exercises", the proper cost-effective solution is to throw more hardware at it.

This is simply not true.  I have seen this personally in the HPC world.  Simply put, aside from support of new, much more performant hardware, no real advances have been made in the HPC software engineering side.  Almost all simulations still use process distribution, distribute data only when not doing computation, and avoid threading models, simply because they're too hard -- or "not cost-effective to teach the developers to do", as I've been told.  Data mining, expert systems, and "AI" are nothing new; even self-organizing maps were pretty well known by 1980s.  It's just that now we have the hardware to collect and process the vast amounts of data at timescales that allow even relatively crappy implementations (compared to biological ones!) produce "miraculous" results.

The exact same happened in the automotive world in the United States in the last fifty years or so, when the fuel consumption of a typical car grew, not shrunk, because it was not thought of as important.  It is even funnier to think that the typical "grocery bag" car in the early 1900s in New York was an electric car.  If you've ever read Donald Duck comics, the car Grandma Duck uses is a Detroit Electric from 1916.  Yet, somehow, electric cars are somehow thought of as a new innovation.  (The battery technologies and some of the materials tech is, but not the electric car concept, not in the least.)  We would have had cheap home 3D printers in the late 1980s, early 1990s at the latest, if it were not for certain patents that were mainly used to protect existing plastics manufacturing methods from competition.

I cannot stress enough how important it is to not let engineers and designers to rely on the hardware improvements to keep their work relevant, and stop learning.  It just isn't good for anyone in the long term.  It is a seductive option, because it makes your work easier in the short term;  but in the long term, the side effects make it a poor choice.

To everyone with degrees, I'd remind them that their studies were to prepare them for the real work, the real learning.  The only way you can belittle "academic exercises" is that if you do your work right, develop your skills to the fullest, you'll do much harder work and learn more every day than you ever did in academia.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: tggzzz on January 28, 2019, 10:13:31 pm
Those are very sensible points about "academic exercises", and HPC, and the necessity of academic studies.

The point about academic exercises is that they should enable the key point to be considered, without it getting lost in a morass of boring irrelevant stuff. The lessons learned should then be applicable to far more than merely a single example problem.

Much tech knowledge has a half-life of a few years, e.g. which button to press to get the frobnitz to kazump when the moon is in the third quarter. The understanding gained from academic exercises lasts a lifetime.

I recently returned to embedded software and electronics after a couple of decades doing other things. I was both delighted and horrified at how little had changed since the 1980s - the experience I gained 30 years ago was still directly relevant, so I slotted back in within a couple of weeks!

The major changes at the speed/resolution of ADCs and DACs, nanopower electronics, ease of making PCB,  and that things were smaller faster and cheaper. But all the fundamentals and pinch points were horrifyingly unchanged.

(Well, there are a few glimmers of hope, e.g. the capabilities of XMOS xCORE processors with xC, but even they are familiar from the 80s and 70s, respectively)
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: westfw on January 29, 2019, 07:48:34 am
Quote
Simply put, aside from support of new, much more performant hardware, no real advances have been made in the HPC software engineering side.
I'm not quite sure what "HPC" is supposed to mean in this context.  But I think it's a major mistake to omit
"massive performance increases" from the "significant advances" column.

Quote
We would have had cheap home 3D printers in the late 1980s, early 1990s at the latest
An interesting example.  I wonder at your definition of "cheap", given that "low cost" dot matrix printers from that era were $500 to $1000, and the Mac IIci I bought in that timeframe was about $7k (~1MB RAM, 100MB disk, built-in 640*480 graphics. color monitor.)   While the technology of the day would have supported the "several stepper motors and a heater" sort of 3D printer, I think I'll claim that the CAD software needed to effectively drive such a printer would have been essentially impossible in that timeframe (on any "reasonable" (but not "low cost"!) home computer.  I mean: no "windows" yet; VGA graphics was "new"; a typical PC had a 16 to 20MHz CPU with 1 to 4MB of RAM.
(Although I did find an ad in a 1988 Byte Magazine for "DesignCad 3D that claimed to work "even on EGA graphics."https://www.americanradiohistory.com/Archive-Byte/80s/Byte-1988-04.pdf)

Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: westfw on January 29, 2019, 08:09:56 am
Heh.  A whole article on 3D CAD from a 1988 PC Magazine!  "If you're going to be spending $3k for software, you should certainly have at least 640k of RAM, and you might want to spring the extra $3k for one of those new 1024*768 color monitors!"
https://books.google.com/books?id=ObYblXvjuhUC&lpg=PA121&ots=atgG2szPEd&dq=designcad%203d%201988&pg=PA115#v=onepage&q&f=false (https://books.google.com/books?id=ObYblXvjuhUC&lpg=PA121&ots=atgG2szPEd&dq=designcad%203d%201988&pg=PA115#v=onepage&q&f=false)
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: NorthGuy on January 29, 2019, 02:08:16 pm
Heh.  A whole article on 3D CAD from a 1988 PC Magazine!  "If you're going to be spending $3k for software, you should certainly have at least 640k of RAM, and you might want to spring the extra $3k for one of those new 1024*768 color monitors!"
https://books.google.com/books?id=ObYblXvjuhUC&lpg=PA121&ots=atgG2szPEd&dq=designcad%203d%201988&pg=PA115#v=onepage&q&f=false (https://books.google.com/books?id=ObYblXvjuhUC&lpg=PA121&ots=atgG2szPEd&dq=designcad%203d%201988&pg=PA115#v=onepage&q&f=false)

Interesting. The resolution of 1080p monitor (which I'm looked at right now) is not that different - only 2.6 times more, but the CPU frequency is 500 times, and the memory is 25000 times bigger. If they could make 3D CAD back then, it should simply fly now.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: T3sl4co1l on January 29, 2019, 03:57:09 pm
And just to bring things a little back to topic here -- an AVR isn't much different in terms of raw computing power, versus a PC-compatible of the day (if certainly not one of the better workstations that you'd want to be running CAD on!).

The main difference is, programming it that way is a pain, and you have to add a ton of peripherals to support that kind of functionality.

Namely: with little SRAM, you need to treat it as a cache against external SRAM and Flash.  Probably the same goes for program memory as well.  Flash can be rewritten live, but it is a wear item, so that wouldn't be such a great idea; more likely, you'd implement a rich operating system, and run programs from external memory as an interpreted virtual machine.

And now that I've speculated about a thoroughly unpleasant system to develop for and use, let's just grab an STM32F4, stick an LCD on it, USB hub, external DRAM and Flash, and run Linux instead. ;-DD

Tim
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Nominal Animal on January 29, 2019, 04:52:20 pm
Quote
Simply put, aside from support of new, much more performant hardware, no real advances have been made in the HPC software engineering side.
I'm not quite sure what "HPC" is supposed to mean in this context.  But I think it's a major mistake to omit
"massive performance increases" from the "significant advances" column.
The massive performance increases stem from new hardware: the software side has not changed.  The software engineering side utilizing that hardware has not kept up, we are simply relying on very old (>25 year old) techniques with very little change since then. Except for new hardware. There is basically nothing new in coding/software engineering; everyone is just coasting on the hardware.

Based on my experience in MD simulations, the software side has really stagnated.  The hardware is not utilized to its full potential; the software folks have not kept up.  Even multithreading is still avoided, using multiprocessing instead.  If you look at GPGPU computation, it treats the GPUs as isolated compute units, very much like separate computers in a cluster environment.  Nothing new, and definitely not using the hardware to its full potential.

I wonder at your definition of "cheap"
I was obviously looking only at the hardware cost. Something as simple as a 6502 is definitely beefy enough to run HPGL or basic G-code.

given that "low cost" dot matrix printers from that era were $500 to $1000
We got a Star LC-10 in 1988, I think, but I do believe it was much cheaper than that.  In 1988, it cost under £200 in the UK (according to adverts).

I'll claim that the CAD software needed to effectively drive such a printer would have been essentially impossible in that timeframe (on any "reasonable" (but not "low cost"!) home computer.
GUI CAD? Absolutely agreed.

But direct path generation via a simple programming language, something between HPGL, Turtle Graphics, and Gcode? I claim that possible.  Didn't you ever write PostScript by hand to run on the printer itself?  (I definitely did, for the first HP LaserJet I got access to, in the early nineties.)

PostScript is a simple, but hugely powerful language. Because of rasterization, it did need surprisingly large amounts of memory (as in often more than on the associated computer, in the early times).  While we use proper CAD and slicing for 3D printing now, does not mean it cannot be done much, much simpler.

Anyway, you have good enough points for me to want to amend my claim, to something like (without the patents,) "we might have had", with the point being that the stopping factor was not so much lack of existing technology, or the high cost of most of that technology (GUI CAD design notwithstanding, definitely), but patents obtained for anticompetitive purposes: for plastics manufacturers to use them to reduce competition in their field, reducing the need for further product development.

(While there has been a lot of research and development on the plastic materials themselves, even PLA is a hundred years old invention.  I would not be too surprised to find out that Lego's product development efforts have been a very big driving factor in the precision plastics industry.  Those little toys are surprisingly high-precision bulk-manufactured things.  The tolerances are, and were already in the eighties, absolutely ridiculous for the blocks to attach and detach hundreds of times with very consistent friction fit.  That should tell a lot about what kind of engineering/development actually pushes the world forwards.)



To clarify, my point was to show that in software engineerin, for decades we have not done what is possible, only what is easy or makes short-term business sense.  The development in hardware has masked the software stagnation, but the stagnation is nevertheless obvious in my opinion.  Using other engineering areas like electric cars in the automotive world for comparison, this stagnation seems very costly, although calculating its exact price is very difficult: it is hard to say how much you lose by only using a fraction of the available tools.

While I do complain about the difficulty in getting funding for overcoming that stagnation by example, I understand the reluctance.  I do not agree, but I understand.  Funds are limited, and the risk/benefit ratio hard to estimate.  Hardware is easy and safe.

What I fail to understand, is the unfounded assertion that there is no need to overcome that; that it is somehow unprofessional or wasteful for an engineer to try to do that; that the core of what an engineer or scientist does is something other than learning and sharing that knowledge, even in product form; that that the proper engineering approach is to throw more hardware at it and keep going like we always have in the software side.  I see no evidence supporting that approach. It makes no sense in the medium to long term; it only makes sense in the short term, for one-off commercial products and services.

I suspect that many have accepted that approach axiomatically, without examining it, because it does feel good to think that what you know and can do now will tide you for the rest of your life, and is therefore emotionally very attractive as an axiomatic approach to your profession: it says you're complete now, no need to struggle to keep up anymore.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Nominal Animal on January 29, 2019, 05:01:39 pm
If they could make 3D CAD back then, it should simply fly now.
If we look at the differences in the approach, our current tools seem to heavily lean towards a what-you-see-is-what-you-get visual representation.  That was not always the case, not even for word processors.

This is why I think the CAD software used would have been different; more abstract.

Mathematical solid geometry modelling like OpenSCAD could have been possible, maybe.  I did not suggest it above, because I think the amount of memory generated would have been costly to store (and I'm too lazy to work out if cassette tape drive data rates would suffice, and the entire tape approach work); but slicing the models would have definitely been too slow to do in real time.  Dedicated helper processors, maybe?  Simon's Basic equivalent cartridge on the C64, but for 3D printing?  Not likely, but I don't think it impossible, either; hits me straight in the Uncanny Valley, whenever I think about it.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: NorthGuy on January 29, 2019, 05:29:36 pm
The development in hardware has masked the software stagnation, but the stagnation is nevertheless obvious in my opinion.

The rapid growth in hardware was the cause of the "software stagnation". The hardware growth now slowed down, but the other, much worse factor starts to influence the software industry. Lots of software went free and open source. There's no money in it. Hence no progress. I expect it will only get worse with time.

Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: janoc on January 29, 2019, 06:08:34 pm
The development in hardware has masked the software stagnation, but the stagnation is nevertheless obvious in my opinion.

The rapid growth in hardware was the cause of the "software stagnation". The hardware growth now slowed down, but the other, much worse factor starts to influence the software industry. Lots of software went free and open source. There's no money in it. Hence no progress. I expect it will only get worse with time.

That's complete nonsense, both on the "no money" and the "open source" parts.

If there was no money in it, then why do software guys command so high salaries and are in such high demand? One would think that nobody would want to do such work and companies would be going out of business or pivoting away from software left and right. Kinda don't see it - just look at any job website or ask any recruiter.

And re open source - open source certainly didn't cause any quality "stagnation", more like opposite, because more people can (and do) participate and any crap code tends to be quickly pointed out and fixed, at least in the popular and actually used projects. And it also pushes vendors of competing commercial projects to fix their messes or their clients will jump ship - which they didn't have to do before.

Look at projects like LLVM which actually enabled building a ton of tooling for programming languages that simply wasn't feasible before because the barrier of entry in terms of complexity was so high. Or Linux. Or FreeBSD (Apple owes the BSD folks quite a bit there). Or OpenCascade. Or GCC ...

Also I don't see companies like Autodesk or even Microsoft fearing of going out of business any time soon, despite there being open source alternatives for their products.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: NorthGuy on January 29, 2019, 07:38:13 pm
That's complete nonsense, both on the "no money" and the "open source" parts.

If there was no money in it, then why do software guys command so high salaries and are in such high demand? One would think that nobody would want to do such work and companies would be going out of business or pivoting away from software left and right. Kinda don't see it - just look at any job website or ask any recruiter.

And re open source - open source certainly didn't cause any quality "stagnation", more like opposite, because more people can (and do) participate and any crap code tends to be quickly pointed out and fixed, at least in the popular and actually used projects. And it also pushes vendors of competing commercial projects to fix their messes or their clients will jump ship - which they didn't have to do before.

Look at projects like LLVM which actually enabled building a ton of tooling for programming languages that simply wasn't feasible before because the barrier of entry in terms of complexity was so high. Or Linux. Or FreeBSD (Apple owes the BSD folks quite a bit there). Or OpenCascade. Or GCC ...

Also I don't see companies like Autodesk or even Microsoft fearing of going out of business any time soon, despite there being open source alternatives for their products.

The process is just starting, and you're speaking as it is already complete.

LLVM is a huge ecosystem and huge effort, and people use it, but did it really make any difference in software development? Is today's LLVMed software any less buggy or less bloated than the software before LLVM? I don't think so.

Linux is developing sideways. It is certainly getting better in some places, but it didn't win lots of new users in the past 10 years. This certainly helps Microsoft, but mostly Microsift twist hands of computer manufacturers to pre-install Windows on every computer. This probably cannot last forever. However, open source Android has already pushed Microsoft away in mobile space.

There are places where market reach of free software is huge. GCC for example. Microsoft no longer sell their VC++ compiler, it's forced to be free. Is VC++ any worse than GCC? I don't think so.

Or FreeRTOS. 10 years ago there were lots of vendors, such as uOS. FreeRTOS pushed them all out, but not because FreeRTOS is any better, but because it's free.

So, there's no doubt that, little by little, the free software will take over everywhere, just give it enough time, I guess 20-30 years.

Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: janoc on January 29, 2019, 09:59:46 pm
The process is just starting, and you're speaking as it is already complete.

That process is "starting" for 30 something years that free/open source software exists. That's eternity in IT.

LLVM is a huge ecosystem and huge effort, and people use it, but did it really make any difference in software development? Is today's LLVMed software any less buggy or less bloated than the software before LLVM? I don't think so.

Compared to what?

If I compare LLVM (or even GCC) to any of the proprietary (and expensive) compilers I had to deal with in the past, jeeze, give me LLVM any day! Most of that proprietary stuff was utter crap compared to LLVM or GCC. In fact, recommendation to install/compile GCC and GNU tools was usually the first thing anyone who had to deal with commercial Unix saw, because the vendor-supplied compilers were buggy and supported only obsolete C/C++ versions.

Linux is developing sideways. It is certainly getting better in some places, but it didn't win lots of new users in the past 10 years. This certainly helps Microsoft, but mostly Microsift twist hands of computer manufacturers to pre-install Windows on every computer. This probably cannot last forever. However, open source Android has already pushed Microsoft away in mobile space.

And what do you think Android is based on? Linux, surprise. Desktop Linux is irrelevant but pretty much everything mobile runs either iOS or a Linux kernel today. And that iOS seems to be doing quite well there. Microsoft had a stake in mobile but they have only themselves to blame because of their clueless and hamfisted OEM and developer support. The cost of the system had little to do with it. The same with Nokia's S60 Symbian - it didn't disappear because Linux or iOS were free (the latter certainly isn't) but because S60 was hopelessly outdated and when the first iPhone appeared it was literally like comparing a bullet train with a steam engine ...

There are places where market reach of free software is huge. GCC for example. Microsoft no longer sell their VC++ compiler, it's forced to be free. Is VC++ any worse than GCC? I don't think so.

Free software made some things into commodities. But that doesn't mean the paid-for tools ceased to exist. E.g. Microsoft still sells their compiler and tools. Only the Community edition of Visual Studio is free, which has severe licensing restrictions. If you have more than 5 users or make more than $100k annually you have to buy the commercial version.

The same holds for e.g. Unity 3D or Unreal engine - they are "free" in the sense that the development tools are free for personal use. The moment you start developing commercially or selling something, you owe them money. Etc.

Or FreeRTOS. 10 years ago there were lots of vendors, such as uOS. FreeRTOS pushed them all out, but not because FreeRTOS is any better, but because it's free.

I do wonder where is VxWorks, EUROS, Neutrino, Nucleus, QNX ... Also that FreeRTOS has a commercial license available as well, same as ChibiOS.

So, there's no doubt that, little by little, the free software will take over everywhere, just give it enough time, I guess 20-30 years.

Riiight ...  GCC alone is more than 30 years old. And we still have proprietary compilers (e.g. IAR) and some vendors even repackage and sell GCC-based toolchains (Microchip). Heck, some people prefer to use the expensive IAR compilers even where free GCC-based tools exist. Could it be that GCC simply doesn't (and cannot) cover all the market needs?

Free works for some things but we are not going to see a competitive high end CAD system (too complex, requires specialized knowledge and customers don't care about free, they need support, they need import and export of various proprietary data formats, etc.), free office software exists but it is pretty much irrelevant because Microsoft's formats are the standard, tools like the Adobe Creative Suite pretty much have no free or paid competition (and certainly aren't going to have any time soon, given how much work it would require - GIMP really isn't in the same league). Etc.

And that's generic, commodity software - most software is made-to-measure, custom development. Even if you use free components you will still need engineers to write all the application glue together. And they don't work for a smile and a beer.

I am certainly not worried about lack of work, if anything, there will be more of it in the future because everything is moving from hardware to software due to lower costs of changes and faster time to market.

Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: NorthGuy on January 29, 2019, 11:47:06 pm
That process is "starting" for 30 something years that free/open source software exists. That's eternity in IT.

If you remember, 30 years ago there was time when people were willing to pay for the software. There were lots of vendors, some of them very useful. Today, most people expect the software to be free. Some even get upset when someone ask them to pay for software. The trend is likely to continue.

If I compare LLVM (or even GCC) to any of the proprietary (and expensive) compilers I had to deal with in the past, jeeze, give me LLVM any day! Most of that proprietary stuff was utter crap compared to LLVM or GCC. In fact, recommendation to install/compile GCC and GNU tools was usually the first thing anyone who had to deal with commercial Unix saw, because the vendor-supplied compilers were buggy and supported only obsolete C/C++ versions.

IMHO, GCC  which existed 20 years ago was perfectly fine. Moreover, Microsoft VC++ for Windows was fine too. I also remember Borland C. They all worked about the same.

Today, I use what comes with the platform. It's GCC on Linux (and GCC-based on most embedded things), VC++ on Windows, LLVM on Mac. I cannot tell the difference.

People who develop LLVM do so because they like it. They like how any language can be compiled in, optimized within the same framework, then the assembler produced for any CPU. May be it is fascinating to the LLVM developers, but it doesn't looks particularly fascinating to me, and I do not see any benefits for myself.

Free software made some things into commodities. But that doesn't mean the paid-for tools ceased to exist. E.g. Microsoft still sells their compiler and tools. Only the Community edition of Visual Studio is free, which has severe licensing restrictions. If you have more than 5 users or make more than $100k annually you have to buy the commercial version.

They cell their C# tools. I don't know if there are any open source substitutes.

Their C/C++ compiler is free download which comes with Platform SDK and doesn't even require opt-in for spying. 20 years ago I had to pay for it.

Riiight ...  GCC alone is more than 30 years old. And we still have proprietary compilers (e.g. IAR) and some vendors even repackage and sell GCC-based toolchains (Microchip). Heck, some people prefer to use the expensive IAR compilers even where free GCC-based tools exist. Could it be that GCC simply doesn't (and cannot) cover all the market needs?

For a while. But not forever. We're now in money printing era. When it turns towards austerity there will be huge pull towards money savings and everything free.

Right now, Altium is considered better choice than KiCAD, but I wouldn't bet that it stays this way forever.

I am certainly not worried about lack of work, if anything, there will be more of it in the future because everything is moving from hardware to software due to lower costs of changes and faster time to market.

Of course there always be custom work for programmers ... or for programming AI robots :)
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: Nominal Animal on January 30, 2019, 04:36:17 am
Today, most people expect the software to be free.  Some even get upset when someone ask them to pay for software.
The unwillingness to pay for something they believe is free, is real.

However, I see much more frustration about the black box nature of proprietary software. (Which is why in the past ten years or so, I've looked at how to provide the best of both worlds, especially for "cottage industry" software: having the UI part open source, because that's where the problems that irritate devusers tend to be in; and the proprietary secret-sauce work horse part as a closed library.)

It is all a cultural issue, and cultural trends do shift.  I think we (as in software engineers and product designers) can push a shift to the better, by doing it ourselves. Leading via example.

I just don't want engineers and scientists to recommend against doing that; to claim that it is somehow better to just throw more money and hardware at every problem.

As to future austerity or long-term post-scarcity plans, I'd say that it is a question if/when we realize that in a closed economy, neither the amount of debt or the amount of money/capital can act as the primus motor for growth, only better circulation and flow of resources can. If we do not work that out, we're fuckered. At the global scale, we're definitely a closed economy, unless we meet some Ferengi right quick.  Thus far, we've coasted on the hardware ignoring the software: made the markets and economies larger, not better.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: LapTop006 on January 30, 2019, 04:58:06 am
LLVM is a huge ecosystem and huge effort, and people use it, but did it really make any difference in software development? Is today's LLVMed software any less buggy or less bloated than the software before LLVM? I don't think so.

I'd say it has hugely helped, with much better warning output to make fixing them easier, the various sanitizers (ASAN, TSAN, UBSAN, etc.) only some of which were available in GCC.

All of that has helped to make software less buggy.
Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on March 30, 2019, 11:08:03 pm
I wouldn't bother with that patent as it doesn't work  :--

It seems that what you are trying to do is:
Quote
When added at the begining of avr_time_us_get() wait for OCF0A cleared in TIFR by ISR during its execution, since OCF0A is set when TCNT0 is reset to 0 in CTC
Sorry, for long delay but a few projects at the same time ;)

How did you know that it doesn't work?
I will take closer look at this code, but I've sucessfully used ideas above to keep timing home roof LED lights so this code works now 24h/day and no problem what so ever.

Patent pending markings was made (for fun), while I'm not a fan of patenting anything - I believe in patents are bad for innovation - Instead I simply didn't showed C source for this critical part, but it was smart enougth I think, since now when I'm back to this project I do not remember how I've did it  :-DD

Anyway, optimized ISR using inline AVR assembler code incrementing only given number of bytes in unsigned number of microseconds is perfectly fine.
Tricky part is in functions which can be hit by this ISR timer code running at 100kHz frequency, and this part of code will be now very carefully tested - I've showed only assembler code, because of I really think that it is very smart way to do it and it should work - proof is proper timing of home light trigered by PIR sensor module and additional modules like magnetic transducers to detect noisy dog or radio for remote controll.

I'll use logic analyser to debug this tricky part of code in realtime, to ensure it runs as expected.

Title: Re: Fast unsigned integer multiply by x100 on 8bit AVR?
Post by: beduino on March 30, 2019, 11:30:25 pm
I *am* assuming ATtiny85 here. They happen to be quite cheap, but interestingly powerful microcontrollers.
Yep, I've used sucessfully ATTiny85 in many projects since it fits my performance requirements and has very small footprint on PCB.
Now, I'm working on project where this tiny AVR with additional opamp (the same footprint SO8) will make complete speed/cadence sensor for road cycling power meter.
At 5VCC I can run ATTiny85 at 16MHz internall clock according to it's specs, while at 3.3Vcc easily at 8MHz.

The same 100kHz timer ISR will be used in those sensors and additionally at the same time output data on 1 wire custom protocol with a few bytes messages with priority based on sensor IDs.

So, now imagine tens of such small but quite powerfull AVRs running in parallel, catching I/O interrupts, processing, easy to scale just by adding another sensor - something like distributed multiprocessor system with decent processing power and what is most important all sensors running in parallel at 8Mhz/16Mhz at the size of SO8 footprint!

It is amazing that such small thing can run so many RISC operations per second and with a little help of manually guided optimized inline AVR assembler code can be very powerfull and small at the same time- we are talking about a few miliamps of current needed to make quite complicated computations but in parallel  8)