Author Topic: A more efficient blink for AVR processors  (Read 17345 times)

0 Members and 1 Guest are viewing this topic.

Offline GiantGnomeTopic starter

  • Contributor
  • Posts: 25
  • Country: dk
A more efficient blink for AVR processors
« on: March 23, 2014, 01:52:14 pm »
I have decided to make my life more miserable and have started trying to wrap my mind around assembler programming for the AVR core. I have used the ]AVRA assembler for the reason that it was the first thing, that popped up in the Ubuntu Software Center.

As a first shot, I have attempted to implement the classic blink example. You put the clock speed and the number of milliseconds in the top of the program, and you are ready to go. The program is very space efficient (42 bytes), and if my cycle accounting is done right, it should be precise to the
extent of your clock speed precision.  To compare to the arduino blink example, which compiles to over a kilobyte and does not look to be precise to the microsecond.

Is there any grey-bearded assembly sage out there with any advice about this code, ie. other than to stay away from assembly in the first place?

Is there anyone out there with another example of super efficient assembler code for 'common' microcontroller tasks?

Code: [Select]
; My first piece of assembler code. It is made to mimic the blink
; example from the Arduino environment. The arduino example compiles
; to around 1Kb - this is 42 bytes. And should be very
; precise in the timing.
; It is possible to reduce the size of the program at the cost of
; precision by removing some of the nop's in the delay subroutine.
.nolist;
.include "m88def.inc";
.list;


; SETTING CLOCK SPEED - currently at 16 MHz
.equ clockCyclesPerMilliSecond = 16*1000
; The delay to put between blinks in milliseconds
.equ delayMilliseconds = 1000
; The direction register, the port and the bit to set the pin of the
; LED to flash
; Currently at PB5 (Arduino pin 13)
.equ DDR = DDRB
.equ PORT = PORTB
.equ BIT = 5

; SETTING UP REGISTERS
.DEF my_register = R16

; From some example. Not sure if this is needed... It works without it,
;     so it is currently removed to save 2 bytes of program space :-)
; rjmp setup

; setup
setup:
    SBI     DDR,BIT         ; Set pin to output

loop:
    sbi     PORT,BIT        ; 2 cycles - set pin HIGH
    rcall   Delay           ; 3 cycles (the call itself)
    cbi     PORT,BIT        ; 2 cycles - set pin LOW
    rcall   Delay           ; 3 cycles (the call itself)
    rjmp    loop            ; 2 cycles (the jump itself) - repeat
   
Delay:
    nop
    ; Delay consists of two loops - the inner loop loops for a
    ; millisecond, the outer counts the number of milliseconds-
    ; From every inner loop, there is subtracted the number of
    ; cycles to complete the outer loop (8). From the first time, there
    ; is also subtracted the number of cycles to call, setup and return
    ; from the subroutine as well as the cycles for switching the pin,
    ; half of the rjmp command and the nop in the start of this function
   
    ; inner loop : 4 cycles
    ; outer loop : 8 cycles
    ; pin switching, calling, returning and looping : 16 cycles
   
    ; Since precision is made by cutting the number of times the
    ; inner loop runs, it is important that the number of cycles
    ; in the outer loop and the one-time-fluff is divisble by 4.
   
    ldi     ZH,HIGH((clockCyclesPerMilliSecond-8-16)/4)
    ldi     ZL,LOW((clockCyclesPerMilliSecond-8-16)/4)
    ; A lot of nops and grief could be saved by only supporting a
    ; maximum of 255 millisecond delay.
    ldi     YL,LOW(delayMilliseconds)
    ldi     YH,HIGH(delayMilliseconds)
   
    delayloop:
            sbiw    ZL, 1       ; 2 cycles
            brne    delayloop   ; 2 cycles
       
        sbiw    YL,1                                     ; 2 cycles
        ldi     ZH,HIGH((clockCyclesPerMilliSecond-8)/4) ; 1 cycle
        ldi     ZL,LOW((clockCyclesPerMilliSecond-8)/4)  ; 1 cycle
        nop ; added to make a number of cycles divisible by 4 1 cycle 
        nop ; added to make a number of cycles divisble by 4  1 cycle
        brne    delayloop                                ; 2 cycles
   
    nop ; added to make a number of cycles divisible by 4 ; 1 cycle
    ret ;                                                   3 cycles
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: A more efficient blink for AVR processors
« Reply #1 on: March 23, 2014, 01:55:12 pm »
Quote
super efficient

The answer depends on your definition of "efficiency".
================================
https://dannyelectronics.wordpress.com/
 

Offline GiantGnomeTopic starter

  • Contributor
  • Posts: 25
  • Country: dk
Re: A more efficient blink for AVR processors
« Reply #2 on: March 23, 2014, 02:05:05 pm »
Well, lets keep it to speed efficient, memory efficient and/or program space efficient. Maybe power efficient also. No softies like readability or such :-)

My blink example is quite space efficient, and uses no RAM, if I understand it correctly. Speed optimization does not make sense in a blink example, but it could probably be way more power efficient by utilizing some of the power saving features of the AVRs.

Maybe that is how I should spend my sunday  :)

Other idea for types of efficiency?
 

Offline Rasz

  • Super Contributor
  • ***
  • Posts: 2617
  • Country: 00
    • My random blog.
Re: A more efficient blink for AVR processors
« Reply #3 on: March 23, 2014, 02:11:18 pm »
pcik one
Well, lets keep it to speed efficient, memory efficient and/or program space efficient. Maybe power efficient also. No softies like readability or such :-)

pick one, maybe two if you are lucky

your post suspiciously reminds me of this :)
« Last Edit: March 23, 2014, 02:13:18 pm by Rasz »
Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: A more efficient blink for AVR processors
« Reply #4 on: March 23, 2014, 02:19:44 pm »
 -, and uses no RAM,-

that would depend on your definition of ram.
================================
https://dannyelectronics.wordpress.com/
 

Offline GiantGnomeTopic starter

  • Contributor
  • Posts: 25
  • Country: dk
Re: A more efficient blink for AVR processors
« Reply #5 on: March 23, 2014, 03:15:04 pm »
-, and uses no RAM,-

that would depend on your definition of ram.

That would be, SRAM. Sorry about that. Is it correctly understood, that when I use only registers and program space, I avoid using SRAM? I have acquired a handful of attiny13a's with only 32 bytes of SRAM and 1kB of program memory,  which encourages me to be somewhat frugal with memory use. As I can figure, the attiny13a couldn't hold the blink example without doing some optimization on it. It compiled to 1084 bytes.

your post suspiciously reminds me of this :)


Other than using inline assembly in arduino, it also reminds me of my project :-). I have used mostly information from here to piece together the instructions: http://www.avr-asm-tutorial.net/avr_en/index.html.
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #6 on: March 23, 2014, 04:06:58 pm »
I'll be the first one to encourage to learn assembly, but I wouldn't use it to do tight loops for timing, although it will work it has no real practical application since you are just using the cpu constantly so you can't do anything other than to blink that LED.

Since the AVR chip doesn't have a performance counter (counts cycles executed by the processor) I would recommend to use timers and counters instead of looping cycles. Not only it will be more accurate (if done right) but you can still use your cpu for other things.
   
Some links that might be of use:
   
http://www.seanet.com/~karllunt/interval.html
http://www.wrightflyer.co.uk/Using%20AVR%20Counter.pdf

Only time I would use code cycle delays are for communications where a NOP delay here and there might make all the difference in the world.
 

Offline zapta

  • Super Contributor
  • ***
  • Posts: 6316
  • Country: 00
Re: A more efficient blink for AVR processors
« Reply #7 on: March 23, 2014, 04:41:40 pm »
Is there any grey-bearded assembly sage out there with any advice about this code, ie. other than to stay away from assembly in the first place?

No grey grey-bearded assembly sage here, more like an Arduino newbe, but here are my 2c:

Premature optimization is one of the seven deadly sins of programming.

http://blogs.msdn.com/b/ericgu/archive/2006/06/26/647877.aspx

If your goal is a blinking program, then the stock Arduino example is just fine and fits in memory, problem solved.  If you have another goal that requires higher level of efficiency, you can still do a lot using efficient C++ code and examine the output of gcc using (IIRC)  something like 'avr-objdump -S <pass to the .elf file generated by the Arduino IDE>'

You probably better spend your time learning writing efficient C++ code. I had to do just the same recently when implementing a protocol proxy/injector based on Arduino Mini Pro. All code is in C++, 20kbs two way bit banging, protocol decoding and generation, application specific logic, 115Kbs continuous serial output stream, using high level libraries like sprintf, using 'inline' everywhere, and still the program is less than 8KB in size (my AVR has 32K).  (my arduino project is here https://github.com/zapta/linbus/tree/master/prototype/arduino )

These little AVR's and the Arduino IDE are very capable and gained my respect.
 

Offline zapta

  • Super Contributor
  • ***
  • Posts: 6316
  • Country: 00
Re: A more efficient blink for AVR processors
« Reply #8 on: March 23, 2014, 04:56:43 pm »
I'll be the first one to encourage to learn assembly, but I wouldn't use it to do tight loops for timing, although it will work it has no real practical application since you are just using the cpu constantly so you can't do anything other than to blink that LED.

I think this is the main obstacle that new programer face when they want to go from the blinking led example to more complex programs. The delay() function is basically a dead end for the reason you mentioned.  I wrote several Arduino program recently (my firsts) and came up with a simple model for multi tasking.  Maybe I will find time to do a writeup, don't know.  The rules are very simple

1. Don't use delay and other long blocking functions.
2. Implement each task has having these two methods  void setup(); and void loop();
3. In the main setup() call exactly once the setup of each of your tasks (and same for loop).

Now you have not an arduino but an arbitrary number of parallel arduinos.

A blinking task can be implemented for example as:

setup() {
  init the led output
}

loop() {
  if (time in millis % 1000 > 500) {
     set led ON
  } else {
    set led OFF
  }
}

I would rename the arduino's delay function to evilEvilEvilDelay().   It definitely is.  ;-)
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 8398
  • Country: de
  • A qualified hobbyist ;)
Re: A more efficient blink for AVR processors
« Reply #9 on: March 23, 2014, 05:01:21 pm »
Here we go:
- use the pico power version of the AVR
- check which timer keeps running in which sleep mode, also take care about the clock source
- setup timer for the delay
- in the corresponding ISR toggle a flag and update the blink pin based on the flag
- sleep
 

Offline vvanders

  • Regular Contributor
  • *
  • Posts: 124
Re: A more efficient blink for AVR processors
« Reply #10 on: March 23, 2014, 05:04:06 pm »
+1 on using a timer and learning sleep modes. Low power usage is just as hard(if not harder) than performance and extremely useful in battery situations.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: A more efficient blink for AVR processors
« Reply #11 on: March 23, 2014, 05:07:49 pm »
Quote
The rules are very simple

An excellent programming practice is to break those rules.

Quote
1. Don't use delay and other long blocking functions.

It depends on your application.

Quote
I would rename the arduino's delay function to evilEvilEvilDelay(). 

There are no evil / stupid routines or approaches, only evil / stupid programmers utilizing those routines and approaches inappropriately.
================================
https://dannyelectronics.wordpress.com/
 

Offline nuhamind2

  • Regular Contributor
  • *
  • Posts: 138
  • Country: id
Re: A more efficient blink for AVR processors
« Reply #12 on: March 23, 2014, 06:11:01 pm »
For the best granularity nothing beat a simple nop. In my graduation project (an AVR VGA adapter) I use nop for short delay and for dummy code (in which i'm gonna replace with a more usefull code),for long delay I use loop.
 

Offline dfmischler

  • Frequent Contributor
  • **
  • Posts: 548
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #13 on: March 23, 2014, 06:25:59 pm »
I wrote many thousands of lines of assembly language before 1990 or so.  And my beard would be gray if I didn't shave it off every day or so.

It looks to me like your program is using RAM to store return addresses on the stack and return through them.

Knowing how to write assembly language will give you a good mindset for writing other languages.  And your ideas about performance/timing are essentially identical to the ones used by people writing code for early microprocessors, but they don't scale up when the systems start getting more complicated (e.g. multiple issue, cache memory, virtual memory, real-time task scheduling, etc).

I wrote a lot of assembly to do low-level things with hardware (e.g. device drivers), or to interface to system software that had no high-level call procedures, or to implement whole systems on hardware platforms with small memories.  I even wrote a few cross-assemblers and a linker in PDP-11 assembler (and I wrote a cross-assembler in TI-990 assembly).  Programmers fought over CPU architecture because it mattered when your code depended on it.  Fortran-66 was the most portable language for the minicomputers when I got into the industry.  I wrote something a lot like a compiler in Fortran once because it needed to run on DEC, DG, TI and IBM systems; it ran really slowly, though.  The C language was not popular except on Unix until sometime in the early-mid 80's.  Source language debugging tools were rare, too.

Then all that changed.  You could get a C compiler for almost anything, and if you wrote your code carefully it would run on almost any platform.  You could get a cross-compiler for your micro of choice that would run on a low-cost development system, and your program would fit in the target memory.  And your program would mostly run fast enough.  You could write a little bit of assembler to fix speed problems if you had to.  I can't think of any good reason to write a large program in assembler now.
« Last Edit: March 23, 2014, 06:38:31 pm by dfmischler »
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #14 on: March 23, 2014, 07:07:57 pm »
I can't think of any good reason to write a large program in assembler now.

Same reason as in the 80's embedded processors have the same limitations as our old microprocessors. Some only have 2K of programming space, granted they run on batteries and for a long time as well.

The Z80 is still on it's prime for home automation and other embedded aplications, specially since it was originally designed for just that. 6502's are also still in production and everywhere.

Although it seems PICs and AVRs are taking over. But there are a lot of embedded processors everywhere you look. The other day I was looking at the datasheet for an NFC chip PN5321/C1 and it has an 80C51 microcontroller built in with 40KB Rom program on it. Also another chip AK2117  (single chip digital multimedia) for mp3 and video players, also has an integrated 80C51.

Anyways, the smaller and more portable and less power the most likely you don't have a lot of coding space on them. So I think there is still a place for assembly when you are working on very spartan MCUs integrated in the die of some chip.


Edit: and I also programmed in Vax assembler and it's the only chip that had queuing instructions that I've ever encountered, and yes my beard as a lot of grey on it ;)
« Last Edit: March 23, 2014, 07:50:56 pm by miguelvp »
 

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 22436
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: A more efficient blink for AVR processors
« Reply #15 on: March 23, 2014, 10:57:12 pm »
Meh, I'd rather have Z80, or HC08/11 or something like that, than RISC.  It's just too verbose; you spend several lines just doing anything (setting up and reading an array, for instance).  At least it's not as bad as 6502 must've been, back in the day -- having to pull everything through a single accumulator register, ewww. :o

I do love that R0-R15 are basically universal accumulators (though the difference between R0-R15 and R16-R25(-R31) is annoying at times).  That's more register bits and (needless to say) more orthogonality than 8086.

MSP430 and MIPS32 (i.e., PIC32) are other instruction sets I should investigate.  They look comparable (RISC-ey) while being richer (bigger instructions, at least in the case of MIPS), without being overly complex (look at all the instruction parameters on an ARM -- take your pick as far as model).

And let's not forget 8051 that's still around... :P

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4382
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #16 on: March 24, 2014, 02:48:07 am »
As someone pointed out, you're using some RAM for the stack.  It's a good thing you used one of the AVRs that initializes the SP for you.

Without changing the general logic, I see a couple of changes:
1) Uses the "output to PINx" to toggle pins feature.
2) jump to your Y reload instead of including it twice.
3) replace doubled no-ops with two-cycle no-ops like "rjmp ."

Code: [Select]
.equ PORT = PINB
.equ BIT = 5


setup:
    SBI     DDR,BIT         ; Set pin to output

loop:
    sbi     PORT,BIT        ; 2 cycles - set pin HIGH
    rcall   Delay           ; 3 cycles (the call itself)
    rjmp    loop            ; 2 cycles (the jump itself) - repeat
   
Delay:
    nop
   
    ldi     ZH,HIGH((clockCyclesPerMilliSecond-8-16)/4)
    ldi     ZL,LOW((clockCyclesPerMilliSecond-8-16)/4)

DelayOuter:
    ldi     YL,LOW(delayMilliseconds)
    ldi     YH,HIGH(delayMilliseconds)
   
    delayloop:
            sbiw    ZL, 1       ; 2 cycles
            brne    delayloop   ; 2 cycles
       
        sbiw    YL,1                                     ; 2 cycles
rjmp delay2 ;;; Two cycle delay
delay2: brne    DelayOuter ; reload and restart inner loop
   
    nop ; added to make a number of cycles divisible by 4 ; 1 cycle
    ret ;                                                   3 cycles

On a broader level:
Using extra nops in your top-level Delay routine (at enter and exit, especially) is silly.  You're talking 125ns out of a human-visible 500ms...
Likewise the double no-op in the outer delay loop.
I would be inclined not to use Y and Z in a delay loop; since they are "important" for other special functions.  There's no good real reason to use double-byte math here
Likewise, and as implied by your comments, I'd be inclined to make the inner loop longer so that the outer loop would be useful with a single-byte counter (1/50th second or something.)  Assuming you don't go to a timer-based approach.
 

Offline dfmischler

  • Frequent Contributor
  • **
  • Posts: 548
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #17 on: March 24, 2014, 12:36:08 pm »
I can't think of any good reason to write a large program in assembler now.
Same reason as in the 80's embedded processors have the same limitations as our old microprocessors.

In 1981 I got a job writing code for a Motorola 6800 system.  I think the hardware design was a bit "recycled".  There was room for up to 8K of EPROM (2716s) and 16K of RAM (2114s).  They could have built the same system with Z80, 8080, or 6502; I don't know if an 8051 would have worked at the time.  All of these had essentially the same trade-offs for implementation language (assembly) and memory size (small).  The 16-bit micros (8086, 68000) were pretty new, and not considered cost competitive for our needs, nor were the tools much better than for the 8-bitters yet.  Today, there are many more choices and smart engineering usually involves picking a micro and tools that allow better productivity and maintainability of the firmware, unless the production quantity is really huge.  And there are getting to be fewer and fewer applications that do not need real networking, etc.

 

Offline GiantGnomeTopic starter

  • Contributor
  • Posts: 25
  • Country: dk
Re: A more efficient blink for AVR processors
« Reply #18 on: March 24, 2014, 08:11:25 pm »
First of all thank you for a good patient post  :)
Without changing the general logic, I see a couple of changes:
1) Uses the "output to PINx" to toggle pins feature.
2) jump to your Y reload instead of including it twice.
3) replace doubled no-ops with two-cycle no-ops like "rjmp ."
Okay, let me see if I understand correctly:
1) As far as I could gather, the PINx is the "Input Pins Address", which I figured to mean that it was for input, not output? So when I do SBI to PINB, it actually toggles HIGH/LOW, rather than just setting HIGH?
2) I gather, that you refer to me reloading Z twice? I know it is rather anal (1 lousy ms), but I wanted to try my hand at cycle-counting, which is more of an academic exercise. But as you said, the precision is not really necessary for this example.
3) Ah, so you can use the rjmp without actually jumping anywhere to get two cycles in one command word. Nice little trick!

Using extra nops in your top-level Delay routine (at enter and exit, especially) is silly.  You're talking 125ns out of a human-visible 500ms...
Likewise the double no-op in the outer delay loop.

Again, yes I know. It does not REALLY matter.

Likewise, and as implied by your comments, I'd be inclined to make the inner loop longer so that the outer loop would be useful with a single-byte counter (1/50th second or something.)  Assuming you don't go to a timer-based approach.

Ah, this was my first idea. I really wanted to use that. But then I could not use the function to actually set an arbitrary number of milliseconds for the delay, right?

As someone pointed out, you're using some RAM for the stack.  It's a good thing you used one of the AVRs that initializes the SP for you.

Okay, this I had a hard time understanding, so I have to read a bit. Is it only when I do subroutines (RCALL / RET) - where RCALL PUSHes the position to the stack and RET POPs it back? So the program as it is now uses 2 byte (1 16bit word) of memory?

So if I should ever encounter a (n AVR) processor, I should stay away from subroutines (or be really clever)?

Again thank you (all of you) for the help.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: A more efficient blink for AVR processors
« Reply #19 on: March 24, 2014, 09:00:03 pm »
- o when I do SBI to PINB, it actually toggles HIGH/LOW, rather than just setting HIGH?-

You could benefit from reading the datasheet - it is all documented there clearly.
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #20 on: March 24, 2014, 11:19:36 pm »
So while answering about capabilities of different architectures for timing purposes, I pointed out that Intel chips have a Time Stamp Counter that is a very stable clock and you only have to read the value to know how much time has lapsed (provided you know the frequency of your CPU, that you could measure with some external clock that would have to be precise to begin with)

https://www.eevblog.com/forum/beginners/why-is-blinky-not-stable/msg412403/#msg412403

So I did put a link to the Wiki page on the Time Stamp Counter http://en.wikipedia.org/wiki/Time_Stamp_Counter
And noticed that the AVR32 does have a program counter registers.

So looking at the AVR32 Technical Reference Manual
http://www.atmel.com/Images/doc32001.pdf

It referred me to the AVR32 Architecture Document for details
http://www.atmel.com/images/doc32000.pdf

The technical reference manual mentions the configuration register needed to turn on the performance counters and to check if it's present bit 4 has to be 1 on CONFIG0.

In the Architecture Document it states:
Quote
7. Performance counters

7.1 Overview
A set of performance counters let users evaluate the performance of the system. This is useful when scheduling code and performing optimizations. Two configurable event counters are provided in addition to a clock cycle counter. These three counters can be used to collect information about for example cache miss rates, branch prediction hit rate and data hazard stall cycles.

The three counters are implemented as 32-bit registers accessible through the system register interface. They can be configured to issue an interrupt request in case of overflow, allowing a software overflow counter to be implemented.
A performance counter control register is implemented in addition to the three counter registers. This register controls which events to record in the counter, counter overflow interrupt enable and other configuration data.

So, if you do have a performance counter on your chip, you can use one of the oscillators to measure how many ticks your processor can do in 1 second and store that. Or if you trust the processor not to deviate then just use your clockCyclesPerMilliSecond

Feel free to do this in assembly but let me just write a c loop for demonstration

Code: [Select]
// Note: clockCyclesPerMillisecond could be computed using precision oscillators instead of hard coding
// the known CPU frequency.
unsigned long clockCyclesPerMillisecond= 16*1000;
unsigned long ticksPerblink = 1000 * clockCyclesPerMillisecond;
unsigned long begin_time = getPerformanceCounter();
unsigned long current_time = 0L;

while(1) {
    current_time = getPerformanceCounter();
    if ((current_time-begin_time) > ticksPerblink ) {
         // inc by ticksPerblink because any extra ticks that passed will be ignored next loop
        begin_time += ticksPerblink ;
        flipLight();
    }
    // do other stuff here.
}

But note that  in the loop you could do other processes too, this will work well for assembler as well.

As for the first link content, I think I will get myself a Galileo to experiment with the RDTSC instruction on an arduino like board with a pentium on it :)
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: A more efficient blink for AVR processors
« Reply #21 on: March 25, 2014, 12:14:29 am »
Quote
the AVR32 does have a program counter registers.

Would be difficult for a mcu to not have a program counter.
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #22 on: March 25, 2014, 12:17:50 am »
Quote
the AVR32 does have a program counter registers.

Would be difficult for a mcu to not have a program counter.

That you can access in code?
And by that I don't mean doing a jump, but to be able to read it so you know the cycles the mcu has been ticking

Edit: Arrgh, so I made a typo and say program counter instead of performance counter. sheesh
« Last Edit: March 25, 2014, 12:21:52 am by miguelvp »
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: A more efficient blink for AVR processors
« Reply #23 on: March 25, 2014, 12:19:25 am »
I have a simpler approach:

Code: [Select]
  tmp = systick_get(); //get systick's current value
  do_my_things();
  tmp = systick_get() - tmp; //how much ticks have elapsed

You can implement systick differently on different chips.
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #24 on: March 25, 2014, 12:39:35 am »
I have a simpler approach:

Code: [Select]
  tmp = systick_get(); //get systick's current value
  do_my_things();
  tmp = systick_get() - tmp; //how much ticks have elapsed

You can implement systick differently on different chips.

Simpler is right, if you want to time the delta for do_my_things() then that's fine.

But that doesn't keep any time at all for controlling the blinky, just a delta won't help the OP much.

 

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 22436
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: A more efficient blink for AVR processors
« Reply #25 on: March 25, 2014, 01:28:30 am »
Efficient Blink

Advantages:
- 6 WORDs
- No memory required (not counting IO registers), only uses four registers
- Runs on any memory size
- Can be used stand-alone or with other programs

Disadvantages:
- Other programs cannot have loops (screws with timing -- or, the loops should have consistent timing); several interrupts are unavailable (for best timing)
- Fixed blink frequency only

Warning: erase before programming.

Code: [Select]
.cseg
.org 0

inc r0
brne overloop

in r16, DDRB
ldi r17, 1 << PB0
eor r16, r17
out DDRB, r16
overloop:

I'm disappointed that the port write is so classically RISC.  I'm sure it can be slimmed down.

Note: this is running in an ATmega32 on an Olimex board, so the LED is PB0, active low.  If it were PB7, I could clean it up a lot more.

Tim
« Last Edit: March 25, 2014, 01:34:20 am by T3sl4co1l »
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4382
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #26 on: March 25, 2014, 02:17:27 am »
Quote
2) I gather, that you refer to me reloading Z twice? I know it is rather anal (1 lousy ms), but I wanted to try my hand at cycle-counting, which is more of an academic exercise.
But the load takes the same cycles whether it happens before or after the jump.  I'm pretty sure my fix has exactly the same timing as your original code, with 2 fewer instructions.

Quote
- 6 WORDs
I don't think it's fair not to count the thousands of instructions at "overloop."
 

Offline GiantGnomeTopic starter

  • Contributor
  • Posts: 25
  • Country: dk
Re: A more efficient blink for AVR processors
« Reply #27 on: March 25, 2014, 05:34:54 am »
Quote
2) I gather, that you refer to me reloading Z twice? I know it is rather anal (1 lousy ms), but I wanted to try my hand at cycle-counting, which is more of an academic exercise.
But the load takes the same cycles whether it happens before or after the jump.  I'm pretty sure my fix has exactly the same timing as your original code, with 2 fewer instructions.

Not sure if I understand it correctly, but it seems that if I use your code, the first round of the inner loop would be done (clockCyclesPerMilliSecond-16-8)/4 = 3994 times, when all subsequent would run 65536 times, because the Z registers will start at zero. This makes the delay timing way off.

My intention was to have the inner loop to run 3994 times the first time, then 3998 all other times. The 3998 compensates for the cycles used in the outer loop, the 3994 compensates for the cycles in the main loop, the calls, setup and return from the subroutine.

You could argue that the last 16 cycles (1 microsecond) is not worth it for the extra instructions it costs... And you would probably be right.

(Also it seems to be stuck because you decrement Y in the end of the outer loop and reset it to delayMilliseconds at the start of the outerloop. This, I ascribe to typo)

- o when I do SBI to PINB, it actually toggles HIGH/LOW, rather than just setting HIGH?-

You could benefit from reading the datasheet - it is all documented there clearly.


You are absolutely right. It is very clear on page 51 of the Attiny13a datasheet, section 10.2.2.  :-[

Lesson learned (again)! Do not rely on tutorials only - ALWAYS READ THE DATASHEET.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4382
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #28 on: March 25, 2014, 07:52:01 am »
oops.  It looks like I did it wrong.
You had
Code: [Select]
    ldi     ZH,HIGH((clockCyclesPerMilliSecond-8-16)/4)
    ldi     ZL,LOW((clockCyclesPerMilliSecond-8-16)/4)
    ; A lot of nops and grief could be saved by only supporting a
    ; maximum of 255 millisecond delay.
    ldi     YL,LOW(delayMilliseconds)
    ldi     YH,HIGH(delayMilliseconds)
   
    delayloop:
            sbiw    ZL, 1       ; 2 cycles
            brne    delayloop   ; 2 cycles
       
        sbiw    YL,1                                     ; 2 cycles
        ldi     ZH,HIGH((clockCyclesPerMilliSecond-8)/4) ; 1 cycle
        ldi     ZL,LOW((clockCyclesPerMilliSecond-8)/4)  ; 1 cycle
        nop ; added to make a number of cycles divisible by 4 1 cycle 
        nop ; added to make a number of cycles divisble by 4  1 cycle
        brne    delayloop                                ; 2 cycles
And I should have re-arranged as well as changing the jump:
Code: [Select]
Delay:
    nop   
    ldi     YL,LOW(delayMilliseconds)
    ldi     YH,HIGH(delayMilliseconds)
DelayOuter:
    ldi     ZH,HIGH((clockCyclesPerMilliSecond-8-16)/4)
    ldi     ZL,LOW((clockCyclesPerMilliSecond-8-16)/4)
   
    delayloop:
            sbiw    ZL, 1       ; 2 cycles
            brne    delayloop   ; 2 cycles     
        sbiw    YL,1                                     ; 2 cycles
rjmp delay2 ;;; Two cycle delay
delay2: brne    DelayOuter ; reload and restart inner loop
Z is loaded at DelayOuter for both the initial loop, and also subsequent reloads.

 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: A more efficient blink for AVR processors
« Reply #29 on: March 25, 2014, 01:12:53 pm »
Quote
But that doesn't keep any time at all

Check on SysTick for Cortex-M processors.

Quote
for controlling the blinky, just a delta won't help the OP much.

That's easily implemented. Here is one example:

Code: [Select]
  time0=systick_get(); //obtain current time
  while (systick_get() - time0 < desired_duration) continue; //wait for desired duration to pass
  //do your things.

You can implement it, I am sure, a gazillion different ways too. But the basic idea is to have one free-running timer at all times.
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #30 on: March 25, 2014, 02:05:08 pm »
Check on SysTick for Cortex-M processors.

Yes, it's the same thing I mentioned but for ARM processors, it returns the clock ticks elapsed. But you have to know the frequency of your CPU and sample a well known oscillator that can give you a second worth of ticks that you can rely on for the rest of the program.

If you don't care about actually keeping time, then that is fine. But if you are going to use it to interface with other devices or humans, I would recommend you can accurately determine at least milliseconds without drift (so not discarding ticks and keeping them around for the next loop)

Quote
That's easily implemented. Here is one example:

Code: [Select]
  time0=systick_get(); //obtain current time
  while (systick_get() - time0 < desired_duration) continue; //wait for desired duration to pass
  //do your things.

You can implement it, I am sure, a gazillion different ways too. But the basic idea is to have one free-running timer at all times.

But that doesn't keep a constant time, just delta times. Actually just delta ticks.

Having a master clock that doesn't drift  and you can determine milliseconds since start at any given time would be a better approach.

say systick_get() - time0 < desired_duration gives you 3 extra ticks over desired_duration, you have drifted then 4 ticks.

On your approach you are really just delaying for the next frame, which is fine for applications that need to keep up a desired frame time say 120Hz per program loop then eat up extra cycles for the next frame. But you have to keep the running clock to prevent the tick drift even if it's probably no more than 4 ticks per loop.

Edit: but if it was for keeping up with a constant frame time, I would do the work first then find out how many ticks I have to delay by, instead of delay first then do work.

And I'm all for having one free-running timer at all times, that's why I suggested it in the first place
« Last Edit: March 25, 2014, 02:09:46 pm by miguelvp »
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: A more efficient blink for AVR processors
« Reply #31 on: March 25, 2014, 02:11:07 pm »
Quote
But you have to know the frequency of your CPU

SysTick is driven by HCLK. You can read it fairly easily.

Quote
without drift

Maybe you can articulate what "drift" you are talking about before I respond further.
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #32 on: March 25, 2014, 03:26:55 pm »
Quote
without drift

Maybe you can articulate what "drift" you are talking about before I respond further.

say you do:
  time0=systick_get(); //obtain current time
and get 100000
and say your desired_duration is 1ms (1000 on a 1 MHz system)

Now say your loop takes 7 cycles per iteration (or any other value that 1000 is not divisible by):
while (systick_get() - time0 < desired_duration) continue; //wait for desired duration to pass

after 142 loops systick_get() - time0 will be 6 ticks short so it has to loop once more and 143*7 is 1001 ticks, so you drifted one microsecond.

If your loop took 9 cycles, then the drift will be 8us.

so that's why you must adjust your begin time with the desire_duration on the next loop so that it compensates from those extra cycles here and there.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: A more efficient blink for AVR processors
« Reply #33 on: March 25, 2014, 08:35:32 pm »
That kind of "drift" is inevitable in a real application (where the mcu is doing more than just blinking). Take your code for example, in the middle of it, an interrupt could have arrived thus lengthening the task to finish a job.

Putting aside the practical usefulness of being accurate to 1us here, you can improve the "drift" in the code that I posted. One approach may look like this:

Code: [Select]
  while (systick_get() < time_target) continue; //wait for target time to arrive
  time_target += desired_duration; //update time_target
  do_my_thing();  //execute user task

So everytime desired_duration or its multiples are happening, your task is executed.

You can also institute a user isr for systick counter so do_my_thing is done there directly, or in the main() via a flag.

The goal isn't to eliminate drifting; but to come up with a compromise.
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #34 on: March 26, 2014, 12:50:16 am »
Agreed, that's why I took the same approach to add the desired time to the begin time.
Adding it to the target time like you did works as well.

Furthermore if your new target time is lower than the last time tick, then it means your processes are taking longer than the desired time and during development you could adjust for that or rethink some processes. Having a timing system like this is pretty powerful.

It not only helps on the final product, but during development you can do very precise performance timings that might bite you if you didn't have such a system.

Too bad only higher end MCU's have dedicated instructions in the core fabric, but using an external oscillator driving an interrupt will work as well, many ways to skin that cat.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4382
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #35 on: March 26, 2014, 04:49:31 am »
Quote
Too bad only higher end MCU's have dedicated instructions in the core fabric
Lots of "moderate" level MCUs have a "systick timer" that that is pretty close.  (standard on all the ARM Cortex processors, and it's high time we stopped thinking about a CM0 as "high end")
Although I guess they're not quite as valuable when you need to implement an ISR to handle times more than about 1s.
(But then, the built-in timers pretty much have to be 64bits (like the RDTSC instruction on x86) to do that, and I'm not sure I'd want a 64bit counter on my 8bit CPU anyway.)

This relatively tiny (32bytes) timer-based code is based on the code from optiboot:

Code: [Select]
int main() {
  DDRB |=  1<<LED;
  // Set up Timer 1 for timeout counter
  TCCR1B = _BV(CS12) | _BV(CS10); // div 1024
  do {
    TCNT1 = -(F_CPU/(1024*16));
    TIFR1 = 1<<TOV1;
    while(!(TIFR1 & _BV(TOV1)))
        ;
    LED_PIN |= 1<<LED;
  } while (1);
}
« Last Edit: March 26, 2014, 05:06:26 am by westfw »
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #36 on: March 26, 2014, 05:12:23 am »
Quote
Too bad only higher end MCU's have dedicated instructions in the core fabric
Lots of "moderate" level MCUs have a "systick timer" that that is pretty close.  (standard on all the ARM Cortex processors, and it's high time we stopped thinking about a CM0 as "high end")

I guess I mean too bad that only 32bit processors....

And thanks for your implementation btw.

Only question I have about your code is what do you use to drive the ICP (input capture pin)? or is that built in on the AVR with an oscillator?

Edit: also I noticed that if you use the 16 bit timer and you had an accurate timer you could use the 16 bit timer as a single channel logic analyzer on either falling or rising edge on certain modes :)

Edit2: nevermind, it would only work on protocols that can be determined by looking only at rising edges or falling edges but not both :(
« Last Edit: March 26, 2014, 05:38:52 am by miguelvp »
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4382
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #37 on: March 26, 2014, 06:21:26 am »
Quote
what do you use to drive the ICP
It's not using the input capture feature at all.  Do you mean the clock in general?  The AVR (and for that matter, most microcontroller timers) has a variety of possible clock sources, many of which are internal.  In this case, the
  TCCR1B = _BV(CS12) | _BV(CS10); // div 1024
statement sets the clock source (CS means Clock Select, I think) as the system clock after it's passed though a /1024 prescaler...
Except for the "systick" and "performance" counters, microcontroller timers tend to be pretty complicated beasts with lots of different operational modes.  (Putting the 'simple' systick counter inside the CPU definition of the Cortex ARM processors, instead of leaving the function "outside" in a vendor-defined timer module is relatively brilliant...)
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #38 on: March 26, 2014, 07:07:38 am »
Thanks for the clarification, I thought ICP1 drove the Timer/Counter1 that you were referring to.

from the datasheet:
Quote
• ICP1 – Port D, Bit 6
ICP1 – Input Capture Pin: The PD6 pin can act as an Input Capture pin for Timer/Counter1.

But I missed the CS11 from the Clock Select Bit that uses the external T1 Pin (Port B, Bit 1) but that's a different pin as ICP1 (Port B, Bit 6)

(page 113 of the data sheet)

For whatever reason while reading the datasheet it seemed the 16 bit timer you were referring to was driven by that chip pin (ICP1) and overridable via T1.

http://www.atmel.com/Images/doc2466.pdf

I guess I'm not sure if the prescaler is driven by Port D, bit 6? or Port B, bit 1?

Sorry I'm a bit confused. but if the prescaler is driven by the ICP1 pin (PD6), maybe your board has it pre-configured to some external oscillator. Or is it internal to the chip? I guess I could read through the whole thing, might be educational anyways.

Thanks.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4382
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #39 on: March 26, 2014, 09:53:33 am »
The prescaler is driven by clkio, which is an internal signal:
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #40 on: March 26, 2014, 01:25:44 pm »
Awesome, that picture makes it more clear.
T0 and T1 are used to sync clocks but not needed.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4382
  • Country: us
Re: A more efficient blink for AVR processors
« Reply #41 on: March 26, 2014, 05:26:29 pm »
the synchronizers for T0/T1 come before the clock selector.
I read it as "timers clock source is one of (nothing), (sync'ed T0), (sync'ed T1), (Clkio), or one of four prescaled Clkio values.  8 possible sources, total.)"
T1/T0 are not needed, and CLKio is always present an internally generated.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf