Author Topic: Even faster Arduino PORT manipulation  (Read 8680 times)

0 Members and 1 Guest are viewing this topic.

Offline LeopoldoTopic starter

  • Contributor
  • Posts: 44
Even faster Arduino PORT manipulation
« on: August 08, 2016, 09:04:41 pm »
Hi there!
I would like to know if it's possible to reduce the time needed for the ATMega2560 while performing this very simple loop:

void loop(){
  while(1){
    PORTC = integer[i++];
    if(i>=2)  i=0;
    if(PINE & 0b00100000)  goto out;
  }

  out:
  //whatever
}

 

Offline Signal32

  • Frequent Contributor
  • **
  • Posts: 251
  • Country: us
Re: Even faster Arduino PORT manipulation
« Reply #1 on: August 08, 2016, 09:07:49 pm »
Code: [Select]
void loop(){
  while(1){
    PORTC = integer[0];
    if(PINE & 0b00100000)  goto out;
    PORTC = integer[1];
    if(PINE & 0b00100000)  goto out;
    PORTC = integer[2];
    if(PINE & 0b00100000)  goto out;
  }
  out:
  //whatever
}

Better still, replace the integer[ x ] with a constant if they are constant.
« Last Edit: August 08, 2016, 09:20:11 pm by Signal32 »
 

Offline LeopoldoTopic starter

  • Contributor
  • Posts: 44
Re: Even faster Arduino PORT manipulation
« Reply #2 on: August 08, 2016, 09:22:27 pm »
I am very sorry, my computer just screwed everything up before I could write all the post; as I was saying, the problem with this sketch is that, as you pointed out, PORT = integer is very slow, much slower than just having PORT = constant. And thus I would like to know if there was a way to maintain the functionality (in the final project I will have more than 20'000 integers to send, one after the other, on the PORT, so I can't do it manually as you wrote; also, that's why there are very few statements in the while(1) loop: it needs to run fast, the loop dictates the speed )....maybe some inline assembly? But I don't know anything about it.
Also, I know that goto is not "elegant", but I think it's the fastest way I could find to break the loop.
Thank you!
 

Offline Signal32

  • Frequent Contributor
  • **
  • Posts: 251
  • Country: us
Re: Even faster Arduino PORT manipulation
« Reply #3 on: August 08, 2016, 09:28:34 pm »
Can you post more of the code or even the exact project that you are working out ?
 

Offline LeopoldoTopic starter

  • Contributor
  • Posts: 44
Re: Even faster Arduino PORT manipulation
« Reply #4 on: August 08, 2016, 09:34:29 pm »
unsigned int integer[2] =
  {0b0000111100001111,
   0b0000111111110000, 
  };
 

byte i=0;

void setup(){
  DDRC = 255;  // set PORTC to output
 
  pinMode(3, INPUT_PULLUP);
}
void loop(){
  while(1){
    PORTC = integer[i++];
    if(i>=2)  i=0;
    if(PINE & 0b00100000)  goto out;
  }
  out:
  delay(1000);
 
}
 

Offline LeopoldoTopic starter

  • Contributor
  • Posts: 44
Re: Even faster Arduino PORT manipulation
« Reply #5 on: August 08, 2016, 09:40:02 pm »
it is quite a simple sketch, I decided to use the while(1) loop even though I would have preferred the interrupt, but it was something like 4 times slower and for me, in this case, speed is crucial. Also, the void loop function is somewhat avoided in the most important part as it adds many, MANY additional clock cycles that have no reason to exist.
 Thank you.
 

Offline Signal32

  • Frequent Contributor
  • **
  • Posts: 251
  • Country: us
Re: Even faster Arduino PORT manipulation
« Reply #6 on: August 08, 2016, 09:44:23 pm »
And in the final version, the integer array will be much larger ?
Also why do you have an array of 16 bit values, but only using the lower 8 bits ?
 

Offline LeopoldoTopic starter

  • Contributor
  • Posts: 44
Re: Even faster Arduino PORT manipulation
« Reply #7 on: August 08, 2016, 09:48:58 pm »
The final array will have something like 20'000 variables; the fact that it is an integer array is because I originally intended to use 2 ports as outputs, thus splitting each variable in half through bitshift, but in the final version I'll probably use just single bytes and one port; as always, just to gain more speed.
 

Offline Signal32

  • Frequent Contributor
  • **
  • Posts: 251
  • Country: us
Re: Even faster Arduino PORT manipulation
« Reply #8 on: August 08, 2016, 09:58:09 pm »
What speed are you seeing now ? What speed to you want to get to ?
In your final version what you will want to do is have an unrolled loop


PORTC = ptr[0]
if(PINE & 0b00100000)  goto out;
PORTC = ptr[1]
if(PINE & 0b00100000)  goto out;
PORTC = ptr[2]
if(PINE & 0b00100000)  goto out;
....
PORTC = ptr[99]
if(PINE & 0b00100000)  goto out;


Say you output 20,010 values, loop around that (1...99 unrolled loop) 20 times then output the final ones 1 at a time.
How was the interrupt version slower, does the interrupt trigger often ?
Perhaps someone with better knowledge of the chip's peripherals can suggest a better solution.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Even faster Arduino PORT manipulation
« Reply #9 on: August 08, 2016, 10:15:02 pm »
Not sure what the code is trying to do. But unrolling it would help.

If you provide more context for the code, better implementation is possible.
================================
https://dannyelectronics.wordpress.com/
 

Offline LeopoldoTopic starter

  • Contributor
  • Posts: 44
Re: Even faster Arduino PORT manipulation
« Reply #10 on: August 08, 2016, 10:23:27 pm »
Unrolling the loop might help, but I've checked with an oscilloscope and the loop only requires 1 extra clock cycle, and I could deal with that; but going from constant to variable makes a huge difference, something like 20-30 times slower in the second case.

I actually don't understand how the interrupt happens to be slower than a normal loop; even if it's set to fire on every clock cycle (being beyond its capabilities, though), it never lets me change the port faster than roughly 35 clock cycles; instead, setting PORT = constant inside a loop requires no more than 1-2 clocks. But for now all I want to know is if there is a way to set the PORT = variable faster than what I've done using some sort of trick in the code...I actually don't care about other aspects of the code, as long as that section runs as fast as possible.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Even faster Arduino PORT manipulation
« Reply #11 on: August 08, 2016, 10:40:39 pm »
For that, I would use an incrementing pointer, like outport = *datptr++;

You will need to keep track of the count so to reset the pointer.

If the process supports DMA, it would be even better.

Interrupt will be slower sue to its latency.
================================
https://dannyelectronics.wordpress.com/
 
The following users thanked this post: Leopoldo

Offline LeopoldoTopic starter

  • Contributor
  • Posts: 44
Re: Even faster Arduino PORT manipulation
« Reply #12 on: August 08, 2016, 11:13:14 pm »
unfortunately, the ATMega2560 that I'm using does not support DMA. Tell me about that pointer thing: do you mean that I actually set the port directly with what is in that memory address? would that be faster than reading the array the usual way?
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Even faster Arduino PORT manipulation
« Reply #13 on: August 09, 2016, 01:33:40 am »
the following code:

Code: [Select]
buff_ptr = buff; //reset the buffer
for (index=0; index<400; index++) //loop the data to OUT_PORT
OUT_PORT = *buff_ptr++;

on a 1MIPS attiny85:

no optimization: 40us per byte of data transmitted
-O1: 7.95us per byte of data transmitted.

In comparison, this piece of code:
Code: [Select]
for (index=0; index<400; index++)
OUT_PORT = buff_ptr[index]; //loop the data to OUT_PORT

runs 11.5us per byte of data transmitted @ -O1, or 50% slower.

If you look at the disassembly, the most time is spent on incrementing the index. So that should be your focus.
================================
https://dannyelectronics.wordpress.com/
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 3240
  • Country: gb
Re: Even faster Arduino PORT manipulation
« Reply #14 on: August 09, 2016, 08:26:33 am »
Also, I know that goto is not "elegant", but I think it's the fastest way I could find to break the loop.

C provides the 'break' keyword to exit loops.  I doubt it will be any faster than using goto, but it would be the preferred choice in this case.
 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Even faster Arduino PORT manipulation
« Reply #15 on: August 09, 2016, 08:37:52 am »
If you're trying to really optimise you should be looking at the disassembled output code to see what the compiler is generating - a few lines of inline assembler can sometimes make a huge difference.
If those bit tests aren't generating SBIS/SBIC instructions then there's definitely scope to improve.
You may also want to see if using pin-change interrupts could be a better solution.

Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 
The following users thanked this post: amyk

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: Even faster Arduino PORT manipulation
« Reply #16 on: August 09, 2016, 10:02:43 am »
Quote
PORT = integer is very slow, much slower than just having PORT = constant.
Um, that's NOT true.
A single bit change with a constant bitmask is much faster than a variable bit, because it becomes a single SBI or SBI instruction, rather than having to read the port, and or or the appropriate bitmask, and write it back.
But writing the whole port at once is going to be about the same regardless of whether you use a constant or a variable.  In both cases, the code generated should be a simple OUT instruction (actually: faster than an SBI/CBI!), with the difference being whether you load a constant into a register beforehand, or fetch a variable from somewhere beforehand...
So it'll be more important to optimize the rest of the code.  And like Mike said, you should be looking at the generated code.

One possible speedup: if the size of your array is actually always 2, you should be able to do:
Code: [Select]
   PORTC = integer[i];
    i ^= 1;  // xor to compliment LSB

 

Offline SimonR

  • Regular Contributor
  • *
  • Posts: 122
  • Country: gb
Re: Even faster Arduino PORT manipulation
« Reply #17 on: August 09, 2016, 10:38:25 am »
If you're trying to really optimise you should be looking at the disassembled output code to see what the compiler is generating

I agree. For time critical stuff its always worth looking at the dissasembly. You may find that that in your application a do loop is faster than a while loop for instance. But if you use optimisation it might be the otherway round so you have to check all 4 configurations.
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8275
Re: Even faster Arduino PORT manipulation
« Reply #18 on: August 09, 2016, 10:48:52 am »
If you want the fastest possible code then use pure Asm. Checking for interrupts is basically free, so use them to your advantage.
 
The following users thanked this post: Kilrah

Offline SimonR

  • Regular Contributor
  • *
  • Posts: 122
  • Country: gb
Re: Even faster Arduino PORT manipulation
« Reply #19 on: August 09, 2016, 10:57:29 am »
the following code:

Code: [Select]
buff_ptr = buff; //reset the buffer
for (index=0; index<400; index++) //loop the data to OUT_PORT
OUT_PORT = *buff_ptr++;



Code: [Select]
buff_ptr = buff;                        //reset the buffer
    for (index=0; index<400; index++)   //loop the data to OUT_PORT
        OUT_PORT = *buff_ptr++;

Good impovement but this solution has 2 increments and 1 compare per loop as does the original solution.
Better to do this, which has 1 increment and 1 compare per loop.

Code: [Select]
#define SIZE-OF_DATA (400)
unsigned int *buff_ptr = integer;           // this is the same as buff_ptr = &integer[0]; if you are new to pointers
unsigned int *buff_ptr_end = integer + SIZE-OF_DATA;
    do{
        OUT_PORT = *buff_ptr++;
    } while(buff_ptr < buff_ptr_end);

If you unroll the loop

Code: [Select]
#define SIZE-OF_DATA (400)
unsigned int *buff_ptr = integer;           // this is the same as buff_ptr = &integer[0]; if you are new to pointers
unsigned int *buff_ptr_end = integer + SIZE-OF_DATA;
    do{
        OUT_PORT = *buff_ptr++;
        OUT_PORT = *buff_ptr++;
        OUT_PORT = *buff_ptr++;
        OUT_PORT = *buff_ptr++;
    } while(buff_ptr < buff_ptr_end);

now you only have a 1/4 of a compare per loop

Note if your data table is not an exact multiple 4 then you may have to tidy up at the end of the loop
« Last Edit: August 09, 2016, 10:59:26 am by SimonR »
 

Offline LeopoldoTopic starter

  • Contributor
  • Posts: 44
Re: Even faster Arduino PORT manipulation
« Reply #20 on: August 09, 2016, 12:51:28 pm »
@SimonR
What kind of compiler are you using? I can't get my Arduino compiler to accept the code you've given me, it says:
"Error: lvalue required as increment operand"
on this line:  PORTC = *integer++;
with integer defined as an array.

@westfw
I tried it out and you're right, it only takes a lot of time when the variable is an array, it takes the same time as a contant when a single variable is set.
Still, I can't use the code you've suggested me, as in the final design I will have more than 20'000 variables in the array.
 

Offline danieleff

  • Newbie
  • Posts: 2
  • Country: hu
Re: Even faster Arduino PORT manipulation
« Reply #21 on: August 09, 2016, 01:07:35 pm »
@SimonR
Still, I can't use the code you've suggested me, as in the final design I will have more than 20'000 variables in the array.
But the ATMega2560 only has 8KBytes of memory.
 

Offline LeopoldoTopic starter

  • Contributor
  • Posts: 44
Re: Even faster Arduino PORT manipulation
« Reply #22 on: August 09, 2016, 01:19:19 pm »
I would use PROGMEM and store it in the flash...hoping it doesn't take too much to read it
 

Offline bobaruni

  • Regular Contributor
  • *
  • Posts: 156
  • Country: au
Re: Even faster Arduino PORT manipulation
« Reply #23 on: August 09, 2016, 01:28:29 pm »
Try and avoid 16 or 32 bit operations on the 8 bit AVR as there is a time penalty.
If you can break everything down to multiple 8 bit arrays and use only 8 bit operations in the inner most section of the unrolled loop, this will speed things up considerably.


 

Offline LeopoldoTopic starter

  • Contributor
  • Posts: 44
Re: Even faster Arduino PORT manipulation
« Reply #24 on: August 09, 2016, 01:30:54 pm »
I was wondering, could it be possible just to increment the port by a "fixed" number, that can eventually be changed while running the program(so using a variable), and have a high performance? This way I could use an external parallel memory such as an SRAM or EEPROM and read it by assigning a parallel address and obtaining a parallel output. But how do I set 2 ports on the Arduino so that one is used to address the memory from bit 7 to 0 and the other from 15 to 8? (I need to have 16 bit at this point, as they are needed for more than 256 memory positions)

I was trying this, but I got the same speed results as before, when I used the code: PORTC =  integer[i++]

unsigned int pos = 1; // incrementing factor, for the address of the external memory (I might want to increment by 1, 2, 3,...)

while(1){
   PORTC += pos;
 //  PORTA = ??
   
   if(PORTC>=1)  PORTC = 0;   // reset port when it reaches a fixed number, that won't be 1
 }
« Last Edit: August 09, 2016, 02:10:51 pm by Leopoldo »
 

Offline SimonR

  • Regular Contributor
  • *
  • Posts: 122
  • Country: gb
Re: Even faster Arduino PORT manipulation
« Reply #25 on: August 09, 2016, 01:37:33 pm »
@SimonR
What kind of compiler are you using? I can't get my Arduino compiler to accept the code you've given me, it

I'm not using any compiler, its just generic C code. I don't know the ATmega at all so I don't know exactly how your compiler deals with the allocation of storage space. But something it is doing may be causing the problem.

You appear to be defining an array (integer[]) and then using its refernce as a pointer in your loop which will definitely cause problems. You need create a pointer and point it at your array. which is whay my example does this.

Code: [Select]
unsigned int *buff_ptr = integer;   
You then use buff_ptr in the loop and not integer.

If you define integer as const it should put it into flash. Hopefully the compiler will let you point to it and the example code will work. Someone else may be able to confirm this.


 

Offline Kilrah

  • Supporter
  • ****
  • Posts: 1852
  • Country: ch
Re: Even faster Arduino PORT manipulation
« Reply #26 on: August 09, 2016, 02:30:07 pm »
I would use PROGMEM and store it in the flash...hoping it doesn't take too much to read it

It does! Fetching data from flash vs RAM will likely have more impact than all the improvements that are discussed here... might want to have a close look at the doc before going further if you're already tight, the 8-bit AVR might simply not cut it for what you want to do.

Or do it in assembly, done right the difference is not huge there.
« Last Edit: August 09, 2016, 02:34:52 pm by Kilrah »
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Even faster Arduino PORT manipulation
« Reply #27 on: August 09, 2016, 05:04:20 pm »
When I wrote my pieces I was focused on indexing the data via a pointer or via an array.

Then I hinted at the use of smaller data types to speed up the execution. But didn't implement it.

If I were to implement it, I would use two loops, both indexed via an 8bit type. You can unroll the inner loop for faster execution.

I am not completely convinced the use of a pointer to the end of the buffer will speed up execution. Pointers on the avr are multi byte types and a comparison will likely take four or more ticks.

I think you will have to run the code to be sure.
================================
https://dannyelectronics.wordpress.com/
 

Offline bktemp

  • Super Contributor
  • ***
  • Posts: 1616
  • Country: de
Re: Even faster Arduino PORT manipulation
« Reply #28 on: August 09, 2016, 05:35:09 pm »
Using assembler it takes 7 cycles to read data from flash and output it to a port without unrolling the loop. The end of the array must be located on a 256byte boundary.
Code: [Select]
loop:
lpm val, Z+  ; 3
out PORTD, val ; 1
cpi ZH, endofarray ; 1
brne loop ; 2

If I had to implement that, I would use a small STM32F or a PIC32 with DMA. Both should be able to write data at >10MHz to the IO ports.
 
The following users thanked this post: Leopoldo, Kilrah

Offline JPortici

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: Even faster Arduino PORT manipulation
« Reply #29 on: August 09, 2016, 06:26:36 pm »
you want fast? use assembly.
if the compiler isn't too smart or doesn't optimize well writing directly to register won't do it.

I know very little about ATMEL architecture but on pics, from enhanced midrange you can access both ram and linear data memory and even program memory (the lower word, so the 8/16/32 bit constant you want to fetch). You also have instruction with pre and post increment of the address pointer
so for example, on a pic16
Code: [Select]
I have already loaded FRS0H and FSR0L with the base address.
PORTB is all outputs.
The table is 200 elements for simplicity
Register _counter has already been loaded with 199

loop:
MOVIW  FSR0++  ; Load the array data into the accumulator and increment address
MOVWF  LATB    ;
DECFSZ _counter;
GOTO   loop    ;
four instructions, five clock cycles per loop (goto takes to clocks).
obviously, if the array is bigger than 256 the situation will be much worse, having to decrement and check a 16 bit number with an 8 bit mcu

on a dspic it is even easier because you can use any one of the accumulators and 16 bit arithmetics
Code: [Select]
W0 is the pointer, W1 stores the data from memory.

loop:
DO     loop_end,#9999; will do the loop 10000 times. actually, any 14 bit number + 1 times so max 16384 times. number can also be an accumulator.
MOV   [W0++],W1
loop_end:
MOV   W1,LATB
two instructions. with no overhead*. neat, huh? only prerequisite, the address pointer MUST BE an even number or an address error trap will be generated.
An address error trap will also be generated if the instruction tries to fetch data from an unimplemented location.
working with bytes over words is only a matter of using the .B suffix in the instruction, like so
Code: [Select]
MOV.B [W0++],W1in that case W0 can also be an odd number. only the lower half of the register will be modified.
of course, if the number of repetitions is greater than 16k or there are chances you can exit the loop at any moment you can and should use the check for condition method.

I am sure you can conjure something simillar with your mcu of choice

*in a dsPIC33E if the first instruction will fetch data from anywhere else than the SFR area, it will take two clock cycles instead of one
 

Offline SimonR

  • Regular Contributor
  • *
  • Posts: 122
  • Country: gb
Re: Even faster Arduino PORT manipulation
« Reply #30 on: August 09, 2016, 07:56:08 pm »
I am not completely convinced the use of a pointer to the end of the buffer will speed up execution. Pointers on the avr are multi byte types and a comparison will likely take four or more ticks.

I think you will have to run the code to be sure.

Its a fair point. If you are working with a CPU smaller than 16bits, or even some of the 16 bit ones then you really do need a very good grasp of how the architecture works and an even better idea of how your compiler makes use of it. You usually end up with some pretty strange solutions when you need to optimaize this much.

As I said I'm not familiar with the atmel parts so I don't know what the overhead of using pointers is, or how it compares with an array and index. In my experience the pointer is nearly always faster but maybe there is a clever trick with an index register that make the array a better choice.

The point I was making is that if you can do a calculation once outside the loop to eliminate one inside then you should do it.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Even faster Arduino PORT manipulation
« Reply #31 on: August 11, 2016, 12:16:20 am »
i'm happy to report that incrementing via pointers vs. an index can be as much as 40% faster @ -O1, but 20% slower @ -O0.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Even faster Arduino PORT manipulation
« Reply #32 on: August 11, 2016, 01:33:29 am »
unrolling the inner most loop will speed up the execution greatly: when I decompose 400 increments to 2 * (200 unrolled output), I doubled the execution speed.
================================
https://dannyelectronics.wordpress.com/
 

Offline SimonR

  • Regular Contributor
  • *
  • Posts: 122
  • Country: gb
Re: Even faster Arduino PORT manipulation
« Reply #33 on: August 11, 2016, 01:59:08 pm »
Interesting stuff DannyF. It just goes to show that you really do have to experiment with the compiler until you get the best out of it.

I recently had to implement the DMA signals REQ and ACK using GPIO pins. I did think that I would have to use assembly language coupled directly to the interupt pin but after trying a few diferent variations of code with different optimisation settings I did it all in C using the library interupt handler and GPIO pins run as fast as they can go.
 

Offline Ammar

  • Regular Contributor
  • *
  • Posts: 154
  • Country: au
Re: Even faster Arduino PORT manipulation
« Reply #34 on: August 19, 2016, 08:09:46 am »
I learned to avoid the arduino environment for any timing specific applications. Works great for everything else and is super easy and quick, but as soon as you care about how many clock cycles something takes, stay away. Have you tried something like avr studio?
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Even faster Arduino PORT manipulation
« Reply #35 on: August 19, 2016, 02:07:31 pm »
Quote
I learned to avoid the arduino environment for any timing specific applications.

It has nothing wrong with the arduino environment - it is basically gcc-avr and you can use it for timing specific applications just as you would with gcc-avr/Arduino Studio.
================================
https://dannyelectronics.wordpress.com/
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf