Author Topic: Even faster Arduino PORT manipulation  (Read 8678 times)

0 Members and 1 Guest are viewing this topic.

Offline SimonR

  • Regular Contributor
  • *
  • Posts: 122
  • Country: gb
Re: Even faster Arduino PORT manipulation
« Reply #25 on: August 09, 2016, 01:37:33 pm »
@SimonR
What kind of compiler are you using? I can't get my Arduino compiler to accept the code you've given me, it

I'm not using any compiler, its just generic C code. I don't know the ATmega at all so I don't know exactly how your compiler deals with the allocation of storage space. But something it is doing may be causing the problem.

You appear to be defining an array (integer[]) and then using its refernce as a pointer in your loop which will definitely cause problems. You need create a pointer and point it at your array. which is whay my example does this.

Code: [Select]
unsigned int *buff_ptr = integer;   
You then use buff_ptr in the loop and not integer.

If you define integer as const it should put it into flash. Hopefully the compiler will let you point to it and the example code will work. Someone else may be able to confirm this.


 

Offline Kilrah

  • Supporter
  • ****
  • Posts: 1852
  • Country: ch
Re: Even faster Arduino PORT manipulation
« Reply #26 on: August 09, 2016, 02:30:07 pm »
I would use PROGMEM and store it in the flash...hoping it doesn't take too much to read it

It does! Fetching data from flash vs RAM will likely have more impact than all the improvements that are discussed here... might want to have a close look at the doc before going further if you're already tight, the 8-bit AVR might simply not cut it for what you want to do.

Or do it in assembly, done right the difference is not huge there.
« Last Edit: August 09, 2016, 02:34:52 pm by Kilrah »
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Even faster Arduino PORT manipulation
« Reply #27 on: August 09, 2016, 05:04:20 pm »
When I wrote my pieces I was focused on indexing the data via a pointer or via an array.

Then I hinted at the use of smaller data types to speed up the execution. But didn't implement it.

If I were to implement it, I would use two loops, both indexed via an 8bit type. You can unroll the inner loop for faster execution.

I am not completely convinced the use of a pointer to the end of the buffer will speed up execution. Pointers on the avr are multi byte types and a comparison will likely take four or more ticks.

I think you will have to run the code to be sure.
================================
https://dannyelectronics.wordpress.com/
 

Offline bktemp

  • Super Contributor
  • ***
  • Posts: 1616
  • Country: de
Re: Even faster Arduino PORT manipulation
« Reply #28 on: August 09, 2016, 05:35:09 pm »
Using assembler it takes 7 cycles to read data from flash and output it to a port without unrolling the loop. The end of the array must be located on a 256byte boundary.
Code: [Select]
loop:
lpm val, Z+  ; 3
out PORTD, val ; 1
cpi ZH, endofarray ; 1
brne loop ; 2

If I had to implement that, I would use a small STM32F or a PIC32 with DMA. Both should be able to write data at >10MHz to the IO ports.
 
The following users thanked this post: Leopoldo, Kilrah

Offline JPortici

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: Even faster Arduino PORT manipulation
« Reply #29 on: August 09, 2016, 06:26:36 pm »
you want fast? use assembly.
if the compiler isn't too smart or doesn't optimize well writing directly to register won't do it.

I know very little about ATMEL architecture but on pics, from enhanced midrange you can access both ram and linear data memory and even program memory (the lower word, so the 8/16/32 bit constant you want to fetch). You also have instruction with pre and post increment of the address pointer
so for example, on a pic16
Code: [Select]
I have already loaded FRS0H and FSR0L with the base address.
PORTB is all outputs.
The table is 200 elements for simplicity
Register _counter has already been loaded with 199

loop:
MOVIW  FSR0++  ; Load the array data into the accumulator and increment address
MOVWF  LATB    ;
DECFSZ _counter;
GOTO   loop    ;
four instructions, five clock cycles per loop (goto takes to clocks).
obviously, if the array is bigger than 256 the situation will be much worse, having to decrement and check a 16 bit number with an 8 bit mcu

on a dspic it is even easier because you can use any one of the accumulators and 16 bit arithmetics
Code: [Select]
W0 is the pointer, W1 stores the data from memory.

loop:
DO     loop_end,#9999; will do the loop 10000 times. actually, any 14 bit number + 1 times so max 16384 times. number can also be an accumulator.
MOV   [W0++],W1
loop_end:
MOV   W1,LATB
two instructions. with no overhead*. neat, huh? only prerequisite, the address pointer MUST BE an even number or an address error trap will be generated.
An address error trap will also be generated if the instruction tries to fetch data from an unimplemented location.
working with bytes over words is only a matter of using the .B suffix in the instruction, like so
Code: [Select]
MOV.B [W0++],W1in that case W0 can also be an odd number. only the lower half of the register will be modified.
of course, if the number of repetitions is greater than 16k or there are chances you can exit the loop at any moment you can and should use the check for condition method.

I am sure you can conjure something simillar with your mcu of choice

*in a dsPIC33E if the first instruction will fetch data from anywhere else than the SFR area, it will take two clock cycles instead of one
 

Offline SimonR

  • Regular Contributor
  • *
  • Posts: 122
  • Country: gb
Re: Even faster Arduino PORT manipulation
« Reply #30 on: August 09, 2016, 07:56:08 pm »
I am not completely convinced the use of a pointer to the end of the buffer will speed up execution. Pointers on the avr are multi byte types and a comparison will likely take four or more ticks.

I think you will have to run the code to be sure.

Its a fair point. If you are working with a CPU smaller than 16bits, or even some of the 16 bit ones then you really do need a very good grasp of how the architecture works and an even better idea of how your compiler makes use of it. You usually end up with some pretty strange solutions when you need to optimaize this much.

As I said I'm not familiar with the atmel parts so I don't know what the overhead of using pointers is, or how it compares with an array and index. In my experience the pointer is nearly always faster but maybe there is a clever trick with an index register that make the array a better choice.

The point I was making is that if you can do a calculation once outside the loop to eliminate one inside then you should do it.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Even faster Arduino PORT manipulation
« Reply #31 on: August 11, 2016, 12:16:20 am »
i'm happy to report that incrementing via pointers vs. an index can be as much as 40% faster @ -O1, but 20% slower @ -O0.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Even faster Arduino PORT manipulation
« Reply #32 on: August 11, 2016, 01:33:29 am »
unrolling the inner most loop will speed up the execution greatly: when I decompose 400 increments to 2 * (200 unrolled output), I doubled the execution speed.
================================
https://dannyelectronics.wordpress.com/
 

Offline SimonR

  • Regular Contributor
  • *
  • Posts: 122
  • Country: gb
Re: Even faster Arduino PORT manipulation
« Reply #33 on: August 11, 2016, 01:59:08 pm »
Interesting stuff DannyF. It just goes to show that you really do have to experiment with the compiler until you get the best out of it.

I recently had to implement the DMA signals REQ and ACK using GPIO pins. I did think that I would have to use assembly language coupled directly to the interupt pin but after trying a few diferent variations of code with different optimisation settings I did it all in C using the library interupt handler and GPIO pins run as fast as they can go.
 

Offline Ammar

  • Regular Contributor
  • *
  • Posts: 154
  • Country: au
Re: Even faster Arduino PORT manipulation
« Reply #34 on: August 19, 2016, 08:09:46 am »
I learned to avoid the arduino environment for any timing specific applications. Works great for everything else and is super easy and quick, but as soon as you care about how many clock cycles something takes, stay away. Have you tried something like avr studio?
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Even faster Arduino PORT manipulation
« Reply #35 on: August 19, 2016, 02:07:31 pm »
Quote
I learned to avoid the arduino environment for any timing specific applications.

It has nothing wrong with the arduino environment - it is basically gcc-avr and you can use it for timing specific applications just as you would with gcc-avr/Arduino Studio.
================================
https://dannyelectronics.wordpress.com/
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf