Author Topic: Fastest AVR software SPI in the West  (Read 3509 times)

0 Members and 1 Guest are viewing this topic.

Offline ralphdTopic starter

  • Frequent Contributor
  • **
  • Posts: 445
  • Country: ca
    • Nerd Ralph
Fastest AVR software SPI in the West
« on: March 30, 2015, 03:48:03 am »
I disassembled some software SPI code, starting with Adafruit's "fast" spiWrite (@ 17/18 cycles per bit) from their Arduino Nokia LCD library.  I wrote an optimized C SPI function that takes just 8 cycles per bit, and show how to shave 1 cycle per bit off that in assembler by using the carry flag like a 9th bit.  I finished with writing the fastest possible software bit-banged SPI AVR assembler routine, taking only 4 cycles per bit unrolled.

http://nerdralph.blogspot.ca/2015/03/fastest-avr-software-spi-in-west.html
Unthinking respect for authority is the greatest enemy of truth. Einstein
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: Fastest AVR software SPI in the West
« Reply #1 on: March 30, 2015, 06:02:02 am »
This is fascinating.   The C compiler is ADDING A LOOP COUNTER, even though there was none in the source code, and none is needed!  And a 16-bit loop counter at that!  I wonder WTF it's thinking?  (Also, let this be a warning against "optimizing" your C code without carefully inspecting the results!)

(It does the same thing if you convert the for loop to a do-while and try to retain the (shifted result == 0) conditional...)

Code: [Select]
void spiWrite(uint8_t data)
{
 uint8_t bit;
 for(bit = 0x80; bit; bit >>= 1) {
  SPIPORT &= ~clkpinmask;
  if(data & bit) SPIPORT |= mosipinmask;
  else SPIPORT &= ~mosipinmask;
  SPIPORT |= clkpinmask;
 }
}

00000000 <spiWrite>:
   0:   28 e0           ldi     r18, 0x08       ; 8   Created loop counter in r19:r18
   2:   30 e0           ldi     r19, 0x00       ; 0
   4:   90 e8           ldi     r25, 0x80       ; 128
   6:   2d 98           cbi     0x05, 5 ; 5
   8:   49 2f           mov     r20, r25
   a:   48 23           and     r20, r24
   c:   11 f0           breq    .+4             ; 0x12
   e:   2c 9a           sbi     0x05, 4 ; 5
  10:   01 c0           rjmp    .+2             ; 0x14
  12:   2c 98           cbi     0x05, 4 ; 5
  14:   2d 9a           sbi     0x05, 5 ; 5
  16:   96 95           lsr     r25
  18:   21 50           subi    r18, 0x01       ; 1   decrement the created loop counter (16bits!)
  1a:   31 09           sbc     r19, r1
  1c:   21 15           cp      r18, r1         ; compare loop counter with zero.
  1e:   31 05           cpc     r19, r1
  20:   91 f7           brne    .-28            ; 0x6
  22:   08 95           ret
« Last Edit: March 30, 2015, 06:07:24 am by westfw »
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 3240
  • Country: gb
Re: Fastest AVR software SPI in the West
« Reply #2 on: March 30, 2015, 08:23:12 am »
I disassembled some software SPI code, starting with Adafruit's "fast" spiWrite (@ 17/18 cycles per bit) from their Arduino Nokia LCD library.  I wrote an optimized C SPI function that takes just 8 cycles per bit, and show how to shave 1 cycle per bit off that in assembler by using the carry flag like a 9th bit.  I finished with writing the fastest possible software bit-banged SPI AVR assembler routine, taking only 4 cycles per bit unrolled.

http://nerdralph.blogspot.ca/2015/03/fastest-avr-software-spi-in-west.html

That only covers one combination of clock polarity and phase, so you'd need another three versions to cover all peripherals.

This is fascinating.   The C compiler is ADDING A LOOP COUNTER, even though there was none in the source code, and none is needed!  And a 16-bit loop counter at that!  I wonder WTF it's thinking?  (Also, let this be a warning against "optimizing" your C code without carefully inspecting the results!)

Look up "integer promotion".  A decent optimiser should remove this, what did you have the optimiser set to?
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: Fastest AVR software SPI in the West
« Reply #3 on: March 30, 2015, 09:17:52 am »
I fail to see how creating a counter out of nothing qualifies as "integer promotion."  (although there are some other inefficiencies in there that I might blame on that.)   For example, explicitly adding an 8-bit counter ends up shortening the code:
Code: [Select]
void spiWrite3(uint8_t data)
{
 uint8_t bit=0x80, i;
 for(i=8; i > 0; i++) {
  SPIPORT &= (uint8_t)~clkpinmask;
  if(data & bit) SPIPORT |= (uint8_t)mosipinmask;
  else SPIPORT &= (uint8_t)~mosipinmask;
  SPIPORT |= clkpinmask;
  bit >>= 1;
 }
}
Arduino normally compiles with -Os.  The same code is produced with -O2, and -O3 unrolls the loop.
« Last Edit: March 30, 2015, 09:24:28 am by westfw »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf