Author Topic: The price of (STM32) Arduino  (Read 8941 times)

0 Members and 1 Guest are viewing this topic.

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
The price of (STM32) Arduino
« on: April 30, 2016, 10:54:23 pm »
I have been playing around a STM32F103C8 board over the last couple weeks: https://www.eevblog.com/forum/microcontrollers/getting-started-on-stm32f103-coide-vs-keil/

This is a little inexpensive but versatile board. Roger Clark / maple has a set of Arduino routines running on those boards, making them a killer board: you can get those suckers on ebay for a song.

Anyway, I decided to see how fast I can flip a pin on those boards using the Arduino port, vs. direct register access: https://dannyelectronics.wordpress.com/2016/04/30/the-price-of-stm32-arduino/

The gist of the story is that the Arduino port routines are about 5x - 20x slower than going to the port registers directly. The 20x penalty is a lot but considerably smaller than the 50x - 100x penalty that I often hear people talking about on the AVR Arduinos.

To me, that is a fairly good trade-off.
================================
https://dannyelectronics.wordpress.com/
 

Offline Skimask

  • Super Contributor
  • ***
  • Posts: 1433
  • Country: us
Re: The price of (STM32) Arduino
« Reply #1 on: April 30, 2016, 11:04:12 pm »
I've never seen your site.  Don't care what others think.  I dig it.
I didn't take it apart.
I turned it on.

The only stupid question is, well, most of them...

Save a fuse...Blow an electrician.
 

Offline obiwanjacobi

  • Super Contributor
  • ***
  • Posts: 1013
  • Country: nl
  • What's this yippee-yayoh pin you talk about!?
    • Marctronix Blog
Re: The price of (STM32) Arduino
« Reply #2 on: May 01, 2016, 09:18:31 am »
digitalWrite performs a pin lookup. I believe there are methods to perform this pin lookup once and then use the resulting mcu pin register in the repeating code (saw this being used in the adafruit gfx libs)...

Could be a realistic scenario if you wan't to be somewhere in between full Arduino and raw registers...

[2c]
Arduino Template Library | Zalt Z80 Computer
Wrong code should not compile!
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: The price of (STM32) Arduino
« Reply #3 on: May 01, 2016, 11:46:12 am »
Quote
the 50x - 100x penalty

I re-tested it on an arduino mini. the arduino penalty on an avr ranges from 26x (using the "^" operator) to 72x (using PINx register): https://dannyelectronics.wordpress.com/2016/05/01/the-price-of-avr-arduino/

================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: The price of (STM32) Arduino
« Reply #4 on: May 01, 2016, 11:48:03 am »
Quote
I dig it.

glad that you enjoyed it.

Quote
I believe there are methods

There are fast / faster arduino routines but I have yet to try / benchmark one.

So far, it seems to confirm that the avr arduino routines are exceedingly slow.
================================
https://dannyelectronics.wordpress.com/
 

Offline Mechatrommer

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
Re: The price of (STM32) Arduino
« Reply #5 on: May 01, 2016, 05:03:35 pm »
Quote
I believe there are methods
There are fast / faster arduino routines but I have yet to try / benchmark one.
So far, it seems to confirm that the avr arduino routines are exceedingly slow.
why dont you try and post in your site, method on programming arduino the board using AVR Studio? and send the generated hex using avrdude in command line?. arduino the IDE, after the very inefficient compilation, will send its generated *.hex to arduino the board using this avrdude. a simple middle man hack will reveal what arguments passed by arduino the ide to the avrdude, from there we can make a simple automated tool sw. i used this method (avr studio 4 -> hex -> avrdude -> arduino the board) to generate assembly class performance without the dedicated avrisp mk2 programmer. i havent try to hack how arduino the ide send command to stm arm arduino board, albeit i bought one but havent got a time to test it. i believe one may study how to program the stm in coide, keil coocox etc, compile to hex? and upload it by avrdude without the need for the expensive keil/j-link programmer, i dont know. btw good job on your site its in my notification list now, since this forum has no "thread subscription" feature, maybe you can put google ads to generate income in there ;). fwiw ymmv.
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: The price of (STM32) Arduino
« Reply #6 on: May 01, 2016, 07:50:14 pm »
Quote
why dont you try and post in your site, method on programming arduino the board using AVR Studio?

I'm not a user of Atmel Studio - I tried AVR Studio when it was introduced and was put off by the side of Atmel Studio that succeeded AVR Studio.

I program AVRs mostly in Code::Blocks and IAR-AVR. IAR-AVR is rarely used by hobbyists and Code::Blocks, a superb IDE in my view, wasn't as wide used as it should have been. However, i did put together a "getting started" series for Code::Blocks here: https://dannyelectronics.wordpress.com/2016/04/24/getting-started-on-avr-codeblocks/

It is pretty straight forward to use winavr, and that particular post contains an extension to allow you to use Atmel toolchains within CB, making it practically identical to AVR Studio, aside from the lack of debugging capabilities.

Hope it helps.
================================
https://dannyelectronics.wordpress.com/
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4392
  • Country: us
Re: The price of (STM32) Arduino
« Reply #7 on: May 01, 2016, 09:24:41 pm »
Quote
Arduino routines: 49.0Khz;[/size]Using “|” and “&” operators: 1,875Khz; 38x faster than the AVR Arduino routines
[/size]People tend to use this number to show how badly the arduino code is written, but that's not really fair, since the arduino digitalWrite() function preforms a distinctly different function (given the AVR architecture) than the "fast" example.  The AVR has "set/clear bit in IORegister" single instructions that can be used to change a pin state in 2 cycles.  But the port, pin, and new value are built into the actual instruction, and therefore have to be known at compile time.  DigitalWrite() allows the pin and the value to be variable at run-time, and if you try to implement THAT capability your code will get much slower.  Various people have looked at the arduino AVR code and said: "you might be able to get it to go SLIGHTLY faster, but given what it does, it's not going to be very fast unless you specifically optimize for constants" (which is what the "fast" digitalWrite() operations do.)
[/size]
Quote
Using STM32 Arduino routines: 833.4Khz;the Arduino port routines are about 5x - 20x slower[/size]

  • [/size]The ARM chips, on the other hand, do NOT have special purpose IO instructions, so there won't be as much difference between code examples that use constants and code that uses variable values.  The arduino code still does pin-mapping and PWM-mitigation, so it will be slower, but it should never be AS MUCH slower as on an AVR (as your testing supports!)
  • [/size]I've been a little disappointed in the ARM implementations of digitalWrite.  the STM32 code number looks pretty good (38x the performance at 9x the clock rate, compared to the AVR version), but IIRC the official Arduino Due and Zero code doesn't do nearly as well - they fail to effectively leverage the ARM's 32bit-ness.  Sigh.  (and then Due at least goes through an extra layer of Atmel ASF.)
  • [/size]
 

Offline uncle_bob

  • Supporter
  • ****
  • Posts: 2441
  • Country: us
Re: The price of (STM32) Arduino
« Reply #8 on: May 01, 2016, 09:36:57 pm »
Hi

Bit bang speed on most ARM implementations is ghastly slow compared to the rest of the things they do. For those who do a lot of pin change / bit bang stuff it is a weakness in the basic design. There are a few vendors who have tried to "fix" this. None of them that I have seen put the fix into all of their parts. There also is no general consensus on how to do it. Thus you get a bunch of tweaks if you move code between chips.

Bob
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4392
  • Country: us
Re: The price of (STM32) Arduino
« Reply #9 on: May 01, 2016, 09:48:47 pm »
The ARM instruction set doesn't allow for special purpose IO instructions.  So the minimum code for twiddling an IO bit is something like:
Code: [Select]
  load R1, <address of IO port>   ;1 16bit instruction, 1 32bit dataword
  load r2, <bitmask>                    ; probably 32bit inst
  store r2, specialreg(R1)             ; write bitmask to special set/clear/toggle register
                                                    ; (if one exists.) (or use bit-banding)
[size=78%]at about 96 bits, that's a lot compared to the single 16bit instruction an AVR could use (and it gets worse if you have a dumber IO port HW.)[/size]
[/size][size=78%]However, note that some of those values can STAY in registers once you've gone to the expense of loading them.  (no help at all for digitalWrite(), but significant for something like bitbang_SPI_OUT()...)[/size]
[/size]
 

Offline uncle_bob

  • Supporter
  • ****
  • Posts: 2441
  • Country: us
Re: The price of (STM32) Arduino
« Reply #10 on: May 01, 2016, 10:23:04 pm »
Hi

The ARM architecture gets really fancy really fast in terms of CPU clocks, RAM clocks, flash clocks, cache and the like. Even with relatively low end parts there is a lot more going on to speed things up than on a simple 8 bit or 16 bit MCU. That makes translating code size or instruction counts into "speed" a task with a whole bunch of qualifiers on it. Without writing optimized code for a given real task on both machines ... who knows what you are actually comparing.

That said, no the ARM is not likely to win the bit bang per MHz contest.

Bob
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9985
  • Country: us
Re: The price of (STM32) Arduino
« Reply #11 on: May 02, 2016, 12:47:57 am »
Several years back, there was a lot of forum traffic discussing how slow bit-banging was on the NXP LPC2106 (ARM 7).  It had nothing to do with the instruction set but everything to do with the bus interface.  So, when the LPC2148 came along, there was a Fast I/O arrangement that gave pin access to the core.

You can see the location of the Fast General Purpose IO  (upper left)and General Purpose IO (lower left) on page 7 here.  To get any speed in bit-banging, they had to avoid crossing two bridges.

http://www.nxp.com/documents/user_manual/UM10139.pdf

For its intended purpose, digitalWrite works fine.  The Arduino was originally targeted at non-geeks who just might want to do a little animatronics.  Of course, it soon became a go-to board for just about everybody.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4392
  • Country: us
Re: The price of (STM32) Arduino
« Reply #12 on: May 02, 2016, 05:10:09 am »
Quote
It had nothing to do with the instruction set but everything to do with the bus interface.
I wouldn't say it had "nothing" to do with the instruction set; but older chips did put their GPIO on a slow bus, so there were additional cycles IN ADDITION to the long instruction sequence.   A "bad" GPIO bit set looks like:
Code: [Select]
  load R1, <address of IO port>   ;1 16bit instruction, 1 32bit dataword
  load r2, <bitmask>              ; probably 32bit inst
  load r3, portreg(R1)            ; load current ioport content
                                  ;   (possibly subject to extra latency)
  ior r3, r2                      ; or with bitmask
  store r3, portreg(R1)           ; store result (extra latency)
If the IO Bus has 4 cycle latency, you'd be using up 8 cycles there in addition to the 5+ cycles of the actual instructions.


(ok, the bus issue does sound pretty bad.  But even on the modern chips with single cycle IO Buses, the number of instructions required really slows things down compared to the older 8-bit architectures with single-instruction pin manipulation.)

 

Offline autobot

  • Regular Contributor
  • *
  • Posts: 66
Re: The price of (STM32) Arduino
« Reply #13 on: May 02, 2016, 08:44:59 am »
There's an arduino implemented in fpga called xlr8:

http://www.aloriumtech.com/

Is it possible to implement something reasonable in the fpga to greatly accelerate the digitalWrite command, making this a non-issue ?
 

Offline autobot

  • Regular Contributor
  • *
  • Posts: 66
Re: The price of (STM32) Arduino
« Reply #14 on: May 02, 2016, 10:43:59 am »
DigitalWrite() allows the pin and the value to be variable at run-time, and if you try to implement THAT capability your code will get much slower.

Not necessarily, if you implement this correctly. The mbed has a library called FastIO[1], which does this using c++ templates i think and a simple interface, and for a somewhat comparable mcu(st32f030/48mhz) it get toggling rate of 188ns i.e. ~ 6,000Khz, which i would guess is pretty close to the raw toggle rate. 

[1]https://developer.mbed.org/users/Sissors/code/FastIO/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: The price of (STM32) Arduino
« Reply #15 on: May 02, 2016, 10:57:08 am »
"how slow bit-banging was on the NXP LPC2106 (ARM 7). "

There are 210x parts with FIOx registers.

I don't think we know for sure what was the cause of the gpio issues on the slower parts. The same instructions (are) take 7 cycles to execute on the IOx registers vs 2 cycles on the FIOx registers.

I got for example 16Mhz on a 72Mhz stm32f103c part through the APB bus, very close to the theoretical 18Mhz limit. That's the same as what you will get from a stm32L105 where gpio hangs of the ahb bus.

I think end if the day, bit banging benchmark like this is mostly a (peripheral block) clock game. In real life, branching is likely a lot more impactdul.
================================
https://dannyelectronics.wordpress.com/
 

Offline obiwanjacobi

  • Super Contributor
  • ***
  • Posts: 1013
  • Country: nl
  • What's this yippee-yayoh pin you talk about!?
    • Marctronix Blog
Re: The price of (STM32) Arduino
« Reply #16 on: May 02, 2016, 11:35:13 am »
C++ templates rule!  O0

(hint: see my Arduino library in the signature)
Arduino Template Library | Zalt Z80 Computer
Wrong code should not compile!
 

Offline AndreasF

  • Frequent Contributor
  • **
  • Posts: 251
  • Country: gb
    • mind-dump.net
Re: The price of (STM32) Arduino
« Reply #17 on: May 02, 2016, 12:47:27 pm »
Quote
It had nothing to do with the instruction set but everything to do with the bus interface.
I wouldn't say it had "nothing" to do with the instruction set; but older chips did put their GPIO on a slow bus, so there were additional cycles IN ADDITION to the long instruction sequence.   A "bad" GPIO bit set looks like:
Code: [Select]
  load R1, <address of IO port>   ;1 16bit instruction, 1 32bit dataword
  load r2, <bitmask>              ; probably 32bit inst
  load r3, portreg(R1)            ; load current ioport content
                                  ;   (possibly subject to extra latency)
  ior r3, r2                      ; or with bitmask
  store r3, portreg(R1)           ; store result (extra latency)
If the IO Bus has 4 cycle latency, you'd be using up 8 cycles there in addition to the 5+ cycles of the actual instructions.


(ok, the bus issue does sound pretty bad.  But even on the modern chips with single cycle IO Buses, the number of instructions required really slows things down compared to the older 8-bit architectures with single-instruction pin manipulation.)

Not sure if the "smaller" STM32 micros have this, but at least on the STM32F4xx devices IO ports have dedicated write-only "set/reset" registers. If you combine those with bit banding (whole word is mapped to a single bit) and you can set or clear any single IO pin with two instructions. Granted you couldn't "toggle" with these, as you're not taking into account the current state of the pin.

my random ramblings mind-dump.net
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1725
  • Country: 00
Re: The price of (STM32) Arduino
« Reply #18 on: May 02, 2016, 12:50:59 pm »
1 solution (that works quite well in GCC) is to inline digitalWrite function by using a macro or putting the function inline in the header file. In that case there is a good chance the compiler will optimize out all const's and resolve it to bit set/clr instructions. But if a variable is passed it will use the run-time variant. This usually works quite well in GCC for any optimize level (other than none) and OK written code.

Downside: each digitalWrite() that uses a variable will have full expansion of the function/macro, increasing code size dramatically yet doing not much for speed (just saves a call/return and possibly stack push/pop).

C++ is a more novel way of doing that, which can be much more expressive. May want to watch/scroll through this talk. However, I can see why it's not implemented widely because of complexity in writing the rules.

It's unfortunate that the ARM chips cannot use ALU operations on I/O space directly, but instead have to go through processor registers. Instead; bitbanding or the set/clr registers are good alternatives so the CPU can wiggle I/O by using MOV instructions.
 

Offline macboy

  • Super Contributor
  • ***
  • Posts: 2325
  • Country: ca
Re: The price of (STM32) Arduino
« Reply #19 on: May 02, 2016, 05:34:29 pm »
1 solution (that works quite well in GCC) is to inline digitalWrite function by using a macro or putting the function inline in the header file. In that case there is a good chance the compiler will optimize out all const's and resolve it to bit set/clr instructions. But if a variable is passed it will use the run-time variant. This usually works quite well in GCC for any optimize level (other than none) and OK written code.

Downside: each digitalWrite() that uses a variable will have full expansion of the function/macro, increasing code size dramatically yet doing not much for speed (just saves a call/return and possibly stack push/pop).

C++ is a more novel way of doing that, which can be much more expressive. May want to watch/scroll through this talk. However, I can see why it's not implemented widely because of complexity in writing the rules.

It's unfortunate that the ARM chips cannot use ALU operations on I/O space directly, but instead have to go through processor registers. Instead; bitbanding or the set/clr registers are good alternatives so the CPU can wiggle I/O by using MOV instructions.
I have used dwf - digital write fast.
https://code.google.com/archive/p/digitalwritefast/downloads

The authors did some clever things to determine at compile time whether the passed in arg was constant or variable. If constant, all the necessary register and bitmask lookups are done at compile time, the write (or read) is done in about 4 cycles, slightly more if you need to maintain 100% compatibility with default digitalWrite (by always disabling PWM on every digitalWrite call). If the pin arg is a variable, then the default digitalWrite is called instead, so it is completely transparent.  Sometimes I add a #warning line to inform me if the slow digitalWrite is being called.

I have modified some libraries to make use of dwf by allowing them to use constant pin numbers instead of variables which are set in their constructor. Usually I make a hardware.h file with pin definitions, then include that file in the (modified) library's own .h file.
 

Offline waspinator

  • Contributor
  • Posts: 49
  • Country: pl
Re: The price of (STM32) Arduino
« Reply #20 on: May 03, 2016, 12:20:57 am »
C++ templates rule!  O0

(hint: see my Arduino library in the signature)

just out of curiosity, why codeplex instead of github?
 

Offline obiwanjacobi

  • Super Contributor
  • ***
  • Posts: 1013
  • Country: nl
  • What's this yippee-yayoh pin you talk about!?
    • Marctronix Blog
Re: The price of (STM32) Arduino
« Reply #21 on: May 03, 2016, 05:27:34 am »
I don't get Git I guess. I experience it as convoluted and unnecessary complex for a one-man project and I hate command line (see my command line gui project  8) ) other than for build scripts. To me that is a step backward. Also at the time of my first open source project, github was hardly common place and there was no integration in Visual Studio. I sort of stuck with codeplex after that. Also the TFS and the Team explorer in Visual Studio is something I work with daily - so it was also a case of use what I know.

Back on topic:
I watched the video linked in by Hans and I think that the presentor should experiment some more. Only using static classes seems only sensible for MCU related stuff. If you are building a library for say a simple LCD UI, I have not found a way to escape using objects. But the point he gets across is that you have to know the consequences of each C++ construct and use them effectively. In my library I use a construct that specifies the class you derive from as a template parameter. That way you can stack up any class hierarchy you like. If you think in reverse - where a base class is more specific - you can even avoid using virtual methods (and the vtable) altogether. I am not saying that I am an expert in C++ templates - I would love to convert my LCD GUI conventional classes to template classes - but there are more options that he shows in his presentation. But it was a nice 45 minutes.
Arduino Template Library | Zalt Z80 Computer
Wrong code should not compile!
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf