Author Topic: Dhrystone 2.1 on mcus  (Read 41830 times)

0 Members and 1 Guest are viewing this topic.

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Dhrystone 2.1 on mcus
« Reply #75 on: March 27, 2014, 06:07:18 pm »
What does the C in 80C51 designate? (or 65C02)

CMOS
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1626
  • Country: nl
Re: Dhrystone 2.1 on mcus
« Reply #76 on: March 29, 2014, 05:12:33 pm »
Okay, I've got a PIC32 target board (PIC32MX440F512H @ 80MHz) and ran the Dhyrstone benchmark.
SetupCycles @ 80MHzDhrystone @ 80MHzCode Size @ 80MHzCycles @ 20MHzDhrystone @ 20MHzCode Size @ 20MHz
No optimizations131276217436996100417436
GCC optimize level 1631158415260455219715256
GCC optimize level 2481207915156345289815152
GCC optimize level 3445224715216339294915212
GCC optimize level 3 + unroll loops437228815308326306715300
GCC optimize level s(ize)680147015400483207015396
See text / speed423236415308326306715300
See text / size140771010552123281110548

After the standard GCC 0/1/2/s , I tried the following:

"Remove unused sections" -> Code size dropped to 14880 bytes, execution time increased to 445 (?). Slower, so I turned it off.
"Optimization level stdlib" set to level 3 -> 436 cycles / Dhrystone! (-1 cycle) Code is 16176 bytes (+868byte).
"Isolate each function in it's own section" -> 423 cycles / Dhrystone (-14 cycles / 2364 Dhrystone/MIPS). Code size was 15308 bytes (+0 bytes).
"Use legacy stdlib" -> Back to 439 cycles, code 33400 bytes. Nope, not attractive feature at all :)
"Generate 16-bit code" -> 803 cycles / dhrystone, code dropped to 15144 bytes. Quite a performance hit.

As personal interest, the smallest code I could get was 10552 bytes, with GCC 16-bit code, GCC optimize s, no unroll loop, remove unused functions, allow section overlap, optimize s for stdlib, 16-bit in linker (for stdlib?). Execution took 1407 cycles, though.

Now, I first did the test only @ 80MHz, wrote the above and thought "something doesnt seem right". As the 1.65DMIPS/MHz claim should give 2900Dhrystone/MIPS. I quickly thought about the FLASH accelerator, and I suspect it's not as agood as on most modern ARM chips or the new PIC32MZ. So I reran the test at 20MHz (which seems to run at full speed, e.g. lowering further doesn't yield faster results) and got these. They seem about right, or better than claimed.
Also glad to see the new PIC32MZ has a FLASH accelerator that should do the business @ full speed :)
« Last Edit: March 29, 2014, 05:27:20 pm by hans »
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #77 on: March 29, 2014, 06:33:53 pm »
Thanks. Your numbers are fairly consistent with mine (simulated). The poorer efficiency at higher frequency makes sense to me - maybe due to flash wait state.

The optimized numbers, not just for pic32 but other mcus as well, are not that believable as you have to investigate each routine to make sure that they are not being optimized away - one way to do that is actually to write something to the port in those routines so they don't get cut by the compiler - and you can watch the port to make sure that they are actually run. Too much work.
================================
https://dannyelectronics.wordpress.com/
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1626
  • Country: nl
Re: Dhrystone 2.1 on mcus
« Reply #78 on: March 29, 2014, 06:56:24 pm »
Yes I suspect it's the FLASH accelerator, as the PIC32MZ datasheet goes into the prefetch module, which is only 16 bytes deep. So I am pretty certain any call will result in a performance drop.

Unless the compiler inlines all the test functions, of course. In that case maybe a "dummy" function is possible so it tricks the compiler into thinking those functions are used more often, and "can't" inline them anymore. However, as the compiler will try to do the same on my "real" programs, I take it as a feature :)
Also, too much effort, and would need to do the same for every other chip as they also use GCC.

Sorry I forgot to add the compiler; it's XC32 v1.21
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #79 on: March 29, 2014, 07:14:53 pm »
Updated the list for your compiler.

I compared XC32 vs. C32 in simulation and the numbers I got are almost identical so I think they are (essentially?) the same, just different branding.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #80 on: March 30, 2014, 08:53:50 pm »
Added MSP430 scores. Fairly respectable but considerably lower than the PIC24 did.
================================
https://dannyelectronics.wordpress.com/
 

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 6459
  • Country: nl
Re: Dhrystone 2.1 on mcus
« Reply #81 on: March 31, 2014, 12:01:35 pm »
Danny what was the difference in compiler settings between these two measurements?

Quote
STM32F4:          1,053,       IAR-ARM         
STM32F4:            494,       IAR-ARM
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #82 on: March 31, 2014, 10:57:31 pm »
Kjelt, the 1053 score came from me, IAR-ARM, no optimization.

The 494 score came from one of the participants and I can look back and see exactly what it is.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #83 on: March 31, 2014, 11:01:06 pm »
hans got a score of 489 for STM32F4 using IAR, no optimization. I may have mis-transcribed it.

Corrected now.
================================
https://dannyelectronics.wordpress.com/
 

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 6459
  • Country: nl
Re: Dhrystone 2.1 on mcus
« Reply #84 on: April 01, 2014, 08:11:07 am »
But a difference of a factor two with the same compiler and compiler settings does not compute?
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #85 on: April 01, 2014, 10:52:06 am »
Sure.

But I am not in the business of validating someone's results. They are what they are, as reported here.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #86 on: May 26, 2014, 09:43:58 pm »
Updated simulated benchmark for PIC32MZ (not a single chip available) and PIC24F (XC16 and C30).

First, PIC32MZ vs. PIC32MX:

Quote
PIC32MX320:       3,378      C32 2.x,        optimized (-O3)
PIC32MX440:       3,067      X32 1.21,      optimized (-O3) - @ 20Mhz
PIC32MX440,       2,288      X32 1.21,      optimized (-O3) @ 80Mhz
PIC32MX320:      1,151,    C32 2.x
PIC32MX440,       1,004,   X32 1.21       @ 20Mhz
PIC32MX440,      762,       X32 1.21,       @ 80Mhz

Simulation only:
PIC32MZ:             1173
PIC32MZ:             3,413,   O3

The results look to be roughly comparable: non-optimized at around 1200 and optimized around 3400.

The compiler for PIC32MZ is XC32.

================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #87 on: May 26, 2014, 09:46:48 pm »
2ndly, PIC24F actual vs. simulation:

Quote
Measured:
PIC24F:                2,432,     C30 2.x,        optimized -O3 (speed)
PIC24F:                2,403      XC16 pro,     optimized (-O3)
PIC24F:                1,901,     C30 2.x,        optimized -O2 (speed)
PIC24F:                1,237,    C30 2.x,
PIC24F:               1,106    [compiler?]    optimized (-O3)
PIC24F:               993,      XC16 free,   

Simulation only:
PIC24F:                2,433    C30, O3
PIC24F:                2,404,   XC16, O3
PIC24F:                1,215    C30
PIC24F:                978       XC16

Very consistent results between the measured benchmarks vs. simulated benchmarks.

Not unexpected.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #88 on: May 26, 2014, 09:52:27 pm »
Now, XC16 vs. C30:

Quote
Optimized C30:
PIC24F:                2,432,     C30 2.x,        optimized -O3 (speed)
PIC24F (sim):       2,433    C30, O3
//PIC24F:                1,901,     C30 2.x,        optimized -O2 (speed)
//PIC24F:               1,106    [compiler?]    optimized (-O3)

Optimized XC16:
PIC24F:                2,403      XC16 pro,     optimized (-O3)
PIC24F (sim):       2,404,   XC16, O3

Slight edge to C30, about 1% higher scores.

Quote
Non-optimized C30:
PIC24F:                1,237,    C30 2.x,
PIC24F (sim):       1,215    C30

Non-optimized XC16:
PIC24F:               993,      XC16 free,   
PIC24F (sim):        978       XC16

C30 still leads, with a 20% margin.

The old dog does have some tricks, :)

================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #89 on: August 14, 2014, 11:31:32 am »
I was quite impressed with the dhrystone performance of PIC24 - they are surprisingly fast in the benchmarks we have done earlier.

It turns out that TI has also done a series benchmarking of 16-bit / 8-bit mcus, as published in SLAA205.

Here is a screen shot of one of the tables in the appnote. Look at the cycle counts for pic24/dspic.

Wow!

For a job well done, Microchip. Wish they had done more to push that chip.
================================
https://dannyelectronics.wordpress.com/
 

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 6459
  • Country: nl
Re: Dhrystone 2.1 on mcus
« Reply #90 on: August 14, 2014, 12:03:41 pm »
Here they use another benchmark: coremark, might be interesting for comparison also.
http://www.eembc.org/coremark/
 

Offline bwat

  • Frequent Contributor
  • **
  • Posts: 278
  • Country: se
    • My website
Re: Dhrystone 2.1 on mcus
« Reply #91 on: August 14, 2014, 12:29:25 pm »
Here they use another benchmark: coremark, might be interesting for comparison also.
http://www.eembc.org/coremark/

Coremark was mentioned on the very first page. :)
"Who said that you should improve programming skills only at the workplace? Is the workplace even suitable for cultural improvement of any kind?" - Christophe Thibaut

"People who are really serious about software should make their own hardware." - Alan Kay
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8240
Re: Dhrystone 2.1 on mcus
« Reply #92 on: August 14, 2014, 12:30:39 pm »
There's a list of DMIPS/MHz here that you may find helpful for comparison:
http://en.wikipedia.org/wiki/Instructions_per_second#Timeline_of_instructions_per_second
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #93 on: August 14, 2014, 05:32:51 pm »
I have a workstation with dual quad-core Xeon (3.0Ghz+), equivalent to 250,000 dmips.

Pretty impressive vs. any Cortex M3 machines, :).
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #94 on: September 11, 2014, 11:26:59 am »
Added STM32F030F - from the ghetto thread.

Not terribly impressive, more on the lower-end of the CMx chips and not that much faster than typical 8-bit chips. Its advantage I think is in its ability to run really fast - I have clocked the little guy at over 64Mhz (flash wait  = 1) or 52Mhz (flash wait = 0). So its raw MIPS numbers are still impressive, in spite of its mediocre MIPS/Mhz numbers.

Among the 8-bit / 16-bit chips, PIC24F remains the champ in terms of the MIPS/Mhz race. It just has limited frequency range.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyfTopic starter

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Dhrystone 2.1 on mcus
« Reply #95 on: September 24, 2014, 12:30:31 am »
Dug up an old STM8 training presentation.

Notice the 0.29dmips figure? That translates to about 500 dhrystones / Mhz.

Very close to our test figures:

Code: [Select]
STM8S:                482,      IAR-STM8,      optimized
STM8S:                434,       IAR-STM8,
PIC18F26K20:     380,       XC8 pro
PIC18F26K20:     323,       XC8 free
PIC18F26K20:     322,       PICC18 pro
AVR90USB1286:  237,       gcc-avr
PIC18F26K20:     168,       PICC18 lite

The presentation is also right that that kind of speed is 1.5 - 3x of some PICs and AVR.
================================
https://dannyelectronics.wordpress.com/
 

Offline BravoV

  • Super Contributor
  • ***
  • Posts: 7547
  • Country: 00
  • +++ ATH1
Re: Dhrystone 2.1 on mcus
« Reply #96 on: September 24, 2014, 02:46:14 am »
Any plan for TI's TM4C123x or TM4C129x series ?

Online coppice

  • Super Contributor
  • ***
  • Posts: 8605
  • Country: gb
Re: Dhrystone 2.1 on mcus
« Reply #97 on: September 24, 2014, 06:37:56 am »
Was curious when I discovered that 8051 (and 80C51?) cores are in a lot of uC. A quick digikey search shows 600-700 (plus have to subtract tape&reel, etc.). So let's say 500 (but still less if you subtract pkg types), but still a lot.  Or is that the only way to get an 8051? i.e. they only come as a core?

What does the C in 80C51 designate? (or 65C02)
The 8051 is a core anyone can use without paying royalties, and multiple reasonably good toolchains are available to support it. That means its a no brainer for many people who need to drop a simple low performance core into a chip to drop in an 8051 core. The original 8051 takes 12 clock cycles for one machine cycle. There are versions today which run at one clock cycle per machine cycle, so it can be reasonably fast. You will find an 8051 at the heart of a lot of devices you didn't even realise were processor based, as the user just sees them as a black box.

Every single engineer graduating in China has studied the 8051 in detail. This is a huge incentive for people to put it in their chips, as a huge number of the engineers developing with 8 bit MCUs today are in China. If you try to sell them a chip with an 8051 core you can talk about the things which make the chip interesting. If you have any other core the conversation ends up mostly about tools, and how much hassle it will be to get up to speed with the core. Even very popular cores, like the PICs, have this disadvantage in China. Many people will recognise a parallel with ARM cores in bigger MCUs on a global level. If a 32 bit MCU doesn't have an ARM core it will be a big struggle to sell it to a lot of people.
 

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 6459
  • Country: nl
Re: Dhrystone 2.1 on mcus
« Reply #98 on: September 24, 2014, 06:44:28 am »
8051 has been put in its grave for a long long time, its now a zombie created by some companies lacking innovation effort.
Unfortunately zombies refuse to stay in their grave and act dead.
 

Online coppice

  • Super Contributor
  • ***
  • Posts: 8605
  • Country: gb
Re: Dhrystone 2.1 on mcus
« Reply #99 on: September 24, 2014, 08:16:26 am »
8051 has been put in its grave for a long long time, its now a zombie created by some companies lacking innovation effort.
Unfortunately zombies refuse to stay in their grave and act dead.
That is the exact opposite of reality. People being really innovative seldom give a damn about the core. If that's where your innovation lies its pretty weak. It would be great if a better free to use core with good tools were available, but the 8051 is good enough for a lot of devices.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf