Author Topic: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer  (Read 10119 times)

0 Members and 1 Guest are viewing this topic.

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37738
  • Country: au
    • EEVblog
A five part video series on building the Free PDK open source programmer for the 3 Cent Padauk microcontrollers.
A new video released 9am Sydney time every day.
Part 1 is about how to take a github hardware project and order the parts and PCB from a Bill Of Materials. This could be applicable to any project you want to get manufactured.
https://github.com/free-pdk/

 
The following users thanked this post: cdev, thm_w, GromBeestje, DenCraw

Offline cdev

  • Super Contributor
  • ***
  • !
  • Posts: 7350
  • Country: 00
Whoa, three cents!

will now HAVE to check this out, cheapskate that I am.
"What the large print giveth, the small print taketh away."
 

Offline ntldr

  • Newbie
  • Posts: 1
  • Country: de
Nice to hear that the project came through. And even with a proper C compiler. Was looking at them for a small project I ordered last week, but didn't want to go through with it, due to the toolchain issues I remembered. Looks like the situation has changed a bit. Maybe next time. For now I'm stuck with an old PIC for ten times the price, what a waste...

One note regarding the video: JLC will happily do V-Scoring on their prototype panels. They just route out your panel (as if it was one large board) and then just V-Score it individually. They do have a minimum panel size, but aside from that I haven't had any issues with panelizing with them. Oddly it's not even an extra cost despite being an extra process step.

As a matter of fact it even looks like the panels on the Github repo were made at JLC, judging from the markings & order code on the edge rails.
 

Offline greenpossum

  • Frequent Contributor
  • **
  • Posts: 408
  • Country: au
Would be even nicer if some module manufacturer like Elecrow were to make and sell them. Landed price would then be closer to what I'd be willing to pay, under $10. Or maybe people within an area can organise a group buy?
 

Offline Crumble

  • Regular Contributor
  • *
  • Posts: 99
Well, I think you'll be more lucky when you initiate this group buy yourself so you can make sure the initiative starts in your vicinity, but for $10 I guess you'll have some issues getting even the postage paid, unless these people are neighbours... ;D The STM on it is already $3, so I guess you will have to wait for Shenzhen to start the presses to really get it for the $10 you mention. Maybe you can get it down to a reasonable price if you get a panel and a stencil made and place the SMD parts. It would make it a very much more attractive module to buy.

What would you guys use the micro's for? It only seems to make sense to use these parts if you plan mass production or have an unusually large project in which you have a need for a lot of self contained modules linked together or so. I occasionally play with Arduino Nano's, and at $2 a piece it would take me years to to go through $10 worth of them. After all, I can very easily reprogram them for the project I am working on, and they plug in and out of a (bread)board quite easily. Keep in mind you need to desolder these suckers every time and replace them, which is IMHO a bit of a pain in the butt for playing around with. For smaller space constrained tasks I would prefer the Attiny85, which has some quite powerful analog features too, and which can be programmed from an Arduino, which I already have. Yes, it's more than $1 a piece, but for projects I generally only build once or twice it is not a big issue to me.

In the meantime this is of course a great project to see unfolding, lots of great work was done in creating this toolchain. For me it is no use, but admittedly, when the opportunity to create a very cheap sellable product comes along, I feel I need to be ready, even though I doubt my fortunes in this matter... :-\ I guess that if you're a freelancer and one of your contacts comes around with a product with a very tight budget (per unit) you may make him very happy by getting one of these controllers to do the job for him.
 
The following users thanked this post: thm_w

Offline greenpossum

  • Frequent Contributor
  • **
  • Posts: 408
  • Country: au
Well, I think you'll be more lucky when you initiate this group buy yourself so you can make sure the initiative starts in your vicinity, but for $10 I guess you'll have some issues getting even the postage paid, unless these people are neighbours... ;D The STM on it is already $3, so I guess you will have to wait for Shenzhen to start the presses to really get it for the $10 you mention. Maybe you can get it down to a reasonable price if you get a panel and a stencil made and place the SMD parts. It would make it a very much more attractive module to buy.

What would you guys use the micro's for? It only seems to make sense to use these parts if you plan mass production or have an unusually large project in which you have a need for a lot of self contained modules linked together or so. I occasionally play with Arduino Nano's, and at $2 a piece it would take me years to to go through $10 worth of them. After all, I can very easily reprogram them for the project I am working on, and they plug in and out of a (bread)board quite easily. Keep in mind you need to desolder these suckers every time and replace them, which is IMHO a bit of a pain in the butt for playing around with. For smaller space constrained tasks I would prefer the Attiny85, which has some quite powerful analog features too, and which can be programmed from an Arduino, which I already have. Yes, it's more than $1 a piece, but for projects I generally only build once or twice it is not a big issue to me.

In the meantime this is of course a great project to see unfolding, lots of great work was done in creating this toolchain. For me it is no use, but admittedly, when the opportunity to create a very cheap sellable product comes along, I feel I need to be ready, even though I doubt my fortunes in this matter... :-\ I guess that if you're a freelancer and one of your contacts comes around with a product with a very tight budget (per unit) you may make him very happy by getting one of these controllers to do the job for him.

These boards are so light that if you leave female connector for the recipient to solder you could probably stick them in a letter.

I'm just a hobbyist and not thinking of large scale. You would never take these things out after soldering. Your time is worth more than 3¢ plus the surrounding electronics. Think of it as a flexible digital logic chip that you use to do a standalone function. Say you want a running arrow for a small sign or a button debouncer or a light that stays on for 20 seconds after releasing the button.
 

Offline Crumble

  • Regular Contributor
  • *
  • Posts: 99
These boards are so light that if you leave female connector for the recipient to solder you could probably stick them in a letter.
Exactly! ;)

Unfortunately I am based in Europe, so I won't be able to send one to you with any practical ease... :-\
Quote
I'm just a hobbyist and not thinking of large scale.
I'm just a hobbyist too, but that only makes it worse. The upfront cost (mainly in time) of getting such a micro working are way larger than you can reasonably be expected to recoup. Keep in mind that equivalent flash controllers that these cheap ones could theoretically replace are not all that expensive, an Attiny10 is only $0,40, and a vastly more powerful Atmega328 is only $1,50. Did I mention the STM8 series which can be had for <$0,22 a piece, with a <$4 programmer? :wtf: For the $10 you mention you can get the programmer and almost 30 of these chips, which can be reused if required.
Quote
Your time is worth more than 3¢ plus the surrounding electronics.
Your time is probably worth more than most common microcontrollers, so I would consider the time it is going to take from the moment you get your programmer delivered and the moment you get the first usable program in one of these micro's. Pure from the perspective of cost in time it may turn out to be difficult to justify going for this 3¢ device. I am actually quite happy with the speed at which you can get an Arduino Uno to do something from scratch.

I'm sorry if I sound a bit sarcastic, it was not intended, especially because initially I thought the exact same way you did. Afterwards though I started to realise that if I wanted to start playing with microcontrollers I needed a few basic things (programmer) and some practice, both of which were going to be more costly than any savings I would ever make in controllers. For me the Atmel controllers are #1 because there is such a massive community around them (Arduino) causing them to be well supported. They are also so cheap I can afford to lose one with every project I do, even though I don't actually make that many projects with microcontrollers in it at all, and I never blew one up. Only when I want to go professional I'd consider switching to other parts based on price. The type of clients that require the use of 3 cent micro's are probably not going to be the best paying ones anyway, so I would avoid those if I were you... In the meantime it is great fun seeing how this cheap micro is triggering such a flurry of activity in the open source community. This may turn out to be very interesting, it is a bit like seeing the Arduino take off.
 

Offline greenpossum

  • Frequent Contributor
  • **
  • Posts: 408
  • Country: au
Yes, but people don't think twice about putting say a transistor or a voltage regulator in a circuit. Why should it be different for a 3¢ MCU? You are still thinking make a custom program then flash it and retry until it works. You're still thinking all the MCU power is too precious to waste. Look at it from the other direction. You have a frequently used function you want an 8 pin device to do, well 6 pins left after power pins are connected, so you design it, simulate it with a MTP version if you like, and then add it to your list of 6 pin gadgets. When you need one, burn one. Or burn a pile and put it in your parts box.

The startup cost of the programmer is only once. At the moment it's for the keen. But once you have a jig setup, you can make any 6 pin gadget from your library on demand. Other people may have designs to share too. We're talking about very basic gadgets. That's why the Padauk C compiler is so limited that they implemented a bastardised for loop which iterates over a list. I seem to remember that it unrolls the loop. If you can fit it into 1kB why not?

I already went through the RPi, Arduino, ESP32, STM8, STM32 etc phase. This is a different game. For 3¢ you can treat it like a transistor. And you keep talking about clients and projects. I already told you I have no interest in those.
 

Offline MathematicalJ

  • Contributor
  • Posts: 16
  • Country: us
Has anyone made a version with JLC Basic Components so it can be pick-and-placed by JCB as well?
 

Offline HKJ

  • Super Contributor
  • ***
  • Posts: 2904
  • Country: dk
    • Tests
It is a very niece project, but these cheap micros are not that well suited to hobby project with very low production quantity.
Even with the low price and simple programmer I do not see any reason to use it when the final quantity is expected to be below 100, there a micro with lots of flash memory is much more suited. If you get into the 1000, 10000's or more it may be a very good idea to look at this project.
 

Offline Crumble

  • Regular Contributor
  • *
  • Posts: 99
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #10 on: May 15, 2020, 09:16:53 pm »
@greenpossum: I guess we just have very different methods of hobbying, for my user case (which includes playing with analog stuff a lot, which does not help) I consider this product to be no more than a curiousity. In the meantime I will just follow along and enjoy the show. :popcorn:

@Peabody: It is funny you mention that, I was scrolling through the video description to find the link too because you cannot read it when the video settings are at 480p either (which happens when the connection is bad). :-DD For what it's worth: the link is https://github.com/free-pdk/. ;)
 

Offline greenpossum

  • Frequent Contributor
  • **
  • Posts: 408
  • Country: au
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #11 on: May 15, 2020, 10:56:28 pm »
@greenpossum: I guess we just have very different methods of hobbying, for my user case (which includes playing with analog stuff a lot, which does not help) I consider this product to be no more than a curiousity. In the meantime I will just follow along and enjoy the show. :popcorn:

Well the problem is you already had a preconception in your mind how I and other people might use these. As I said, forget that it's a MCU, just think of it as a very cheap configurable digital chip.
 

Offline ebclr

  • Super Contributor
  • ***
  • Posts: 2328
  • Country: 00
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #12 on: June 05, 2020, 09:25:26 pm »
I would like to understand what is that Padauk all about,

I use STC15F104e where low cost is a point that CPU cost 7 cents ( it's double than 3 cents from Padauk But still near 0 and isn't OTP) , has standard 8051 compatibilities, and is flash and eresable, Have a lot of development tools paid and free, for assembler C , pascal, Real-time simulation ( Proteus ) and even forth. why I would think in USE  a 3 Cents limited CPU OTP Like Padauk, this makes no sense, Why are you guys so excited about this trash? What is the point?
 

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 6364
  • Country: ca
  • Non-expert
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #13 on: June 08, 2020, 08:55:39 pm »
I would like to understand what is that Padauk all about,

I use STC15F104e where low cost is a point that CPU cost 7 cents ( it's double than 3 cents from Padauk But still near 0 and isn't OTP) , has standard 8051 compatibilities, and is flash and eresable, Have a lot of development tools paid and free, for assembler C , pascal, Real-time simulation ( Proteus ) and even forth. why I would think in USE  a 3 Cents limited CPU OTP Like Padauk, this makes no sense, Why are you guys so excited about this trash? What is the point?

36c each
https://lcsc.com/product-detail/STC_STC15F104E-35I-SOP8_C106847.html
Profile -> Modify profile -> Look and Layout ->  Don't show users' signatures
 


Offline thm_w

  • Super Contributor
  • ***
  • Posts: 6364
  • Country: ca
  • Non-expert
Profile -> Modify profile -> Look and Layout ->  Don't show users' signatures
 

Offline ebclr

  • Super Contributor
  • ***
  • Posts: 2328
  • Country: 00
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #16 on: June 10, 2020, 12:45:47 am »
" taobao is not a major authorized electronics distributor."

I don' t care, bought one time works, bought 2nd time works again,  Now I'm on the " n"  time level and no problem at all. I do not have the necessity to this kind of nonsense in restrict my supplier to expensive ones
 

Offline greenpossum

  • Frequent Contributor
  • **
  • Posts: 408
  • Country: au
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #17 on: June 10, 2020, 03:07:54 am »
7c each

https://item.taobao.com/item.htm?spm=a230r.1.14.8.35ce1d9d60uX6n&id=572963383895&ns=1&abbucket=16#detail

Interesting, might give Taobao a try. One wonders what you're getting at that price. Seconds? End of run parts? How does the supplier make enough money to eat?
 

Offline spth

  • Regular Contributor
  • *
  • Posts: 163
  • Country: de
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #18 on: June 13, 2020, 11:25:06 am »
7c each

https://item.taobao.com/item.htm?spm=a230r.1.14.8.35ce1d9d60uX6n&id=572963383895&ns=1&abbucket=16#detail

Interesting, might give Taobao a try. One wonders what you're getting at that price. Seconds? End of run parts? How does the supplier make enough money to eat?

I don't know, but if you buy from taobao, there are Padauk µC below 0.01 € (PMS15A, when bought in quantities of at least 10).
 

Offline spth

  • Regular Contributor
  • *
  • Posts: 163
  • Country: de
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #19 on: June 13, 2020, 11:35:11 am »
I would like to understand what is that Padauk all about,

I use STC15F104e where low cost is a point that CPU cost 7 cents ( it's double than 3 cents from Padauk But still near 0 and isn't OTP) , has standard 8051 compatibilities, and is flash and eresable, Have a lot of development tools paid and free, for assembler C , pascal, Real-time simulation ( Proteus ) and even forth. why I would think in USE  a 3 Cents limited CPU OTP Like Padauk, this makes no sense, Why are you guys so excited about this trash? What is the point?

There are Padauk µC that have Flash. They don't get mentioned on eevblog much, but most development for the free tools used them (and they are thus among the currently best-supported ones - all the demo programs have been developed and tested on the Flash devices).

When  writing C programs withfree tools, I do not see any advantage on either side of 8051 vs. Padauk. For both you'd use SDCC as the compiler. For both there are tools to write the programs onto the devices (thouhg currently, the Pdauk ones seem better maintained).

The Padauk µC are no trash. They are somewhat comparable to lower-end 8051-compatible devices like the one you mentioned. Of course on the 8051 side you have the advantage of multiple vendors, and the possibility to change to bigger devices as necessary (Padauk currently tops out at 256 B RAM). But you don't get hardware multithreading on 8051 (though that is a Padauk feature not yet supported by free tools - you'd currently have to write the code for all cores but one in assembler).

Personally, I prefer the Padauk architecture to 8051, but it is not that much of a difference. Both are much nicer than PIC. Both are far less nice than STM8.
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #20 on: June 13, 2020, 07:40:00 pm »
Personally, I prefer the Padauk architecture to 8051, but it is not that much of a difference. Both are much nicer than PIC. Both are far less nice than STM8.

I'm curious why you prefer Padauk over 8051?  Also, why do you think STM8 is better than both (that hasn't necessarily been my experience).

In general, it seems like 8051 gets a bad rap lots of times and I'm not entirely sure why.  Is it the split memory architecture?  Or the lack of a good open source C compiler?  Yes, I know about SDCC (and use it all the time), but it really isn't all that efficient/optimized in the code it generates.  But, it is good enough for high level things, and one can always drop down to inline assembly for the places where speed and/or size efficiency is actually important.

Speaking of program size, 8051 is actually pretty optimal in this regard.  There are a lot of 1 byte instructions in addition to the 2 byte instructions, and very few 3 byte instructions.  The Padauk ICs use the equivalent of 2 byte instructions for everything (whether it is a 13/14/15/16 bit word).  Other architectures AVR/STM8 seems to have larger sized instructions on average, potentially contributing to larger sized programs (of course it also depends on the instruction set and how many instructions are needed to accomplish the task at hand).

Surely the Padauk with their limited SRAM is no better than 8051 in terms of memory architecture.  Even comparing 8051 to other architectures, I don't find the split memory to be all that big of a deal really.  In fact, it is kind of liberating.  The internal 128/256 bytes of SRAM can be thought of like a giant 'register' pool or scratch pad, and then one can use the larger xram for normal things like global variables that aren't accessed as often, or arrays where one has to access them indirectly anyway.  The 8051's instructions really aren't that different for accessing xram than what other architectures require for indirect access (i.e. mov dptr, #address; movx a, @dptr;, or movx @dptr, a; inc dptr).  This is very similar to AVR's X, Y, Z registers which are used as indirect pointers into SRAM.  Most 8051 MCUs these days have support for dual dptrs as well.  Maybe the biggest limitation is that the stack has to fit within the internal 256 byte of SRAM, but I haven't found that to be too much of a limit so far.

One place where 8051 might fall behind is in cycles per instruction, but this really depends on which variant you work with.  One of my favorite 8051 MCUs right now is the Nuvoton N76E003 (~$0.20/each for a TSSOP20 IC with 18 IO's, 18KB flash (up to 4KB bootloader support), 256 bytes SRAM, 768 bytes XRAM, 12-bit ADC, 2xUART, SPI, I2C, etc...).  It has instructions that vary from 1 to about 5 clock cycles, with the average being about 3 for most instructions.  That is slightly inferior to the Padauk and AVR MCUs where most instructions are 1/2 clock cycles, although the N76E003 runs at 16MHz, and the Padauk ICs only run at 8MHz.  But, there are other cheap 8051 MCUs that just as good.  The CH551/CH552/CH554/CH559 MCUs are really efficient supporting 1 clock cycle for most instructions, with only a few in the 2+ range.  And the CH551/CH552 are really inexpensive ($0.20 - $0.30) and even have USB support built in!

I agree that these Padauk MCUs are interesting and have their place, but I think they shine in a different area than 8051's and other MCUs.  To me, the main benefit is the low cost, the lower power consumption, and the fact that they are good enough for a lot of simple things.  But, the small SRAM/Flash size and lack of hardware peripherals certainly can be a limitation in many projects where spending a little bit more goes a long way.
 
The following users thanked this post: edavid

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #21 on: June 13, 2020, 08:01:09 pm »
Yes, but people don't think twice about putting say a transistor or a voltage regulator in a circuit. Why should it be different for a 3¢ MCU? You are still thinking make a custom program then flash it and retry until it works. You're still thinking all the MCU power is too precious to waste. Look at it from the other direction. You have a frequently used function you want an 8 pin device to do, well 6 pins left after power pins are connected, so you design it, simulate it with a MTP version if you like, and then add it to your list of 6 pin gadgets. When you need one, burn one. Or burn a pile and put it in your parts box.

The startup cost of the programmer is only once. At the moment it's for the keen. But once you have a jig setup, you can make any 6 pin gadget from your library on demand. Other people may have designs to share too. We're talking about very basic gadgets. That's why the Padauk C compiler is so limited that they implemented a bastardised for loop which iterates over a list. I seem to remember that it unrolls the loop. If you can fit it into 1kB why not?

I already went through the RPi, Arduino, ESP32, STM8, STM32 etc phase. This is a different game. For 3¢ you can treat it like a transistor. And you keep talking about clients and projects. I already told you I have no interest in those.

Thank you for the insight!

I too had been trying to equate these too much to other MCUs, where the value proposition is harder to justify compared to slightly more expensive parts with greatly enhanced capabilities.  But your comments about treating these more like simple programmable building blocks changes things.  I can now see finding uses for these as glue logic replacing one or more 74 series ICs and/or discrete diodes/transistors/resistors.  Or, I can also see finding uses for these as supplemental devices that augment and free up resources on the main MCU.  I already have a few idea for some things I want to try (a SPI CS multiplexer, a HD44780 driver, a GPIO expander, etc...)
 

Offline spth

  • Regular Contributor
  • *
  • Posts: 163
  • Country: de
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #22 on: June 14, 2020, 09:19:53 am »
Also, why do you think STM8 is better than both (that hasn't necessarily been my experience).

In general, it seems like 8051 gets a bad rap lots of times and I'm not entirely sure why.  Is it the split memory architecture?  Or the lack of a good open source C compiler?  Yes, I know about SDCC (and use it all the time), but it really isn't all that efficient/optimized in the code it generates.  But, it is good enough for high level things, and one can always drop down to inline assembly for the places where speed and/or size efficiency is actually important.

Speaking of program size, 8051 is actually pretty optimal in this regard.  There are a lot of 1 byte instructions in addition to the 2 byte instructions, and very few 3 byte instructions.

Let's compare STM8 to MCS-51 then. SDCC supports both. For the comparison, I'll assume we need a few KB of RAM (i.e. large memory model for mcs51, medium memory model for stm8) and want full reentrancy as in the C standard (i.e. --stack-auto option for mcs51).
The STM8 has good support for pointers (flat address space, registers x and y), while MCS-51 has to juggle with memory spaces and go through dptr. Also, the STM8 has stackpointer-relative addressing. And the SDCC stm8 port has more fancy optimizations than the mcs51 one.

Looking at a benchmark, we can see what this means (Dhrystone, STM8AF at 16 MHz vs C8051 at 24.5 MHz):

stm8 code size is half of mcs51:
https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc-extra/historygraphs/dhrystone-stm8-size.svg
https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc-extra/historygraphs/dhrystone-mcs51-size.svg

Despite the C8051 being single-cycle and having 50% higher clock speed, the STM8 is 85% faster:
https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc-extra/historygraphs/dhrystone-stm8-score.svg
https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc-extra/historygraphs/dhrystone-mcs51-score.svg

The graphs are from SDCC, where they are used to track code size and speed to quickly notice regressions.
 
The following users thanked this post: thm_w

Offline spth

  • Regular Contributor
  • *
  • Posts: 163
  • Country: de
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #23 on: June 14, 2020, 10:04:30 am »
I'm curious why you prefer Padauk over 8051? […]

In general, it seems like 8051 gets a bad rap lots of times and I'm not entirely sure why.  Is it the split memory architecture?  Or the lack of a good open source C compiler?  Yes, I know about SDCC (and use it all the time), but it really isn't all that efficient/optimized in the code it generates.  But, it is good enough for high level things, and one can always drop down to inline assembly for the places where speed and/or size efficiency is actually important.

Padauk having fewer memory spaces make the architecture somewhat cleaner.  On MCS-51 any pointer read / write has to go through support functions (or the programmer has to manually specifiy memory space, i.e. use non-standard extensions of C. For Padauk, the problem exists for pointer read only. It owuld be good if SDCC had better tracking of pointers, so the use of support functions could be optimized out in more cases. But often that would be very hard to do (e.g. passing pointers to a function defined in a different source file).

I have to admit that to some degree, this cleanliness in architecture comes at the cost of loss of power: The Padauks are limited to far less memory than MCS-51.

Quote
Surely the Padauk with their limited SRAM is no better than 8051 in terms of memory architecture.  Even comparing 8051 to other architectures, I don't find the split memory to be all that big of a deal really.  In fact, it is kind of liberating.  The internal 128/256 bytes of SRAM can be thought of like a giant 'register' pool or scratch pad, and then one can use the larger xram for normal things like global variables that aren't accessed as often, or arrays where one has to access them indirectly anyway.  The 8051's instructions really aren't that different for accessing xram than what other architectures require for indirect access (i.e. mov dptr, #address; movx a, @dptr;, or movx @dptr, a; inc dptr).  This is very similar to AVR's X, Y, Z registers which are used as indirect pointers into SRAM.  Most 8051 MCUs these days have support for dual dptrs as well.  Maybe the biggest limitation is that the stack has to fit within the internal 256 byte of SRAM, but I haven't found that to be too much of a limit so far.

Though it is somewaht unfortunate that there are so many differnt ways of handling dual dptr. There is a document suggesting to split the exisitng mcs51 backend in SDCC into 5 different ones to cover the most common variants of dual dptr handling (https://sourceforge.net/p/sdcc/wiki/8051%20Variants/). But even then there are many more variants not yet covered. Manufacturers don't even stick to one single way for their product line.

Quote
I agree that these Padauk MCUs are interesting and have their place, but I think they shine in a different area than 8051's and other MCUs.  To me, the main benefit is the low cost, the lower power consumption, and the fact that they are good enough for a lot of simple things.  But, the small SRAM/Flash size and lack of hardware peripherals certainly can be a limitation in many projects where spending a little bit more goes a long way.

The small SRAM size is clearly a limitation. The architecture tops out at 512B, but all devices I know about have at most 256B. Code memory is far less limited: The architecture supports up to 8KW of 16-bit memory, i.e. 16 KB, though all devices I know about have at most 4KW. I am not sure yet about the peripheral situation: The Padauk FPPA approach (i.e. hardware-multihthreading) allows to do a lot of stuff in software that otherwise would need a peripheral.
 

Offline ebclr

  • Super Contributor
  • ***
  • Posts: 2328
  • Country: 00
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #24 on: June 14, 2020, 03:31:10 pm »
8051 is 30 years old but still simple, Now a lot of new generation 1 cycle 8051,  Also I don't know any other processor so easy to manipulate bits.  Of course, this is for small projects directly linked to hardware manipulation, not to do a lot of processing, ethernet or anything like that, just the bare basic.

The moderns 8051 family arent' the ones of Original Intel, they have more power, fewer cycles, more megahertz, even a lot of ram, some works with only 1.2V, Other have USB all very cheap, A lot of timers. But remember still an 8-bit processor, Stm 8 is a very nice processor to.

For small thinks, 8051 is a winner, this is why a lot of chips have an internal 8051 to do basic things, arm M0 tries to get this space in nowadays, but simplicity is something that arm doesn't have, The venerable Cypress USB bridge has an 8051 inside,

8051 still alive, And you need to learn, at the least the basics of 8051, if you are in this business, Is a legacy thing that refuses to die
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #25 on: June 15, 2020, 12:30:01 am »
Also, why do you think STM8 is better than both (that hasn't necessarily been my experience).

In general, it seems like 8051 gets a bad rap lots of times and I'm not entirely sure why.  Is it the split memory architecture?  Or the lack of a good open source C compiler?  Yes, I know about SDCC (and use it all the time), but it really isn't all that efficient/optimized in the code it generates.  But, it is good enough for high level things, and one can always drop down to inline assembly for the places where speed and/or size efficiency is actually important.

Speaking of program size, 8051 is actually pretty optimal in this regard.  There are a lot of 1 byte instructions in addition to the 2 byte instructions, and very few 3 byte instructions.

Let's compare STM8 to MCS-51 then. SDCC supports both. For the comparison, I'll assume we need a few KB of RAM (i.e. large memory model for mcs51, medium memory model for stm8) and want full reentrancy as in the C standard (i.e. --stack-auto option for mcs51).
The STM8 has good support for pointers (flat address space, registers x and y), while MCS-51 has to juggle with memory spaces and go through dptr. Also, the STM8 has stackpointer-relative addressing. And the SDCC stm8 port has more fancy optimizations than the mcs51 one.

Looking at a benchmark, we can see what this means (Dhrystone, STM8AF at 16 MHz vs C8051 at 24.5 MHz):

stm8 code size is half of mcs51:
https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc-extra/historygraphs/dhrystone-stm8-size.svg
https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc-extra/historygraphs/dhrystone-mcs51-size.svg

Despite the C8051 being single-cycle and having 50% higher clock speed, the STM8 is 85% faster:
https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc-extra/historygraphs/dhrystone-stm8-score.svg
https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc-extra/historygraphs/dhrystone-mcs51-score.svg

The graphs are from SDCC, where they are used to track code size and speed to quickly notice regressions.

Thanks for the links, although to be honest, they seem to be more about how good SDCC is for a particular architecture over time than the architecture itself.  It looks like more work is being put into STM8 so it is on an upward trajectory (i.e. smaller code size and faster execution), while MCS51 may have had some regressions introduced that speaks to a downward trajectory (i.e. larger code size and slower execution).

Also, turning on --stack-auto for re-entrancy as well as using the medium or large memory models seems to go against the SDCC defaults and recommendations.  For the projects I have worked on, there wasn't a need to go with either of those.  Potentially those options are dramatically contributing to the increased code size and lower performance.  My observations from porting code originally written for AVR, is that it usually compiles to a smaller size when ported to 8051.  But, it could also be that I am also optimizing it in the process and it would be smaller regardless of destination architecture.

I fully agree that SDCC is not particularly good at generating optimal code for the MCS51 architecture.  What I'm not sure about is whether that is inherently because of the 8051 architecture, or just because  SDCC tries to work across so many architectures that it is hard to optimize for any one.  Or is it just because some architectures have had more interest, and therefore more work done on optimizing them.  Also, I'm not trying to pick on SDCC, I am very thankful it exists, and I can appreciate how difficult it must be to create and maintain a complex multi-architecture compiler.

I will say, however, that after looking at the generated code for MCS51, one has to wonder if the authors actually read through and understand the full 8051 instruction set.

Something simple like:
Code: [Select]
uint8_t i = 8;
do { /* ... */ } while (--i);

should compile down to some like this (4 bytes, 6 cycles) (Note all cycle counts here and below are for the N76E003, other MCUs are even better cycle wise):
Code: [Select]
mov r7,#0x08 ; 2 bytes, 2 cycles
00101$:
; ...
djnz r7,00101$ ; 2 byte, 4 cycles

why then does SDCC generate this (7 bytes, 8 cycles)?  Does SDCC not know about the DJNZ (decrement and jump if not zero) instruction?:
Code: [Select]
mov r7,#0x08 ; 2 bytes, 2 cycles
00101$:
; ...
mov a,r7 ; 1 byte, 1 cycle
dec a ; 1 byte, 1 cycle
mov r7,a ; 1 byte, 1 cycle
jnz 00101$ ; 2 bytes, 3 cycles

Or, consider this:
Code: [Select]
char __code lookup[] = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
char low_nibble_to_hex(uint8_t nibble) {
return lookup[nibble & 0xF];
}

for which SDCC generates (22 bytes, 31 cycles):
Code: [Select]
mov r7,dpl ; 2 bytes, 2 cycles
anl ar7,#0x0f ; 3 bytes, 4 cycles
mov r6,#0x00 ; 2 bytes, 2 cycles
mov a,r7 ; 1 byte, 2 cycles
add a,#_lookup ; 2 bytes, 2 cycles
mov dpl,a ; 2 bytes, 2 cycles
mov a,r6 ; 1 byte, 1 cycle
addc a,#(_lookup >> 8) ; 2 bytes, 2 cycles
mov dph,a ; 2 bytes, 2 cycles
clr a ; 1 byte, 1 cycle
movc a,@a+dptr ; 1 byte, 4 cycles
mov dpl,a ; 2 bytes, 2 cycles
ret ; 1 byte, 5 cycles

which could be as simple as (11 bytes, 19 cycles).  Does SDCC not know about the MOV DPTR, #address instruction?:
Code: [Select]
mov a,dpl ; 2 bytes, 3 cycles
anl a,#0x0f ; 2 bytes, 2 cycles
mov dptr,#_lookup ; 3 bytes, 3 cycles
movc a,@a+dptr ; 1 byte, 4 cycles
mov dpl,a ; 2 bytes, 2 cycles
ret ; 1 byte, 5 cycles

And, how about this:
Code: [Select]
void print(char __xdata *string) {
char c = *string;
while (c != 0) {
// ...
string++;
c = *string;
};
}

for which SDCC generates (23 bytes, 40 cycles, 28 cycles in the loop):
Code: [Select]
mov r6,dpl ; 2 bytes, 2 cycles
mov  r7,dph ; 2 bytes, 2 cycles
movx a,@dptr ; 1 byte, 4 cycles
mov r5,a ; 1 byte, 1 cycle
00101$:
mov a,r5 ; 1 byte, 1 cycles
jz 00104$ ; 2 bytes, 3 cycles
; ...
inc r6 ; 1 byte, 3 cycles
cjne r6,#0x00,00116$ ; 3 bytes, 4 cycles
inc r7 ; 1 byte, 3 cycles
00116$:
mov dpl,r6 ; 2 bytes, 2 cycles
mov dph,r7 ; 2 bytes, 2 cycles
movx a,@dptr ; 1 byte, 4 cycles
mov r5,a ; 1 byte, 1 cycle
sjmp 00101$ ; 2 bytes, 3 cycles
00104$:
ret ; 1 bytes, 5 cycles

which could be as simple as this (8 bytes, 20 cycles, 11 cycles in the loop). Does SDCC not know about the INC DPTR instruction?:
Code: [Select]
movx a,@dptr ; 1 byte, 4 cycles
00101$:
jz 00104$ ; 2 bytes, 3 cycles
; ...
inc dptr ; 1 byte, 1 cycles
movx a,@dptr ; 1 byte, 4 cycles
sjmp 00101$ ; 2 bytes, 3 cycles
00104$:
ret ; 1 bytes, 5 cycles

Or, how about this:
Code: [Select]
uint8_t div8(uint8_t a) {
return a / 8;
}

for which SDCC generates this (17+ bytes, 29+ cycles):
Code: [Select]
mov r7,dpl ; 2 bytes, 4 cycles
mov r6,#0x00 ; 2 bytes, 2 cycles
mov __divsint_PARM_2,#0x08 ; 3 bytes, 3 cycles
mov (__divsint_PARM_2 + 1),r6 ; 2 bytes, 3 cycles
mov dpl,r7 ; 2 bytes, 4 cycles
mov dph,r6 ; 2 bytes, 4 cycles
ljmp __divsint ; 3 bytes, 4 cycles (plus unknown bytes/cycles inside __divsint and a final 1 byte, 5 cycles for ret)

which could be as simple as this (10 bytes, 16 cycles).  Why does SDCC need to farm this out to a helper function?:
Code: [Select]
mov a, dpl ; 2 bytes, 3 cycles
mov b, #0x08 ; 3 bytes, 3 cycles
div ab ; 2 bytes, 3 cycles
mov dpl, a ; 2 bytes, 2 cycles
ret ; 1 byte, 5 cycles

or even as simple as this (9 bytes, 14 cycles).  Does SDCC not know that / 8 is the same as >> 3?:
Code: [Select]
mov a,dpl ; 2 bytes, 3 cycles
swap a ; 1 byte, 1 cycle
rl a ; 1 byte, 1 cycle
anl a,#0x1f ; 2 bytes, 2 cycles
mov dpl,a ; 2 bytes, 2 cycles
ret ; 1 byte, 5 cycles

Sorry if some of these seem a bit contrived, but they are all subsets of things I have had to manually optimize around in a recent project I have been working on.  And, they go to show how much of an impact a compiler implementation can have on the program size and execution speed.  I don't have experience using them, but from what I have read the IAR and Keil compilers generate better / more optimized code than SDCC.  Again, not trying to bash SDCC, just pointing out that what a particular compiler generates isn't the end-all be-all of a given processor architecture.

It would be an interesting exercise to take some real world code and hand optimize it to certain architectures to compare more real-world results.
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #26 on: June 15, 2020, 01:07:35 am »
I'm curious why you prefer Padauk over 8051? […]

In general, it seems like 8051 gets a bad rap lots of times and I'm not entirely sure why.  Is it the split memory architecture?  Or the lack of a good open source C compiler?  Yes, I know about SDCC (and use it all the time), but it really isn't all that efficient/optimized in the code it generates.  But, it is good enough for high level things, and one can always drop down to inline assembly for the places where speed and/or size efficiency is actually important.

Padauk having fewer memory spaces make the architecture somewhat cleaner.  On MCS-51 any pointer read / write has to go through support functions (or the programmer has to manually specifiy memory space, i.e. use non-standard extensions of C. For Padauk, the problem exists for pointer read only. It owuld be good if SDCC had better tracking of pointers, so the use of support functions could be optimized out in more cases. But often that would be very hard to do (e.g. passing pointers to a function defined in a different source file).

I have to admit that to some degree, this cleanliness in architecture comes at the cost of loss of power: The Padauks are limited to far less memory than MCS-51.


Yes, exactly.  Most of the Padauks are limited to less SRAM than the MCS-51's directly accessible lower 128 bytes of SRAM anyway.  And, usually the upper 128 bytes of SRAM in the MCS-51 is mostly utilized by the stack anyway, so is there really much difference other than the Padauks don't even have the possibility of using the slightly slower and harder to get at extended ram.

I have found that when I am writing code for the MCS-51, I usually know what memory type I am targeting anyway, so it hasn't been much issue really.  I usually leave the 128/256 bytes of SRAM for registers, scratch pad, function arguments, highly utilized small global variables, and the stack.  Everything else goes in XRAM.  And, it should be clear when you need to read from program code instead of SRAM/XRAM.  So, I just use the __code or __xram attributes whenever I am referencing a pointer unless I really really don't know where I am pointing too.  Sometimes, that means two different versions of a function, which sometimes is less code (and always faster) than using the generic pointers and figuring it out at run-time.

AVR has the same issue in that to read program code you have to use different instructions, hence you will see PROGMEM and associated helpers scattered through AVR/Arduino code.  I actually like the SDCC __code attribute way of doing it much better.

And it looks like the Padauks also require different instructions (LDTABH/LDTABL) that are only even available on the 15-bit (or higher?) MCUs.

Surely the Padauk with their limited SRAM is no better than 8051 in terms of memory architecture.  Even comparing 8051 to other architectures, I don't find the split memory to be all that big of a deal really.  In fact, it is kind of liberating.  The internal 128/256 bytes of SRAM can be thought of like a giant 'register' pool or scratch pad, and then one can use the larger xram for normal things like global variables that aren't accessed as often, or arrays where one has to access them indirectly anyway.  The 8051's instructions really aren't that different for accessing xram than what other architectures require for indirect access (i.e. mov dptr, #address; movx a, @dptr;, or movx @dptr, a; inc dptr).  This is very similar to AVR's X, Y, Z registers which are used as indirect pointers into SRAM.  Most 8051 MCUs these days have support for dual dptrs as well.  Maybe the biggest limitation is that the stack has to fit within the internal 256 byte of SRAM, but I haven't found that to be too much of a limit so far.

Though it is somewaht unfortunate that there are so many differnt ways of handling dual dptr. There is a document suggesting to split the exisitng mcs51 backend in SDCC into 5 different ones to cover the most common variants of dual dptr handling (https://sourceforge.net/p/sdcc/wiki/8051%20Variants/). But even then there are many more variants not yet covered. Manufacturers don't even stick to one single way for their product line.

Yes, the lack of a standard dual dptr implementation is unfortunate.  Splitting the MCS51 backend into 5 different ones sounds less than ideal.  Just spit-balling, what about just passing a flag in that defines a specific dual dptr variant with the default (if no flag is passed) being to not use a second dptr?

I agree that these Padauk MCUs are interesting and have their place, but I think they shine in a different area than 8051's and other MCUs.  To me, the main benefit is the low cost, the lower power consumption, and the fact that they are good enough for a lot of simple things.  But, the small SRAM/Flash size and lack of hardware peripherals certainly can be a limitation in many projects where spending a little bit more goes a long way.

The small SRAM size is clearly a limitation. The architecture tops out at 512B, but all devices I know about have at most 256B. Code memory is far less limited: The architecture supports up to 8KW of 16-bit memory, i.e. 16 KB, though all devices I know about have at most 4KW. I am not sure yet about the peripheral situation: The Padauk FPPA approach (i.e. hardware-multihthreading) allows to do a lot of stuff in software that otherwise would need a peripheral.

The largest code memory of any Padauk IC I have been interested in so far is 3KW of 15-bit words with 256 bytes of SRAM (PFS173).  For code, this isn't too big of a limitation really, but I'm curious how efficiently data is stored.  If I want to store an array of bytes in program code, does it have to use a full 15-bit word for each byte?

As for peripherals, yes the Padauk FPPA solution sounds interesting, but it will rarely if ever be better than dedicated hardware (for existing standards at least).  I can spin up an I2C slave that operates at 400kHz (actually seems to work fine up to at least 800 kHz) with just a few lines of code on a N76E003 ($0.20) IC.  Most of the time my program gets access to the full 16MHz and only gets interrupted occasionally when I2C data is ready to process.  Try doing that with a Padauk IC.  The FPPA approach, however, will take the 8MHz clock speed and divide it into effectively two 4MHz processors, so the main program code is going to run at slower speed, and the peripheral is going to have to use bit banging with a maximum of a 4 MHz clock as well.  This could be really interesting for custom defined protocols, but for already established protocols, I fail to see how that would ever be better (aside from the potentially cheaper cost of the Padauk).  Now, if we are talking about actual multitasking (not just duplicating a communications protocol), that could get interesting, although with the program size and memory limits being what they are, I'm not sure what would really even fit.
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #27 on: June 15, 2020, 03:35:13 am »
I don't have experience using them, but from what I have read the IAR and Keil compilers generate better / more optimized code than SDCC.  Again, not trying to bash SDCC, just pointing out that what a particular compiler generates isn't the end-all be-all of a given processor architecture.

I just downloaded the evaluation version of Keil C51 and put the 4 above examples into it (slightly modified for Keil syntax).  While not perfect, Keil generates much closer to what I posted as the ideal version in each case (some differences in how registers are used compared to SDCC).

Test code for Keil:
Code: [Select]
unsigned char code lookup[] = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
unsigned char high_nibble_to_hex(unsigned char byte) {
return lookup[byte >> 4];
}
unsigned char low_nibble_to_hex(unsigned char byte) {
return lookup[byte & 0x0F];
}

void print(unsigned char xdata *string) {
unsigned char c = *string;
while (c != 0) {
// ...
string++;
c = *string;
}
}

unsigned char div8(unsigned char a) {
return a / 8;
}

char xdata str_buf[10];

void main() {
unsigned char i = div8(64);
do {
str_buf[0] = high_nibble_to_hex(i);
str_buf[1] = low_nibble_to_hex(i);
str_buf[2] = 0x00;
print(str_buf);
} while (--i);
}

Keil compiles it to:
Code: [Select]
C:0x0003: void print(unsigned char xdata *string) {
; unsigned char c = *string;
mov dpl, r7
mov dph, r6
movx a,@dptr
mov r7,a
; while (c != 0) {
0009:
mov a,r7
jz 0011
; string++;
inc dptr
; c = *string;
movx a,@dptr
mov r7,a
; }
sjmp,0009
0011:
; }
ret

C:0x0012: unsigned char high_nibble_to_hex(unsigned char byte) {
; return lookup[byte >> 4];
mov a,r7
swap a
anl a,#0x0f
mov dptr,#lookup
movc a,@a+dptr
mov r7,a
ret

C:0x001c: unsigned char low_nibble_to_hex(unsigned char byte) {
; return lookup[byte & 0x0F];
mov a,r7
anl a,#0F
mov dptr,#lookup
movc a,@a+dptr
mov r7,a
ret

C0x0025: unsigned char div8(unsigned char a) {
; return a / 8;
mov a,r7
rrc a
rrc a
rrc a
anl a,#0x1f
mov r7,a
ret

C:0x0800: void main() {
; unsigned char i = div8(64);
mov r7,#0x40
lcall div8(C:0025)
mov r5,ar7
: do {
C:0807
; str_buf[0] = high_nibble_to_hex(i);
mov r7,ar5
lcall high_nibble_to_hex(C:0012)
movx dptr,#0x0000 (&str_buf[0])
mov a,r7
mov @dptr,a
; str_buf[1] = low_nibble_to_hex(i);
mov r7,ar5
lcall low_nibble_to_hex(C:001C)
movx dptr,#0x0001 (&str_buf[1])
mov a,r7
mov @dptr,a
; str_buf[2] = 0x00;
clr a
inc dptr
movx @dptr,a
; print(str_buf);
mov r6,#0x00 (&str_buf[0])
mov r7,#0x00 (&str_buf[0])
lcall print(C:0003)
; } while (--1);
djnz r5,c:0807
; }
ret

Here is the exact same code compiled by SDCC:
Code: [Select]
_high_nibble_to_hex: unsigned char high_nibble_to_hex(unsigned char byte) {
; return lookup[byte >> 4];
mov a,dpl
swap a
anl a,#0x0f
mov dptr,#_lookup
movc a,@a+dptr
; }
mov dpl,a
ret

_low_nibble_to_hex: unsigned char low_nibble_to_hex(unsigned char byte) {
mov r7,dpl
; return lookup[byte & 0x0F];
anl ar7,#0x0f
mov r6,#0x00
mov a,r7
add a,#_lookup
mov dpl,a
mov a,r6
addc a,#(_lookup >> 8)
mov dph,a
clr a
movc a,@a+dptr
; }
mov dpl,a
ret

_print: void print(unsigned char __xdata *string) {
; unsigned char c = *string;
mov r6,dpl
mov  r7,dph
movx a,@dptr
mov r5,a
; while (c != 0) {
00101$:
mov a,r5
jz 00104$
; string++;
inc r6
cjne r6,#0x00,00116$
inc r7
00116$:
; c = *string;
mov dpl,r6
mov dph,r7
movx a,@dptr
mov r5,a
sjmp 00101$
00104$:
; }
ret

_div8: unsigned char div8(unsigned char a) {
mov r7,dpl
; return a / 8;
mov r6,#0x00
mov __divsint_PARM_2,#0x08
mov (__divsint_PARM_2 + 1),r6
mov dpl,r7
mov dph,r6
ljmp __divsint

_main: void main() {
; unsigned char i = div8(64);
mov dpl,#0x40
lcall _div8
mov r7,dpl
; do {
00101$:
; str_buf[0] = high_nibble_to_hex(i);
mov dpl,r7
push ar7
lcall _high_nibble_to_hex
mov r6,dpl
pop ar7
mov dptr,#_str_buf
mov a,r6
movx @dptr,a
; str_buf[1] = low_nibble_to_hex(i);
mov dpl,r7
push ar7
lcall _low_nibble_to_hex
mov r6,dpl
mov dptr,#(_str_buf + 0x0001)
mov a,r6
movx @dptr,a
; str_buf[2] = 0x00;
mov dptr,#(_str_buf + 0x0002)
clr a
movx @dptr,a
; print(str_buf);
mov dptr,#_str_buf
lcall _print
pop ar7
; while (--i);
djnz r7,00101$
; }
ret

Interestingly enough, in this case SDCC did actually do some of the same optimizations, although not as many as Keil.  It's really weird that it chose to optimize high_nibble_to_hex and low_nibble_to_hex so differently.  In one case SDCC uses the 'mov dptr,#_lookup' shorthand, in the other case it blows it out into several instructions (Keil does it optimally for both).  The print function still isn't using 'inc dptr' for SDCC (but does for Keil).  SDCC is still using a division helper (but Keil does the shorter/quicker >> 3 syntax).  Both SDCC and Keil use 'mov dptr,#str_buf' syntax and 'djnz' in main (not sure why SDCC wasn't using it in my earlier tests), although SDCC has some extra push/pop code that Keil doesn't need (SDCC uses dpl/dph for function first argument where Keil seems to use r7/r6).

So, a lot of it definitely seems to come down to how optimized the compiler is for the given architecture.
« Last Edit: June 15, 2020, 05:58:59 am by serisman »
 

Offline greenpossum

  • Frequent Contributor
  • **
  • Posts: 408
  • Country: au
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #28 on: June 15, 2020, 04:19:10 am »
I assume you changed code to __code when moving from Keil to SDCC.
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #29 on: June 15, 2020, 04:25:53 am »
I assume you changed code to __code when moving from Keil to SDCC.
Yes, and xram to __xram.  Otherwise they were the same.
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #30 on: June 15, 2020, 04:29:47 am »
And, sorry for taking this thread so far off topic.  Although I do find it interesting comparing these low cost MCUs at an instruction set level.  They each have their pros/cons for sure.  And, learning when to use one vs. another is surely useful information to have.
 

Offline spth

  • Regular Contributor
  • *
  • Posts: 163
  • Country: de
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #31 on: June 15, 2020, 10:10:49 am »
And it looks like the Padauks also require different instructions (LDTABH/LDTABL) that are only even available on the 15-bit (or higher?) MCUs.

[…]

The largest code memory of any Padauk IC I have been interested in so far is 3KW of 15-bit words with 256 bytes of SRAM (PFS173).  For code, this isn't too big of a limitation really, but I'm curious how efficiently data is stored.  If I want to store an array of bytes in program code, does it have to use a full 15-bit word for each byte?

Accessing code memory is indeed different from accessing data memory. However, having two types of memory to read from, and one to write to makes code generation easier than on mcs51 where there are more.

To store objects in code memory, SDCC currently uses one word per byte. That is unlikely to change for pdk13 and pdk14, and probably also pdk15.  However, a future pdk16 backend is likely to use one work for for two bytes (reading via ldtabl and ldtabh).
 

Offline spth

  • Regular Contributor
  • *
  • Posts: 163
  • Country: de
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #32 on: June 15, 2020, 07:32:28 pm »
Thanks for the links, although to be honest, they seem to be more about how good SDCC is for a particular architecture over time than the architecture itself.  It looks like more work is being put into STM8 so it is on an upward trajectory (i.e. smaller code size and faster execution), while MCS51 may have had some regressions introduced that speaks to a downward trajectory (i.e. larger code size and slower execution).
Yes. In particular there were some changes in the front-end that stm8 apparently adapted to better. And of course there is still potential for more machine-independent optimizations in SDCC.
Quote
Also, turning on --stack-auto for re-entrancy as well as using the medium or large memory models seems to go against the SDCC defaults and recommendations.  For the projects I have worked on, there wasn't a need to go with either of those. 
For Dhrystone, we need the large memory model, as it uses about 5KB of RAM-
 

Offline ebclr

  • Super Contributor
  • ***
  • Posts: 2328
  • Country: 00
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #33 on: June 16, 2020, 08:19:30 am »
You did a good job, showing that the one to be blamed is sdcc compiler not 8051, I always use Keil, People who use SDCC are the same ones that use Linux. They don't want to pay for software, they prefer to get into a lot of restriction on a free thing, wich a much less friendly environment just to " supposed to be a freedom guy"
 

Offline spth

  • Regular Contributor
  • *
  • Posts: 163
  • Country: de
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #34 on: June 16, 2020, 08:31:37 am »
You did a good job, showing that the one to be blamed is sdcc compiler not 8051, I always use Keil, People who use SDCC are the same ones that use Linux. They don't want to pay for software, they prefer to get into a lot of restriction on a free thing, wich a much less friendly environment just to " supposed to be a freedom guy"

However, there are further aspects:
* You apparently confuse free-as-in-beer with free-as-in-freedom.
* Major MCS-51 hardware vendors paid Keil a lot of money to provide Keil licenses at no cost to users of their hardware, so often there is no monetary advantage in using SDCC.
* Keil claims ANSI-C compliance. Not only is the ANSi C standard ancient (1989), superseded by later versions, but Keil also does a poor job at ANSI-C compliance. SDCC on the other hand has reasonable support for the historic ANSI-C 89/ISO C90, ISO C95, ISO C99 standards, the current ISO C11/C17 standard, and is already working on support for the future C2x standard.
« Last Edit: June 16, 2020, 09:28:25 am by spth »
 

Offline spth

  • Regular Contributor
  • *
  • Posts: 163
  • Country: de
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #35 on: June 16, 2020, 09:57:40 am »
Yes, the lack of a standard dual dptr implementation is unfortunate.  Splitting the MCS51 backend into 5 different ones sounds less than ideal.  Just spit-balling, what about just passing a flag in that defines a specific dual dptr variant with the default (if no flag is passed) being to not use a second dptr?

This would not be a full split (like mcs51 vs ds390). Rather more like the z80-and-related backends (z80, gbz80, z180, ez80_z80, tlcs90, r2k, r3k, tlcs90) which still share most of the code.

So it would not result in much code-duplication in the compiler.

However, one would want different standard libraries, so standard library functions can take advantage of dual dptr.

But, IMO, at the moment, other improvements for the mcs51 backend are more urgent.

P.S.: AFAIK, Keil uses dual dptr for standard library functions only, not for code generation.
 

Offline spth

  • Regular Contributor
  • *
  • Posts: 163
  • Country: de
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #36 on: June 16, 2020, 10:26:37 am »
Interestingly enough, in this case SDCC did actually do some of the same optimizations, although not as many as Keil.  It's really weird that it chose to optimize high_nibble_to_hex and low_nibble_to_hex so differently.  In one case SDCC uses the 'mov dptr,#_lookup' shorthand, in the other case it blows it out into several instructions (Keil does it optimally for both).  The print function still isn't using 'inc dptr' for SDCC (but does for Keil).  SDCC is still using a division helper (but Keil does the shorter/quicker >> 3 syntax).  Both SDCC and Keil use 'mov dptr,#str_buf' syntax and 'djnz' in main (not sure why SDCC wasn't using it in my earlier tests), although SDCC has some extra push/pop code that Keil doesn't need (SDCC uses dpl/dph for function first argument where Keil seems to use r7/r6).

So, a lot of it definitely seems to come down to how optimized the compiler is for the given architecture.

I think you are seeing two main problems here:

1) SDCC sometimes makes bad choices as to what put into which register. It doesn't sufficiently take into account that some variables are often used as pointers, so they should better be in dptr, etc. This is due to using a gister allocator that at its core is still a simple linear scan allocator. The problem is known to SDCC developers, who came up with a better register allocator. Most ports have been converted to use the new allocator instead (AFAIK mcs51 and ds390 are the only remining ports that still use the old allocator).

2) In the division, SDCC somehow doesn't notice that the left operand is nonnegative. SDCC used to have an optimization for cases like this in the front-end, but it was buggy (it used to affect only sizeof in rare cases, but with ISO C11 _Generic, it became a bigger issue). So that optimization was removed, and replaced by other optimizations in later stages of the compiler. Apparently in your case, this doesn't work, and the division (of an int, due to integer promotion) is done signed (the /8 to >> 3 optimizationm is only valid for unsigned variables). The best solution would be for SDCC to implement generalized constant propagation, but something simpler should work in your case.
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #37 on: June 16, 2020, 05:05:43 pm »
And it looks like the Padauks also require different instructions (LDTABH/LDTABL) that are only even available on the 15-bit (or higher?) MCUs.

[…]

The largest code memory of any Padauk IC I have been interested in so far is 3KW of 15-bit words with 256 bytes of SRAM (PFS173).  For code, this isn't too big of a limitation really, but I'm curious how efficiently data is stored.  If I want to store an array of bytes in program code, does it have to use a full 15-bit word for each byte?

Accessing code memory is indeed different from accessing data memory. However, having two types of memory to read from, and one to write to makes code generation easier than on mcs51 where there are more.

To store objects in code memory, SDCC currently uses one word per byte. That is unlikely to change for pdk13 and pdk14, and probably also pdk15.  However, a future pdk16 backend is likely to use one work for for two bytes (reading via ldtabl and ldtabh).

I was curious how it was even possible to reference program code with PDK13/14 that don't have the LDTABH/LDTABL instructions.  I found the SDCC code for __gptrget, which is certainly a interesting solution to the problem (manipulating the SP and jumping to code that places a value on the accumulator and returns).  But this is nowhere near as clean or efficient as what can be accomplished on any MCS51 MCU, where a simple mov dptr, #base_addr; mov a, #offset; movc a, @a+dptr; can be used.  Is there a reason that PDK15 in SDCC isn't using the simpler LDTABL instruction?
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #38 on: June 16, 2020, 05:12:46 pm »
Also, turning on --stack-auto for re-entrancy as well as using the medium or large memory models seems to go against the SDCC defaults and recommendations.  For the projects I have worked on, there wasn't a need to go with either of those. 
For Dhrystone, we need the large memory model, as it uses about 5KB of RAM-
[/quote]

Ahh.  Yeah, my projects either haven't needed that much RAM to begin with, or are mostly using RAM as a buffer which means I have been able to get away with the small memory mode so farl.  I could certainly see that more complex projects with heavy data processing or deep nested function requirements would benefit from a different architecture.  I would probably reach for a 32-bit ARM at that point (which can be obtained for as low as $0.40 or so).
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #39 on: June 16, 2020, 05:26:29 pm »
You did a good job, showing that the one to be blamed is sdcc compiler not 8051, I always use Keil, People who use SDCC are the same ones that use Linux. They don't want to pay for software, they prefer to get into a lot of restriction on a free thing, wich a much less friendly environment just to " supposed to be a freedom guy"

However, there are further aspects:
* You apparently confuse free-as-in-beer with free-as-in-freedom.
* Major MCS-51 hardware vendors paid Keil a lot of money to provide Keil licenses at no cost to users of their hardware, so often there is no monetary advantage in using SDCC.
* Keil claims ANSI-C compliance. Not only is the ANSi C standard ancient (1989), superseded by later versions, but Keil also does a poor job at ANSI-C compliance. SDCC on the other hand has reasonable support for the historic ANSI-C 89/ISO C90, ISO C95, ISO C99 standards, the current ISO C11/C17 standard, and is already working on support for the future C2x standard.

Yes, I agree.

I was not trying to blame SDCC.  I really was just trying to point out that a compiler's implementation can have a dramatic impact on performance and perception of an architecture, but it isn't the final word.

As I already stated, I am really thankful that we have SDCC as an option to begin with.  It isn't perfect (but neither is Keil).  And as long as one is aware of the limitations, they can usually be worked around.  Yes, ideally SDCC would be better, but that requires someone knowledgeable enough and caring enough to spend their time enhancing it.

I have paid for compilers in the past and would do so again if/when it made sense.  I remember buying Borland Turbo C/C++ back in the day, when it came on several 3.5" floppies.  Now a-days we have better open source options (GCC) thankfully.  I don't like it that Keil doesn't even show a price on their website.  You have to request a quote, which immediately turns me off as a consumer.  And their artificial limit of 2KB program code for the evaluation version is way too restrictive to entice me to even give it a fair shake.  Maybe if that limit was 8KB or 16KB or so I would be more interested in giving it a real evaluation (i.e. learn it and use it for a real project).  I only downloaded it the other day to see if it was any better at generating more optimal code.
 

Offline spth

  • Regular Contributor
  • *
  • Posts: 163
  • Country: de
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #40 on: June 16, 2020, 05:27:37 pm »
Is there a reason that PDK15 in SDCC isn't using the simpler LDTABL instruction?

In the long term, the pdk15 backend will get ldtabl support.
However, for now it just wasn't worth the effort required:

* We already have a working solution with the stack hack (needed for pdk13 and pdk14 anyway)
* ldtabl needs its operand to be 16-bit aligned
* We don't want to change the alignment of all pointers to 16 bit in SDCC, as this would waste space in structs where a pointer follows members that are an odd number of bytes.
* So some pointers will not be aligned.
* This means we need a temporary 16-bit-aligned location we can use for pointers - this location would need to be saved on interrupts, increasing interrupt latency
* For most efficient code, we'd want to track alignment, i.e. SDC should know, which pointers are actually 16-bit aligned, so it could use ldtabl for those.
* On the other hand, for better support of __sfr16, we are in a similar situation (16-bit-aligned operand required for some instructions).
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #41 on: June 16, 2020, 05:30:54 pm »
Yes, the lack of a standard dual dptr implementation is unfortunate.  Splitting the MCS51 backend into 5 different ones sounds less than ideal.  Just spit-balling, what about just passing a flag in that defines a specific dual dptr variant with the default (if no flag is passed) being to not use a second dptr?

This would not be a full split (like mcs51 vs ds390). Rather more like the z80-and-related backends (z80, gbz80, z180, ez80_z80, tlcs90, r2k, r3k, tlcs90) which still share most of the code.

So it would not result in much code-duplication in the compiler.

However, one would want different standard libraries, so standard library functions can take advantage of dual dptr.

But, IMO, at the moment, other improvements for the mcs51 backend are more urgent.

P.S.: AFAIK, Keil uses dual dptr for standard library functions only, not for code generation.

Ok, interesting.

I could see where dual dptr would be easier to implement in standard library functions (i.e. memcpy could potentially benefit from it) even it it wasn't ready to be implemented for user code.

And, I agree that other improvements for MCS51 are more urgent than dual dptr.  I have rarely had the need for dual dptr to begin with, but I run across the need for other optimizations all the time.
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #42 on: June 16, 2020, 05:36:34 pm »
Interestingly enough, in this case SDCC did actually do some of the same optimizations, although not as many as Keil.  It's really weird that it chose to optimize high_nibble_to_hex and low_nibble_to_hex so differently.  In one case SDCC uses the 'mov dptr,#_lookup' shorthand, in the other case it blows it out into several instructions (Keil does it optimally for both).  The print function still isn't using 'inc dptr' for SDCC (but does for Keil).  SDCC is still using a division helper (but Keil does the shorter/quicker >> 3 syntax).  Both SDCC and Keil use 'mov dptr,#str_buf' syntax and 'djnz' in main (not sure why SDCC wasn't using it in my earlier tests), although SDCC has some extra push/pop code that Keil doesn't need (SDCC uses dpl/dph for function first argument where Keil seems to use r7/r6).

So, a lot of it definitely seems to come down to how optimized the compiler is for the given architecture.

I think you are seeing two main problems here:

1) SDCC sometimes makes bad choices as to what put into which register. It doesn't sufficiently take into account that some variables are often used as pointers, so they should better be in dptr, etc. This is due to using a gister allocator that at its core is still a simple linear scan allocator. The problem is known to SDCC developers, who came up with a better register allocator. Most ports have been converted to use the new allocator instead (AFAIK mcs51 and ds390 are the only remining ports that still use the old allocator).

Honestly, you are above my head here (I am not a compiler expert), but it sounds good to me.   Do you know if there is an intended timetable for converting MCS51 to the use the new register allocator?

2) In the division, SDCC somehow doesn't notice that the left operand is nonnegative. SDCC used to have an optimization for cases like this in the front-end, but it was buggy (it used to affect only sizeof in rare cases, but with ISO C11 _Generic, it became a bigger issue). So that optimization was removed, and replaced by other optimizations in later stages of the compiler. Apparently in your case, this doesn't work, and the division (of an int, due to integer promotion) is done signed (the /8 to >> 3 optimizationm is only valid for unsigned variables). The best solution would be for SDCC to implement generalized constant propagation, but something simpler should work in your case.

Yeah, makes sense.  I was only using unsigned variables, but apparently SDCC doesn't realize that or have that optimization at the moment.  I can always write the code as >> 3, (or << 3 for multiplication) and SDCC will generate better code.  It is just less clear what my code is trying to accomplish when written that way.
 

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #43 on: June 16, 2020, 05:38:24 pm »
Is there a reason that PDK15 in SDCC isn't using the simpler LDTABL instruction?

In the long term, the pdk15 backend will get ldtabl support.
However, for now it just wasn't worth the effort required:

* We already have a working solution with the stack hack (needed for pdk13 and pdk14 anyway)
* ldtabl needs its operand to be 16-bit aligned
* We don't want to change the alignment of all pointers to 16 bit in SDCC, as this would waste space in structs where a pointer follows members that are an odd number of bytes.
* So some pointers will not be aligned.
* This means we need a temporary 16-bit-aligned location we can use for pointers - this location would need to be saved on interrupts, increasing interrupt latency
* For most efficient code, we'd want to track alignment, i.e. SDC should know, which pointers are actually 16-bit aligned, so it could use ldtabl for those.
* On the other hand, for better support of __sfr16, we are in a similar situation (16-bit-aligned operand required for some instructions).

Cool, thanks for the explanation.
 

Offline spth

  • Regular Contributor
  • *
  • Posts: 163
  • Country: de
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #44 on: June 16, 2020, 05:46:54 pm »
Do you know if there is an intended timetable for converting MCS51 to the use the new register allocator?
There is no timetable. The closest to a timetable SDCC has is considering a bug as release-critical, but there is no equivalent for features and lesser bugs.
Quote
2) In the division, SDCC somehow doesn't notice that the left operand is nonnegative. SDCC used to have an optimization for cases like this in the front-end, but it was buggy (it used to affect only sizeof in rare cases, but with ISO C11 _Generic, it became a bigger issue). So that optimization was removed, and replaced by other optimizations in later stages of the compiler. Apparently in your case, this doesn't work, and the division (of an int, due to integer promotion) is done signed (the /8 to >> 3 optimizationm is only valid for unsigned variables). The best solution would be for SDCC to implement generalized constant propagation, but something simpler should work in your case.

Yeah, makes sense.  I was only using unsigned variables, but apparently SDCC doesn't realize that or have that optimization at the moment.  I can always write the code as >> 3, (or << 3 for multiplication) and SDCC will generate better code.  It is just less clear what my code is trying to accomplish when written that way.

With a quick test I just found: For this case, dividing by an unsigned number works, i.e. / 8u instead of / 8. Then integer promotion promotes the left operand to unsigned instead of int, so SDCC still recognizes it as division of an unsigned number.
 
The following users thanked this post: serisman

Offline serisman

  • Regular Contributor
  • *
  • Posts: 100
  • Country: us
Re: EEVblog #1306 (1 of 5): 3 Cent Padauk Micro - Open Source Programmer
« Reply #45 on: June 16, 2020, 06:00:58 pm »
2) In the division, SDCC somehow doesn't notice that the left operand is nonnegative. SDCC used to have an optimization for cases like this in the front-end, but it was buggy (it used to affect only sizeof in rare cases, but with ISO C11 _Generic, it became a bigger issue). So that optimization was removed, and replaced by other optimizations in later stages of the compiler. Apparently in your case, this doesn't work, and the division (of an int, due to integer promotion) is done signed (the /8 to >> 3 optimizationm is only valid for unsigned variables). The best solution would be for SDCC to implement generalized constant propagation, but something simpler should work in your case.

Yeah, makes sense.  I was only using unsigned variables, but apparently SDCC doesn't realize that or have that optimization at the moment.  I can always write the code as >> 3, (or << 3 for multiplication) and SDCC will generate better code.  It is just less clear what my code is trying to accomplish when written that way.

With a quick test I just found: For this case, dividing by an unsigned number works, i.e. / 8u instead of / 8. Then integer promotion promotes the left operand to unsigned instead of int, so SDCC still recognizes it as division of an unsigned number.

Thanks!  I'll have to try that out later.  I haven't been using that syntax for constants, but it makes sense why it might be required.
« Last Edit: June 16, 2020, 06:04:17 pm by serisman »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf