Author Topic: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?  (Read 2399 times)

0 Members and 1 Guest are viewing this topic.

Offline pastaclub

  • Contributor
  • Posts: 5
  • Country: th
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #25 on: January 01, 2021, 12:16:28 pm »
A statement that has been repeated here many times is how such processors waste flash, because the limitations of the architecture lead to bloated programs. They do lead to bloated programs - but some people seem to be unaware that the "terrible 3 cent microcontroller" does NOT HAVE any flash. Those devices are one-time-programmable. When programming them with high voltage, internal fuses are burnt and you cannot erase them or re-program them.

I am not sure how such a PROM compares to flash memory in terms of cost, but I suppose it's much cheaper and this is why they chose to minimize the area of the processor at the (cheaper) expense of needing more ROM. Padauk does have a few devices that can be programmed multiple-times. These cost twice as much. Most of their devices are OTP.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 5841
  • Country: fr
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #26 on: January 01, 2021, 04:42:36 pm »
A statement that has been repeated here many times is how such processors waste flash, because the limitations of the architecture lead to bloated programs. They do lead to bloated programs - but some people seem to be unaware that the "terrible 3 cent microcontroller" does NOT HAVE any flash. Those devices are one-time-programmable. When programming them with high voltage, internal fuses are burnt and you cannot erase them or re-program them.

I am not sure how such a PROM compares to flash memory in terms of cost, but I suppose it's much cheaper and this is why they chose to minimize the area of the processor at the (cheaper) expense of needing more ROM. Padauk does have a few devices that can be programmed multiple-times. These cost twice as much. Most of their devices are OTP.

Thanks for pointing this out.
My point about separating a Flash die was more general - as I said, I didn't know much about Padauk MCUs. But generally speaking, a Flash chip on a separate die can make sense financially speaking, as CMOS processes that allow embedding Flash cells are signficantly more expensive than CMOS processes without that ability. So depending on the overall area, the number of pads, etc, it may be cheaper to separate the dies.

OTP ROM, OTOH, can be implemented on standard CMOS processes, so that's indeed cheaper in any case.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1934
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #27 on: January 01, 2021, 11:31:09 pm »
A statement that has been repeated here many times is how such processors waste flash, because the limitations of the architecture lead to bloated programs. They do lead to bloated programs - but some people seem to be unaware that the "terrible 3 cent microcontroller" does NOT HAVE any flash. Those devices are one-time-programmable. When programming them with high voltage, internal fuses are burnt and you cannot erase them or re-program them.

That's irrelevant to the principle of the discussion. Storage for the program costs money, regardless of the particular form it takes -- OTP, mask ROM, EPROM, flash, SRAM. The ratio of the cost of a bit of program space to a gate in the CPU core varies, but probably not by all that much -- and flash is probably the cheapest.

Regardless of the relative costs, above some program size the cost of an inefficient program coding outweighs the cost of making a better CPU core. That point might be at 1000 instructions, at 10,000 instructions, or at 100,000 instructions.

I would be surprised if it lies outside that range.

Especially when we're talking about the difference between, say {8080, 6502, 8051, PIC} vs say low complexity implementations of {PDP-11, MSP430, CM0, RV32E}

I've been trying to understand where the point is that it's worth adding the "C" extension for 16 bit opcodes to a RISC-V core. The 16 bit to 32 bit decoder takes something like 400 6-LUTs and saves 25%-30% from program size.  As a crude estimate, if you used those LUTs as program memory instead then that's about 3 KB of RAM-equivalent. To save 3 KB of program size you'd need the RV32I program to be 10 KB to 12 KB, or 2500 to 3000 instructions.

 
The following users thanked this post: I wanted a rude username

Offline westfw

  • Super Contributor
  • ***
  • Posts: 3344
  • Country: us
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #28 on: January 02, 2021, 01:37:53 am »
Quote
some people seem to be unaware that the "terrible 3 cent microcontroller" does NOT HAVE any flash. Those devices are one-time-programmable. When programming them with high voltage, internal fuses are burnt
I would be surprised if the Padauk program memory is not "flash", by most technical definitions.  They've just left out the (significant) additional circuitry needed to erase it.
Actual "fuse" based PROM memory hasn't been used for ages, and I don't think it ever came close to achieving the density of even the low-end padauk parts (~6kbits.)
https://youtu.be/5jw5D0F008c
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 3344
  • Country: us
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #29 on: January 02, 2021, 02:10:31 am »
Quote
for any fixed number of conveniently addressed short term variable locations, code for any given construct is longer for an accumulator machine than for a 2-address or 3-address machine because the same number (or more) of addresses are needed in the code but more opcodes are needed.
Intesting.  I'd been thinking about 'bloat' solely as a software issues, where I meant something like "the code produced is larger than it needs to be to accomplish the results, because the architecture doesn't match the abstraction that the language, compiler, or programmer had in mind."   Like "we must allow recursion, so we need a stack, so I will have to simulate one."  Or, an interesting blip - the RCA1802 CPU had a non-traditional function-call mechanism, which meant that almost everyone implemented a "bloated" mechanism to bend it into more traditional forms.
Measuring bloat in "total bits required per task" is a different thing entirely ("architecture bloat?"), and I need to think about that...


Quote
I think the major problem is such CPUs were optimized for fairly simple tasks such as block copy/compare or multi-precision arithmetic
Hmm.  Maybe for 8080 and 6502.  I'm pretty sure PICs and probably 8051 were optimized to twiddle pins...
Set an individual pin on chip to 1:
low-end PIC, Padauk: 12bits, no registers.
Cortex M0: (usual case) 96bits, two registers.


Hmm.  RISC-V has an extensible instruction set, right?  Is there an "IO controller extension" ?  I guess it would permit "immediate" addresses and values within a small subset of the address space, sort-of like the "Virtual ports" on newer AVR chips.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1934
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #30 on: January 02, 2021, 03:14:49 am »
[
Quote
I think the major problem is such CPUs were optimized for fairly simple tasks such as block copy/compare or multi-precision arithmetic
Hmm.  Maybe for 8080 and 6502.  I'm pretty sure PICs and probably 8051 were optimized to twiddle pins...
Set an individual pin on chip to 1:
low-end PIC, Padauk: 12bits, no registers.
Cortex M0: (usual case) 96bits, two registers.

Assuming memory-mapped I/O (and you have the freedom to design the memory map and I/O hardware):

6502: 48 bits for a location in ZP, 64 bits for absolute, 56 bits if there's a pointer in ZP (including a LDY #0). If you provide SET and RESET registers on your GPIO adapter then 40 bits for absolute, etc. If you do bit-banding of SET and RESET locations then 24 bits.

RV32IC: 16 bits for up to 32 I/O pins if you do bit-banding of SET and RESET locations and are willing to keep the base address of the I/O space in a dedicated register. An extra 16 bits to load the base address of the IO region if it's of the form {-32..31}*4096.


Quote
Hmm.  RISC-V has an extensible instruction set, right?  Is there an "IO controller extension" ?  I guess it would permit "immediate" addresses and values within a small subset of the address space, sort-of like the "Virtual ports" on newer AVR chips.

I'm not aware of such an extension or working group. There's a "Code Size" working group that is concerned with matching and beating (mostly) Thumb2 in code size, at the expense of a slightly more complex core.\

You could of course add custom SET/RESET BIT instructions if you wanted to. You could hard-wire the I/O address space base address into the instruction, and use the reserved space in RVC 100xxxxxxxxxxx00 or reuse the opcodes for floating point or 64/128 bit load/store xx1xxxxxxxxxxx10 if you have a 32 bit CPU with no FP.

If performance is not super-critical (and bit twiddling usually isn't if you have a 100+ MHz CPU) then you could trap and emulate those on an unmodified CPU.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1934
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #31 on: January 02, 2021, 03:27:02 am »
More efficient than trap-and-emulate of course is a function call. That limits the code size for pretty much anything to a load immediate and a JSR -- five bytes on 6502 (for up to 256 GPIOs, subroutine anywhere in memory) or 4 bytes on RV32IC (for up to 64 GPIOs, subroutine within +/- 2 KB).

It seems to me though that only very very small programs are going to have their code size materially affected by the size of code to twiddle an IO pin. It's not really a sensible thing to optimize for except on the absolutely tiniest devices.
 

Offline jklasdf

  • Regular Contributor
  • *
  • Posts: 66
  • Country: us
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #32 on: January 11, 2021, 05:46:01 am »
Quote
some people seem to be unaware that the "terrible 3 cent microcontroller" does NOT HAVE any flash. Those devices are one-time-programmable. When programming them with high voltage, internal fuses are burnt
I would be surprised if the Padauk program memory is not "flash", by most technical definitions.  They've just left out the (significant) additional circuitry needed to erase it.
Actual "fuse" based PROM memory hasn't been used for ages, and I don't think it ever came close to achieving the density of even the low-end padauk parts (~6kbits.)
https://youtu.be/5jw5D0F008c

I'm curious if you have any other source that the Padauk OTP microcontrollers are actually just flash without the circuitry to erase it (and also there are a huge number of other asian OTP microcontrollers cropping up recently on LCSC)? I was always under the impression that flash (both the cells themselves along with support circuitry like the charge pumps needed to generate the operating voltages) is poorly suited to modern CMOS processes. Hence the large number of flash microcontrollers that are essentially two separate dies in the same package.

At least one of the commenters on the electonupdate youtube video/blog seems to think these are not using flash
https://www.youtube.com/watch?v=5jw5D0F008c&lc=UgzjpUvRw-gayq-YV554AaABAg.8zs_DySsHKw9-H4ln1iw53

There are antifuses that are implementable in standard CMOS processes that will do several megabits (may not be the same "fuse" based PROM technology you're referring to)
https://news.synopsys.com/2018-01-10-Synopsys-Expands-DesignWare-IP-Portfolio-with-Acquisition-of-Kilopass-Technology
https://www.synopsys.com/designware-ip/technical-bulletin/non-volatile-memory-dwtb-q418.html
 

Offline jklasdf

  • Regular Contributor
  • *
  • Posts: 66
  • Country: us
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #33 on: January 11, 2021, 06:01:56 am »
A statement that has been repeated here many times is how such processors waste flash, because the limitations of the architecture lead to bloated programs. They do lead to bloated programs - but some people seem to be unaware that the "terrible 3 cent microcontroller" does NOT HAVE any flash. Those devices are one-time-programmable. When programming them with high voltage, internal fuses are burnt and you cannot erase them or re-program them.

That's irrelevant to the principle of the discussion. Storage for the program costs money, regardless of the particular form it takes -- OTP, mask ROM, EPROM, flash, SRAM. The ratio of the cost of a bit of program space to a gate in the CPU core varies, but probably not by all that much -- and flash is probably the cheapest.

Regardless of the relative costs, above some program size the cost of an inefficient program coding outweighs the cost of making a better CPU core. That point might be at 1000 instructions, at 10,000 instructions, or at 100,000 instructions.

I would be surprised if it lies outside that range.

Especially when we're talking about the difference between, say {8080, 6502, 8051, PIC} vs say low complexity implementations of {PDP-11, MSP430, CM0, RV32E}

I've been trying to understand where the point is that it's worth adding the "C" extension for 16 bit opcodes to a RISC-V core. The 16 bit to 32 bit decoder takes something like 400 6-LUTs and saves 25%-30% from program size.  As a crude estimate, if you used those LUTs as program memory instead then that's about 3 KB of RAM-equivalent. To save 3 KB of program size you'd need the RV32I program to be 10 KB to 12 KB, or 2500 to 3000 instructions.

A lot of Padauk's initial work when it was founded was with multi-core processor arrays: https://jaycarlson.net/2019/09/06/whats-up-with-these-3-cent-microcontrollers/

The low complexity of their cores might be a holdover from that work, and not from trying to optimize the total silicon size of their current microcontrollers.
 

Offline bson

  • Supporter
  • ****
  • Posts: 1820
  • Country: us
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #34 on: January 12, 2021, 11:02:06 pm »
There are antifuses that are implementable in standard CMOS processes that will do several megabits (may not be the same "fuse" based PROM technology you're referring to)
Megabits?  If so it's probably quite slow.  I've been under the impression that antifuses are problematic in that they don't permit selectively driving rows, or more specifically, when you drive a row you end up driving every row with a diode connection to it.  Wide rows then result in more cross diode connections.  As a result of all these interconnected rows you get really high capacitances, which causes access times to grow with the array size.  But it might be fine for a 4k OTP memory at what, 4MHz?  :-//
 

Offline jklasdf

  • Regular Contributor
  • *
  • Posts: 66
  • Country: us
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #35 on: January 13, 2021, 09:25:39 pm »
There are antifuses that are implementable in standard CMOS processes that will do several megabits (may not be the same "fuse" based PROM technology you're referring to)
Megabits?  If so it's probably quite slow.  I've been under the impression that antifuses are problematic in that they don't permit selectively driving rows, or more specifically, when you drive a row you end up driving every row with a diode connection to it.  Wide rows then result in more cross diode connections.  As a result of all these interconnected rows you get really high capacitances, which causes access times to grow with the array size.  But it might be fine for a 4k OTP memory at what, 4MHz?  :-//

What about antifuses prevents organizing them into rows for readout? Some antifuse technologies do use diodes sure, but I don't think that that fundamentally prevents you from making a design that can address and read out rows independently. Maybe it was just whatever specific implementation you were working with, everything was really just one huge "row"?

Let's say you have an antifuse array on an IC and you want to add another one on the same IC, completely independent. Are you saying it would be completely impossible to add another completely independent antifuse array, and if so what prevents it? Just being on the same die? For ICs with antifuses, would all the antifuses on the ICs in a wafer be linked somehow (can't read out one unless you readout all of them) until they're diced?

I think similar to a RAM array (what I usually think about for addressing and reading out rows), where you can have rows of x4, x8, x16, etc, you could theoretically have extremely wide rows (not even that "wide" if the total memory is small), but there's also nothing preventing you from organizing things into smaller "rows" for readout for practical reasons.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 3344
  • Country: us
Re: Padauk: why use 16 bit pointers when 8 bit can address the whole RAM?
« Reply #36 on: January 14, 2021, 01:58:11 am »
Quote
I'm curious if you have any other source that the Padauk OTP microcontrollers are actually just flash without the circuitry to erase it
No, I don't.  Presumably a sufficiently high-magnification and knowledgeable viewer could decide.
I'm basing the assumption on never having seen anyone else produce a microcontroller specifically advertised as have antifuse-based OTP memory (it's common for FPGAs.)
It does seem to be possible; I can find a couple articles on the virtues of antifuse technology, notably:
https://www.eetimes.com/anti-fuse-memory-provides-robust-secure-nvm-option/and some scholarly articles on relatively high-density antifuse memorys (here's a 2006 paper on a microcontroller with a 32k antifuse memory and a 16bit CPU: https://koasas.kaist.ac.kr/bitstream/10203/22026/1/000240077100016.pdf )(others are paywalled.  Sigh.)
From the EETimes article:
Quote
Anti-fuse NVM became practical when standard logic CMOS arrived at the 180-nm process node. This was the first process node for which the gate oxide breakdown voltage was less than that of the junction breakdown voltage. With each successively smaller process geometry, the gate oxide breakdown voltage continues to decrease along with transistor dimensions and the oxide thickness. Thus, anti-fuse technology has the benefit of improving with each new process generation: more bit cells per area, less power consumed to write and read the memory, and increased reliability as a result of less power consumed during operation.
That article was from "kilopass", apparently a semicondutor IP company that has since been acquired by Synopsys.
So they had some vested interest in selling their particular technology.  Perhaps high-density antifuse memory is mired in a swamp of IP issues that prevent its use in the western world (and Padauk/China happily ignores?)Here are some more recent items from Synopsis:https://search.synopsys.com/?q=antifuse&lang-id=en
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf