8-bit uC - is there even a point?

#150 Reply
Posted by coppice on 30 Sep, 2018 12:44
Quote from: westfw on 30 Sep, 2018 05:35
I don't know if there are any 6800 microcontrollers, but the 6805 line (which is quite similar architecturally) spurred several microcontroller families, including S08 and S12 families that are still labeled as current, rather than "legacy."
The 6801 was an MCU built around the 6800 core. Later on, the 6805 MCUs used a stripped down version of the 6800 core, and the 6811 MCUs used an enhanced version of the 6800 core. There were other variants, some from Hitachi and ST.
Quote from: westfw on 30 Sep, 2018 05:35
I don't know of any 6502-like microcontrollers.
I believe WDC and others put the 6502 core in some MCUs.

#151 Reply
Posted by langwadt on 30 Sep, 2018 15:06
Quote from: legacy on 30 Sep, 2018 12:28
Quote from: Siwastaja on 30 Sep, 2018 08:45
ARM architecture is easy to understand, what comes to the interrupt vector table, how the interrupts work, and how to use them. Also register pushing on interrupt entry. I find all this simpler than on AVR, although AVR is quite simple as well. Having prioritized interrupts that can pre-empt others, IMO, makes everything easier. You can write longer ISRs with lower priorities, and make timing critical ISRs pre-empt them; you have more tools in your box to work different ways case-by-case. Using two lines of code to set the interrupt priority and enable it is not complex.

In avionics we have constraints and ISRs pre-empt is extremely bad for us, hence strictly prohibited.
I don't want to give you a "no-go", simply a hint: I would put extra careful in abusing of this feature.

so no OS allowed?

#152 Reply
Posted by legacy on 30 Sep, 2018 15:22
Quote from: langwadt on 30 Sep, 2018 15:06
so no OS allowed?

no-go for nested interrupts, no-go for preempted interrupts. Our directives are simple: interrupts are allowed but! when an interrupt happens it MUST not be interrupted.

#153 Reply
Posted by langwadt on 30 Sep, 2018 15:42
Quote from: legacy on 30 Sep, 2018 15:22
Quote from: langwadt on 30 Sep, 2018 15:06
so no OS allowed?

no-go for nested interrupts, no-go for preempted interrupts. Our directives are simple: interrupts are allowed but! when an interrupt happens it MUST not be interrupted.

a preemptive OS is basically a preempted interrupt

#154 Reply
Posted by maginnovision on 30 Sep, 2018 16:31

Quote from: GeorgeOfTheJungle on 30 Sep, 2018 07:32

Quote from: maginnovision on 30 Sep, 2018 06:42
Even before cache and flash issues the interrupt overhead was already too high to bother.

[url]https://community.arm.com/processors/b/blog/posts/beginner-guide-on-interrupt-latency-and-interrupt-latency-of-the-arm-cortex-m-processors[/url]

For comparison, here's a famous 8 bitter:

[url]http://6502.org/tutorials/interrupts.html#a[/url]
[url]http://6502.org/tutorials/interrupts.html#1.3[/url]

A copy-paste of a table from there (clearly biased):

1802 RCA ? Most instructions take 16 clocks (6.4μs), some, 24 (9.6μs). 2.5MHz @ 5V.
8080 Intel ? (still waiting for information)
8088 Intel 10 bus cycles or 40 clocks(?) (B)+(C) (still waiting for further information)
8086 Intel WDC says 182 clocks max total latency. * * * * * (still waiting for information)
Z8 Zilog IRET (E) takes 16 execution cycles. I don't know how many clock cycles per execution cycle. 8MHz?
Z80 Zilog 11-19 clocks (B)+(C) depending on mode, or 2.75-4.75μs @ 4MHz. RETI (E) is 14 clocks, or 3.5μs @ 4MHz.
Z8000 Zilog IRET (E) takes 13 cycles in non-segmented mode, and 16 in segmented mode. I don't know if that's
instruction cycles or clock cycles.
8048 Intel (?) return (E) is 2.7μs @ 11MHz
8051 Dallas 1.5μs @ 33MHz (52 clocks) latency
8051 Intel 1.8μs (C) min @ 20MHz. 5.4μs (A)+(C) max total latency @ 20MHz. (3-9μs @ 12MHz.)
Interrupt sequence (C) and return (E) take 4.6μs @ 20MHz ST
80C51XA Philips 2.25μs for interrupt+return (C)+(E) @ 20MHz. ST
Instructions 2-24 cy, or 0.1-1.2μs. Avg 5-6 cy, or around 0.27μs.
KS88 Samsung 3μs for interrupt+return (C)+(E) @ 8MHz ST
Instructions 6-28 cy, or 0.75-2.5μs. Avg 11 cy, or 1.38μs.
78K0 NEC 4.3μs for interrupt+return (C)+(E) @ 10MHz ST
Instructions 4-50 cy, or 0.4-5.0μs. Avg 15 cy, or 1.5μs.
COP8 National 70 clocks (7 instruction cycles). RETI (E) is 50 clocks (5 instruction cycles). (7μs & 5μs @ 10MHz)
μPD78C05 NEC RETI (E) takes 13 or 15 clocks (2.08 or 2.4μs at 6.25MHz)
μPD70008/A NEC sequence (C) takes 13 or 19 clocks. Return (E) takes 14 clocks.
Instructions take 4-23 clocks each. 6MHz in '87 book.
V20 NEC RETI (E) takes 39 clocks or 3.9μs @ 10MHz in '87 book.
Instruction set is a superset of that of 8086/8088.
V25 NEC ? (still waiting for information)
68000 Motorola 46 clocks or 2.875μs minimum @ 16MHz (B)+(C)?. Has a very complex interrupt system.
6800 Motorola (C)=13 clocks, including pushing the index register and both accumulators. RTI (E) takes 10 clocks. 2MHz.
6809 Motorola (C)=19 clocks. Stacks all registers. RTI (E) 15 clocks. 2MHz (8MHz/4).
FIRQ-RTI take 10 & 6 clocks, & work more like 6502 IRQ-RTI.
68HC05 Motorola 16 clocks typ (8μs @ 2MHz)
68HC08 Motorola Instructions 1-9 cy, or 0.125-1.125μs. Avg 4-5 cy, or around 0.55μs.
68HC11 Motorola (C)=14 clocks. RTI (E)=12 clocks. Total for interrupt+return=8.75μs @ 4MHz (16MHz/4). ST
Instructions 2-41 cy, or 0.5-10.25μs. Avg 6-7 cy, or around 1.6μs.
68HC12 Motorola 2.63μs for interrupt+return (C)+(E) @ 8MHz. ST
Instructions 1-13 cy, or 0.125-1.625μs. Avg 3-4 cy, or 0.45μs.
68HC16 Motorola 2.25μs for interrupt+return (C)+(E) @ 16MHz. ST
Instructions 2-38 cy, or 0.125-2.375μs. Avg 6-7 cy, or around 0.4μs.
PIC16 Microchip (C)=8 clocks (2 instruction cycles), and RETFIE (E) is also 8 clocks; but this doesn't
even include saving and restoring the status register. That's an extra, rather mickey-mouse
operation. 20MHz Most instructions 4 cy, or 0.2μs.
TMS370 TI 15 cycles (3μs) min (C), 78 (15.6μs) max (A)+(C), and a cycle is 4 clocks (200ns min)! 20MHz.
RTI (E) is 12 cy (48 clocks or 2.4μs).
TMS7000 TI (C)=19 cycles min (17 if from idle status) 5MHz, 400ns min cycle time
(IOW, interrupt sequence is 7.6μs min, 6.8 from idle.) RETI (E) is 9 cycles, or 3.6μs @ 5MHz.
ST6 STM 78 clocks min, or 9.75μs @ 8MHz to fetch interrupt vector. More to reach first ISR instruction.
RETI is 26 clocks, or 3.25μs.
ST7 STM 3μs for interrupt+return @ 8MHz. ST
Instructions 2-12 cy, or 0.25-1.5μs. Avg 4-5 cy, or around 0.55μs.
ST9 STM External IRQ best case: 1.08μs @ 24MHz.
NMI best case: 0.92μs. internal interrupts best case: 1.04μs. 2.25μs @ 24MHz for interrupt
and return ST Instructions 6-38 instruction cy, or 0.5-3.67μs. Avg 17 cy, or around 1.4μs.

ST9+ STM 1.84μs @ 25MHz for interrupt and return, ST
Instructions 2-26 instruction cy, or 0.16-1.04μs. Avg 11 cy, or around 0.9μs
H8/300 Hitachi 8/16-bit: 2.1μs @ 10MHz for interrupt and return ST
Instructions 2-24 cy, or 0.2-3.4μs. Avg 5-6 cy, or around 0.55μs.
M16C
M30218 Mitsubishi
Renesas 18 cy min (C), or 1.125μs @ 16MHz w/ 16-bit data bus. 50 cy max (A)+(C). REIT is 6 cy, or 0.375μs.
Dual register sets like the Z80. Max instruction length 30 cy.
CIP-51 Silicon Labs
Cygnal μC p/n C8051F2xx) total latency 5-18 cy or 0.2-0.72μs @ 25MHz. RETI takes 5 cy, or 0.2μs.
This is the only one I have data on here that gives the 6502 below any competition.
65C02 WDC Normal latency (C) 7 clocks (0.35μs) min, 14 clocks (0.7μs) max (A)+(C). RTI 6 cy (0.3μs). 20MHz.
Instructions 2-7 cy, or 0.1-0.35μs. Avg 4 cy, or 0.2μs.
Special case: IRQ from WAIt instrucion with interrupt-disable bit I set: no more than 1 cy (0.05μs!)

The AVR we were considering had 5 cycles latency in, 4 cycles out and ran at 20MHz. I believe the total overhead for the ARM based MCU was ~30 and clocked at 48MHz. It didn't work out. Believe the ARM was an M4 core. Neither ended up being selected but it's just an example. I'm not trying to submit proof that 8bit can be quick or ARM slow. In this case the ARM was worse than the 8 bit and it was much more troublesome to find the data.

#155 Reply
Posted by Mr. Scram on 30 Sep, 2018 16:58
Quote from: nctnico on 30 Sep, 2018 11:18
You are still idolising. If someone tells me something is extremely hard to do I already close one ear. I'm not going to get useful information from that person other than he/she can't and doesn't know people who can. I keep one ear open for clues on how not to tackle a problem but that is sketchy because maybe the approach was good but the execution was wrong.
I'm prioritising the experience of a well known and respected industry icon over the self assessment of a random guy on the Internet. You can tell yourself that's idolising, but that only seems to prove my point.

#156 Reply
Posted by nctnico on 30 Sep, 2018 17:11
Quote from: maginnovision on 30 Sep, 2018 16:31
The AVR we were considering had 5 cycles latency in, 4 cycles out and ran at 20MHz. I believe the total overhead for the ARM based MCU was ~30 and clocked at 48MHz. It didn't work out. Believe the ARM was an M4 core. Neither ended up being selected but it's just an example.
One thing to consider here is that (unlike most other microcontrollers) an ARM Cortex has already saved a whole bunch of register on the stack when you enter the interrupt routine. On most other microcontrollers the software has to push the registers onto the stack by itself before being able to do something useful. The latter adds to the total latency of the interrupt handling.

#157 Reply
Posted by Howardlong on 30 Sep, 2018 17:59
Add also that you may have to save even more on the stack if you use floating point. It gets very painful, very fast, and it’s not unusual to have to start disassembling and massaging the ISR code or modifiers accordingly. One problem with this approach is that your well meaning fiddling makes the code less maintainable when someone comes along withiut knowledge of your assumptions.

On the wider point, I use the most appropriate device for the problem at hand, whether that’s a PIC10F or an LPC4370, and everything in between, and sometimes beyond. I will also admit to having a strong bias to devices and toolchain I have a good knowledge of, not least because leaning a new family of parts (including their peripherals and toolchain) is very expensive in terms of man hours.

As a quick example, try making an RF powered ARM RFID transponder. It’s hard, whereas with a PIC10LF, and many other 8 bitters, life’s often easier, due to lower power, wider voltage ranges etc.

#158 Reply
Posted by nctnico on 30 Sep, 2018 19:15
Quote from: Howardlong on 30 Sep, 2018 17:59
Add also that you may have to save even more on the stack if you use floating point. It gets very painful, very fast, and it’s not unusual to have to start disassembling and massaging the ISR code or modifiers accordingly. One problem with this approach is that your well meaning fiddling makes the code less maintainable when someone comes along withiut knowledge of your assumptions.
Personally I try hard to avoid being dependant on software when it comes down to nanosecond timing on a regular microcontroller because it is hard to achieve and hard to maintain. I guess these situations are likely originating from a hardware designer thinking 'they can fix this in software for sure'.

#159 Reply
Posted by maginnovision on 30 Sep, 2018 19:33
Quote from: nctnico on 30 Sep, 2018 17:11
Quote from: maginnovision on 30 Sep, 2018 16:31
The AVR we were considering had 5 cycles latency in, 4 cycles out and ran at 20MHz. I believe the total overhead for the ARM based MCU was ~30 and clocked at 48MHz. It didn't work out. Believe the ARM was an M4 core. Neither ended up being selected but it's just an example.
One thing to consider here is that (unlike most other microcontrollers) an ARM Cortex has already saved a whole bunch of register on the stack when you enter the interrupt routine. On most other microcontrollers the software has to push the registers onto the stack by itself before being able to do something useful. The latter adds to the total latency of the interrupt handling.

Yes, we actually didn't need to save or restore anything because we kept the registers from the compiler. Bit of an edge case. The ARM manuals did specify the registers automatically saved and restored. If you use floating point hardware it saved and restored all of those registers which more than doubles the overhead if I remember right. It's good because you don't have to worry, bad because you can't do anything about it. In our case the AVR worked except we had no time to manage UI and user IO stuff. After testing we decided that wasn't ideal to require a reset to change modes and things. The ARM would have worked better there but the interrupt time wouldn't work. It's likely we could have found the perfect ARM MCU but we fell back on our goto MCU instead.

#160 Reply
Posted by langwadt on 30 Sep, 2018 19:59
Quote from: maginnovision on 30 Sep, 2018 19:33
Quote from: nctnico on 30 Sep, 2018 17:11
Quote from: maginnovision on 30 Sep, 2018 16:31
The AVR we were considering had 5 cycles latency in, 4 cycles out and ran at 20MHz. I believe the total overhead for the ARM based MCU was ~30 and clocked at 48MHz. It didn't work out. Believe the ARM was an M4 core. Neither ended up being selected but it's just an example.
One thing to consider here is that (unlike most other microcontrollers) an ARM Cortex has already saved a whole bunch of register on the stack when you enter the interrupt routine. On most other microcontrollers the software has to push the registers onto the stack by itself before being able to do something useful. The latter adds to the total latency of the interrupt handling.

Yes, we actually didn't need to save or restore anything because we kept the registers from the compiler. Bit of an edge case. The ARM manuals did specify the registers automatically saved and restored. If you use floating point hardware it saved and restored all of those registers which more than doubles the overhead if I remember right. It's good because you don't have to worry, bad because you can't do anything about it. In our case the AVR worked except we had no time to manage UI and user IO stuff. After testing we decided that wasn't ideal to require a reset to change modes and things. The ARM would have worked better there but the interrupt time wouldn't work. It's likely we could have found the perfect ARM MCU but we fell back on our goto MCU instead.

afair an M4 is 12 or 29 cycles with/without fpu, and with lazy stacking the hardware itself figures if an interrupt
uses the FPU

#161 Reply
Posted by maginnovision on 30 Sep, 2018 20:16
Quote from: langwadt on 30 Sep, 2018 19:59
Quote from: maginnovision on 30 Sep, 2018 19:33
Quote from: nctnico on 30 Sep, 2018 17:11
Quote from: maginnovision on 30 Sep, 2018 16:31
The AVR we were considering had 5 cycles latency in, 4 cycles out and ran at 20MHz. I believe the total overhead for the ARM based MCU was ~30 and clocked at 48MHz. It didn't work out. Believe the ARM was an M4 core. Neither ended up being selected but it's just an example.
One thing to consider here is that (unlike most other microcontrollers) an ARM Cortex has already saved a whole bunch of register on the stack when you enter the interrupt routine. On most other microcontrollers the software has to push the registers onto the stack by itself before being able to do something useful. The latter adds to the total latency of the interrupt handling.

Yes, we actually didn't need to save or restore anything because we kept the registers from the compiler. Bit of an edge case. The ARM manuals did specify the registers automatically saved and restored. If you use floating point hardware it saved and restored all of those registers which more than doubles the overhead if I remember right. It's good because you don't have to worry, bad because you can't do anything about it. In our case the AVR worked except we had no time to manage UI and user IO stuff. After testing we decided that wasn't ideal to require a reset to change modes and things. The ARM would have worked better there but the interrupt time wouldn't work. It's likely we could have found the perfect ARM MCU but we fell back on our goto MCU instead.

afair an M4 is 12 or 29 cycles with/without fpu, and with lazy stacking the hardware itself figures if an interrupt
uses the FPU

Our reference was http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16366.html
I didn't do the calculations but it's the data we have. You could be right that it's 12in 12 out, so 24. Slightly better than the AVR, but not by enough for our purposes.

#162 Reply
Posted by langwadt on 30 Sep, 2018 20:30
Quote from: maginnovision on 30 Sep, 2018 20:16
Quote from: langwadt on 30 Sep, 2018 19:59
Quote from: maginnovision on 30 Sep, 2018 19:33
Quote from: nctnico on 30 Sep, 2018 17:11
Quote from: maginnovision on 30 Sep, 2018 16:31
The AVR we were considering had 5 cycles latency in, 4 cycles out and ran at 20MHz. I believe the total overhead for the ARM based MCU was ~30 and clocked at 48MHz. It didn't work out. Believe the ARM was an M4 core. Neither ended up being selected but it's just an example.
One thing to consider here is that (unlike most other microcontrollers) an ARM Cortex has already saved a whole bunch of register on the stack when you enter the interrupt routine. On most other microcontrollers the software has to push the registers onto the stack by itself before being able to do something useful. The latter adds to the total latency of the interrupt handling.

Yes, we actually didn't need to save or restore anything because we kept the registers from the compiler. Bit of an edge case. The ARM manuals did specify the registers automatically saved and restored. If you use floating point hardware it saved and restored all of those registers which more than doubles the overhead if I remember right. It's good because you don't have to worry, bad because you can't do anything about it. In our case the AVR worked except we had no time to manage UI and user IO stuff. After testing we decided that wasn't ideal to require a reset to change modes and things. The ARM would have worked better there but the interrupt time wouldn't work. It's likely we could have found the perfect ARM MCU but we fell back on our goto MCU instead.

afair an M4 is 12 or 29 cycles with/without fpu, and with lazy stacking the hardware itself figures if an interrupt
uses the FPU

Our reference was http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16366.html
I didn't do the calculations but it's the data we have. You could be right that it's 12in 12 out, so 24. Slightly better than the AVR, but not by enough for our purposes.

out is only 10 cycles, so with 20MHz vs. 48MHz it is exactly the same time as the AVR

#163 Reply
Posted by westfw on 30 Sep, 2018 20:59
Quote
(unlike most other microcontrollers) an ARM Cortex has already saved a whole bunch of register on the stack when you enter the interrupt routine.

The ARM is somewhat inflexible when it comes to interrupts. On an AVR, you can potentially write 4 types of ISR (actual details are based on AVR-GCC; other compilers may vary):
1) everything handled at the vector address. You've got one instruction, plus the reti, and can't touch anything used by the mainline code, including the PSW. Response time is ~4 cycles in, and 4 cycles out.

2) The single instruction from (1) is a jump to an ISR routine. The ISR routine is carefully hand-crafted "naked" or ASM code that saves the bare minimum context and restores it with maximum efficiency (this can be zero - the AVR has some instructions that don't modify the PSW.) Adds at least 3 cycles to entry. Usually more like 8 (push r, get psw, push psw) Call it 12 cycles total.

2a) The AVR has 32 registers, and you can configure the compiler to NEVER USE some of them, so this hand-crafted code may be able to get away without using the stack at all...

3) Like 2, only the ISR is C code and the compiler uses a default prologue that saves more than might be necessary, but keeps careful track of what registers are used and doesn't save any extras. (This is the "usual" case.) The prologue looks like:
Code: [Select]
push r1 push r0 in r0, 0x3f ; PSR push r0 eor r1, r1(assembly language programmers complain about the save/setup of "known zero" R1.) We're up to about 15 cycles, which is more than ARM CM0-4 takes ALL THE TIME.

4) Like (3), only now the ISR calls other C functions. This means that the compiler has to save 12 more registers (at 2 cycles each) that C functions are allowed to modify without saving them. And restore them. This is also the only way to get a C ISR that can be changed at runtime (using pointers to functions), since the vector table is in flash. So the "latency" is up to about 40 cycles, with a similar exit time. (note that these extra registers all get saved at the beginning of the ISR, not "just before the call actually occurs", so you don't get the opportunity to cleverly handle SOME situations more efficiently. Not with gcc, anyway.

So, comparing that to an ARM with up to 15 cycles latency, plus some more IFF you use floating point IN your ISR, it seems more like "AVR had some flexibility that makes me comfortable, even though I never use it."

#164 Reply
Posted by Howardlong on 30 Sep, 2018 21:18
Quote from: nctnico on 30 Sep, 2018 19:15
Quote from: Howardlong on 30 Sep, 2018 17:59
Add also that you may have to save even more on the stack if you use floating point. It gets very painful, very fast, and it’s not unusual to have to start disassembling and massaging the ISR code or modifiers accordingly. One problem with this approach is that your well meaning fiddling makes the code less maintainable when someone comes along withiut knowledge of your assumptions.
Personally I try hard to avoid being dependant on software when it comes down to nanosecond timing on a regular microcontroller because it is hard to achieve and hard to maintain. I guess these situations are likely originating from a hardware designer thinking 'they can fix this in software for sure'.

Possibly, but C-O-S-T is a biggie in many implementations. As I said already, you choose the best solution you can given a set of criteria. We all like the most artistically pleasing result, well most of us do, but sometimes if it’s a go or no-go decision, the go decision will always make it. Engineers, like it or not, have to be pragmatic. Energy harvesting is another area where every cycle can count.

#165 Reply
Posted by westfw on 30 Sep, 2018 21:21
Quote
1) Caches are a moot point. They tend to be disabled by default. Just don't enable them.
You think so, eh? I'm talking less about formal "cache memory that you need to enable" and more about things like the "2*64bit prefetch buffer" (stm32f1xx) or the "Enhanced flash memory accelerator" (LPC176x)Your lovely single-cycle 120MHz RISC CPU isn't going to run so well if every instruction takes 5 additional cycles to fetch from the flash program memory (SAMD51. But 50ns access times for flash seem to be "typical.")

#166 Reply
Posted by mikeselectricstuff on 30 Sep, 2018 21:37
Quote from: Siwastaja on 30 Sep, 2018 08:45

10) Previous point tldr: With AVR, every beginner has a clear and simple route to follow. On ARM MCUs, everybody's teaching different way, it's hard to know what to do, and it often looks difficult and complex - many code examples are long just to blink an LED.

11) Once more: the biggest issue I see on learning ARM MCUs is lack of simple, easy to understand, lightweight examples, and tutorials to do sane, sustainable development.

Another problem is the mistake of lumping all ARMs together.
Especially from the POV of a beginner where the biggest hurdle is learning a toolchain and getting programming working.
An STM32 is as different to a NXP ARM as it is to a PIC. It's all about the peripherals and the tools.

#167 Reply
Posted by westfw on 30 Sep, 2018 21:56
Quote
2) Getting an STM32 blink an LED requires one (1) register write more than an AVR

I wasn't complaining only about needing to enable peripherals in initialization before you can use them.(I'm not doing so well on clarity, it seems.)More about the three instruction sequence to write any pin:
Code: [Select]
ldr r1, =IOPORT_BASE ;; load address of ioport registers. (48bits!) mov r0, #(1<<pinno) ;; load bit that needs set (assumes that pinno<=7, on CM0.) ;; 32bits on v7m, for all single-bit values. ;; needs to be another 48bit "ld r0,=b" on v6m, or maybe ;; a mov followed by a shift. str r0, [r1, IOPORT_SET] ;; write to the SETABIT register. (1 word)
That's at least 2 registers used, 5 words of flash,and 4 cycles (not counting wait states on the flash, or on the frequently-not-single-cycle APB where the IOPORT peripheral probably lives.) (I'm assuming that the ARM has PINSET/PINCLR operations, or it gets worse. Most ARM "Microcontrollers" do have these (and only some 8bit CPUs.)

This compared to the 2-cycle best-case "sbi PORT, pinno" for an AVR (with similar capabilities in most 8bit CPUs.) It's certainly true that the ARM case is a lot more "general" - The 8bit case changes dramatically if you want to use PORTL, or a variable for the port or pin, while the ARM case stays about the same.
In the end, that's probably the complaint: "The ARM CPU does not have special case features for embedded-like things I think I want to do!"

#168 Reply
Posted by westfw on 30 Sep, 2018 22:11
Quote
ARM assembly is easy peasy
Pay no attention to the 4 possible instruction encodings for loading a constant into a register, that range from 16 to 48 bits of flash space, and half of which are not available on a CM0... It's a RISC CPU and RISC always has completely regular instructions!

#169 Reply
Posted by coppice on 30 Sep, 2018 22:24
Quote from: mikeselectricstuff on 30 Sep, 2018 21:37
Another problem is the mistake of lumping all ARMs together.
Especially from the POV of a beginner where the biggest hurdle is learning a toolchain and getting programming working.
An STM32 is as different to a NXP ARM as it is to a PIC. It's all about the peripherals and the tools.
There is currently a strange pattern of behaviour when offering MCUs to customers:
- For engineers: MCUs mostly sell for their peripheral and memory content, and the core doesn't matter a lot. Even the speed of the core doesn't matter a lot, because most MCUs aren't run at their full speed.
- For managers: If it doesn't have an ARM core they aren't interested. If it does have an ARM core, they will happily sit through a sales pitch for a device that is a horrible mismatch for their needs.

#170 Reply
Posted by nctnico on 30 Sep, 2018 22:29
Quote from: westfw on 30 Sep, 2018 22:11
Quote
ARM assembly is easy peasy
Pay no attention to the 4 possible instruction encodings for loading a constant into a register, that range from 16 to 48 bits of flash space, and half of which are not available on a CM0... It's a RISC CPU and RISC always has completely regular instructions!
Well usually you'd use an LDR Rx,[PC+y] instruction to load a constant into a register on ARM.

#171 Reply
Posted by legacy on 30 Sep, 2018 23:00
(One 16bit constant per instruction on MIPS. A 32bit constant needs two load instructions)

#172 Reply
Posted by NorthGuy on 01 Oct, 2018 00:04
Quote from: coppice on 30 Sep, 2018 22:24
There is currently a strange pattern of behaviour when offering MCUs to customers:
- For engineers: MCUs mostly sell for their peripheral and memory content, and the core doesn't matter a lot. Even the speed of the core doesn't matter a lot, because most MCUs aren't run at their full speed.
- For managers: If it doesn't have an ARM core they aren't interested. If it does have an ARM core, they will happily sit through a sales pitch for a device that is a horrible mismatch for their needs.
For marketers: Our device is powered by cutting-edge 2-core ARM processor.

#173 Reply
Posted by wraper on 01 Oct, 2018 01:32
Quote from: NorthGuy on 01 Oct, 2018 00:04
Quote from: coppice on 30 Sep, 2018 22:24
There is currently a strange pattern of behaviour when offering MCUs to customers:
- For engineers: MCUs mostly sell for their peripheral and memory content, and the core doesn't matter a lot. Even the speed of the core doesn't matter a lot, because most MCUs aren't run at their full speed.
- For managers: If it doesn't have an ARM core they aren't interested. If it does have an ARM core, they will happily sit through a sales pitch for a device that is a horrible mismatch for their needs.
For marketers: Our device is powered by cutting-edge 2-core ARM processor.
http://www.coolermaster.com/peripheral/keyboards/suppressor/
Boasting 72MHz 32 bit MCU and 128kB of flash in keyboard, as if does matter FFS .

#174 Reply
Posted by westfw on 01 Oct, 2018 01:52
Quote
usually you'd use an LDR Rx,[PC+y] instruction to load a constant into a register on ARM.

Yep. 48bits. (And how does the PC-relative load work out timing-wise, WRT flash wait-states, flash accelerators, or caches I've been complaining about? I dunno.)

"MOV" can load an 8-bit constant from a 16bit instruction, and on thumb-2 capable chips you can load 12bit constants, or assorted shifted and/or duplicated 8ish-bit constants, with a 32bit instruction. (on pure Thumb-1 you can do the 8-bit load and a shift in the same space, only using two instructions.) It's not awful, but it does rather burst the "elegance" of the instruction set...
And it does tend to throw a damper on "typical" instruction sequences like
Code: [Select]
if ((PORTB & SWITCHMASK) == MY_SWITCH_COMBO ...
It's sort of like "ARM does everything faster and better except for dealing with peripheral registers", overlooking the factor that a lot of embedded code does very little BUT "deal with peripheral registers. :-(

1802	RCA	? Most instructions take 16 clocks (6.4μs), some, 24 (9.6μs). 2.5MHz @ 5V.
8080	Intel	? (still waiting for information)
8088	Intel	10 bus cycles or 40 clocks(?) (B)+(C) (still waiting for further information)
8086	Intel	WDC says 182 clocks max total latency. * * * * * (still waiting for information)
Z8	Zilog	IRET (E) takes 16 execution cycles. I don't know how many clock cycles per execution cycle. 8MHz?
Z80	Zilog	11-19 clocks (B)+(C) depending on mode, or 2.75-4.75μs @ 4MHz. RETI (E) is 14 clocks, or 3.5μs @ 4MHz.
Z8000	Zilog	IRET (E) takes 13 cycles in non-segmented mode, and 16 in segmented mode. I don't know if that's instruction cycles or clock cycles.
8048	Intel	(?) return (E) is 2.7μs @ 11MHz
8051	Dallas	1.5μs @ 33MHz (52 clocks) latency
8051	Intel	1.8μs (C) min @ 20MHz. 5.4μs (A)+(C) max total latency @ 20MHz. (3-9μs @ 12MHz.) Interrupt sequence (C) and return (E) take 4.6μs @ 20MHz ST
80C51XA	Philips	2.25μs for interrupt+return (C)+(E) @ 20MHz. ST Instructions 2-24 cy, or 0.1-1.2μs. Avg 5-6 cy, or around 0.27μs.
KS88	Samsung	3μs for interrupt+return (C)+(E) @ 8MHz ST Instructions 6-28 cy, or 0.75-2.5μs. Avg 11 cy, or 1.38μs.
78K0	NEC	4.3μs for interrupt+return (C)+(E) @ 10MHz ST Instructions 4-50 cy, or 0.4-5.0μs. Avg 15 cy, or 1.5μs.
COP8	National	70 clocks (7 instruction cycles). RETI (E) is 50 clocks (5 instruction cycles). (7μs & 5μs @ 10MHz)
μPD78C05	NEC	RETI (E) takes 13 or 15 clocks (2.08 or 2.4μs at 6.25MHz)
μPD70008/A	NEC	sequence (C) takes 13 or 19 clocks. Return (E) takes 14 clocks. Instructions take 4-23 clocks each. 6MHz in '87 book.
V20	NEC	RETI (E) takes 39 clocks or 3.9μs @ 10MHz in '87 book. Instruction set is a superset of that of 8086/8088.
V25	NEC	? (still waiting for information)
68000	Motorola	46 clocks or 2.875μs minimum @ 16MHz (B)+(C)?. Has a very complex interrupt system.
6800	Motorola	(C)=13 clocks, including pushing the index register and both accumulators. RTI (E) takes 10 clocks. 2MHz.
6809	Motorola	(C)=19 clocks. Stacks all registers. RTI (E) 15 clocks. 2MHz (8MHz/4). FIRQ-RTI take 10 & 6 clocks, & work more like 6502 IRQ-RTI.
68HC05	Motorola	16 clocks typ (8μs @ 2MHz)
68HC08	Motorola	Instructions 1-9 cy, or 0.125-1.125μs. Avg 4-5 cy, or around 0.55μs.
68HC11	Motorola	(C)=14 clocks. RTI (E)=12 clocks. Total for interrupt+return=8.75μs @ 4MHz (16MHz/4). ST Instructions 2-41 cy, or 0.5-10.25μs. Avg 6-7 cy, or around 1.6μs.
68HC12	Motorola	2.63μs for interrupt+return (C)+(E) @ 8MHz. ST Instructions 1-13 cy, or 0.125-1.625μs. Avg 3-4 cy, or 0.45μs.
68HC16	Motorola	2.25μs for interrupt+return (C)+(E) @ 16MHz. ST Instructions 2-38 cy, or 0.125-2.375μs. Avg 6-7 cy, or around 0.4μs.
PIC16	Microchip	(C)=8 clocks (2 instruction cycles), and RETFIE (E) is also 8 clocks; but this doesn't even include saving and restoring the status register. That's an extra, rather mickey-mouse operation. 20MHz Most instructions 4 cy, or 0.2μs.
TMS370	TI	15 cycles (3μs) min (C), 78 (15.6μs) max (A)+(C), and a cycle is 4 clocks (200ns min)! 20MHz. RTI (E) is 12 cy (48 clocks or 2.4μs).
TMS7000	TI	(C)=19 cycles min (17 if from idle status) 5MHz, 400ns min cycle time (IOW, interrupt sequence is 7.6μs min, 6.8 from idle.) RETI (E) is 9 cycles, or 3.6μs @ 5MHz.
ST6	STM	78 clocks min, or 9.75μs @ 8MHz to fetch interrupt vector. More to reach first ISR instruction. RETI is 26 clocks, or 3.25μs.
ST7	STM	3μs for interrupt+return @ 8MHz. ST Instructions 2-12 cy, or 0.25-1.5μs. Avg 4-5 cy, or around 0.55μs.
ST9	STM	External IRQ best case: 1.08μs @ 24MHz. NMI best case: 0.92μs. internal interrupts best case: 1.04μs. 2.25μs @ 24MHz for interrupt and return ST Instructions 6-38 instruction cy, or 0.5-3.67μs. Avg 17 cy, or around 1.4μs.
ST9+	STM	1.84μs @ 25MHz for interrupt and return, ST Instructions 2-26 instruction cy, or 0.16-1.04μs. Avg 11 cy, or around 0.9μs
H8/300	Hitachi	8/16-bit: 2.1μs @ 10MHz for interrupt and return ST Instructions 2-24 cy, or 0.2-3.4μs. Avg 5-6 cy, or around 0.55μs.
M16C M30218	Mitsubishi Renesas	18 cy min (C), or 1.125μs @ 16MHz w/ 16-bit data bus. 50 cy max (A)+(C). REIT is 6 cy, or 0.375μs. Dual register sets like the Z80. Max instruction length 30 cy.
CIP-51	Silicon Labs Cygnal	μC p/n C8051F2xx) total latency 5-18 cy or 0.2-0.72μs @ 25MHz. RETI takes 5 cy, or 0.2μs. This is the only one I have data on here that gives the 6502 below any competition.
65C02	WDC	Normal latency (C) 7 clocks (0.35μs) min, 14 clocks (0.7μs) max (A)+(C). RTI 6 cy (0.3μs). 20MHz. Instructions 2-7 cy, or 0.1-0.35μs. Avg 4 cy, or 0.2μs. Special case: IRQ from WAIt instrucion with interrupt-disable bit I set: no more than 1 cy (0.05μs!)

8-bit uC - is there even a point?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Navigation

Common actions