I don't know if there are any 6800 microcontrollers, but the 6805 line (which is quite similar architecturally) spurred several microcontroller families, including S08 and S12 families that are still labeled as current, rather than "legacy."
I don't know of any 6502-like microcontrollers.
ARM architecture is easy to understand, what comes to the interrupt vector table, how the interrupts work, and how to use them. Also register pushing on interrupt entry. I find all this simpler than on AVR, although AVR is quite simple as well. Having prioritized interrupts that can pre-empt others, IMO, makes everything easier. You can write longer ISRs with lower priorities, and make timing critical ISRs pre-empt them; you have more tools in your box to work different ways case-by-case. Using two lines of code to set the interrupt priority and enable it is not complex.
In avionics we have constraints and ISRs pre-empt is extremely bad for us, hence strictly prohibited.
I don't want to give you a "no-go", simply a hint: I would put extra careful in abusing of this feature.
so no OS allowed?
so no OS allowed?
no-go for nested interrupts, no-go for preempted interrupts. Our directives are simple: interrupts are allowed but! when an interrupt happens it MUST not be interrupted.
Even before cache and flash issues the interrupt overhead was already too high to bother.
[url]https://community.arm.com/processors/b/blog/posts/beginner-guide-on-interrupt-latency-and-interrupt-latency-of-the-arm-cortex-m-processors[/url]
For comparison, here's a famous 8 bitter:
[url]http://6502.org/tutorials/interrupts.html#a[/url]
[url]http://6502.org/tutorials/interrupts.html#1.3[/url]
A copy-paste of a table from there (clearly biased):
1802 RCA ? Most instructions take 16 clocks (6.4μs), some, 24 (9.6μs). 2.5MHz @ 5V. 8080 Intel ? (still waiting for information) 8088 Intel 10 bus cycles or 40 clocks(?) (B)+(C) (still waiting for further information) 8086 Intel WDC says 182 clocks max total latency. * * * * * (still waiting for information) Z8 Zilog IRET (E) takes 16 execution cycles. I don't know how many clock cycles per execution cycle. 8MHz? Z80 Zilog 11-19 clocks (B)+(C) depending on mode, or 2.75-4.75μs @ 4MHz. RETI (E) is 14 clocks, or 3.5μs @ 4MHz. Z8000 Zilog IRET (E) takes 13 cycles in non-segmented mode, and 16 in segmented mode. I don't know if that's
instruction cycles or clock cycles.8048 Intel (?) return (E) is 2.7μs @ 11MHz 8051 Dallas 1.5μs @ 33MHz (52 clocks) latency 8051 Intel 1.8μs (C) min @ 20MHz. 5.4μs (A)+(C) max total latency @ 20MHz. (3-9μs @ 12MHz.)
Interrupt sequence (C) and return (E) take 4.6μs @ 20MHz ST80C51XA Philips 2.25μs for interrupt+return (C)+(E) @ 20MHz. ST
Instructions 2-24 cy, or 0.1-1.2μs. Avg 5-6 cy, or around 0.27μs.KS88 Samsung 3μs for interrupt+return (C)+(E) @ 8MHz ST
Instructions 6-28 cy, or 0.75-2.5μs. Avg 11 cy, or 1.38μs.78K0 NEC 4.3μs for interrupt+return (C)+(E) @ 10MHz ST
Instructions 4-50 cy, or 0.4-5.0μs. Avg 15 cy, or 1.5μs.COP8 National 70 clocks (7 instruction cycles). RETI (E) is 50 clocks (5 instruction cycles). (7μs & 5μs @ 10MHz) μPD78C05 NEC RETI (E) takes 13 or 15 clocks (2.08 or 2.4μs at 6.25MHz) μPD70008/A NEC sequence (C) takes 13 or 19 clocks. Return (E) takes 14 clocks.
Instructions take 4-23 clocks each. 6MHz in '87 book.V20 NEC RETI (E) takes 39 clocks or 3.9μs @ 10MHz in '87 book.
Instruction set is a superset of that of 8086/8088.V25 NEC ? (still waiting for information) 68000 Motorola 46 clocks or 2.875μs minimum @ 16MHz (B)+(C)?. Has a very complex interrupt system. 6800 Motorola (C)=13 clocks, including pushing the index register and both accumulators. RTI (E) takes 10 clocks. 2MHz. 6809 Motorola (C)=19 clocks. Stacks all registers. RTI (E) 15 clocks. 2MHz (8MHz/4).
FIRQ-RTI take 10 & 6 clocks, & work more like 6502 IRQ-RTI.68HC05 Motorola 16 clocks typ (8μs @ 2MHz) 68HC08 Motorola Instructions 1-9 cy, or 0.125-1.125μs. Avg 4-5 cy, or around 0.55μs. 68HC11 Motorola (C)=14 clocks. RTI (E)=12 clocks. Total for interrupt+return=8.75μs @ 4MHz (16MHz/4). ST
Instructions 2-41 cy, or 0.5-10.25μs. Avg 6-7 cy, or around 1.6μs.68HC12 Motorola 2.63μs for interrupt+return (C)+(E) @ 8MHz. ST
Instructions 1-13 cy, or 0.125-1.625μs. Avg 3-4 cy, or 0.45μs.68HC16 Motorola 2.25μs for interrupt+return (C)+(E) @ 16MHz. ST
Instructions 2-38 cy, or 0.125-2.375μs. Avg 6-7 cy, or around 0.4μs.PIC16 Microchip (C)=8 clocks (2 instruction cycles), and RETFIE (E) is also 8 clocks; but this doesn't
even include saving and restoring the status register. That's an extra, rather mickey-mouse
operation. 20MHz Most instructions 4 cy, or 0.2μs.TMS370 TI 15 cycles (3μs) min (C), 78 (15.6μs) max (A)+(C), and a cycle is 4 clocks (200ns min)! 20MHz.
RTI (E) is 12 cy (48 clocks or 2.4μs).TMS7000 TI (C)=19 cycles min (17 if from idle status) 5MHz, 400ns min cycle time
(IOW, interrupt sequence is 7.6μs min, 6.8 from idle.) RETI (E) is 9 cycles, or 3.6μs @ 5MHz.ST6 STM 78 clocks min, or 9.75μs @ 8MHz to fetch interrupt vector. More to reach first ISR instruction.
RETI is 26 clocks, or 3.25μs.ST7 STM 3μs for interrupt+return @ 8MHz. ST
Instructions 2-12 cy, or 0.25-1.5μs. Avg 4-5 cy, or around 0.55μs.ST9 STM External IRQ best case: 1.08μs @ 24MHz.
NMI best case: 0.92μs. internal interrupts best case: 1.04μs. 2.25μs @ 24MHz for interrupt
and return ST Instructions 6-38 instruction cy, or 0.5-3.67μs. Avg 17 cy, or around 1.4μs.
ST9+ STM 1.84μs @ 25MHz for interrupt and return, ST
Instructions 2-26 instruction cy, or 0.16-1.04μs. Avg 11 cy, or around 0.9μsH8/300 Hitachi 8/16-bit: 2.1μs @ 10MHz for interrupt and return ST
Instructions 2-24 cy, or 0.2-3.4μs. Avg 5-6 cy, or around 0.55μs.M16C
M30218Mitsubishi
Renesas18 cy min (C), or 1.125μs @ 16MHz w/ 16-bit data bus. 50 cy max (A)+(C). REIT is 6 cy, or 0.375μs.
Dual register sets like the Z80. Max instruction length 30 cy.CIP-51 Silicon Labs
CygnalμC p/n C8051F2xx) total latency 5-18 cy or 0.2-0.72μs @ 25MHz. RETI takes 5 cy, or 0.2μs.
This is the only one I have data on here that gives the 6502 below any competition.65C02 WDC Normal latency (C) 7 clocks (0.35μs) min, 14 clocks (0.7μs) max (A)+(C). RTI 6 cy (0.3μs). 20MHz.
Instructions 2-7 cy, or 0.1-0.35μs. Avg 4 cy, or 0.2μs.
Special case: IRQ from WAIt instrucion with interrupt-disable bit I set: no more than 1 cy (0.05μs!)
You are still idolising. If someone tells me something is extremely hard to do I already close one ear. I'm not going to get useful information from that person other than he/she can't and doesn't know people who can. I keep one ear open for clues on how not to tackle a problem but that is sketchy because maybe the approach was good but the execution was wrong.
The AVR we were considering had 5 cycles latency in, 4 cycles out and ran at 20MHz. I believe the total overhead for the ARM based MCU was ~30 and clocked at 48MHz. It didn't work out. Believe the ARM was an M4 core. Neither ended up being selected but it's just an example.
Add also that you may have to save even more on the stack if you use floating point. It gets very painful, very fast, and it’s not unusual to have to start disassembling and massaging the ISR code or modifiers accordingly. One problem with this approach is that your well meaning fiddling makes the code less maintainable when someone comes along withiut knowledge of your assumptions.
The AVR we were considering had 5 cycles latency in, 4 cycles out and ran at 20MHz. I believe the total overhead for the ARM based MCU was ~30 and clocked at 48MHz. It didn't work out. Believe the ARM was an M4 core. Neither ended up being selected but it's just an example.One thing to consider here is that (unlike most other microcontrollers) an ARM Cortex has already saved a whole bunch of register on the stack when you enter the interrupt routine. On most other microcontrollers the software has to push the registers onto the stack by itself before being able to do something useful. The latter adds to the total latency of the interrupt handling.
The AVR we were considering had 5 cycles latency in, 4 cycles out and ran at 20MHz. I believe the total overhead for the ARM based MCU was ~30 and clocked at 48MHz. It didn't work out. Believe the ARM was an M4 core. Neither ended up being selected but it's just an example.One thing to consider here is that (unlike most other microcontrollers) an ARM Cortex has already saved a whole bunch of register on the stack when you enter the interrupt routine. On most other microcontrollers the software has to push the registers onto the stack by itself before being able to do something useful. The latter adds to the total latency of the interrupt handling.
Yes, we actually didn't need to save or restore anything because we kept the registers from the compiler. Bit of an edge case. The ARM manuals did specify the registers automatically saved and restored. If you use floating point hardware it saved and restored all of those registers which more than doubles the overhead if I remember right. It's good because you don't have to worry, bad because you can't do anything about it. In our case the AVR worked except we had no time to manage UI and user IO stuff. After testing we decided that wasn't ideal to require a reset to change modes and things. The ARM would have worked better there but the interrupt time wouldn't work. It's likely we could have found the perfect ARM MCU but we fell back on our goto MCU instead.
The AVR we were considering had 5 cycles latency in, 4 cycles out and ran at 20MHz. I believe the total overhead for the ARM based MCU was ~30 and clocked at 48MHz. It didn't work out. Believe the ARM was an M4 core. Neither ended up being selected but it's just an example.One thing to consider here is that (unlike most other microcontrollers) an ARM Cortex has already saved a whole bunch of register on the stack when you enter the interrupt routine. On most other microcontrollers the software has to push the registers onto the stack by itself before being able to do something useful. The latter adds to the total latency of the interrupt handling.
Yes, we actually didn't need to save or restore anything because we kept the registers from the compiler. Bit of an edge case. The ARM manuals did specify the registers automatically saved and restored. If you use floating point hardware it saved and restored all of those registers which more than doubles the overhead if I remember right. It's good because you don't have to worry, bad because you can't do anything about it. In our case the AVR worked except we had no time to manage UI and user IO stuff. After testing we decided that wasn't ideal to require a reset to change modes and things. The ARM would have worked better there but the interrupt time wouldn't work. It's likely we could have found the perfect ARM MCU but we fell back on our goto MCU instead.
afair an M4 is 12 or 29 cycles with/without fpu, and with lazy stacking the hardware itself figures if an interrupt
uses the FPU
The AVR we were considering had 5 cycles latency in, 4 cycles out and ran at 20MHz. I believe the total overhead for the ARM based MCU was ~30 and clocked at 48MHz. It didn't work out. Believe the ARM was an M4 core. Neither ended up being selected but it's just an example.One thing to consider here is that (unlike most other microcontrollers) an ARM Cortex has already saved a whole bunch of register on the stack when you enter the interrupt routine. On most other microcontrollers the software has to push the registers onto the stack by itself before being able to do something useful. The latter adds to the total latency of the interrupt handling.
Yes, we actually didn't need to save or restore anything because we kept the registers from the compiler. Bit of an edge case. The ARM manuals did specify the registers automatically saved and restored. If you use floating point hardware it saved and restored all of those registers which more than doubles the overhead if I remember right. It's good because you don't have to worry, bad because you can't do anything about it. In our case the AVR worked except we had no time to manage UI and user IO stuff. After testing we decided that wasn't ideal to require a reset to change modes and things. The ARM would have worked better there but the interrupt time wouldn't work. It's likely we could have found the perfect ARM MCU but we fell back on our goto MCU instead.
afair an M4 is 12 or 29 cycles with/without fpu, and with lazy stacking the hardware itself figures if an interrupt
uses the FPU
Our reference was http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16366.html
I didn't do the calculations but it's the data we have. You could be right that it's 12in 12 out, so 24. Slightly better than the AVR, but not by enough for our purposes.
(unlike most other microcontrollers) an ARM Cortex has already saved a whole bunch of register on the stack when you enter the interrupt routine.
push r1
push r0
in r0, 0x3f ; PSR
push r0
eor r1, r1
(assembly language programmers complain about the save/setup of "known zero" R1.) We're up to about 15 cycles, which is more than ARM CM0-4 takes ALL THE TIME.Add also that you may have to save even more on the stack if you use floating point. It gets very painful, very fast, and it’s not unusual to have to start disassembling and massaging the ISR code or modifiers accordingly. One problem with this approach is that your well meaning fiddling makes the code less maintainable when someone comes along withiut knowledge of your assumptions.Personally I try hard to avoid being dependant on software when it comes down to nanosecond timing on a regular microcontroller because it is hard to achieve and hard to maintain. I guess these situations are likely originating from a hardware designer thinking 'they can fix this in software for sure'.
1) Caches are a moot point. They tend to be disabled by default. Just don't enable them.You think so, eh? I'm talking less about formal "cache memory that you need to enable" and more about things like the "2*64bit prefetch buffer" (stm32f1xx) or the "Enhanced flash memory accelerator" (LPC176x)Your lovely single-cycle 120MHz RISC CPU isn't going to run so well if every instruction takes 5 additional cycles to fetch from the flash program memory (SAMD51. But 50ns access times for flash seem to be "typical.")
10) Previous point tldr: With AVR, every beginner has a clear and simple route to follow. On ARM MCUs, everybody's teaching different way, it's hard to know what to do, and it often looks difficult and complex - many code examples are long just to blink an LED.
11) Once more: the biggest issue I see on learning ARM MCUs is lack of simple, easy to understand, lightweight examples, and tutorials to do sane, sustainable development.
2) Getting an STM32 blink an LED requires one (1) register write more than an AVR
ldr r1, =IOPORT_BASE ;; load address of ioport registers. (48bits!)
mov r0, #(1<<pinno) ;; load bit that needs set (assumes that pinno<=7, on CM0.)
;; 32bits on v7m, for all single-bit values.
;; needs to be another 48bit "ld r0,=b" on v6m, or maybe
;; a mov followed by a shift.
str r0, [r1, IOPORT_SET] ;; write to the SETABIT register. (1 word)
ARM assembly is easy peasy
Another problem is the mistake of lumping all ARMs together.
Especially from the POV of a beginner where the biggest hurdle is learning a toolchain and getting programming working.
An STM32 is as different to a NXP ARM as it is to a PIC. It's all about the peripherals and the tools.
QuoteARM assembly is easy peasyPay no attention to the 4 possible instruction encodings for loading a constant into a register, that range from 16 to 48 bits of flash space, and half of which are not available on a CM0... It's a RISC CPU and RISC always has completely regular instructions!
There is currently a strange pattern of behaviour when offering MCUs to customers:
- For engineers: MCUs mostly sell for their peripheral and memory content, and the core doesn't matter a lot. Even the speed of the core doesn't matter a lot, because most MCUs aren't run at their full speed.
- For managers: If it doesn't have an ARM core they aren't interested. If it does have an ARM core, they will happily sit through a sales pitch for a device that is a horrible mismatch for their needs.
There is currently a strange pattern of behaviour when offering MCUs to customers:
- For engineers: MCUs mostly sell for their peripheral and memory content, and the core doesn't matter a lot. Even the speed of the core doesn't matter a lot, because most MCUs aren't run at their full speed.
- For managers: If it doesn't have an ARM core they aren't interested. If it does have an ARM core, they will happily sit through a sales pitch for a device that is a horrible mismatch for their needs.
For marketers: Our device is powered by cutting-edge 2-core ARM processor.
usually you'd use an LDR Rx,[PC+y] instruction to load a constant into a register on ARM.
if ((PORTB & SWITCHMASK) == MY_SWITCH_COMBO ...