Author Topic: How are interrupts handlers implemented? (Read 7725 times)

mikeselectricstuff · « **Reply #50 on:** May 10, 2022, 11:03:35 am »

Something I missed when they went to the Cortex architecture is the FIQ, with its dedicated register bank and ability to put the ISR directly at the vector address. This could achieve extremely low latency, allowing tricks like reading low-res data directly from cameras.

brucehoult · « **Reply #51 on:** May 10, 2022, 11:24:59 am »

Quote from: DiTBho on May 10, 2022, 10:04:14 am

Quote from: brucehoult on May 10, 2022, 08:53:44 am
Simpler internally, or simpler to use?

Internally. I am a RISC-purist, MIPS-addicted.

Yup. MIPS and RISC-V are very similar in this. But always reserving $k0 and $k1 for interrupts seems a bit bodgy. The RISC-V solution to "how do I get a register to work with?" is the mscratch CSR ... in MIPS terms, an unused register in CP0 that the interrupt handler can just write a GPR into (or swap with).

DiTBho · « **Reply #52 on:** May 10, 2022, 12:23:18 pm »

PowerPC and POWER have a similar trick. Not tricky to implement in a Cop0, hence it's welcome for me

westfw · « **Reply #53 on:** May 10, 2022, 10:19:02 pm »

Quote

I don't like it. ARM was simpler years ago.

I'm inclined to agree.
I tried writing a disassembler for CM4, and those thumb2 instruction encodings are just AWFUL, in ways that I thought RISC intentionally avoided. Plain thumb (CM0) isn't too bad, but it has a lot of non-orthogonality and special casing that I again thought would have been foreign to "principles."
I guess the increase in code density is considered worthwhile (at a time when most of a microcontroller die is occupied by code memory), but it's pretty ugly.

Similarly, the NVIC is neat, but ... removes choices from the programmer. I prefer the sort of "has vectors, but how much context to save is all up to you" of some of the simpler architectures.

langwadt · « **Reply #54 on:** May 10, 2022, 10:31:39 pm »

Quote from: DiTBho on May 10, 2022, 07:28:39 am

I don't like it. ARM was simpler years ago.

it was also slower, used more memory and required you to jump through assembly hoops to get things done

brucehoult · « **Reply #55 on:** May 11, 2022, 02:14:11 am »

Quote from: westfw on May 10, 2022, 10:19:02 pm

I tried writing a disassembler for CM4, and those thumb2 instruction encodings are just AWFUL, in ways that I thought RISC intentionally avoided. Plain thumb (CM0) isn't too bad, but it has a lot of non-orthogonality and special casing that I again thought would have been foreign to "principles."

Right.

T16 has a lot more complex encoding than A32, with 19 different instruction formats.

RISC-V C extension also has more complex encoding than the base ISA, with 8 instruction formats vs 4. At least it maintains the property of the base ISA of the bottom three bits of rs1 and rs2 always being in the same place (if they exist) and the MSB (sign bit) of immediate/offset always being in the same place. Which of the two possible register fields is rd (if it exists) does vary though.

T32 encoding seems just random and ugly. It has the excuse of having to fit in around the two actually independent instructions that make up each of the T16 BL/BLX instructions. In T16 you can separate the two instructions and they still work (though assemblers and compilers never do), but in T32 they are actual 32 bit instructions.

A64 encoding seems equally ugly, for no reason apparent to me.

Quote

I guess the increase in code density is considered worthwhile (at a time when most of a microcontroller die is occupied by code memory), but it's pretty ugly.

T16 was also constrained by having to be something close to a complete and efficient ISA in itself, at least for code that a C compiler would generate. The original CPUs could always switch back to A32 mode if you needed a weird thing (such as the hi half of a multiply, or some system function) but you couldn't just randomly and efficiently throw a 32 bit opcode in the middle of 16 bit code. The CM0 has a handful of T32 instructions for those purposes, and you can intermix them.

RVC was designed with the knowledge that it didn't have to be complete, because using a full size instruction instead is always possible at any point.

Quote

Similarly, the NVIC is neat, but ... removes choices from the programmer. I prefer the sort of "has vectors, but how much context to save is all up to you" of some of the simpler architectures.

NVIC is easy, but one size fits all. Short simple functions that need only one or two working registers can get lower latency with a simpler mechanism.

brucehoult · « **Reply #56 on:** May 11, 2022, 02:18:23 am »

Quote from: langwadt on May 10, 2022, 10:31:39 pm

Quote from: DiTBho on May 10, 2022, 07:28:39 am
I don't like it. ARM was simpler years ago.

it was also slower, used more memory and required you to jump through assembly hoops to get things done

True, but the necessary assembly language to provide all NVIC features is very nearly as fast (maybe faster on a dual-issue or wider core), only a couple of dozen instructions long, and can literally be printed in the manual, or supplied in platform or compiler libraries so 99% of programmers don't have to write or understand it themselves.

ejeffrey · « **Reply #57 on:** May 11, 2022, 04:10:33 am »

Quote from: langwadt on May 10, 2022, 10:31:39 pm

Quote from: DiTBho on May 10, 2022, 07:28:39 am
I don't like it. ARM was simpler years ago.

it was also slower, used more memory and required you to jump through assembly hoops to get things done

I'm not any kind of CPU architecture expert, but the argument that "ARM Cortex-M interrupts follow the C calling convention so you can just write C handlers" always seemed a big pointless. Plenty of other platforms let you write C ISRs with just a keyword or attribute to specify the calling convention. It's trivial for the compiler to add a slightly different prologue / epilogue which can then be partially optimized away based on the registers actually used by the ISR. Saving and restoring half the register file plus checking for magic values on branch instructions seems like an awful lot of baggage to avoid adding __attribute__((interrupt)) to at most a couple dozen functions in your project. Again, I'm not an expert and there may be other reasons why the ARM approach is desirable, but that particular argument doesn't seem particularly compelling.

nctnico · « **Reply #58 on:** May 11, 2022, 08:24:22 am »

Quote from: ejeffrey on May 11, 2022, 04:10:33 am

Quote from: langwadt on May 10, 2022, 10:31:39 pm
Quote from: DiTBho on May 10, 2022, 07:28:39 am
I don't like it. ARM was simpler years ago.

it was also slower, used more memory and required you to jump through assembly hoops to get things done

I'm not any kind of CPU architecture expert, but the argument that "ARM Cortex-M interrupts follow the C calling convention so you can just write C handlers" always seemed a big pointless. Plenty of other platforms let you write C ISRs with just a keyword or attribute to specify the calling convention. It's trivial for the compiler to add a slightly different prologue / epilogue which can then be partially optimized away based on the registers actually used by the ISR. Saving and restoring half the register file plus checking for magic values on branch instructions seems like an awful lot of baggage to avoid adding __attribute__((interrupt)) to at most a couple dozen functions in your project. Again, I'm not an expert and there may be other reasons why the ARM approach is desirable, but that particular argument doesn't seem particularly compelling.

It is not that simple. On the older ARM architectures (like ARM7TDMI) you'll need a wrapper (written in assembler) to demultiplex the interrupts from a vectored interrupt handler peripheral. This more or less requires you to save all the registers on the stack anyway. Also, many ARM Cortex controllers have DMA nowadays which takes away the need for interrupts with a high repetition rate.

Siwastaja · « **Reply #59 on:** May 11, 2022, 10:38:40 am »

Also actual complex systems benefit from tail-chaining, which is realistically only possible if the hardware does the stacking (because hardware can trivially check if another interrupt is pending).

With software push/pop, you save a few cycles on some simple handlers by not stacking everything, but then if another IRQ gets pending during the first, the sooner you get there the better, but having the software stupidly pop all the registers just to push them all again in the next handler is wasted time and this happens in the worst case, making long wait even longer.

brucehoult · « **Reply #60 on:** May 11, 2022, 11:38:03 am »

Quote from: Siwastaja on May 11, 2022, 10:38:40 am

Also actual complex systems benefit from tail-chaining, which is realistically only possible if the hardware does the stacking (because hardware can trivially check if another interrupt is pending).

With software push/pop, you save a few cycles on some simple handlers by not stacking everything, but then if another IRQ gets pending during the first, the sooner you get there the better, but having the software stupidly pop all the registers just to push them all again in the next handler is wasted time and this happens in the worst case, making long wait even longer.

That's easy to avoid to suitable hardware design.

Here, once again, is standard (i.e. published for all to use) RISC-V interrupt handler code that implements interrupt chaining and late arrival of high priority interrupts.

https://github.com/riscv/riscv-fast-interrupt/blob/master/clic.adoc#c-abi-trampoline-code

If an interrupt comes in while another handler is executing, there are 5 instructions from one handler returning to the new one being called.

langwadt · « **Reply #61 on:** May 11, 2022, 02:30:50 pm »

Quote from: brucehoult on May 11, 2022, 11:38:03 am

Quote from: Siwastaja on May 11, 2022, 10:38:40 am
Also actual complex systems benefit from tail-chaining, which is realistically only possible if the hardware does the stacking (because hardware can trivially check if another interrupt is pending).

With software push/pop, you save a few cycles on some simple handlers by not stacking everything, but then if another IRQ gets pending during the first, the sooner you get there the better, but having the software stupidly pop all the registers just to push them all again in the next handler is wasted time and this happens in the worst case, making long wait even longer.

That's easy to avoid to suitable hardware design.

Here, once again, is standard (i.e. published for all to use) RISC-V interrupt handler code that implements interrupt chaining and late arrival of high priority interrupts.

https://github.com/riscv/riscv-fast-interrupt/blob/master/clic.adoc#c-abi-trampoline-code

If an interrupt comes in while another handler is executing, there are 5 instructions from one handler returning to the new one being called.

so a bunch of carefully handcrafted assembly taking up code memory and probably with waitstates

Siwastaja · « **Reply #62 on:** May 11, 2022, 03:37:00 pm »

Quote from: langwadt on May 11, 2022, 02:30:50 pm

so a bunch of carefully handcrafted assembly taking up code memory and probably with waitstates

Indeed, a hardware stacker can run in parallel with flash controller prefetching the vector address, and then, prefetching the first instructions of the ISR. With software solution, you just wait for the flash, doing nothing, and then start stacking.

But as always, the devil is in the details, and I'm 100% there are many cases where the software solution ends up being faster. But I like the ARM Cortex way, really. It gives consistently small (albeit not always the absolute minimum) latency, minimizes code size and enables standard functions to be used as handlers, although as ejeffrey says, the last one isn't practically that important.

ejeffrey · « **Reply #63 on:** May 11, 2022, 03:46:10 pm »

Quote from: nctnico on May 11, 2022, 08:24:22 am

It is not that simple. On the older ARM architectures (like ARM7TDMI) you'll need a wrapper (written in assembler) to demultiplex the interrupts from a vectored interrupt handler peripheral. This more or less requires you to save all the registers on the stack anyway. Also, many ARM Cortex controllers have DMA nowadays which takes away the need for interrupts with a high repetition rate.

I've never used ARM7TDMI, but I wasn't really comparing a vectored vs. non-vectored controller, I was more comparing it to something like x86 or (to my understanding) 68k where there is an interrupt vector table but the CPU only saves minimal state and ISRs use a dedicated iret instruction to restore processor state and return. On these systems you can still write ISRs in C if you just mark them as such to the compiler. Like I said, I'm not an expert on the performance and complexity tradeoffs but "the CPU implements the platform C ABI and uses a magic return value so you can write handlers in C" is a bit of a silly argument on ARM's part because it's been possible to write ISRs in C for ages.

That said, an interrupt demultiplexer which could be but in no way needs to be written in assembly as long as you have macros/intrinsics for accessing the interrupt source is pretty simple and low overhead and only needs to be written once. Maybe that makes ISR latency worse or is in some other way less desirable but simply avoiding that is not a very compelling argument to me.

langwadt · « **Reply #64 on:** May 11, 2022, 04:09:40 pm »

Quote from: ejeffrey on May 11, 2022, 03:46:10 pm

Quote from: nctnico on May 11, 2022, 08:24:22 am
It is not that simple. On the older ARM architectures (like ARM7TDMI) you'll need a wrapper (written in assembler) to demultiplex the interrupts from a vectored interrupt handler peripheral. This more or less requires you to save all the registers on the stack anyway. Also, many ARM Cortex controllers have DMA nowadays which takes away the need for interrupts with a high repetition rate.

I've never used ARM7TDMI, but I wasn't really comparing a vectored vs. non-vectored controller, I was more comparing it to something like x86 or (to my understanding) 68k where there is an interrupt vector table but the CPU only saves minimal state and ISRs use a dedicated iret instruction to restore processor state and return. On these systems you can still write ISRs in C if you just mark them as such to the compiler.

but then you'll have multiple copies of the stacking/restoring code taking up flash, and probably with waitstates so it is slow

SiliconWizard · « **Reply #65 on:** May 11, 2022, 05:16:15 pm »

Quote from: brucehoult on May 11, 2022, 11:38:03 am

Quote from: Siwastaja on May 11, 2022, 10:38:40 am
Also actual complex systems benefit from tail-chaining, which is realistically only possible if the hardware does the stacking (because hardware can trivially check if another interrupt is pending).

With software push/pop, you save a few cycles on some simple handlers by not stacking everything, but then if another IRQ gets pending during the first, the sooner you get there the better, but having the software stupidly pop all the registers just to push them all again in the next handler is wasted time and this happens in the worst case, making long wait even longer.

That's easy to avoid to suitable hardware design.

Here, once again, is standard (i.e. published for all to use) RISC-V interrupt handler code that implements interrupt chaining and late arrival of high priority interrupts.

https://github.com/riscv/riscv-fast-interrupt/blob/master/clic.adoc#c-abi-trampoline-code

If an interrupt comes in while another handler is executing, there are 5 instructions from one handler returning to the new one being called.

It's hard to beat a completely hardware solution here though. Now granted it may just be a matter of a couple cycles, and the software approach is more flexible.
The beauty of RISC-V apart from its simplicity is that you can easily extend it. With ARM, you get what they give (uh, sell) you.

ejeffrey · « **Reply #66 on:** May 11, 2022, 05:50:47 pm »

Quote from: langwadt on May 11, 2022, 04:09:40 pm

Quote from: ejeffrey on May 11, 2022, 03:46:10 pm
Quote from: nctnico on May 11, 2022, 08:24:22 am
It is not that simple. On the older ARM architectures (like ARM7TDMI) you'll need a wrapper (written in assembler) to demultiplex the interrupts from a vectored interrupt handler peripheral. This more or less requires you to save all the registers on the stack anyway. Also, many ARM Cortex controllers have DMA nowadays which takes away the need for interrupts with a high repetition rate.

I've never used ARM7TDMI, but I wasn't really comparing a vectored vs. non-vectored controller, I was more comparing it to something like x86 or (to my understanding) 68k where there is an interrupt vector table but the CPU only saves minimal state and ISRs use a dedicated iret instruction to restore processor state and return. On these systems you can still write ISRs in C if you just mark them as such to the compiler.

but then you'll have multiple copies of the stacking/restoring code taking up flash, and probably with waitstates so it is slow

Yes, it seems pretty likely that if your typical ISR requires all 8 caller saved registers it's probably more efficient to have the CPU do it, especially on a microcontroller that is executing from flash with wait states but can access the stack in SRAM in a single cycle. That said ARM has STM/LDM that could do the register stacking/unstacking with a single instruction each so it wouldn't really be saving much space or time even executing from flash. The tradeoff is that if your ISR only requires 2-3 registers -- which would be typical of an ISR that reads a word from an IO register and stores it in a buffer -- you are doing unnecessary save/restores.

I definitely don't know enough to know how often these situations come up in real applications. My point was just that the argument "this approach is great because you can use C functions as ISRs" is both unimportant and disingenuous. Interrupt latency and performance matter, the ability to use standard calling convention C functions as ISRs matter very little.

langwadt · « **Reply #67 on:** May 11, 2022, 08:27:37 pm »

Quote from: ejeffrey on May 11, 2022, 05:50:47 pm

Quote from: langwadt on May 11, 2022, 04:09:40 pm
Quote from: ejeffrey on May 11, 2022, 03:46:10 pm
Quote from: nctnico on May 11, 2022, 08:24:22 am
It is not that simple. On the older ARM architectures (like ARM7TDMI) you'll need a wrapper (written in assembler) to demultiplex the interrupts from a vectored interrupt handler peripheral. This more or less requires you to save all the registers on the stack anyway. Also, many ARM Cortex controllers have DMA nowadays which takes away the need for interrupts with a high repetition rate.

I've never used ARM7TDMI, but I wasn't really comparing a vectored vs. non-vectored controller, I was more comparing it to something like x86 or (to my understanding) 68k where there is an interrupt vector table but the CPU only saves minimal state and ISRs use a dedicated iret instruction to restore processor state and return. On these systems you can still write ISRs in C if you just mark them as such to the compiler.

but then you'll have multiple copies of the stacking/restoring code taking up flash, and probably with waitstates so it is slow

Yes, it seems pretty likely that if your typical ISR requires all 8 caller saved registers it's probably more efficient to have the CPU do it, especially on a microcontroller that is executing from flash with wait states but can access the stack in SRAM in a single cycle. That said ARM has STM/LDM that could do the register stacking/unstacking with a single instruction each so it wouldn't really be saving much space or time even executing from flash. The tradeoff is that if your ISR only requires 2-3 registers -- which would be typical of an ISR that reads a word from an IO register and stores it in a buffer -- you are doing unnecessary save/restores.

but the hardware is clever enough to fetch instructions in parallel with the stacking (and unstacking) so there probably isn't much to gain

Quote from: ejeffrey on May 11, 2022, 05:50:47 pm

I definitely don't know enough to know how often these situations come up in real applications. My point was just that the argument "this approach is great because you can use C functions as ISRs" is both unimportant and disingenuous. Interrupt latency and performance matter, the ability to use standard calling convention C functions as ISRs matter very little.

maybe, but not requiring the compiler to support some special decoration of ISRs is convenient and the code generation should be very well optimized and debugged for the using registers according to the normal calling convention

brucehoult · « **Reply #68 on:** May 11, 2022, 08:49:53 pm »

Quote from: langwadt on May 11, 2022, 02:30:50 pm

Quote from: brucehoult on May 11, 2022, 11:38:03 am
Quote from: Siwastaja on May 11, 2022, 10:38:40 am
Also actual complex systems benefit from tail-chaining, which is realistically only possible if the hardware does the stacking (because hardware can trivially check if another interrupt is pending).

With software push/pop, you save a few cycles on some simple handlers by not stacking everything, but then if another IRQ gets pending during the first, the sooner you get there the better, but having the software stupidly pop all the registers just to push them all again in the next handler is wasted time and this happens in the worst case, making long wait even longer.

That's easy to avoid to suitable hardware design.

Here, once again, is standard (i.e. published for all to use) RISC-V interrupt handler code that implements interrupt chaining and late arrival of high priority interrupts.

https://github.com/riscv/riscv-fast-interrupt/blob/master/clic.adoc#c-abi-trampoline-code

If an interrupt comes in while another handler is executing, there are 5 instructions from one handler returning to the new one being called.

so a bunch of carefully handcrafted assembly taking up code memory and probably with waitstates

Yes, carefully crafted once by the experts who designed the interrupt hardware, in parallel with designing that hardware. For an ABI similar to the Cortex-M one I make it 94 bytes of code. Wait states is up to the implementation. A chip manufacturer could put the code in on-chip ROM. The linker script could put it in SRAM -- ITCM in ARM terminology, ITIM in the RISC-V world (or at least SiFive)

MadScientist · « **Reply #69 on:** May 11, 2022, 09:45:01 pm »

To get back to the core question. Interrupt programming doesn’t require any “ manufacturers “ code or predetermined setup. If you are writing in assembler you just write your interrupt function following the constraints for the processor architecture.
If writing in c , then the startup code , which you can equally write from scratch yourself , will typically place dummy nul Vectors and you code your C function instructing the linker to link the file appropriately.

Again there is no “ magic “ code. Hence any IDE or toolchain can be used.

Of course C compilers are provided with “ typical “ startup code or manufacturers will provide” canned “ sample code etc. But this isn’t needed , you can write your own quite easy. ( and yes it can all be in C , no assembly required ! )

At the end of the day , all interrupts are , is essentially an abrupt change of the program counter , the processor then expects instructions or code to be at that location. All languages designed for embedded use have facilities to place code at specific memory locations.

westfw · « **Reply #70 on:** May 12, 2022, 05:18:00 am »

Quote

yes it can all be in C , no assembly required !

That depends on the architecture. You can't do a PIC8 or AVR8 ISR without either ASM, or hooks built in to the compiler (the hypothetical "ISR" attribute/tag/macro/pragma/whatever.) I don't think you could do an ARM32 (pre NVIC), either. C does not have explicit access to the stack, nor to some of the context that needs saved.

DiTBho · « **Reply #71 on:** May 12, 2022, 11:57:02 am »

R16K was the last MIPS-IV released to the public, before its dead end, MIPS had then moved on to MIPS32 and MIPS64.

My Atlas-YellowKnife accepts CPU modules and I received a MIPS-IV R18200 prototype as sample from a little company.

It's a n FPGA-CPU board, but adapted for the Atlas motherboard released years ago by MIPS inc.. Same connectors, etc.

After R16K there was a plan to add a Nested Vectored Interrupt Controller, but it has been removed, and cop0 has no hardware support for nested interrupts. According to the user manual, they also sound banned from the software side.

MIPS is dead like a walking-dead, if you listen to carefully, you can hear from its tomb what it's saying - "nested interrupts are eeevvviiilll"!

harsh words, not to be underestimated for personal ego

brucehoult · « **Reply #72 on:** May 12, 2022, 10:46:56 pm »

Quote from: DiTBho on May 12, 2022, 11:57:02 am

MIPS is dead like a walking-dead, if you listen to carefully, you can hear from its tomb what it's saying - "nested interrupts are eeevvviiilll"!

They're making RISC-V stuff now (first chips were announced this week, shipping in September) so they don't have a choice :-)

nctnico · « **Reply #73 on:** May 13, 2022, 08:26:43 am »

Quote from: ejeffrey on May 11, 2022, 05:50:47 pm

Quote from: langwadt on May 11, 2022, 04:09:40 pm
Quote from: ejeffrey on May 11, 2022, 03:46:10 pm
Quote from: nctnico on May 11, 2022, 08:24:22 am
It is not that simple. On the older ARM architectures (like ARM7TDMI) you'll need a wrapper (written in assembler) to demultiplex the interrupts from a vectored interrupt handler peripheral. This more or less requires you to save all the registers on the stack anyway. Also, many ARM Cortex controllers have DMA nowadays which takes away the need for interrupts with a high repetition rate.

I've never used ARM7TDMI, but I wasn't really comparing a vectored vs. non-vectored controller, I was more comparing it to something like x86 or (to my understanding) 68k where there is an interrupt vector table but the CPU only saves minimal state and ISRs use a dedicated iret instruction to restore processor state and return. On these systems you can still write ISRs in C if you just mark them as such to the compiler.

but then you'll have multiple copies of the stacking/restoring code taking up flash, and probably with waitstates so it is slow

Yes, it seems pretty likely that if your typical ISR requires all 8 caller saved registers it's probably more efficient to have the CPU do it, especially on a microcontroller that is executing from flash with wait states but can access the stack in SRAM in a single cycle. That said ARM has STM/LDM that could do the register stacking/unstacking with a single instruction each so it wouldn't really be saving much space or time even executing from flash. The tradeoff is that if your ISR only requires 2-3 registers -- which would be typical of an ISR that reads a word from an IO register and stores it in a buffer -- you are doing unnecessary save/restores.

I definitely don't know enough to know how often these situations come up in real applications. My point was just that the argument "this approach is great because you can use C functions as ISRs" is both unimportant and disingenuous. Interrupt latency and performance matter, the ability to use standard calling convention C functions as ISRs matter very little.

It actually makes life a whole lot easier. In many cases the compiler provided way (on older microcontrollers) depends on having seperate vectors for each interrupt which typically require a jump (increasing interrupt latency) to the actual routine as well. Add nested interrupts to that and things get complicated quickly. When you are going to add in stuff like naked C functions (which could be prone error due to the next software engineer not understanding what is going on) things can get really messy in terms of maintainability. OTOH the NVIC found in ARM Cortex-Mx microcontrollers solves all this in hardware and offers a very clean interface to the software developer. What is not to like about that?

On top of that, interrupt latency is highly overrated where it comes to modern microcontroller runnings at 10's of MHz. If your application depends on interrupt latency on a modern microcontroller, then there is something seriously wrong with how the system (hardware + software) has been designed. There are better ways to achieve the same goal (DMA for example).


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: How are interrupts handlers implemented? (Read 7725 times)

Share me