Seems it has two levels of hardware register stacking (on chip shadow registers), which should take just 1 cycle to save/restore RA and all the A and T registers.
Does this mean if I have no more than two interrupt priorities, I can let the hardware shadow those registers?
Yes, but not simply priorities, but nested interrupts. Priority just decides which of two simultaneous interrupts is serviced first. You can't get nested interrupts unless you explicitly re-enable interrupts inside your interrupt function, which you must not do unless you've read and saved all useful CSRs first! e.g. mepc, mcause, mtval, mstatus.MPP, mstatus.MPIE. Considerable care is needed.
And it appears that the saved registers agree with the RISC-V calling ABI. Or am I mistaken?
Sure. The function itself should as usual, save and restore any S registers it uses.
For context, I decided at the start to write code that will work with any RISC-V compiler, so on the CH32V003 I did not enable the automatic interrupt register saving. Since it was saving to memory, I was not really sure how much that would save anyway.
Previous experimenting showed that on CH32V003 the HPE was faster or at least no slower unless the number of registers you needed to save was truly small -- 3 or less, maybe?
So if I enable this feature and add an assembly "mret" at the end of a handler written as an ordinary function, this will work? (I don't mind losing two bytes for the extra "ret".)
Nooooooooo!!!
Returning behind the compiler's back means that it won't get a chance to restore any S registers it saved, or delete the stack frame.
The "correct" way to do it is to use
__attribute __((interrupt("WCH-Interrupt-fast"))) with WCH's own compiler (or this *might* have been upstreamed by now -- the previous time we discussed this here was 18 months ago).
Or to roll it yourself with a generic compiler the correct way is:
void my_handler_inner(){
// ...
}
__attribute__((naked))
void my_handler(){
asm volatile ("call my_handler_inner");
asm volatile ("mret");
}
The inner handler must NOT be inlined. Using an asm statement to call it both ensures this and also makes Clang happy as it doesn't allow non-asm code in a naked function.
User jnk0le proposed an extension to the existing __attribute__((interrupt)) feature which would work for the CH32V003, for the bigger WCH cores, and for any future core with a similar feature. He suggested (for CH32V003):
__attribute__((interrupt, prestacked("x1,x5-x7,x10-x15")))
This proposal was made in October 2023. I don't know the current status.