The WCH CH32V003 microcontroller with its QingKeV2 RISC-V CPU core has a slightly different interrupt controller (named Programmable Fast Interrupt Controller, or PFIC) than the higher-spec CH32V30x series with the QingKeV4 CPU core, specifically related to how the Hardware Prologue/Epilogue (HPE) feature works.
The HPE hardware stacking feature, when enabled, automatically saves a certain set of CPU registers upon an interrupt firing, and restores them again after the ISR returns. This way, code within the ISR does not have to perform this work, and thus enhancing performance by supposedly decreasing interrupt latency (i.e. how long between the event that triggers the interrupt, and the ISR being able to perform whatever actions in response).
On the CH32V30x, the HPE feature saves registers in a single cycle to a private internal hardware stack area which supports 3 levels of depth. However, on the CH32V003, HPE saves registers to the general stack area in RAM (and with a maximum depth of 2).
However, in a recent other
thread, the discussion turned to whether, given that the CH32V003 just saves to in-RAM stack like normal, there is actually any performance advantage to interrupt latency.
So, I decided to benchmark it, following roughly the same testing methodology as newbrain in a
thread about the CH32V307.
The code I wrote to perform the benchmark goes as follows:
- PC1 and PC2 are configured as outputs, and PC3 as an input.
- PC1 is connected externally to PC3.
- An EXTI interrupt is configured for PC3 with a rising-edge trigger.
- In the main loop, PC1 is set high.
- In the EXTI ISR, PC2 is set high.
- The ISR also makes a call to a sub-function, in order to create a worst-case scenario where all registers have to be saved.
Using an oscilloscope, I measured the period between the rising edges of the two output signals (PC1 in main loop to PC2 in ISR).
For the test case with HPE disabled, the ISR was marked with
__attribute__((interrupt)). For the case with HPE enabled, the ISR uses
__attribute__((interrupt("WCH-Interrupt-fast"))). The without-HPE case was compiled using mainline GCC 12.2 and
-mabi=ilp32e -march=rv32ec_zicsr -mcmodel=medany -misa-spec=2.2 options. The with-HPE case was compiled using WCH's GCC 8.2 fork and
-mabi=ilp32e -mcmodel=medany -march=rv32ecxw options. Both were also compiled with
-Os.
The chip was initialised to run at the default 24 MHz using the HSI oscillator.
Here is the ISR disassembly for the case without HPE:
0000021a <EXTI7_0_IRQHandler>:
21a: fd810113 addi sp,sp,-40
21e: c23a sw a4,4(sp)
220: c03e sw a5,0(sp)
222: d206 sw ra,36(sp)
224: d016 sw t0,32(sp)
226: ce1a sw t1,28(sp)
228: cc1e sw t2,24(sp)
22a: ca2a sw a0,20(sp)
22c: c82e sw a1,16(sp)
22e: c632 sw a2,12(sp)
230: c436 sw a3,8(sp)
232: 400117b7 lui a5,0x40011
236: 4711 li a4,4
238: cb98 sw a4,16(a5)
23a: 3f31 jal 156 <foo>
23c: 400107b7 lui a5,0x40010
240: 40078793 addi a5,a5,1024 # 40010400 <__global_pointer$+0x2000fc00>
244: 577d li a4,-1
246: cbd8 sw a4,20(a5)
248: 5092 lw ra,36(sp)
24a: 5282 lw t0,32(sp)
24c: 4372 lw t1,28(sp)
24e: 43e2 lw t2,24(sp)
250: 4552 lw a0,20(sp)
252: 45c2 lw a1,16(sp)
254: 4632 lw a2,12(sp)
256: 46a2 lw a3,8(sp)
258: 4712 lw a4,4(sp)
25a: 4782 lw a5,0(sp)
25c: 02810113 addi sp,sp,40
260: 30200073 mret
And the ISR disassembly for the case with HPE:
0000021a <EXTI7_0_IRQHandler>:
21a: 400117b7 lui a5,0x40011
21e: 4711 li a4,4
220: cb98 sw a4,16(a5)
222: 3f15 jal 156 <foo>
224: 400107b7 lui a5,0x40010
228: 577d li a4,-1
22a: 40e7aa23 sw a4,1044(a5) # 40010414 <__global_pointer$+0x2000fc14>
22e: 40078793 addi a5,a5,1024
232: 30200073 mret
The results are as follows:
Without HPE: 1.45 us
With HPE: 0.87 us
So, using HPE results in an interrupt latency that is 0.58 us shorter!
The supposition was that because the CH32V003 simply does in hardware what software would be doing anyway, it would be little-to-no faster. But here we see that it is over half a microsecond faster. This must mean that there is definitely something special going on when it is saving to the stack with hardware.