Author Topic: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data  (Read 6015 times)

0 Members and 1 Guest are viewing this topic.

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #25 on: March 30, 2023, 06:39:35 am »
I can't remember which thread it was or who did it, but there was someone here on the EEVblog forum that did some benchmarking of regular vs. HPE vs. VTF interrupts and found that HPE did make a significant different in ISR latency, but VTF added basically nothing.
It might have been me, I haven't seen other similar benchmarks.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4037
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #26 on: March 30, 2023, 07:09:49 am »
I can't remember which thread it was or who did it, but there was someone here on the EEVblog forum that did some benchmarking of regular vs. HPE vs. VTF interrupts and found that HPE did make a significant different in ISR latency, but VTF added basically nothing.
It might have been me, I haven't seen other similar benchmarks.

Note that those tests appear to have been using a bigger core (CH32V307 is in the post title), which probably does have on-chip duplicate register sets for the hardware register save/retore, not dumping them to the stack in RAM like the CH32V003 is doing.

So the results could be very different on the smaller core.
 

Offline HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1477
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #27 on: March 30, 2023, 08:06:43 am »
It might have been me, I haven't seen other similar benchmarks.

Ah, yes, that was what I was thinking of. Thank you.

Note that those tests appear to have been using a bigger core (CH32V307 is in the post title), which probably does have on-chip duplicate register sets for the hardware register save/retore, not dumping them to the stack in RAM like the CH32V003 is doing.

I think you're right. The CH32V003 (QingKeV2) CPU manual says this regarding HPE: "V2 series microprocessors, support hardware to automatically save 10 of the shaped Caller Saved registers to the user stack area", whereas the CH32V307 (QingKeV4) manual says "The V4 series microprocessors support hardware single cycle automatic saving of 16 of the shaped Caller Saved registers to an internal stack area that is not visible to the user".

Maybe I shall try and replicate newbrain's benchmark test on the CH32V003 sometime.

Anyway, I compiled my code with WCH's compiler (GCC 8.2-based) with HPE enabled and using 'WCH-Interrupt-fast', plus also specifying -march=rv32ecxw, and for that my SysTick_Handler ISR disassembly looks like this:

Code: [Select]
000014d0 <SysTick_Handler>:
    14d0: 84418713          addi a4,gp,-1980 # 20000044 <sys_timestamp>
    14d4: 4354                lw a3,4(a4)
    14d6: 435c                lw a5,4(a4)
    14d8: 1161                addi sp,sp,-8
    14da: c222                sw s0,4(sp)
    14dc: 0785                addi a5,a5,1
    14de: c026                sw s1,0(sp)
    14e0: 84f1a423          sw a5,-1976(gp) # 20000048 <sys_timestamp+0x4>
    14e4: 00d7f663          bgeu a5,a3,14f0 <SysTick_Handler+0x20>
    14e8: 431c                lw a5,0(a4)
    14ea: 0785                addi a5,a5,1
    14ec: 84f1a223          sw a5,-1980(gp) # 20000044 <sys_timestamp>
    14f0: 85818413          addi s0,gp,-1960 # 20000058 <interval_calls>
    14f4: 4481                li s1,0
    14f6: 8722                mv a4,s0
    14f8: 431c                lw a5,0(a4)
    14fa: 00f4eb63          bltu s1,a5,1510 <SysTick_Handler+0x40>
    14fe: 4412                lw s0,4(sp)
    1500: e000f7b7          lui a5,0xe000f
    1504: 0007a223          sw zero,4(a5) # e000f004 <__global_pointer$+0xc000e804>
    1508: 4482                lw s1,0(sp)
    150a: 0121                addi sp,sp,8
    150c: 30200073          mret
    1510: 441c                lw a5,8(s0)
    1512: 17fd                addi a5,a5,-1
    1514: c41c                sw a5,8(s0)
    1516: e799                bnez a5,1524 <SysTick_Handler+0x54>
    1518: 445c                lw a5,12(s0)
    151a: 9782                jalr a5
    151c: 405c                lw a5,4(s0)
    151e: 85818713          addi a4,gp,-1960 # 20000058 <interval_calls>
    1522: c41c                sw a5,8(s0)
    1524: 0485                addi s1,s1,1
    1526: 0431                addi s0,s0,12
    1528: bfc1                j 14f8 <SysTick_Handler+0x28>

Compare to using mainline GCC 12.2 with no HPE, regular 'interrupt' attribute, and -march=rv32ec_zicsr:

Code: [Select]
0000163c <SysTick_Handler>:
    163c: 7179                addi sp,sp,-48
    163e: c03e                sw a5,0(sp)
    1640: 84018793          addi a5,gp,-1984 # 20000040 <sys_timestamp>
    1644: c436                sw a3,8(sp)
    1646: c23a                sw a4,4(sp)
    1648: 43d4                lw a3,4(a5)
    164a: 43d8                lw a4,4(a5)
    164c: d606                sw ra,44(sp)
    164e: d416                sw t0,40(sp)
    1650: d21a                sw t1,36(sp)
    1652: d01e                sw t2,32(sp)
    1654: ce22                sw s0,28(sp)
    1656: cc26                sw s1,24(sp)
    1658: ca2a                sw a0,20(sp)
    165a: c82e                sw a1,16(sp)
    165c: c632                sw a2,12(sp)
    165e: 0705                addi a4,a4,1
    1660: c3d8                sw a4,4(a5)
    1662: 00d77563          bgeu a4,a3,166c <SysTick_Handler+0x30>
    1666: 4398                lw a4,0(a5)
    1668: 0705                addi a4,a4,1
    166a: c398                sw a4,0(a5)
    166c: 85418413          addi s0,gp,-1964 # 20000054 <interval_calls>
    1670: 4481                li s1,0
    1672: 8722                mv a4,s0
    1674: 431c                lw a5,0(a4)
    1676: 02f4e563          bltu s1,a5,16a0 <SysTick_Handler+0x64>
    167a: 4472                lw s0,28(sp)
    167c: e000f7b7          lui a5,0xe000f
    1680: 0007a223          sw zero,4(a5) # e000f004 <__global_pointer$+0xc000e804>
    1684: 50b2                lw ra,44(sp)
    1686: 52a2                lw t0,40(sp)
    1688: 5312                lw t1,36(sp)
    168a: 5382                lw t2,32(sp)
    168c: 44e2                lw s1,24(sp)
    168e: 4552                lw a0,20(sp)
    1690: 45c2                lw a1,16(sp)
    1692: 4632                lw a2,12(sp)
    1694: 46a2                lw a3,8(sp)
    1696: 4712                lw a4,4(sp)
    1698: 4782                lw a5,0(sp)
    169a: 6145                addi sp,sp,48
    169c: 30200073          mret
    16a0: 441c                lw a5,8(s0)
    16a2: 17fd                addi a5,a5,-1
    16a4: c41c                sw a5,8(s0)
    16a6: e799                bnez a5,16b4 <SysTick_Handler+0x78>
    16a8: 445c                lw a5,12(s0)
    16aa: 9782                jalr a5
    16ac: 405c                lw a5,4(s0)
    16ae: 85418713          addi a4,gp,-1964 # 20000054 <interval_calls>
    16b2: c41c                sw a5,8(s0)
    16b4: 0485                addi s1,s1,1
    16b6: 0431                addi s0,s0,12
    16b8: bf75                j 1674 <SysTick_Handler+0x38>

It seems to me that the only difference is indeed that the former only saves s0 and s1.

By the way, it seems the saving in size of code with HPE can be fairly significant. My code was approximately 200 bytes smaller when compiled with WCH's compiler. Not sure how much of that is due to the proprietary 'XW' compact instructions, and how much due to shorter ISR prologue/eplilogue, but given I only have 5 ISRs in my entire codebase, there must be some contribution from the former.
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8172
  • Country: fi
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #28 on: March 30, 2023, 08:12:24 am »
I can't stop liking the ARM's idea that stacking, interrupt entry and exit are completely handled in hardware, and interrupt handlers can be any usual C functions, even if it increases latency to very small/simple interrupt handlers.

Despite the fact it's theoretically very easy to let compiler just handle the generation of correct prologue/epilogue, we still see problems like this arising, compiler instructed with wrong attributes and it's difficult to notice and understand what's going on. And in benchmarking we tend to see that the difference of doing it in HW or SW is very small anyway, either can have an advantage in particular test case but the advantage is always small.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4037
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #29 on: March 30, 2023, 09:29:29 am »
I can't stop liking the ARM's idea that stacking, interrupt entry and exit are completely handled in hardware, and interrupt handlers can be any usual C functions, even if it increases latency to very small/simple interrupt handlers.

It certainly has its attractions for inexperienced programmers using unmodified toolchains.

WCH *almost* achieved that. They just forgot one thing. All would be wonderful if they had added:

  • copy SP to new hidden register/CSR MISP after stacking saved registers (if not using a shadow register set) and before calling the interrupt handler.
  • when RET instruction is executed then IF in_fast_interrupt and SP = MISP THEN execute MRET instead of RET

Then you could use absolutely standard C functions as interrupt handlers, just like on Cortex-M.

Quote
Despite the fact it's theoretically very easy to let compiler just handle the generation of correct prologue/epilogue, we still see problems like this arising, compiler instructed with wrong attributes and it's difficult to notice and understand what's going on. And in benchmarking we tend to see that the difference of doing it in HW or SW is very small anyway, either can have an advantage in particular test case but the advantage is always small.

Yes, the speed advantage to hardware stacking is small to zero or even negative.

Which is why I'd rather see it done in software BUT the tools should be improved to make this transparent to the average user.

The RISC-V people have gone to the trouble of working out exactly what the code looks like to e.g. duplicate the functionality of Arm's NVIC, including chaining of handlers if a new interrupt has come in while another is executing (avoiding restoring and then immediately re-saving registers), checking if a higher priority interrupt has come in during stacking of registers for a low priority interrupt etc.

https://github.com/riscv/riscv-fast-interrupt/blob/master/clic.adoc#calling-c-abi-functions-as-interrupt-handlers

Keep the hardware simple, improve the software tooling. A few dozen bytes of flash (maybe even mask rom in the chip) for the dispatch function is cheap.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14476
  • Country: fr
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #30 on: March 30, 2023, 07:40:53 pm »
I can't stop liking the ARM's idea that stacking, interrupt entry and exit are completely handled in hardware, and interrupt handlers can be any usual C functions, even if it increases latency to very small/simple interrupt handlers.

I can understand that, but the RISC-V approach is at the same time simpler (from the processor's POV) and more flexible.
Now certainly this approach makes it harder IMO for designers of RISC-V CPU cores to implement something with the same level of performance.
And the annoying thing with interrupts on RISC-V is that there are almost as many implementations as there are vendors, each with their own tricks and compiler attributes. But that's the price to pay with modularity. We can't have it all. Downside is that for many RISC-V MCUs, you need a patched version of GCC only distributed by the vendor, since the extensions have not made it to the mainline and may take a good while before making it, unless the vendor is an active member of RISC-V International, as far as I can tell.

One thing I would like to see is return stacks handled in hardware.
Never let compilers and/or developers the ability to mess with return addresses, even if it's by mistake.

I have a few ideas, but it's far from being simple if we want something both secure and flexible.
« Last Edit: March 30, 2023, 07:44:35 pm by SiliconWizard »
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 824
  • Country: es
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #31 on: March 30, 2023, 09:21:03 pm »
Why vendors don’t just provide a template interrupt function with attribute(naked), a couple of asm() lines doing correct custom prologue/epilogue and a //your code … //end of your code section between?
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4037
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #32 on: March 31, 2023, 12:29:56 am »
One thing I would like to see is return stacks handled in hardware.
Never let compilers and/or developers the ability to mess with return addresses, even if it's by mistake.

Even the very first RISC-V chip distributed, the FE-310 in late 2016, had a 2-entry return address stack in the CPU core, but the purpose was to predict the return address (e.g. because a reload of RA from memory hadn't completed yet) rather than to enforce that the return address has not been tampered with.

But see:

https://github.com/riscv/riscv-cfi/blob/main/cfi_backward.adoc
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14476
  • Country: fr
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #33 on: March 31, 2023, 01:41:32 am »
One thing I would like to see is return stacks handled in hardware.
Never let compilers and/or developers the ability to mess with return addresses, even if it's by mistake.

Even the very first RISC-V chip distributed, the FE-310 in late 2016, had a 2-entry return address stack in the CPU core, but the purpose was to predict the return address (e.g. because a reload of RA from memory hadn't completed yet) rather than to enforce that the return address has not been tampered with.

Yes, that's something that is often mentioned for improving branch prediction. But not the same thing indeed.

But see:

https://github.com/riscv/riscv-cfi/blob/main/cfi_backward.adoc

Nice!
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #34 on: March 31, 2023, 04:54:58 am »


Quote
I can't stop liking the ARM's idea that stacking, interrupt entry and exit are completely handled in hardware, and interrupt handlers can be any usual C functions


It's cute, but I'm not sure I like the loss of the ability for really tight ISRs.  (usually the CPU clock is fast enough that it doesn't matter, but...)  And you're giving that up JUST to have your ISRs be normal C functions (well, plus the tail optimization stuff, I guess.)


Does RISC-V implement something that allows the equivalent of ARM's "lazy" Floating point stacking?  (that's also "neat, but maybe slower than I'd want.")
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4037
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #35 on: March 31, 2023, 08:48:29 am »
Does RISC-V implement something that allows the equivalent of ARM's "lazy" Floating point stacking?  (that's also "neat, but maybe slower than I'd want.")

I don't know the details of whatever Arm does, but haven't such facilities been standard since ... the 68020 and 80386?

RISC-V provides the mechanisms to implement a number of different policies with regard to FP (and other) state. It does not dictate any particular policy. Saving and restoring FP state can each, independently, be eager or lazy.

Anyway, to quote:

----

3.1.6.6 Extension Context Status in mstatus Register

Supporting substantial extensions is one of the primary goals of RISC-V, and hence we define a standard interface to allow unchanged privileged-mode code, particularly a supervisor-level OS, to support arbitrary user-mode state extensions.

To date, the V extension is the only standard extension that defines additional state beyond the floating-point CSR and data registers.

The FS[1:0] and VS[1:0] WARL fields and the XS[1:0] read-only field are used to reduce the cost of context save and restore by setting and tracking the current state of the floating-point unit and any other user-mode extensions respectively. The FS field encodes the status of the floating-point unit state, including the floating-point registers f0–f31 and the CSRs fcsr, frm, and fflags. The VS field encodes the status of the vector extension state, including the vector registers v0–v31 and the CSRs vcsr, vxrm, vxsat, vstart, vl, vtype, and vlenb. The XS field encodes the status of additional user-mode extensions and associated state. These fields can be checked by a context switch routine to quickly determine whether a state save or restore is required. If a save or restore is required, additional instructions and CSRs are typically required to effect and optimize the process.

The design anticipates that most context switches will not need to save/restore state in either or both of the floating-point unit or other extensions, so provides a fast check via the SD bit.

The FS, VS, and XS fields use the same status encoding as shown in Table 3.3, with the four possible status values being Off, Initial (e.g. zeroed), Clean, and Dirty.

When an extension’s status is set to Off, any instruction that attempts to read or write the corresponding state will cause an illegal instruction exception. When the status is Initial, the corresponding state should have an initial constant value. When the status is Clean, the corresponding state is potentially different from the initial value, but matches the last value stored on a context swap. When the status is Dirty, the corresponding state has potentially been modified since the last context save.

During a context save, the responsible privileged code need only write out the corresponding state if its status is Dirty, and can then reset the extension’s status to Clean. During a context restore, the context need only be loaded from memory if the status is Clean.

----

FS is for the FPU. VS is for the Vector unit. XS is a summary of any additional standard (none now) or custom extensions that add state that needs to be context-switched. Each such extension will add additional status bits elsewhere.

As well as lazy save (only if the state is Dirty), an OS can choose whether to eagerly reload the state when switching back to a process (and set the state to Clean) or lazily (and set the state to Off) so that FP or Vector etc context is loaded only on the first execution of a relevant instruction.

FP state is 128 bytes for a single precision FPU, 256 bytes for a double precision FPU. Typically it is the same size as the integer register state, so no big deal if it is not done lazily.

However vector state is (on an application class processor) a minimum of 512 bytes and the most common size in the next few years is likely to be 1024 bytes, with 2048 bytes (512 bit vector registers) not uncommon.

The vector state is managed more aggressively than the FP state. The ABI specifies that vector registers are caller-save i.e. their contents are undefined after ANY function call, and this includes system calls. This enables the OS to set the vector state to Off or Initial before returning from any syscall. The vector state only needs to be saved and restored if a context switch happens as the result of an interrupt (e.g. 100 Hz system tick), and NOT if a context switch happens as a result of a system call blocking (e.g. I/O).

Also:

----

Changing the setting of FS has no effect on the contents of the floating-point register state. In particular, setting FS=Off does not destroy the state, nor does setting FS=Initial clear the contents. Similarly, the setting of VS has no effect on the contents of the vector register state. Other extensions, however, might not preserve state when set to Off.

----

So, it is also possible to implement a policy of not saving FP or Vector state when you switch away from a process, but simply set the unit to Off and then, when that process is later resumed, if no other process has used the FPU or VPU in the meantime then simply turn the FPU or VPU back on, without ever saving and restoring the state.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4037
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #36 on: March 31, 2023, 09:09:40 am »
One thing I would like to see is return stacks handled in hardware.

https://github.com/riscv/riscv-cfi/blob/main/cfi_backward.adoc

Nice!

TLDR summary:

Add PUSH_RA, POP_RA, and POP_CHECK_RA that push or pop the Return Address (aka Link Register) contents to a special RAM area (page(s)) with permissions that allow *only* these instructions to access it.

POP_CHECK_RA removes an item from the special hidden stack and traps if the value is not the same as what is already in LR.

On CPUs that don't implement this extension these instructions are all NOP.

Can use it in two ways:

1) on any CPU, new or old

Code: [Select]
foo:
    addi sp,sp,-16
    sw ra,12(sp)
    PUSH_RA
    :
    :
    lw ra,12(sp)
    addi sp,sp,12
    POP_CHECK_RA
    ret

On new CPUs this will trap if the return address has been tampered with on the stack. On old CPUs it will execute as always (just maybe a couple of cycles slower)

2) on new CPUs only

Code: [Select]
foo:
    PUSH_RA
    :
    :
    POP_RA
    ret

There is no possibility that the return address has been tampered with.
 
The following users thanked this post: SiliconWizard

Offline uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #37 on: March 31, 2023, 09:23:35 am »

But turning off "fast interrupts" and using the standard RISC-V __attribute__((interrupt)) is probably not slower, and will for sure use less stack (for this particular handler).

not really, using:

Code: [Select]
void EXTI7_0_IRQHandler(void) __attribute__((interrupt()));
void EXTI7_0_IRQHandler(void)
{
  GPIOD->BSHR = GPIO_Pin_4;
  GPIOD->BCR = GPIO_Pin_4;
  EXTI_ClearITPendingBit(EXTI_Line0);
}
results in

Code: [Select]
00000150 <EXTI7_0_IRQHandler>:
 150: fd810113          addi sp,sp,-40
 154: ca2a                sw a0,20(sp)
 156: c23a                sw a4,4(sp)
 158: c03e                sw a5,0(sp)
 15a: d206                sw ra,36(sp)
 15c: d016                sw t0,32(sp)
 15e: ce1a                sw t1,28(sp)
 160: cc1e                sw t2,24(sp)
 162: c82e                sw a1,16(sp)
 164: c632                sw a2,12(sp)
 166: c436                sw a3,8(sp)
 168: 400117b7          lui a5,0x40011
 16c: 4741                li a4,16
 16e: 40e7a823          sw a4,1040(a5) # 40011410 <__global_pointer$+0x20010be8>
 172: 40e7aa23          sw a4,1044(a5)
 176: 4505                li a0,1
 178: 2e09                jal 48a <EXTI_ClearITPendingBit>
 17a: 5092                lw ra,36(sp)
 17c: 5282                lw t0,32(sp)
 17e: 4372                lw t1,28(sp)
 180: 43e2                lw t2,24(sp)
 182: 4552                lw a0,20(sp)
 184: 45c2                lw a1,16(sp)
 186: 4632                lw a2,12(sp)
 188: 46a2                lw a3,8(sp)
 18a: 4712                lw a4,4(sp)
 18c: 4782                lw a5,0(sp)
 18e: 02810113          addi sp,sp,40
 192: 30200073          mret


 and 856 ns ~ 41 clock cycles @ 48 MHz
 
 while using:
 
Code: [Select]
void EXTI7_0_IRQHandler(void) __attribute__((interrupt("WCH-Interrupt-fast")));

 results in
 
Code: [Select]
00000150 <EXTI7_0_IRQHandler>:
 150: 400117b7          lui a5,0x40011
 154: 4741                li a4,16
 156: 40e7a823          sw a4,1040(a5) # 40011410 <__global_pointer$+0x20010be8>
 15a: 40e7aa23          sw a4,1044(a5)
 15e: 4505                li a0,1
 160: 2ced                jal 45a <EXTI_ClearITPendingBit>
 162: 30200073          mret

 and 420 ~ 20 clock cycles @ 48 MHz
 

Offline HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1477
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #38 on: March 31, 2023, 09:41:02 am »
I benchmarked the interrupt latency myself and put the results in another thread.

Summary: it's almost twice as fast on the CH32V003 with HPE hardware stacking enabled (0.87 us vs. 1.45 us ISR latency).

Although, admittedly that is for worst-case scenario where the ISR has to save all registers. Performance advantage is probably not so significant for an ISR that only has to save a couple.

So, uliano's results seem to correlate with mine. By the way, I presume you were measuring the period from one interrupt to the next? i.e. low pulse length.
 

Offline uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #39 on: March 31, 2023, 09:52:30 am »
Oh sorry I didn't noticed that!

I don't see this example as worst case though, does my ISR really need all the 10 registers?

I'm not into assembly enough to tell, but it would seem really strange for just setting & resetting a pin + clearing the flag.
« Last Edit: March 31, 2023, 10:24:55 am by uliano »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4037
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #40 on: March 31, 2023, 10:03:16 am »

But turning off "fast interrupts" and using the standard RISC-V __attribute__((interrupt)) is probably not slower, and will for sure use less stack (for this particular handler).

not really, using:

Yes, really.

Your 21 cycles of difference is pretty much precisely the time required to do 12 memory reads to fetch an extra 48 bytes of instructions at 4 bytes per fetch and 2 cycles per fetch because of the 1 wait state from flash at 48 MHz.

But the POINT of __attribute__((interrupt)) is that it does less work because it saves only what is needed to be saved. Which would only be a0, a4, and a5 if you hadn't called a standard C function, EXTI_ClearITPendingBit().

Get that function inlined, or as a macro, and see what happens.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4037
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #41 on: March 31, 2023, 10:07:31 am »
I don't see this example as worst case though, does my ISR really need all the 10 registers?

I'm not into arm assembly enough to tell, but it would seem really strange for just setting & resetting a pin + clearing the flag.

If the interrupt routine calls a standard ABI C function then it needs to save and restore everything, because it doesn't know what that C function might overwrite.

If you're going to do that then, yes, use the hardware stacking.

Inline everything it needs into the interrupt function and you can often save and restore far fewer registers and be faster.
 

Offline uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #42 on: March 31, 2023, 10:21:28 am »
And here we are

Code: [Select]
void EXTI7_0_IRQHandler(void) __attribute__((interrupt()));
void EXTI7_0_IRQHandler(void)
{
  GPIOD->BSHR = GPIO_Pin_4;
  GPIOD->BCR = GPIO_Pin_4;
  EXTI->INTFR = EXTI_Line0;
}

00000150 <EXTI7_0_IRQHandler>:
 150: 1161                addi sp,sp,-8
 152: c23a                sw a4,4(sp)
 154: c03e                sw a5,0(sp)
 156: 4741                li a4,16
 158: 400117b7          lui a5,0x40011
 15c: 40e7a823          sw a4,1040(a5) # 40011410 <__global_pointer$+0x20010be8>
 160: 40e7aa23          sw a4,1044(a5)
 164: 400107b7          lui a5,0x40010
 168: 4705                li a4,1
 16a: 40e7aa23          sw a4,1044(a5) # 40010414 <__global_pointer$+0x2000fbec>
 16e: 4712                lw a4,4(sp)
 170: 4782                lw a5,0(sp)
 172: 0121                addi sp,sp,8
 174: 30200073          mret

539ns ~ 26 clocks @ 48 MHz
 

Offline uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #43 on: March 31, 2023, 10:24:17 am »
however this is a best case more than the other being the worst, I mean usually ISR do something more...
 

Offline uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #44 on: March 31, 2023, 10:26:14 am »
I presume you were measuring the period from one interrupt to the next? i.e. low pulse length.

time elapsed between the pulse generated in main to the pulse generated in isr
 

Offline HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1477
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #45 on: March 31, 2023, 10:42:29 am »
Inline everything it needs into the interrupt function and you can often save and restore far fewer registers and be faster.

I suppose that's one argument to be made against using the vendor's SPL/HAL libraries - usage invariably involves calling functions for everything. As opposed to a more bare metal approach, just using registers directly.
 

Offline HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1477
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #46 on: March 31, 2023, 10:47:31 am »
539ns ~ 26 clocks @ 48 MHz

Ah, interesting. Even a favourable case is still slightly slower. But this is chasing nanoseconds now, haha.

I was going to re-run my benchmark with such a scenario, but I don't really need to bother now. :)
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4037
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #47 on: March 31, 2023, 11:13:35 am »
539ns ~ 26 clocks @ 48 MHz

Ah, interesting. Even a favourable case is still slightly slower. But this is chasing nanoseconds now, haha.

I was going to re-run my benchmark with such a scenario, but I don't really need to bother now. :)

He's running at 48 MHz where instruction fetches take 2 clock cycles instead of 1 at your 24 MHz.

It's worth checking at 24 MHz. I'm going for 0.6 µs, a little faster than the HPE case :-)
« Last Edit: March 31, 2023, 11:18:47 am by brucehoult »
 

Offline uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #48 on: March 31, 2023, 11:17:27 am »
2 clock per instruction or per 32 bits fetched from flash? It seems to me that stack related instructions here are 16 bit long.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4037
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #49 on: March 31, 2023, 11:27:11 am »
2 clock per instruction or per 32 bits fetched from flash? It seems to me that stack related instructions here are 16 bit long.

Per 32 bits.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf