Author Topic: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data  (Read 6029 times)

0 Members and 1 Guest are viewing this topic.

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
I've run into a bizarre problem with the SysTick timer interrupt on my CH32V003 project. Basically, what's happening is that a certain section of code being executed inside the SysTick ISR is somehow causing data transmitted on the UART to be corrupted. Yeah, I know, sounds crazy! But that is what I've painstakingly narrowed it down to. I am at a loss as to how this is occurring, and how to proceed in fixing it.

The problem also only occurs when my code is compiled with -Os, and not when compiled in debug mode with -g (and no -O flags).

The UART corruption takes the form of a character being corrupted roughly every N characters. For example, a string of "-----------------------------" might become "------------<-----------<----". The corrupted character is random (i.e. not a bit-flip of the correct character) but always the same at each instance. What's interesting - and this is what put me on to the trail of the SysTick ISR - is that I noticed that N is a value such that, given the baud rate the UART is operating at, it roughly corresponds to the regular period that the SysTick ISR is being called. In my case, baud is 115200, the bit period is 8.68 us, so each character takes 86.8 us (8N1, so 10 bits per character), and if N = 12 (as per example above), that's 1041 us - approximately matching the 1 ms period of the ISR!

This is my ISR's code:

Code: [Select]
extern void SysTick_Handler(void) __attribute__((naked));

void SysTick_Handler(void) {
const uint32_t ticks_orig = sys_timestamp.ticks;
if(++sys_timestamp.ticks < ticks_orig) sys_timestamp.epochs++;

/**** THIS BLOCK IS THE CULPRIT (when commented, problem disappears) ****/
for(size_t i = 0; i < interval_calls.count; i++) {
if(--interval_calls.calls[i].interval_remaining_ms == 0) {
interval_calls.calls[i].callback();
interval_calls.calls[i].interval_remaining_ms = interval_calls.calls[i].interval_ms;
}
}

SysTick->SR = 0;

interrupt_return(); /* macro for "mret" inline assembly instruction */
}

The SysTick timer is set up with this code:

Code: [Select]
void SysTick_Init(const uint32_t hclk_freq_hz) {
SysTick->CMP = (hclk_freq_hz / 1000) - 1;
SysTick->CNT = 0;
SysTick->SR = 0;
SysTick->CTLR = STK_STRE | STK_STCLK | STK_STIE | STK_STE;
interrupt_enable(SysTicK_IRQn); /* macro to enable IRQ in PFIC->IENR */
}

The ISR code is just some straightforward code that increments a timestamp, but also has a rudimentary 'callback' system that will call any registered functions at a given interval in milliseconds. I use it to sample a GPIO for button input every 20 ms.

What's stupid is that even when no interval callbacks are registered (i.e. interval_calls.count = 0, verified by dumping that value with GDB) and the loop body should not be executing, the problem still occurs, so it's not like it's inadvertently calling some random function pointer and jumping of into who-knows-where and clobbering the UART registers or something.

In trying to get some insight into this by debugging with GDB, there's also something wacky going on there concerning the ISR that I can't explain. According to the disassembly produced by objdump (from the compiled .elf), SysTick_Handler starts at address 0xf5c, yet GDB thinks SysTick_Handler starts at 0xf66 if I set a breakpoint by name. If I set a breakpoint by address at 0xf5c, it doesn't get hit! [False alarm, somehow I managed to set the breakpoint to a disabled state.] I don't know WTF is going on there, but I have a feeling it might be part of the problem.

Here's the disassembly of the ISR in case it helps:

Code: [Select]
00000f5c <SysTick_Handler>:
     f5c: 82818793          addi a5,gp,-2008 # 20000028 <sys_timestamp>
     f60: 43d4                lw a3,4(a5)
     f62: 43d8                lw a4,4(a5)
     f64: 0705                addi a4,a4,1 # 1001 <uart_init+0x53>
     f66: c3d8                sw a4,4(a5)
     f68: 00d77963          bgeu a4,a3,f7a <SysTick_Handler+0x1e>
     f6c: 0007c703          lbu a4,0(a5)
     f70: 0705                addi a4,a4,1
     f72: 0ff77713          zext.b a4,a4
     f76: 00e78023          sb a4,0(a5)
     f7a: 83c18413          addi s0,gp,-1988 # 2000003c <interval_calls>
     f7e: 4481                li s1,0
     f80: 8722                mv a4,s0
     f82: 431c                lw a5,0(a4)
     f84: 00f4e863          bltu s1,a5,f94 <SysTick_Handler+0x38>
     f88: e000f7b7          lui a5,0xe000f
     f8c: 0007a223          sw zero,4(a5) # e000f004 <__global_pointer$+0xc000e804>
     f90: 30200073          mret
     f94: 441c                lw a5,8(s0)
     f96: 17fd                addi a5,a5,-1
     f98: c41c                sw a5,8(s0)
     f9a: e799                bnez a5,fa8 <SysTick_Handler+0x4c>
     f9c: 445c                lw a5,12(s0)
     f9e: 9782                jalr a5
     fa0: 405c                lw a5,4(s0)
     fa2: 83c18713          addi a4,gp,-1988 # 2000003c <interval_calls>
     fa6: c41c                sw a5,8(s0)
     fa8: 0485                addi s1,s1,1
     faa: 0431                addi s0,s0,12
     fac: bfd9                j f82 <SysTick_Handler+0x26>

I don't know where to proceed with this. :( Any suggestions for avenues of attack?
« Last Edit: March 29, 2023, 03:07:09 am by HwAoRrDk »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #1 on: March 29, 2023, 03:44:01 am »
Presumably you have enabled their "fast interrupt" register stacking thing.

I worry a little about calling a lot of code from that state, but haven't really thought about it.

I absolutely worry about the timing implications of calling an arbitrary amount of code from an interrupt handler. I think you should increment the tick count there and nothing else. Put the callback scheduling stuff in non-interrupt code.
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #2 on: March 29, 2023, 05:09:21 am »
Presumably you have enabled their "fast interrupt" register stacking thing.

Yes, hardware prologue/epilogue (HPE) is being used, but not vector table-free (VTF).

I worry a little about calling a lot of code from that state, but haven't really thought about it.

I absolutely worry about the timing implications of calling an arbitrary amount of code from an interrupt handler. I think you should increment the tick count there and nothing else. Put the callback scheduling stuff in non-interrupt code.

The callback stuff is only really an abstraction to allow me to separate code into different files rather than pile stuff into the ISR. Technically it could be calling an arbitrary amount of code, but in practice it isn't, and I know not to use it like that. It's only a very small amount of code in the callbacks. I started out with two: one to toggle an LED every 500 ms (kind of a 'heartbeat'); a second to sample a GPIO for a button every 20 ms. Not exactly taxing. :) Got rid of the former and now I have just the latter.

As I mentioned before, everything works just fine when compiled without -Os. So something about the optimisation is screwing things up, or perhaps revealing an unseen issue.

One further thing I've noticed when attempting to investigate with GDB, is that the corrupted characters appear after the ISR returns. That is, if my code is printing a long string of characters, it happens when the ISR interrupts execution somewhere inside my putchar()-like function that writes a single character to the UART data register. But I couldn't get a handle on what exactly was being affected or how.

Aside: I am finding it quite hard to read RISC-V assembly, as the ordering of operands is not consistent. Sometimes the destination is listed first (e.g. "lw a4,4(a5)"), sometimes last (e.g. "sw a4,4(a5)"). Very confusing. Why is that?
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #3 on: March 29, 2023, 07:00:02 am »
As I mentioned before, everything works just fine when compiled without -Os. So something about the optimisation is screwing things up, or perhaps revealing an unseen issue.

I think the problem is probably in the code that gets interrupted.

This code looks fine, other than not being as optimised as I'd like. For example, loading from 4(a5) in both the 2nd and 3rd instructions, which could result in different values, which could break your code.

I'd expect the optimiser to combine those loads. If it hasn't then I imagine it is because
Code: [Select]
sys_timestamp.ticks is declared as volatile in code we don't see. In which case it is your responsibility to make sure you load it only once.

In fact in general that you don't factor out things such as
Code: [Select]
interval_calls.calls[i] in your code scares me.


Quote
Aside: I am finding it quite hard to read RISC-V assembly, as the ordering of operands is not consistent. Sometimes the destination is listed first (e.g. "lw a4,4(a5)"), sometimes last (e.g. "sw a4,4(a5)"). Very confusing. Why is that?

"store" is the *only* exception, I imagine for mental compatibility with every other ISA that has instructions called "load" and "store", not "move" or both load and store being "ld". ARM for example. Or MIPS.
« Last Edit: March 29, 2023, 07:01:39 am by brucehoult »
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #4 on: March 29, 2023, 07:53:58 am »
I think the problem is probably in the code that gets interrupted.

That's a thought I had when I realised the corruption happens at the point the ISR has returned back into UART code. But the code in my uart_putchar() is really simple, such that I don't see how it could go wrong.

Code: [Select]
int uart_putchar(int c) {
if(!uart_opts.binary && c == '\n') uart_putchar('\r');
while(!(USART1->STATR & USART_STATR_TXE));
USART1->DATAR = c & USART_DATAR_DR;
return c;
}

Yes, I know, a recursive call. But it's inherently bounded, so sue me. ;D

Maybe I need to figure out how to get GDB to break on every write to USART1->DATAR, and see what value is being written to the register. Perhaps that will shed some light on whether the corruption comes from the character to be transmitted being corrupted before being written to USART1->DATAR.

This code looks fine, other than not being as optimised as I'd like. For example, loading from 4(a5) in both the 2nd and 3rd instructions, which could result in different values, which could break your code.

I'd expect the optimiser to combine those loads. If it hasn't then I imagine it is because
Code: [Select]
sys_timestamp.ticks is declared as volatile in code we don't see. In which case it is your responsibility to make sure you load it only once.

Indeed, sys_timestamp is volatile. Not sure what you mean by "your responsibility to make sure you load it only once". You mean in case it changes in-between accesses? That won't happen, because the SysTick_Handler ISR is the only thing that ever writes to it - after all, that's its job, and no-where else's.

In fact in general that you don't factor out things such as
Code: [Select]
interval_calls.calls[i] in your code scares me.

Again, not sure what you mean by this. You mean "factor out" as in make a pointer instead?

Code: [Select]
for(size_t i = 0; i < interval_calls.count; i++) {
interval_call_t *c = &interval_calls.calls[i];
if(--c->interval_remaining_ms == 0) {
c->callback();
c->interval_remaining_ms = c->interval_ms;
}
}

How would that have a meaningful change to things? It's not like i is going to change mid-loop... :-//

"store" is the *only* exception, I imagine for mental compatibility with every other ISA that has instructions called "load" and "store", not "move" or both load and store being "ld". ARM for example. Or MIPS.

It is? Ah, well, I'm not familiar with ARM, or MIPS, or x86, etc. Still, it's a bit taxing mentally when every third instruction in RISC-V code seems to be a load or store of some kind. I suppose I'll get used to it eventually. :)
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #5 on: March 29, 2023, 09:34:14 am »
This code looks fine, other than not being as optimised as I'd like. For example, loading from 4(a5) in both the 2nd and 3rd instructions, which could result in different values, which could break your code.

I'd expect the optimiser to combine those loads. If it hasn't then I imagine it is because
Code: [Select]
sys_timestamp.ticks is declared as volatile in code we don't see. In which case it is your responsibility to make sure you load it only once.

Indeed, sys_timestamp is volatile. Not sure what you mean by "your responsibility to make sure you load it only once". You mean in case it changes in-between accesses? That won't happen, because the SysTick_Handler ISR is the only thing that ever writes to it - after all, that's its job, and no-where else's.

Yes. Because it is declared volatile the compiler has to assume it could change between accesses, so can't combine them the way it normally would (and the way it does for interval_calls.calls[i]. Loading it twice is just a waste of time.

The code is also loading interval_calls.count every time around the loop, which is presumably also completely unnecessary. It's even having to recalculate the address of interval_calls each time around the loop, after the possible call to callback(), because there are only two S registers in RV32E and they are used up by i and the address of interval_calls.calls\[i\].

Quote
In fact in general that you don't factor out things such as interval_calls.calls[i] in your code scares me.

Again, not sure what you mean by this. You mean "factor out" as in make a pointer instead?

Code: [Select]
for(size_t i = 0; i < interval_calls.count; i++) {
interval_call_t *c = &interval_calls.calls[i];
if(--c->interval_remaining_ms == 0) {
c->callback();
c->interval_remaining_ms = c->interval_ms;
}
}

How would that have a meaningful change to things? It's not like i is going to change mid-loop... :-//

Well no it's not. And the compiler does essentially that rewrite for you anyway.  I just find it a lot easier to write, read, and think about this way. In fact I'd go further.

Code: [Select]
        interval_call_t *c = interval_calls.calls;
        size_t limit = interval_calls.count;
for(size_t i = 0; i < limit; i++, c++) {
if(--c->interval_remaining_ms == 0) {
c->callback();
c->interval_remaining_ms = c->interval_ms;
}
}

I appreciate this is just a style thing, not an issue of the code being correct or not. The compiler will work hard for you and figure out that you're using the same element of interval_calls.calls every time.

But ... the compiler has to actually do some work to analyse the code to see that. And so does the human reader.
« Last Edit: March 29, 2023, 12:07:16 pm by brucehoult »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #6 on: March 29, 2023, 09:36:26 am »
Ok, I give up. How are you supposed to escape an array access using the variable i, outside of an actual code block, without this BBS thinking you want to use italics?
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 3240
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #7 on: March 29, 2023, 11:37:46 am »
Ok, I give up. How are you supposed to escape an array access using the variable i, outside of an actual code block, without this BBS thinking you want to use italics?

You enclose the part you don't want the buletin board codes to be active in nobbc tags  [nobbc] your stuff  [/nobbc]

e.g.

interval_calls.calls[i].interval_remaining_ms = interval_calls.calls[i].interval_ms;

IMO The dodgiest part of the systick code is that there is no check for a null pointer.


Quote
The corrupted character is random (i.e. not a bit-flip of the correct character) but always the same at each instance

Does this mean the corruption always result in the same (incorrect) character being printed, or something else?  I read this a few times and it didn't immediately make this clear.

Checking the assembly listing of the uart_putchar() function may be instructive.  Disabling interrupts around each operation in this function, one at a time, may help to narrow down the exact part of the code causing issues.
« Last Edit: March 29, 2023, 11:50:43 am by mikerj »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #8 on: March 29, 2023, 12:10:59 pm »
Checking the assembly listing of the uart_putchar() function may be instructive.

It is certainly possible that while(!(USART1->STATR & USART_STATR_TXE)); might cause problems with optimisation turned on, if USART1->STATR is not declared volatile.
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 825
  • Country: es
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #9 on: March 29, 2023, 03:49:16 pm »
Try removing the “naked” attribute (and probably, disabling HPE) just to be sure it works as expected and doesn’t clobber some reg at the “right” moment.
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 3240
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #10 on: March 29, 2023, 04:09:22 pm »
Try removing the “naked” attribute (and probably, disabling HPE) just to be sure it works as expected and doesn’t clobber some reg at the “right” moment.

Well spotted, I missed the function attribute.  Shouldn't the "interrupt" attribute be used here rather than "naked"?

https://gcc.gnu.org/onlinedocs/gcc/RISC-V-Function-Attributes.html
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5913
  • Country: es
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #11 on: March 29, 2023, 04:21:30 pm »
Just why the recursive thing?
As stated, try enforcing volatile access.
Code: [Select]
int uart_putchar_(int c) {
        while(!((volatile)USART1->STATR & USART_STATR_TXE));
USART1->DATAR = c & USART_DATAR_DR;
return c;
}
int uart_putchar(int c) {
if(!uart_opts.binary && c == '\n')
            uart_putchar_('\r');
        return uart_putchar_(c);
}
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline jnk0le

  • Contributor
  • Posts: 41
  • Country: pl
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #12 on: March 29, 2023, 06:30:59 pm »
Quote
This is my ISR's code:

Code: [Select]

extern void SysTick_Handler(void) __attribute__((naked));

void SysTick_Handler(void) {
   //...
   interrupt_return(); /* macro for "mret" inline assembly instruction */
}

According to the v003 core datasheet [1], par 3.4; it stacks only the caller saved registers.

And because the `naked` attribute removes prologue/epilogue entirely, you get corruption due to use of callee saved registers.


[1] http://www.wch-ic.com/downloads/QingKeV2_Processor_Manual_PDF.html
 

Offline AVI-crak

  • Regular Contributor
  • *
  • Posts: 125
  • Country: ru
    • Rtos
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #13 on: March 29, 2023, 06:59:26 pm »
C-style "callback" for interrupt is a potential bug hazard. In your case, CH32V003 does not explicitly save the program context before calling the break, while sharing static variables (via "callback") within functions can generate errors.
About UART.
The algorithm becomes extremely simple when sending the entire string (with a trailing zero), and much more complicated when sending one character at a time.
You might be interested https://github.com/AVI-crak/Rtos_cortex/blob/master/sPrint.h
One macro printo() combines many functions, if something is not used - it will be removed automatically.
printo() uses the GCC _Generic((X) macro, so all preparation for printing is done at the compilation stage. No additional effort is required, up to 9 variables or constants in the parameter of each macro.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #14 on: March 29, 2023, 07:32:37 pm »
Try removing the “naked” attribute (and probably, disabling HPE) just to be sure it works as expected and doesn’t clobber some reg at the “right” moment.

Well spotted, I missed the function attribute.  Shouldn't the "interrupt" attribute be used here rather than "naked"?

https://gcc.gnu.org/onlinedocs/gcc/RISC-V-Function-Attributes.html

The "interrupt" attribute is for standard RISC-V cores. It causes the compiler to save all the registers it uses, not only any S registers it uses.

But the OP is using the custom "fast interrupt" mode, which saves registers in hardware before calling the interrupt function. WCH's supplied compiler has a different __attribute __ ((interrupt("WCH-Interrupt-fast"))) for this. If you're not using their compiler then using a naked function with an mret forced at the end *should* be equivalent.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #15 on: March 29, 2023, 07:40:53 pm »
According to the v003 core datasheet [1], par 3.4; it stacks only the caller saved registers.

Ohhh! That is not generally known. Certainly wasn't to me!

Most people who are using the fast interrupt feature are also using their compiler, which presumably does the right thing.

So, to use a standard toolchain, what you need to do with any interrupt function complex enough to need to use the S registers (the most obvious being if you call some other function from it), the way to do it with a generic RISC-V toolchain is:

Code: [Select]
void SysTick_Handler_inner(void){
    ... do all the stuff here
}

__attribute__((naked))
void SysTick_Handler(void){
    SysTick_Handler_inner();
    asm volatile ("mret");
}

Edit after thinking:

No, that's stupid.  Simply use a normal function (as on ARMv7-M), but force mret instead of ret.

Code: [Select]
void SysTick_Handler(void){
    ... do all the stuff here
    asm volatile ("mret");
    __builtin_unreachable(); // suppress the usual ret
}

No attributes needed at all. Any S registers used will be saved and restored in software, the others won't be (and don't need to be).

But turning off "fast interrupts" and using the standard RISC-V __attribute__((interrupt)) is probably not slower, and will for sure use less stack (for this particular handler).
« Last Edit: March 29, 2023, 10:03:01 pm by brucehoult »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #16 on: March 29, 2023, 08:23:18 pm »
Ok, so back after reading the manual for this chip.

The __attribute __ ((interrupt("WCH-Interrupt-fast"))) saves 10 registers to the stack. The normal stack in RAM. I'd assuming when hearing about the feature and that it supports 2-deep interrupt nesting that they'd put a 2nd and 3rd register set in the CPU and it would just be BOOM single cycle or something and you're done.

So, actually, saving to the stack in RAM won't be all that much faster than doing the saving in software, and if you have a small interrupt handler that only needs a couple of registers then using the standard RISC-V __attribute__((intterrupt)) that saves only what it uses will probably be faster.

The 10 registers saved:

x1: ra (only actually useful if your interrupt routine calls other functions)

x5-x7: t0-t2

x10-x15: a0-a5

So, the OP's disassembled code changes: ra, a3, a4, a5, s0, s1

The hardware stacking is needlessly saving and restoring: t0, t1, t2, a0, a1, a2

The hardware stacking is NOT saving: s0, s1

The OP would be just as well off turning off the "fast interrupt" feature and using standard RISC-V __attribute__((interrupt)).

NB: I'm assuming here that the compiler is using the RV32E ABI that is simply RV32I truncated to 16 registers. I haven't checked the hex opcodes against the disassembled instructions to check.

There is a proposed EABI that should give considerably better code in many cases, mainly by increasing the S registers to 5 instead of 2 (this code, as compiled, really wants to have 3 S registers, and has to re-create the contents of a4 because it can't use an s2 instead) by cutting down to 2 T registers and 4 A registers (the same as 32 bit ARM, so should be enough for ported software).

Standard ABI vs EABI for x5-x15:

t0, t1, t2, s0, s1, a0, a1, a2, a3, a4, a5
t0, s3, s4, s0, s1, a0, a1, a2, a3, s2, t1

Hardware stacking for the EABI would only need to stack x5, x10-x13, x15 i.e. 6 registers instead of 10.  Which just goes to show WCH are not using it.

https://github.com/riscv-non-isa/riscv-eabi-spec/blob/master/EABI.adoc
 

Offline jnk0le

  • Contributor
  • Posts: 41
  • Country: pl
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #17 on: March 29, 2023, 09:19:57 pm »
Quote
No, that's stupid.  Simply use a normal function (as on ARMv7-M), but force mret instead of ret.

Code: [Select]

void SysTick_Handler(void){
    ... do all the stuff here
    asm volatile ("mret");
}


No attributes needed at all. Any S registers used will be saved and restored in software, the others won't be (and don't need to be).

You will end up with an unreachable ret after the mret but it's only 2 bytes so NDB.

That will break when anything is allocated on stack.
To get rid of epilogue after inline asm, you can use `__builtin_unreachable();`

Quote
The __attribute __ ((interrupt("WCH-Interrupt-fast"))) saves 10 registers to the stack. The normal stack in RAM. I'd assuming when hearing about the feature and that it supports 2-deep interrupt nesting that they'd put a 2nd and 3rd register set in the CPU and it would just be BOOM single cycle or something and you're done.
Thats in V3 and V4 of their core version.


Quote
Hardware stacking for the EABI would only need to stack x5, x10-x13, x15 i.e. 6 registers instead of 10.  Which just goes to show WCH are not using it.
There is EABIEN bit in INTSYSCR.
Not present on V4 so that is most probably pushing eabi register set. (less cycles)
Of course its functionality is not documented at all.
« Last Edit: March 29, 2023, 09:28:44 pm by jnk0le »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #18 on: March 29, 2023, 10:28:00 pm »
That will break when anything is allocated on stack.

Ah yeah, crud.  Even the saving of s0 and s1 will do that.

And you can't even call another function with the main guts in from a standard function, because saving the return address will also do it.

Code: [Select]
void bar(int *i);

__attribute__((noinline))
void my_handler_inner(){
    int xxx[20];
    for (int i=0; i<10; ++i)
        bar(&xxx[i]);
}

__attribute__((naked))
void my_handler(){
    my_handler_inner();
    asm volatile ("mret");
    //__builtin_unreachable();
}

This works on gcc. Clang complains about the call: "error: non-ASM statement in naked function is not supported". Using `asm volatile ("call my_handler_inner")` makes both gcc and clang happy.

https://godbolt.org/z/Kv7dhr7G8

The caller MUST be naked, otherwise it will allocate a stack frame and save ra but never deallocate the stack space.

The called function must NOT be inlined, otherwise any stack it uses (e.g. to save s0 or s1 or to allocate an array) will also never be deallocated.


OR just turn fast interrupts off (don't turn it on) and use the standard __attribute__((interrupt))

OR use WCH's recommended IDE and compiler and use their __attribute __ ((interrupt("WCH-Interrupt-fast")))
« Last Edit: March 29, 2023, 10:48:34 pm by brucehoult »
 

Offline jnk0le

  • Contributor
  • Posts: 41
  • Country: pl
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #19 on: March 29, 2023, 11:20:09 pm »
well, the issues that were raised in this topic are the ones I have already proposed solutions for in various places.
You can have a look at 1.4.1 of the "total-embedded/Xteic" spec that I'm doing right now. It should be much cleaner than current approach leading to a lot of outdated (of course) toolchain builds for every new interrupt controller.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #20 on: March 29, 2023, 11:29:38 pm »
well, the issues that were raised in this topic are the ones I have already proposed solutions for in various places.
You can have a look at 1.4.1 of the "total-embedded/Xteic" spec that I'm doing right now. It should be much cleaner than current approach leading to a lot of outdated (of course) toolchain builds for every new interrupt controller.

Interesting document. I haven't previously seen it (or heard of you).

I've only taken a very quick glance at it so far, but I'm amused you quote me in it :-)
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #21 on: March 30, 2023, 01:29:02 am »
Does this mean the corruption always result in the same (incorrect) character being printed, or something else?  I read this a few times and it didn't immediately make this clear.

Yes. I could probably have worded that better. For example, when the corrupted character occurs, each instance will always be a '<', but whether it is '<' or '|' or whatever is random. And the corrupted character bears no apparent relation to the correct character.

Checking the assembly listing of the uart_putchar() function may be instructive.

Here it is:

Code: [Select]
00001034 <uart_putchar>:
    1034: 86cff2ef          jal t0,a0 <__riscv_save_0>
    1038: 1ffff797          auipc a5,0x1ffff
    103c: fc97c783          lbu a5,-55(a5) # 20000001 <uart_opts+0x1>
    1040: 842a                mv s0,a0
    1042: e791                bnez a5,104e <uart_putchar+0x1a>
    1044: 47a9                li a5,10
    1046: 00f51463          bne a0,a5,104e <uart_putchar+0x1a>
    104a: 4535                li a0,13
    104c: 37e5                jal 1034 <uart_putchar>
    104e: 40014737          lui a4,0x40014
    1052: 80075783          lhu a5,-2048(a4) # 40013800 <__global_pointer$+0x20013000>
    1056: 0807f793          andi a5,a5,128
    105a: dfe5                beqz a5,1052 <uart_putchar+0x1e>
    105c: 1ff47793          andi a5,s0,511
    1060: 80f71223          sh a5,-2044(a4)
    1064: 8522                mv a0,s0
    1066: 844ff06f          j aa <__riscv_restore_0>
...

Not sure what the "..." at the end represents. Perhaps it means there's a gap to the next function, which is at 0x106c. For alignment?
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #22 on: March 30, 2023, 01:40:15 am »
It is certainly possible that while(!(USART1->STATR & USART_STATR_TXE)); might cause problems with optimisation turned on, if USART1->STATR is not declared volatile.

I checked, all the members of USART1 struct are volatile.

I am using the WCH-supplied headers, so one should well hope the peripheral registers are all marked volatile, otherwise they've got big problems with their own HAL/SPL! ;D Not that they are completely free of mistakes, though. I have had to make several corrections - only 4 EXTI were defined for AFIO_EXTICR; RCC_CFGR0 had incorrect definitions for non-existent APB1 & 2 prescalers, but not for extended ADC prescaler.
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #23 on: March 30, 2023, 02:20:14 am »
So, actually, saving to the stack in RAM won't be all that much faster than doing the saving in software

I can't remember which thread it was or who did it, but there was someone here on the EEVblog forum that did some benchmarking of regular vs. HPE vs. VTF interrupts and found that HPE did make a significant different in ISR latency, but VTF added basically nothing.

The hardware stacking is NOT saving: s0, s1

Ohhhhhhhhhh... Yes, I think you're right! |O

And it explains why commenting out the for-loop dealing with the callbacks fixes the problem - it removes a line of code that sets register s0 to the base memory address of the interval_calls struct:

Code: [Select]
addi s0,gp,-1988 # 2000003c <interval_calls>

So even though interval_calls.count equals zero, and the loop body does not execute, it's too late, s0 has already been clobbered. And when the ISR returns back into uart_putchar(), guess what code is used to write the character to be transmitted to the UART data register?

Code: [Select]
andi a5,s0,511
sh a5,-2044(a4)

Yup, it takes the character value from s0 (previously copied there from the function argument in a0), masks it, then writes to USART1->DATAR. (It also later copies s0 back to a0 for the function return value, but that didn't affect anything as I'm not using the return value.)

So, if I am understanding correctly, to solve my problem my options here are as follows:

- Switch to WCH's version of compiler (IIRC, GCC 8.x-based) and use __attribute__((interrupt("WCH-Interrupt-fast"))) on my ISRs.
- Turn off HPE and use regular __attribute__((interrupt)) on my ISRs.

In order to do the latter, I think I need to modify this part of the startup assembly code:

Code: [Select]
li t0, 0x3
csrw 0x804, t0

Changing the immediate value to 0x2 on the first instruction, so that it doesn't set the HWSTKEN bit 0, leaving HPE disabled.

Do I still need to use mret to return from the ISR with __attribute__((interrupt))? (And perhaps use __builtin_unreachable() to get rid of the redundant regular ret, as suggested.) I see I do not, __attribute__((interrupt)) generates mret anyway.

I could switch to the WCH compiler. In their 'community' distribution you can just extract the RISC-V GCC toolchain on its own from the installer archive. That's what I did to get their custom OpenOCD. Might be worth doing, as it would also mean I can take advantage of their proprietary 'XW' extensions to the compact instruction set.
« Last Edit: March 30, 2023, 02:33:19 am by HwAoRrDk »
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #24 on: March 30, 2023, 06:30:48 am »
Yup, it takes the character value from s0 (previously copied there from the function argument in a0), masks it, then writes to USART1->DATAR.

Y'know, I've just noticed that there is evidence that this is almost certainly what was happening, because there is a correlation between the address that was being stored in s0 in SysTick_Handler(), and the corrupted character being transmitted on the UART in uart_putchar().

What value was being written to s0? 0x2000003c. What character was showing up? The '<' character, which has an ASCII value of 0x3c. And what is the LSB of s0? Same!

The fact I was seeing a different corrupt character from time to time must have been because I was re-compiling the code with changes and making the interval_calls variable reside at different memory addresses.

With regard to what the WCH compiler must be doing with its custom __attribute__((interrupt("WCH-Interrupt-fast"))), am I right in thinking that it is basically just changing the behaviour of the regular 'interrupt' attribute to save only s0 & s1 on the stack (when necessary), rather than additionally all the rest that HPE instead does automatically (x1, x5-7, x10-15)? I might try the WCH compiler and compare the ISR code that is generated.

Anyway, I modified the startup assembly code as I previously mentioned, to disable HPE, and changed my code to use __attribute__((interrupt)) on the ISRs, and now everything is back working again. :-+

Thank you everyone for your help. I probably wouldn't have got to the bottom of this without it.
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #25 on: March 30, 2023, 06:39:35 am »
I can't remember which thread it was or who did it, but there was someone here on the EEVblog forum that did some benchmarking of regular vs. HPE vs. VTF interrupts and found that HPE did make a significant different in ISR latency, but VTF added basically nothing.
It might have been me, I haven't seen other similar benchmarks.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #26 on: March 30, 2023, 07:09:49 am »
I can't remember which thread it was or who did it, but there was someone here on the EEVblog forum that did some benchmarking of regular vs. HPE vs. VTF interrupts and found that HPE did make a significant different in ISR latency, but VTF added basically nothing.
It might have been me, I haven't seen other similar benchmarks.

Note that those tests appear to have been using a bigger core (CH32V307 is in the post title), which probably does have on-chip duplicate register sets for the hardware register save/retore, not dumping them to the stack in RAM like the CH32V003 is doing.

So the results could be very different on the smaller core.
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #27 on: March 30, 2023, 08:06:43 am »
It might have been me, I haven't seen other similar benchmarks.

Ah, yes, that was what I was thinking of. Thank you.

Note that those tests appear to have been using a bigger core (CH32V307 is in the post title), which probably does have on-chip duplicate register sets for the hardware register save/retore, not dumping them to the stack in RAM like the CH32V003 is doing.

I think you're right. The CH32V003 (QingKeV2) CPU manual says this regarding HPE: "V2 series microprocessors, support hardware to automatically save 10 of the shaped Caller Saved registers to the user stack area", whereas the CH32V307 (QingKeV4) manual says "The V4 series microprocessors support hardware single cycle automatic saving of 16 of the shaped Caller Saved registers to an internal stack area that is not visible to the user".

Maybe I shall try and replicate newbrain's benchmark test on the CH32V003 sometime.

Anyway, I compiled my code with WCH's compiler (GCC 8.2-based) with HPE enabled and using 'WCH-Interrupt-fast', plus also specifying -march=rv32ecxw, and for that my SysTick_Handler ISR disassembly looks like this:

Code: [Select]
000014d0 <SysTick_Handler>:
    14d0: 84418713          addi a4,gp,-1980 # 20000044 <sys_timestamp>
    14d4: 4354                lw a3,4(a4)
    14d6: 435c                lw a5,4(a4)
    14d8: 1161                addi sp,sp,-8
    14da: c222                sw s0,4(sp)
    14dc: 0785                addi a5,a5,1
    14de: c026                sw s1,0(sp)
    14e0: 84f1a423          sw a5,-1976(gp) # 20000048 <sys_timestamp+0x4>
    14e4: 00d7f663          bgeu a5,a3,14f0 <SysTick_Handler+0x20>
    14e8: 431c                lw a5,0(a4)
    14ea: 0785                addi a5,a5,1
    14ec: 84f1a223          sw a5,-1980(gp) # 20000044 <sys_timestamp>
    14f0: 85818413          addi s0,gp,-1960 # 20000058 <interval_calls>
    14f4: 4481                li s1,0
    14f6: 8722                mv a4,s0
    14f8: 431c                lw a5,0(a4)
    14fa: 00f4eb63          bltu s1,a5,1510 <SysTick_Handler+0x40>
    14fe: 4412                lw s0,4(sp)
    1500: e000f7b7          lui a5,0xe000f
    1504: 0007a223          sw zero,4(a5) # e000f004 <__global_pointer$+0xc000e804>
    1508: 4482                lw s1,0(sp)
    150a: 0121                addi sp,sp,8
    150c: 30200073          mret
    1510: 441c                lw a5,8(s0)
    1512: 17fd                addi a5,a5,-1
    1514: c41c                sw a5,8(s0)
    1516: e799                bnez a5,1524 <SysTick_Handler+0x54>
    1518: 445c                lw a5,12(s0)
    151a: 9782                jalr a5
    151c: 405c                lw a5,4(s0)
    151e: 85818713          addi a4,gp,-1960 # 20000058 <interval_calls>
    1522: c41c                sw a5,8(s0)
    1524: 0485                addi s1,s1,1
    1526: 0431                addi s0,s0,12
    1528: bfc1                j 14f8 <SysTick_Handler+0x28>

Compare to using mainline GCC 12.2 with no HPE, regular 'interrupt' attribute, and -march=rv32ec_zicsr:

Code: [Select]
0000163c <SysTick_Handler>:
    163c: 7179                addi sp,sp,-48
    163e: c03e                sw a5,0(sp)
    1640: 84018793          addi a5,gp,-1984 # 20000040 <sys_timestamp>
    1644: c436                sw a3,8(sp)
    1646: c23a                sw a4,4(sp)
    1648: 43d4                lw a3,4(a5)
    164a: 43d8                lw a4,4(a5)
    164c: d606                sw ra,44(sp)
    164e: d416                sw t0,40(sp)
    1650: d21a                sw t1,36(sp)
    1652: d01e                sw t2,32(sp)
    1654: ce22                sw s0,28(sp)
    1656: cc26                sw s1,24(sp)
    1658: ca2a                sw a0,20(sp)
    165a: c82e                sw a1,16(sp)
    165c: c632                sw a2,12(sp)
    165e: 0705                addi a4,a4,1
    1660: c3d8                sw a4,4(a5)
    1662: 00d77563          bgeu a4,a3,166c <SysTick_Handler+0x30>
    1666: 4398                lw a4,0(a5)
    1668: 0705                addi a4,a4,1
    166a: c398                sw a4,0(a5)
    166c: 85418413          addi s0,gp,-1964 # 20000054 <interval_calls>
    1670: 4481                li s1,0
    1672: 8722                mv a4,s0
    1674: 431c                lw a5,0(a4)
    1676: 02f4e563          bltu s1,a5,16a0 <SysTick_Handler+0x64>
    167a: 4472                lw s0,28(sp)
    167c: e000f7b7          lui a5,0xe000f
    1680: 0007a223          sw zero,4(a5) # e000f004 <__global_pointer$+0xc000e804>
    1684: 50b2                lw ra,44(sp)
    1686: 52a2                lw t0,40(sp)
    1688: 5312                lw t1,36(sp)
    168a: 5382                lw t2,32(sp)
    168c: 44e2                lw s1,24(sp)
    168e: 4552                lw a0,20(sp)
    1690: 45c2                lw a1,16(sp)
    1692: 4632                lw a2,12(sp)
    1694: 46a2                lw a3,8(sp)
    1696: 4712                lw a4,4(sp)
    1698: 4782                lw a5,0(sp)
    169a: 6145                addi sp,sp,48
    169c: 30200073          mret
    16a0: 441c                lw a5,8(s0)
    16a2: 17fd                addi a5,a5,-1
    16a4: c41c                sw a5,8(s0)
    16a6: e799                bnez a5,16b4 <SysTick_Handler+0x78>
    16a8: 445c                lw a5,12(s0)
    16aa: 9782                jalr a5
    16ac: 405c                lw a5,4(s0)
    16ae: 85418713          addi a4,gp,-1964 # 20000054 <interval_calls>
    16b2: c41c                sw a5,8(s0)
    16b4: 0485                addi s1,s1,1
    16b6: 0431                addi s0,s0,12
    16b8: bf75                j 1674 <SysTick_Handler+0x38>

It seems to me that the only difference is indeed that the former only saves s0 and s1.

By the way, it seems the saving in size of code with HPE can be fairly significant. My code was approximately 200 bytes smaller when compiled with WCH's compiler. Not sure how much of that is due to the proprietary 'XW' compact instructions, and how much due to shorter ISR prologue/eplilogue, but given I only have 5 ISRs in my entire codebase, there must be some contribution from the former.
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8178
  • Country: fi
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #28 on: March 30, 2023, 08:12:24 am »
I can't stop liking the ARM's idea that stacking, interrupt entry and exit are completely handled in hardware, and interrupt handlers can be any usual C functions, even if it increases latency to very small/simple interrupt handlers.

Despite the fact it's theoretically very easy to let compiler just handle the generation of correct prologue/epilogue, we still see problems like this arising, compiler instructed with wrong attributes and it's difficult to notice and understand what's going on. And in benchmarking we tend to see that the difference of doing it in HW or SW is very small anyway, either can have an advantage in particular test case but the advantage is always small.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #29 on: March 30, 2023, 09:29:29 am »
I can't stop liking the ARM's idea that stacking, interrupt entry and exit are completely handled in hardware, and interrupt handlers can be any usual C functions, even if it increases latency to very small/simple interrupt handlers.

It certainly has its attractions for inexperienced programmers using unmodified toolchains.

WCH *almost* achieved that. They just forgot one thing. All would be wonderful if they had added:

  • copy SP to new hidden register/CSR MISP after stacking saved registers (if not using a shadow register set) and before calling the interrupt handler.
  • when RET instruction is executed then IF in_fast_interrupt and SP = MISP THEN execute MRET instead of RET

Then you could use absolutely standard C functions as interrupt handlers, just like on Cortex-M.

Quote
Despite the fact it's theoretically very easy to let compiler just handle the generation of correct prologue/epilogue, we still see problems like this arising, compiler instructed with wrong attributes and it's difficult to notice and understand what's going on. And in benchmarking we tend to see that the difference of doing it in HW or SW is very small anyway, either can have an advantage in particular test case but the advantage is always small.

Yes, the speed advantage to hardware stacking is small to zero or even negative.

Which is why I'd rather see it done in software BUT the tools should be improved to make this transparent to the average user.

The RISC-V people have gone to the trouble of working out exactly what the code looks like to e.g. duplicate the functionality of Arm's NVIC, including chaining of handlers if a new interrupt has come in while another is executing (avoiding restoring and then immediately re-saving registers), checking if a higher priority interrupt has come in during stacking of registers for a low priority interrupt etc.

https://github.com/riscv/riscv-fast-interrupt/blob/master/clic.adoc#calling-c-abi-functions-as-interrupt-handlers

Keep the hardware simple, improve the software tooling. A few dozen bytes of flash (maybe even mask rom in the chip) for the dispatch function is cheap.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14487
  • Country: fr
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #30 on: March 30, 2023, 07:40:53 pm »
I can't stop liking the ARM's idea that stacking, interrupt entry and exit are completely handled in hardware, and interrupt handlers can be any usual C functions, even if it increases latency to very small/simple interrupt handlers.

I can understand that, but the RISC-V approach is at the same time simpler (from the processor's POV) and more flexible.
Now certainly this approach makes it harder IMO for designers of RISC-V CPU cores to implement something with the same level of performance.
And the annoying thing with interrupts on RISC-V is that there are almost as many implementations as there are vendors, each with their own tricks and compiler attributes. But that's the price to pay with modularity. We can't have it all. Downside is that for many RISC-V MCUs, you need a patched version of GCC only distributed by the vendor, since the extensions have not made it to the mainline and may take a good while before making it, unless the vendor is an active member of RISC-V International, as far as I can tell.

One thing I would like to see is return stacks handled in hardware.
Never let compilers and/or developers the ability to mess with return addresses, even if it's by mistake.

I have a few ideas, but it's far from being simple if we want something both secure and flexible.
« Last Edit: March 30, 2023, 07:44:35 pm by SiliconWizard »
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 825
  • Country: es
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #31 on: March 30, 2023, 09:21:03 pm »
Why vendors don’t just provide a template interrupt function with attribute(naked), a couple of asm() lines doing correct custom prologue/epilogue and a //your code … //end of your code section between?
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #32 on: March 31, 2023, 12:29:56 am »
One thing I would like to see is return stacks handled in hardware.
Never let compilers and/or developers the ability to mess with return addresses, even if it's by mistake.

Even the very first RISC-V chip distributed, the FE-310 in late 2016, had a 2-entry return address stack in the CPU core, but the purpose was to predict the return address (e.g. because a reload of RA from memory hadn't completed yet) rather than to enforce that the return address has not been tampered with.

But see:

https://github.com/riscv/riscv-cfi/blob/main/cfi_backward.adoc
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14487
  • Country: fr
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #33 on: March 31, 2023, 01:41:32 am »
One thing I would like to see is return stacks handled in hardware.
Never let compilers and/or developers the ability to mess with return addresses, even if it's by mistake.

Even the very first RISC-V chip distributed, the FE-310 in late 2016, had a 2-entry return address stack in the CPU core, but the purpose was to predict the return address (e.g. because a reload of RA from memory hadn't completed yet) rather than to enforce that the return address has not been tampered with.

Yes, that's something that is often mentioned for improving branch prediction. But not the same thing indeed.

But see:

https://github.com/riscv/riscv-cfi/blob/main/cfi_backward.adoc

Nice!
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #34 on: March 31, 2023, 04:54:58 am »


Quote
I can't stop liking the ARM's idea that stacking, interrupt entry and exit are completely handled in hardware, and interrupt handlers can be any usual C functions


It's cute, but I'm not sure I like the loss of the ability for really tight ISRs.  (usually the CPU clock is fast enough that it doesn't matter, but...)  And you're giving that up JUST to have your ISRs be normal C functions (well, plus the tail optimization stuff, I guess.)


Does RISC-V implement something that allows the equivalent of ARM's "lazy" Floating point stacking?  (that's also "neat, but maybe slower than I'd want.")
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #35 on: March 31, 2023, 08:48:29 am »
Does RISC-V implement something that allows the equivalent of ARM's "lazy" Floating point stacking?  (that's also "neat, but maybe slower than I'd want.")

I don't know the details of whatever Arm does, but haven't such facilities been standard since ... the 68020 and 80386?

RISC-V provides the mechanisms to implement a number of different policies with regard to FP (and other) state. It does not dictate any particular policy. Saving and restoring FP state can each, independently, be eager or lazy.

Anyway, to quote:

----

3.1.6.6 Extension Context Status in mstatus Register

Supporting substantial extensions is one of the primary goals of RISC-V, and hence we define a standard interface to allow unchanged privileged-mode code, particularly a supervisor-level OS, to support arbitrary user-mode state extensions.

To date, the V extension is the only standard extension that defines additional state beyond the floating-point CSR and data registers.

The FS[1:0] and VS[1:0] WARL fields and the XS[1:0] read-only field are used to reduce the cost of context save and restore by setting and tracking the current state of the floating-point unit and any other user-mode extensions respectively. The FS field encodes the status of the floating-point unit state, including the floating-point registers f0–f31 and the CSRs fcsr, frm, and fflags. The VS field encodes the status of the vector extension state, including the vector registers v0–v31 and the CSRs vcsr, vxrm, vxsat, vstart, vl, vtype, and vlenb. The XS field encodes the status of additional user-mode extensions and associated state. These fields can be checked by a context switch routine to quickly determine whether a state save or restore is required. If a save or restore is required, additional instructions and CSRs are typically required to effect and optimize the process.

The design anticipates that most context switches will not need to save/restore state in either or both of the floating-point unit or other extensions, so provides a fast check via the SD bit.

The FS, VS, and XS fields use the same status encoding as shown in Table 3.3, with the four possible status values being Off, Initial (e.g. zeroed), Clean, and Dirty.

When an extension’s status is set to Off, any instruction that attempts to read or write the corresponding state will cause an illegal instruction exception. When the status is Initial, the corresponding state should have an initial constant value. When the status is Clean, the corresponding state is potentially different from the initial value, but matches the last value stored on a context swap. When the status is Dirty, the corresponding state has potentially been modified since the last context save.

During a context save, the responsible privileged code need only write out the corresponding state if its status is Dirty, and can then reset the extension’s status to Clean. During a context restore, the context need only be loaded from memory if the status is Clean.

----

FS is for the FPU. VS is for the Vector unit. XS is a summary of any additional standard (none now) or custom extensions that add state that needs to be context-switched. Each such extension will add additional status bits elsewhere.

As well as lazy save (only if the state is Dirty), an OS can choose whether to eagerly reload the state when switching back to a process (and set the state to Clean) or lazily (and set the state to Off) so that FP or Vector etc context is loaded only on the first execution of a relevant instruction.

FP state is 128 bytes for a single precision FPU, 256 bytes for a double precision FPU. Typically it is the same size as the integer register state, so no big deal if it is not done lazily.

However vector state is (on an application class processor) a minimum of 512 bytes and the most common size in the next few years is likely to be 1024 bytes, with 2048 bytes (512 bit vector registers) not uncommon.

The vector state is managed more aggressively than the FP state. The ABI specifies that vector registers are caller-save i.e. their contents are undefined after ANY function call, and this includes system calls. This enables the OS to set the vector state to Off or Initial before returning from any syscall. The vector state only needs to be saved and restored if a context switch happens as the result of an interrupt (e.g. 100 Hz system tick), and NOT if a context switch happens as a result of a system call blocking (e.g. I/O).

Also:

----

Changing the setting of FS has no effect on the contents of the floating-point register state. In particular, setting FS=Off does not destroy the state, nor does setting FS=Initial clear the contents. Similarly, the setting of VS has no effect on the contents of the vector register state. Other extensions, however, might not preserve state when set to Off.

----

So, it is also possible to implement a policy of not saving FP or Vector state when you switch away from a process, but simply set the unit to Off and then, when that process is later resumed, if no other process has used the FPU or VPU in the meantime then simply turn the FPU or VPU back on, without ever saving and restoring the state.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #36 on: March 31, 2023, 09:09:40 am »
One thing I would like to see is return stacks handled in hardware.

https://github.com/riscv/riscv-cfi/blob/main/cfi_backward.adoc

Nice!

TLDR summary:

Add PUSH_RA, POP_RA, and POP_CHECK_RA that push or pop the Return Address (aka Link Register) contents to a special RAM area (page(s)) with permissions that allow *only* these instructions to access it.

POP_CHECK_RA removes an item from the special hidden stack and traps if the value is not the same as what is already in LR.

On CPUs that don't implement this extension these instructions are all NOP.

Can use it in two ways:

1) on any CPU, new or old

Code: [Select]
foo:
    addi sp,sp,-16
    sw ra,12(sp)
    PUSH_RA
    :
    :
    lw ra,12(sp)
    addi sp,sp,12
    POP_CHECK_RA
    ret

On new CPUs this will trap if the return address has been tampered with on the stack. On old CPUs it will execute as always (just maybe a couple of cycles slower)

2) on new CPUs only

Code: [Select]
foo:
    PUSH_RA
    :
    :
    POP_RA
    ret

There is no possibility that the return address has been tampered with.
 
The following users thanked this post: SiliconWizard

Online uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #37 on: March 31, 2023, 09:23:35 am »

But turning off "fast interrupts" and using the standard RISC-V __attribute__((interrupt)) is probably not slower, and will for sure use less stack (for this particular handler).

not really, using:

Code: [Select]
void EXTI7_0_IRQHandler(void) __attribute__((interrupt()));
void EXTI7_0_IRQHandler(void)
{
  GPIOD->BSHR = GPIO_Pin_4;
  GPIOD->BCR = GPIO_Pin_4;
  EXTI_ClearITPendingBit(EXTI_Line0);
}
results in

Code: [Select]
00000150 <EXTI7_0_IRQHandler>:
 150: fd810113          addi sp,sp,-40
 154: ca2a                sw a0,20(sp)
 156: c23a                sw a4,4(sp)
 158: c03e                sw a5,0(sp)
 15a: d206                sw ra,36(sp)
 15c: d016                sw t0,32(sp)
 15e: ce1a                sw t1,28(sp)
 160: cc1e                sw t2,24(sp)
 162: c82e                sw a1,16(sp)
 164: c632                sw a2,12(sp)
 166: c436                sw a3,8(sp)
 168: 400117b7          lui a5,0x40011
 16c: 4741                li a4,16
 16e: 40e7a823          sw a4,1040(a5) # 40011410 <__global_pointer$+0x20010be8>
 172: 40e7aa23          sw a4,1044(a5)
 176: 4505                li a0,1
 178: 2e09                jal 48a <EXTI_ClearITPendingBit>
 17a: 5092                lw ra,36(sp)
 17c: 5282                lw t0,32(sp)
 17e: 4372                lw t1,28(sp)
 180: 43e2                lw t2,24(sp)
 182: 4552                lw a0,20(sp)
 184: 45c2                lw a1,16(sp)
 186: 4632                lw a2,12(sp)
 188: 46a2                lw a3,8(sp)
 18a: 4712                lw a4,4(sp)
 18c: 4782                lw a5,0(sp)
 18e: 02810113          addi sp,sp,40
 192: 30200073          mret


 and 856 ns ~ 41 clock cycles @ 48 MHz
 
 while using:
 
Code: [Select]
void EXTI7_0_IRQHandler(void) __attribute__((interrupt("WCH-Interrupt-fast")));

 results in
 
Code: [Select]
00000150 <EXTI7_0_IRQHandler>:
 150: 400117b7          lui a5,0x40011
 154: 4741                li a4,16
 156: 40e7a823          sw a4,1040(a5) # 40011410 <__global_pointer$+0x20010be8>
 15a: 40e7aa23          sw a4,1044(a5)
 15e: 4505                li a0,1
 160: 2ced                jal 45a <EXTI_ClearITPendingBit>
 162: 30200073          mret

 and 420 ~ 20 clock cycles @ 48 MHz
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #38 on: March 31, 2023, 09:41:02 am »
I benchmarked the interrupt latency myself and put the results in another thread.

Summary: it's almost twice as fast on the CH32V003 with HPE hardware stacking enabled (0.87 us vs. 1.45 us ISR latency).

Although, admittedly that is for worst-case scenario where the ISR has to save all registers. Performance advantage is probably not so significant for an ISR that only has to save a couple.

So, uliano's results seem to correlate with mine. By the way, I presume you were measuring the period from one interrupt to the next? i.e. low pulse length.
 

Online uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #39 on: March 31, 2023, 09:52:30 am »
Oh sorry I didn't noticed that!

I don't see this example as worst case though, does my ISR really need all the 10 registers?

I'm not into assembly enough to tell, but it would seem really strange for just setting & resetting a pin + clearing the flag.
« Last Edit: March 31, 2023, 10:24:55 am by uliano »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #40 on: March 31, 2023, 10:03:16 am »

But turning off "fast interrupts" and using the standard RISC-V __attribute__((interrupt)) is probably not slower, and will for sure use less stack (for this particular handler).

not really, using:

Yes, really.

Your 21 cycles of difference is pretty much precisely the time required to do 12 memory reads to fetch an extra 48 bytes of instructions at 4 bytes per fetch and 2 cycles per fetch because of the 1 wait state from flash at 48 MHz.

But the POINT of __attribute__((interrupt)) is that it does less work because it saves only what is needed to be saved. Which would only be a0, a4, and a5 if you hadn't called a standard C function, EXTI_ClearITPendingBit().

Get that function inlined, or as a macro, and see what happens.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #41 on: March 31, 2023, 10:07:31 am »
I don't see this example as worst case though, does my ISR really need all the 10 registers?

I'm not into arm assembly enough to tell, but it would seem really strange for just setting & resetting a pin + clearing the flag.

If the interrupt routine calls a standard ABI C function then it needs to save and restore everything, because it doesn't know what that C function might overwrite.

If you're going to do that then, yes, use the hardware stacking.

Inline everything it needs into the interrupt function and you can often save and restore far fewer registers and be faster.
 

Online uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #42 on: March 31, 2023, 10:21:28 am »
And here we are

Code: [Select]
void EXTI7_0_IRQHandler(void) __attribute__((interrupt()));
void EXTI7_0_IRQHandler(void)
{
  GPIOD->BSHR = GPIO_Pin_4;
  GPIOD->BCR = GPIO_Pin_4;
  EXTI->INTFR = EXTI_Line0;
}

00000150 <EXTI7_0_IRQHandler>:
 150: 1161                addi sp,sp,-8
 152: c23a                sw a4,4(sp)
 154: c03e                sw a5,0(sp)
 156: 4741                li a4,16
 158: 400117b7          lui a5,0x40011
 15c: 40e7a823          sw a4,1040(a5) # 40011410 <__global_pointer$+0x20010be8>
 160: 40e7aa23          sw a4,1044(a5)
 164: 400107b7          lui a5,0x40010
 168: 4705                li a4,1
 16a: 40e7aa23          sw a4,1044(a5) # 40010414 <__global_pointer$+0x2000fbec>
 16e: 4712                lw a4,4(sp)
 170: 4782                lw a5,0(sp)
 172: 0121                addi sp,sp,8
 174: 30200073          mret

539ns ~ 26 clocks @ 48 MHz
 

Online uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #43 on: March 31, 2023, 10:24:17 am »
however this is a best case more than the other being the worst, I mean usually ISR do something more...
 

Online uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #44 on: March 31, 2023, 10:26:14 am »
I presume you were measuring the period from one interrupt to the next? i.e. low pulse length.

time elapsed between the pulse generated in main to the pulse generated in isr
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #45 on: March 31, 2023, 10:42:29 am »
Inline everything it needs into the interrupt function and you can often save and restore far fewer registers and be faster.

I suppose that's one argument to be made against using the vendor's SPL/HAL libraries - usage invariably involves calling functions for everything. As opposed to a more bare metal approach, just using registers directly.
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #46 on: March 31, 2023, 10:47:31 am »
539ns ~ 26 clocks @ 48 MHz

Ah, interesting. Even a favourable case is still slightly slower. But this is chasing nanoseconds now, haha.

I was going to re-run my benchmark with such a scenario, but I don't really need to bother now. :)
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #47 on: March 31, 2023, 11:13:35 am »
539ns ~ 26 clocks @ 48 MHz

Ah, interesting. Even a favourable case is still slightly slower. But this is chasing nanoseconds now, haha.

I was going to re-run my benchmark with such a scenario, but I don't really need to bother now. :)

He's running at 48 MHz where instruction fetches take 2 clock cycles instead of 1 at your 24 MHz.

It's worth checking at 24 MHz. I'm going for 0.6 µs, a little faster than the HPE case :-)
« Last Edit: March 31, 2023, 11:18:47 am by brucehoult »
 

Online uliano

  • Regular Contributor
  • *
  • Posts: 175
  • Country: it
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #48 on: March 31, 2023, 11:17:27 am »
2 clock per instruction or per 32 bits fetched from flash? It seems to me that stack related instructions here are 16 bit long.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #49 on: March 31, 2023, 11:27:11 am »
2 clock per instruction or per 32 bits fetched from flash? It seems to me that stack related instructions here are 16 bit long.

Per 32 bits.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #50 on: March 31, 2023, 12:08:34 pm »
Ah, interesting. Even a favourable case is still slightly slower. But this is chasing nanoseconds now, haha.

If you want to chase another few ns...

Compile your main program code (and any library code it calls) with "-ffixed-a4 -ffixed-a5" and then don't bother saving registers in your interrupt routine at all.
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8178
  • Country: fi
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #51 on: March 31, 2023, 05:22:30 pm »
Also often the need for low latency if to do some tiny simple operation, like write one GPIO pin, then continue doing something else which is not that latency critical anymore. You can split stacking in two parts (maybe with the first part with zero registers stacked, or just one). I don't know if you can control C compiler to do this for you, so may need to use assembly for the whole ISR.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14487
  • Country: fr
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #52 on: March 31, 2023, 08:38:44 pm »
Also often the need for low latency if to do some tiny simple operation, like write one GPIO pin, then continue doing something else which is not that latency critical anymore. You can split stacking in two parts (maybe with the first part with zero registers stacked, or just one). I don't know if you can control C compiler to do this for you, so may need to use assembly for the whole ISR.

As we often discuss, use appropriate hardware peripherals as much as possible if you need to trigger some hardware action like a GPIO toggle with minimal latency from a condition that is supported by said hardware. Not always possible in all cases of course, but that should be the #1 approach before resorting to fiddling with interrupt latency. IMO.

May sound obvious but I still see a lot of this stuff done in ISRs when the hardware would have allowed much more efficient handling without resorting to interrupts and software handling.

 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #53 on: April 01, 2023, 03:46:39 am »
I can't stop liking the ARM's idea that stacking, interrupt entry and exit are completely handled in hardware, and interrupt handlers can be any usual C functions, even if it increases latency to very small/simple interrupt handlers.

It certainly has its attractions for inexperienced programmers using unmodified toolchains.

WCH *almost* achieved that. They just forgot one thing. All would be wonderful if they had added:

  • copy SP to new hidden register/CSR MISP after stacking saved registers (if not using a shadow register set) and before calling the interrupt handler.
  • when RET instruction is executed then IF in_fast_interrupt and SP = MISP THEN execute MRET instead of RET

Then you could use absolutely standard C functions as interrupt handlers, just like on Cortex-M.

Ha!  I wrote a tweet to WCH suggesting this. They just PHONED me and I had a conversation with their CTO and a couple of engineers. They like the idea and will look into it to see if they can include it in future revisions of their cores. They like the idea of making things simpler for people coming from Cortex-M, and also not needing a custom toolchain. They said if I want any chips or boards just write and they'll send some for free.
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #54 on: April 01, 2023, 05:32:47 am »
Compile your main program code (and any library code it calls) with "-ffixed-a4 -ffixed-a5" and then don't bother saving registers in your interrupt routine at all.

After looking at the docs for that and related options, a thought experiment: could one compile an ISR with several -fcall-used-reg options specifying that the set of 10 registers that HPE saves/restores are not to be saved/restored in code? Then you could in theory have HPE enabled and use the regular __attribute__((interrupt)) together.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #55 on: April 01, 2023, 06:06:07 am »
Compile your main program code (and any library code it calls) with "-ffixed-a4 -ffixed-a5" and then don't bother saving registers in your interrupt routine at all.

After looking at the docs for that and related options, a thought experiment: could one compile an ISR with several -fcall-used-reg options specifying that the set of 10 registers that HPE saves/restores are not to be saved/restored in code? Then you could in theory have HPE enabled and use the regular __attribute__((interrupt)) together.

Nice idea, but sadly it doesn't seem to do anything with an __attribute__((interrupt)) function.

https://godbolt.org/z/Mrfq8GqY3
 

Online HwAoRrDkTopic starter

  • Super Contributor
  • ***
  • Posts: 1480
  • Country: gb
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #56 on: April 01, 2023, 06:22:43 am »
Nice idea, but sadly it doesn't seem to do anything with an __attribute__((interrupt)) function.

https://godbolt.org/z/Mrfq8GqY3

Hmm, that's weird; playing with different compiler versions, it works in GCC 8.x, but not in later versions. I wonder why that is? ???

GCC 8.2.0:

Code: [Select]
handler:
        li      a4,1048576
        lw      a5,0(a4)
        addi    a5,a5,1
        sw      a5,0(a4)
        ret

GCC 12.2.0:

Code: [Select]
handler:
        addi    sp,sp,-16
        sw      a4,12(sp)
        sw      a5,8(sp)
        li      a4,1048576
        lw      a5,0(a4)
        addi    a5,a5,1
        sw      a5,0(a4)
        lw      a4,12(sp)
        lw      a5,8(sp)
        addi    sp,sp,16
        mret

Edit: Oh, I just noticed the 8.2 version has "ret" instead of "mret" - seems to be just ignoring the interrupt attribute completely.
« Last Edit: April 01, 2023, 06:24:26 am by HwAoRrDk »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data
« Reply #57 on: April 01, 2023, 07:52:51 am »
8.2 is from July 2018 which would have been marginal whether __attribute__((interrupt)) got in upstream:

https://patchwork.ozlabs.org/project/gcc/patch/20180525223027.1792-1-jimw@sifive.com/
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf