Electronics > Microcontrollers

Bizarre problem on CH32V003 with SysTick ISR corrupting UART TX data

(1/12) > >>

HwAoRrDk:
I've run into a bizarre problem with the SysTick timer interrupt on my CH32V003 project. Basically, what's happening is that a certain section of code being executed inside the SysTick ISR is somehow causing data transmitted on the UART to be corrupted. Yeah, I know, sounds crazy! But that is what I've painstakingly narrowed it down to. I am at a loss as to how this is occurring, and how to proceed in fixing it.

The problem also only occurs when my code is compiled with -Os, and not when compiled in debug mode with -g (and no -O flags).

The UART corruption takes the form of a character being corrupted roughly every N characters. For example, a string of "-----------------------------" might become "------------<-----------<----". The corrupted character is random (i.e. not a bit-flip of the correct character) but always the same at each instance. What's interesting - and this is what put me on to the trail of the SysTick ISR - is that I noticed that N is a value such that, given the baud rate the UART is operating at, it roughly corresponds to the regular period that the SysTick ISR is being called. In my case, baud is 115200, the bit period is 8.68 us, so each character takes 86.8 us (8N1, so 10 bits per character), and if N = 12 (as per example above), that's 1041 us - approximately matching the 1 ms period of the ISR!

This is my ISR's code:


--- Code: ---extern void SysTick_Handler(void) __attribute__((naked));

void SysTick_Handler(void) {
const uint32_t ticks_orig = sys_timestamp.ticks;
if(++sys_timestamp.ticks < ticks_orig) sys_timestamp.epochs++;

/**** THIS BLOCK IS THE CULPRIT (when commented, problem disappears) ****/
for(size_t i = 0; i < interval_calls.count; i++) {
if(--interval_calls.calls[i].interval_remaining_ms == 0) {
interval_calls.calls[i].callback();
interval_calls.calls[i].interval_remaining_ms = interval_calls.calls[i].interval_ms;
}
}

SysTick->SR = 0;

interrupt_return(); /* macro for "mret" inline assembly instruction */
}

--- End code ---

The SysTick timer is set up with this code:


--- Code: ---void SysTick_Init(const uint32_t hclk_freq_hz) {
SysTick->CMP = (hclk_freq_hz / 1000) - 1;
SysTick->CNT = 0;
SysTick->SR = 0;
SysTick->CTLR = STK_STRE | STK_STCLK | STK_STIE | STK_STE;
interrupt_enable(SysTicK_IRQn); /* macro to enable IRQ in PFIC->IENR */
}

--- End code ---

The ISR code is just some straightforward code that increments a timestamp, but also has a rudimentary 'callback' system that will call any registered functions at a given interval in milliseconds. I use it to sample a GPIO for button input every 20 ms.

What's stupid is that even when no interval callbacks are registered (i.e. interval_calls.count = 0, verified by dumping that value with GDB) and the loop body should not be executing, the problem still occurs, so it's not like it's inadvertently calling some random function pointer and jumping of into who-knows-where and clobbering the UART registers or something.

In trying to get some insight into this by debugging with GDB, there's also something wacky going on there concerning the ISR that I can't explain. According to the disassembly produced by objdump (from the compiled .elf), SysTick_Handler starts at address 0xf5c, yet GDB thinks SysTick_Handler starts at 0xf66 if I set a breakpoint by name. If I set a breakpoint by address at 0xf5c, it doesn't get hit! [False alarm, somehow I managed to set the breakpoint to a disabled state.] I don't know WTF is going on there, but I have a feeling it might be part of the problem.

Here's the disassembly of the ISR in case it helps:


--- Code: ---00000f5c <SysTick_Handler>:
     f5c: 82818793          addi a5,gp,-2008 # 20000028 <sys_timestamp>
     f60: 43d4                lw a3,4(a5)
     f62: 43d8                lw a4,4(a5)
     f64: 0705                addi a4,a4,1 # 1001 <uart_init+0x53>
     f66: c3d8                sw a4,4(a5)
     f68: 00d77963          bgeu a4,a3,f7a <SysTick_Handler+0x1e>
     f6c: 0007c703          lbu a4,0(a5)
     f70: 0705                addi a4,a4,1
     f72: 0ff77713          zext.b a4,a4
     f76: 00e78023          sb a4,0(a5)
     f7a: 83c18413          addi s0,gp,-1988 # 2000003c <interval_calls>
     f7e: 4481                li s1,0
     f80: 8722                mv a4,s0
     f82: 431c                lw a5,0(a4)
     f84: 00f4e863          bltu s1,a5,f94 <SysTick_Handler+0x38>
     f88: e000f7b7          lui a5,0xe000f
     f8c: 0007a223          sw zero,4(a5) # e000f004 <__global_pointer$+0xc000e804>
     f90: 30200073          mret
     f94: 441c                lw a5,8(s0)
     f96: 17fd                addi a5,a5,-1
     f98: c41c                sw a5,8(s0)
     f9a: e799                bnez a5,fa8 <SysTick_Handler+0x4c>
     f9c: 445c                lw a5,12(s0)
     f9e: 9782                jalr a5
     fa0: 405c                lw a5,4(s0)
     fa2: 83c18713          addi a4,gp,-1988 # 2000003c <interval_calls>
     fa6: c41c                sw a5,8(s0)
     fa8: 0485                addi s1,s1,1
     faa: 0431                addi s0,s0,12
     fac: bfd9                j f82 <SysTick_Handler+0x26>

--- End code ---

I don't know where to proceed with this. :( Any suggestions for avenues of attack?

brucehoult:
Presumably you have enabled their "fast interrupt" register stacking thing.

I worry a little about calling a lot of code from that state, but haven't really thought about it.

I absolutely worry about the timing implications of calling an arbitrary amount of code from an interrupt handler. I think you should increment the tick count there and nothing else. Put the callback scheduling stuff in non-interrupt code.

HwAoRrDk:

--- Quote from: brucehoult on March 29, 2023, 03:44:01 am ---Presumably you have enabled their "fast interrupt" register stacking thing.

--- End quote ---

Yes, hardware prologue/epilogue (HPE) is being used, but not vector table-free (VTF).


--- Quote from: brucehoult on March 29, 2023, 03:44:01 am ---I worry a little about calling a lot of code from that state, but haven't really thought about it.

I absolutely worry about the timing implications of calling an arbitrary amount of code from an interrupt handler. I think you should increment the tick count there and nothing else. Put the callback scheduling stuff in non-interrupt code.

--- End quote ---

The callback stuff is only really an abstraction to allow me to separate code into different files rather than pile stuff into the ISR. Technically it could be calling an arbitrary amount of code, but in practice it isn't, and I know not to use it like that. It's only a very small amount of code in the callbacks. I started out with two: one to toggle an LED every 500 ms (kind of a 'heartbeat'); a second to sample a GPIO for a button every 20 ms. Not exactly taxing. :) Got rid of the former and now I have just the latter.

As I mentioned before, everything works just fine when compiled without -Os. So something about the optimisation is screwing things up, or perhaps revealing an unseen issue.

One further thing I've noticed when attempting to investigate with GDB, is that the corrupted characters appear after the ISR returns. That is, if my code is printing a long string of characters, it happens when the ISR interrupts execution somewhere inside my putchar()-like function that writes a single character to the UART data register. But I couldn't get a handle on what exactly was being affected or how.

Aside: I am finding it quite hard to read RISC-V assembly, as the ordering of operands is not consistent. Sometimes the destination is listed first (e.g. "lw a4,4(a5)"), sometimes last (e.g. "sw a4,4(a5)"). Very confusing. Why is that?

brucehoult:

--- Quote from: HwAoRrDk on March 29, 2023, 05:09:21 am ---As I mentioned before, everything works just fine when compiled without -Os. So something about the optimisation is screwing things up, or perhaps revealing an unseen issue.

--- End quote ---

I think the problem is probably in the code that gets interrupted.

This code looks fine, other than not being as optimised as I'd like. For example, loading from 4(a5) in both the 2nd and 3rd instructions, which could result in different values, which could break your code.

I'd expect the optimiser to combine those loads. If it hasn't then I imagine it is because
--- Code: ---sys_timestamp.ticks
--- End code ---
is declared as volatile in code we don't see. In which case it is your responsibility to make sure you load it only once.

In fact in general that you don't factor out things such as
--- Code: ---interval_calls.calls[i]
--- End code ---
in your code scares me.



--- Quote ---Aside: I am finding it quite hard to read RISC-V assembly, as the ordering of operands is not consistent. Sometimes the destination is listed first (e.g. "lw a4,4(a5)"), sometimes last (e.g. "sw a4,4(a5)"). Very confusing. Why is that?

--- End quote ---

"store" is the *only* exception, I imagine for mental compatibility with every other ISA that has instructions called "load" and "store", not "move" or both load and store being "ld". ARM for example. Or MIPS.

HwAoRrDk:

--- Quote from: brucehoult on March 29, 2023, 07:00:02 am ---I think the problem is probably in the code that gets interrupted.

--- End quote ---

That's a thought I had when I realised the corruption happens at the point the ISR has returned back into UART code. But the code in my uart_putchar() is really simple, such that I don't see how it could go wrong.


--- Code: ---int uart_putchar(int c) {
if(!uart_opts.binary && c == '\n') uart_putchar('\r');
while(!(USART1->STATR & USART_STATR_TXE));
USART1->DATAR = c & USART_DATAR_DR;
return c;
}

--- End code ---

Yes, I know, a recursive call. But it's inherently bounded, so sue me. ;D

Maybe I need to figure out how to get GDB to break on every write to USART1->DATAR, and see what value is being written to the register. Perhaps that will shed some light on whether the corruption comes from the character to be transmitted being corrupted before being written to USART1->DATAR.


--- Quote from: brucehoult on March 29, 2023, 07:00:02 am ---This code looks fine, other than not being as optimised as I'd like. For example, loading from 4(a5) in both the 2nd and 3rd instructions, which could result in different values, which could break your code.

I'd expect the optimiser to combine those loads. If it hasn't then I imagine it is because
--- Code: ---sys_timestamp.ticks
--- End code ---
is declared as volatile in code we don't see. In which case it is your responsibility to make sure you load it only once.

--- End quote ---

Indeed, sys_timestamp is volatile. Not sure what you mean by "your responsibility to make sure you load it only once". You mean in case it changes in-between accesses? That won't happen, because the SysTick_Handler ISR is the only thing that ever writes to it - after all, that's its job, and no-where else's.


--- Quote from: brucehoult on March 29, 2023, 07:00:02 am ---In fact in general that you don't factor out things such as
--- Code: ---interval_calls.calls[i]
--- End code ---
in your code scares me.

--- End quote ---

Again, not sure what you mean by this. You mean "factor out" as in make a pointer instead?


--- Code: --- for(size_t i = 0; i < interval_calls.count; i++) {
interval_call_t *c = &interval_calls.calls[i];
if(--c->interval_remaining_ms == 0) {
c->callback();
c->interval_remaining_ms = c->interval_ms;
}
}

--- End code ---

How would that have a meaningful change to things? It's not like i is going to change mid-loop... :-//


--- Quote from: brucehoult on March 29, 2023, 07:00:02 am ---"store" is the *only* exception, I imagine for mental compatibility with every other ISA that has instructions called "load" and "store", not "move" or both load and store being "ld". ARM for example. Or MIPS.

--- End quote ---

It is? Ah, well, I'm not familiar with ARM, or MIPS, or x86, etc. Still, it's a bit taxing mentally when every third instruction in RISC-V code seems to be a load or store of some kind. I suppose I'll get used to it eventually. :)

Navigation

[0] Message Index

[#] Next page

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod