Author Topic: Ideas for code instrumentation (Read 7246 times)

Siwastaja · « **on:** November 12, 2022, 11:38:11 am »

I like to error() out when fundamental sanity checks fail in ways there is no obvious/easy recovery. I was thinking about error handlers, and quite obvious is to store as much information as possible in RAM segment not cleared by init code, then just reset, and let the networking code (or whatever is relevant for the project) deliver the "oh, we rebooted with error code 123, with the following extra info: ..." message.

Call graph that led into the error would be nice, and in architectures where return addresses for function calls are stored in stack, it's just matter of scanning the stack and storing values which are valid addresses for code (with a possibility that some rare false positives appear, which are not actually return addresses but any data that falls within code segment address range).

But ARM Cortex-M only stacks return addresses when entering an interrupt handler; normal function calling uses the LR register.

So maybe this: in the beginning of each function (except some super performance critical funcs), use some helper macro to allocate 4-8 bytes out of stack, and store some magic number, plus any kind of identifier for the function, like a #defined unique constant for each .c file plus __LINE__ number - these would fit in maybe 20 bits. Maybe something else, too, any status information that can be automatically discarded after function return. Once the function exists, stack allocation disappears and here we go, easily generated call graph, plus magic numbering reduces false entries.

Any comments on this, or completely orthogonal run-time instrumentation ideas? Things that have helped you to catch those unimaginable bugs that only happen after you have a lot of units on the field, in some weird corner case, so that instead of trying to reproduce the bug, you already have enough data to possibly straight up solve it, or at least give massive hint how to try to reproduce it. (Anything else except "make it completely bug free by single-stepping in debugger on lab table").

nctnico · « **Reply #1 on:** November 12, 2022, 02:46:30 pm »

I'm not quite sure if you really want to do a complete stack trace. What has helped me a lot over the years is to have functions print an error to a UART output in case a bad parameter is encountered or some external error is detected and allowing to capture this output. With -at least- a different return value for an error but preferably some more information about the error and state of the system, you can quickly trace back where the program went wrong. Ofcourse you can make this logging internal into an SPI flash or so that can be read but I never went this far; the assumption is that most units simply run without any problems and those with problems get a PC hooked to them to do logging.

Siwastaja · « **Reply #2 on:** November 12, 2022, 02:57:33 pm »

Yes, my go-to handler has been error(int code) which simply uses infinite loop which prints the code to UART plus blinks the code with LED, in case UART becomes inoperational or the thing is at customer who has no access to the UART, but can always count the blinks.

My question is, why not? Implementing something like call trace should not be many hours of work, and it is entirely possible it will save months of bug-hunting in case a serious and very rarely reoccurring bug remains in production.

Complete stack dump might not be reasonable because one easily runs out of storage and/or interface to send it out, especially if one has many kilobytes of buffers in stack, but call stack should be just 10-15 levels deep max (I rarely use recursive functions on embedded), and due to interrupts (possibly many nested interrupts), it can be pretty unpredictable, so func() {if (condition) error(57);} is not always reached by obvious path at all, but there can be some weird sequence of events. And due to interaction with peripherals, it is not possible to analyze it on PC in advance. Maybe the path to func() is the same 99.99999% of time but once in a year some peripheral acts a bit different, e.g. causing unexpected interrupt.

dietert1 · « **Reply #3 on:** November 12, 2022, 04:24:33 pm »

One can also use a short cyclic buffer to save this type of info, e.g. with 20 entries. I remember making a procedure that saved a string pointer for a constant string = name of function together with processor cycle value to the cyclic buffer. Then called this helper at each function entry. This method can be useful if there are multiple tasks/stacks. Using a macro one can make this a compile time option.
Don't remember whether it was for post-mortem analysis or for profiling. I could use the debugger to translate the string pointers into strings, so there was a plain text display in the watch panel.

Regards, Dieter

ataradov · « **Reply #4 on:** November 12, 2022, 05:58:39 pm »

The buffer idea is what we used for a wireless stack that could run on 100s of nodes. If you have a bug that only happens overnight in a busy network on one of the nodes, then it is not viable to have all of them under the debugger (this was on AVRs, so no live attach).

We had a cyclic buffer and statements that add a marker (single byte). The markers were placed manually to better reflect expected code flow. Plus the markers would not be located only on the functions, but also on which branches are taken and sometimes even values of the variables.

The buffer would be continuously dumped on UART on assert(), unexpected reset or just a button press.

Some compilers (GCC included) give you ability to add instrumentation functions on a function entry/exit. You can use to add automatic logging too.

NorthGuy · « **Reply #5 on:** November 12, 2022, 06:37:12 pm »

I did two things:

- Post-mortem dumps into internal flash, so you can read it later

- Ability to read RAM at run time. Then you can create various variables, counters etc. and then you can check them all. Very handy. The drawback that this is not a snapshot, so it is not synchronus.

More importantly, I try to avoid rare cases - design algorithms and routines which do not have special rare cases, try to make timing deterministic etc. etc.

SiliconWizard · « **Reply #6 on:** November 12, 2022, 06:42:00 pm »

Why not, but one question is, what do you do with the trace information that you can read from RAM? How do you transmit it from the device in the field to the developers?

Siwastaja · « **Reply #7 on:** November 12, 2022, 07:03:38 pm »

Quote from: SiliconWizard on November 12, 2022, 06:42:00 pm

Why not, but one question is, what do you do with the trace information that you can read from RAM? How do you transmit it from the device in the field to the developers?

Assuming it's truly a rare error, then reset will fix the device enough to communicate using whatever interfaces you have available. You can keep RAM content through reset, then send the data.

Of course, sometimes you have no interfaces, then you're out of luck. But such devices tend to be quite simple, so maybe you don't need sophisticated reporting, but a simple error code via led blink count (and customer calling you) is enough.

ataradov · « **Reply #8 on:** November 12, 2022, 07:06:41 pm »

This is useful for debugging during development. What to do with the device in the filed is a separate question. But just saving into the internal MCU flash is probably the most reliable way if you are doing RMAs. I would not expect to get any useful cooperation from majority of the customers. If your device has SDCard, then saving a file there and asking for that file is probably the most you can expect.

nctnico · « **Reply #9 on:** November 12, 2022, 08:41:50 pm »

Quote from: NorthGuy on November 12, 2022, 06:37:12 pm

More importantly, I try to avoid rare cases - design algorithms and routines which do not have special rare cases, try to make timing deterministic etc. etc.

Yes and no. Yes to designing software not to have too many paths but definitely no to making timing deterministic. Deterministic timing is next to impossible to achieve for software consisting of more than a few lines of code so better design software in a way that timing doesn't matter c.q. synchronisation can be recovered.

For example: when I need to read values from a microcontroller's ADC I don't use interrupts or anything like that. I let them sample at a higher rate (freerunning) than needed so everytime I need updated ADC values in my software, the values are simply there -guaranteed-.

tggzzz · « **Reply #10 on:** November 12, 2022, 09:24:41 pm »

The best answer to the OP's question completely depends on the type of "error" that is being caught and diagnosed.

If you are debugging your FSMs' interactions with other equipment, capture the sequence.of events and states.

If you are debugging memory problems, then track what causes allocs and frees.

Leave all the many fine grained logging statements in the code, but ensure evaluating whether something should be logged is computationally trivial. The Java Log4J is a good example, but no doubt there are others

SiliconWizard · « **Reply #11 on:** November 12, 2022, 09:41:38 pm »

Quote from: tggzzz on November 12, 2022, 09:24:41 pm

The Java Log4J is a good example, but no doubt there are others

A good example of a catastrophe?

westfw · « **Reply #12 on:** November 12, 2022, 11:12:41 pm »

Quote

ARM Cortex-M only stacks return addresses when entering an interrupt handler

Huh? Cortex-M stacks full context on entering an ISR. You can't count on there having been a valid stack frame when the interrupt occurred (stack frames seem to frequently be optimized away), and LR itself will have magic numbers, but you should be able to derive a good stack trace without much more effort than a check for those magic numbers...

ataradov · « **Reply #13 on:** November 12, 2022, 11:24:22 pm »

His point was the opposite. You can get a real return address from the ISR stack frame, but you can't derive full call graph from the stack on multiple normal function calls.

You can do this with IAR because it uses (used?) separate call and data stacks.

As a less automated way of extracting the call stack, I usually use a python script that takes disassembly (raw objdump output) and the bytes of stack. The script then tries to match words on the stack to the addresses of the branch instructions. This reproduces call stack very reliably.

But if you really want a full and complete information, then look at "-finstrument-functions" for GCC. It would generate an instrumentation function call for each function entry and exit. You can log call sites in those functions. It is not a good idea for final code, as it generates a lot of overhead, but it is useful for debugging.

tggzzz · « **Reply #14 on:** November 12, 2022, 11:54:54 pm »

Quote from: SiliconWizard on November 12, 2022, 09:41:38 pm

Quote from: tggzzz on November 12, 2022, 09:24:41 pm
The Java Log4J is a good example, but no doubt there are others

A good example of a catastrophe?

A good example of averting a "catastrophe".

I've successfully used it - in a soft realtime system - to prevent lawyers becoming involved in a dispute between companies.

voltsandjolts · « **Reply #15 on:** November 13, 2022, 10:23:08 am »

Quote from: Siwastaja on November 12, 2022, 07:03:38 pm

Of course, sometimes you have no interfaces, then you're out of luck. But such devices tend to be quite simple, so maybe you don't need sophisticated reporting, but a simple error code via led blink count (and customer calling you) is enough.

Yup, for simpler systems with no UI, SDcard, USB etc you are stuck with a blinking LED, which is better than nothing.

Almost every customer does have a mobile phone though, and they could provide a 1 minute video of said LED(s). You could then decode maybe a byte-per-second from the video. Here is a video decoder for morse code, but something else would be more appropriate here, manchester maybe. Dunno, stupid idea.

Siwastaja · « **Reply #16 on:** November 13, 2022, 11:59:04 am »

Quote from: voltsandjolts on November 13, 2022, 10:23:08 am

Almost every customer does have a mobile phone though, and they could provide a 1 minute video of said LED(s). You could then decode maybe a byte-per-second from the video.

What an excellent idea, didn't think about it. I agree you could safely encode approx. 8-10 bits or so per second. If the blink pattern repeats every 30 seconds and you can ask customer to video it for 1 minute to catch one full sequence, that's already over 200 bits of payload. That's an error code, state transition graph with a dozen of previous states maybe even with timestamps, dozen of interesting flags (peripheral status flags for example), and indicator flags for branches taken. Not bad!

Regarding the workload, if this is truly a "once in a lifetime" corner case, then the developer is happy to spend half an hour to decode the signal manually by looking at the video frame-by-frame. If it becomes a larger issue, then it pays back to write a simple video processing tool.

You can encode even more data on it, so that "LED on" is actually a high-frequency (a few MHz) signal which just looks solid on camera. Then you can get said ~200 bits with a camera phone, but once you connect logic analyzer to it (optically or electrically), you can read out much more. But this is getting into the "no one will ever need to do such a thing" territory.

dietert1 · « **Reply #17 on:** November 13, 2022, 12:48:30 pm »

Probably somewhere in this world an app already exists that reads a LED blinking sequence like a QR code and outputs the data in some format.

Alti · « **Reply #18 on:** November 13, 2022, 03:29:46 pm »

You could also embed USB debugger into a product and then ask a customer to plug USB and run a remote GDB server on his computer. If that is Cortex M3/M4 then you could even embed gdb server into the product firmware (no additional hardware needed) because it has dedicated Debug Monitor IRQ. I have not tried it but it seems technically possible. Via USB, UART or maybe Ethernet.

voltsandjolts · « **Reply #19 on:** November 13, 2022, 08:02:48 pm »

Quote from: Siwastaja on November 13, 2022, 11:59:04 am

Regarding the workload, if this is truly a "once in a lifetime" corner case, then the developer is happy to spend half an hour to decode the signal manually by looking at the video frame-by-frame. If it becomes a larger issue, then it pays back to write a simple video processing tool.

Hmm, I wonder if scope decode could be used.
Hook an LDR/photodiode to the scope, play the video and hold the LDR/pd against the screen.
Heh, a lazy electronics engineer's way to avoid writing video decoder stuff

tellurium · « **Reply #20 on:** November 14, 2022, 01:28:01 pm »

What we did on one of our firmware (non-ARM, but ESP32) was the following

1. Installed a crash handler which saved the backtrace to a special place in RAM before rebooting:

Code: [Select]

RTC_NOINIT_ATTR char backtrace_buf[512];  // Crash backtrace  buffer

extern void __real_xt_unhandled_exception(void *);
IRAM_ATTR void __wrap_xt_unhandled_exception(XtExcFrame *frame) {
  do_backtrace(frame, 0);
  __real_xt_unhandled_exception(frame);
}

2. When a device reconnects to the cloud, it uploads the contents of the backtrace buffer if non-empty

This way we knew which devices crash at which places.

voltsandjolts · « **Reply #21 on:** December 12, 2022, 11:15:02 am »

Quote from: voltsandjolts on November 13, 2022, 08:02:48 pm

Hmm, I wonder if scope decode could be used.
Hook an LDR/photodiode to the scope, play the video and hold the LDR/pd against the screen.
Heh, a lazy electronics engineer's way to avoid writing video decoder stuff

Heh, just tried this for fun.
Soft uart at 10 baud 8N1 driving LED.
Video the LED.
Playback the video with crude LDR detector just above LED on screen.
Serial decode LDR data using logic analyser (because my scope decode is 100 baud minimum).
Works fine.

Very large creepage and clearance opto isolator

Doctorandus_P · « **Reply #22 on:** December 13, 2022, 03:36:02 pm »

I also use the serial data format and a Logic Analyzer.

I wrote a little library with "DEBUG( x)" macro's that turned the "x" into a start bit, 5 data bits (Set bit or clear bit asm instructions and a stop bit).
This worked quite nice on the AVR controllers I used, but the bitrate was too high for the ARM controllers I used later.
For even lower overhead bytes can be written to a spare uart or SPI bus. I2C may also work if you can disable / ignore the absence of the "ACK" bits.

My little debug lib also has an option to replace the debug statements with an empty statement, so it does not have any runtime penalty at all.

The beauty of using a logic analyzer is that you also can correlate code execution in the uC with other external events.
Sometimes I've also added a simple toggle of the same bit when the uC is idling. This can for example be used to measure ISR latency.

I've heard that some (a lot, most?) of the arm uC's have some trace capability and you may be able to log that.

ejeffrey · « **Reply #23 on:** December 13, 2022, 04:59:21 pm »

Quote from: Siwastaja on November 12, 2022, 11:38:11 am

But ARM Cortex-M only stacks return addresses when entering an interrupt handler; normal function calling uses the LR register.

So maybe this: in the beginning of each function (except some super performance critical funcs), use some helper macro to allocate 4-8 bytes out of stack, and store some magic number, plus any kind of identifier for the function, like a #defined unique constant for each .c file plus __LINE__ number - these would fit in maybe 20 bits. Maybe something else, too, any status information that can be automatically discarded after function return. Once the function exists, stack allocation disappears and here we go, easily generated call graph, plus magic numbering reduces false entries.

I believe the deprecated GCC option -mapcs-frame causes the generated binaries to be built with a traditional frame pointer which you can easily unwind in a fault handler. I don't know if this is even supported on cortex-M and I don't know if it has been removed entirely from gcc. You can also compile with exception handling (-fexceptions or -funwind-tables) to enable the necessary information to generate a stack trace at runtime.

If you don't do that, I think the only way to reliably get a stack trace is with the debug symbols.

Nominal Animal · « **Reply #24 on:** December 13, 2022, 05:42:13 pm »

Like Ataradov mentioned in #4 and #13, GCC's (and Clang's) -finstrument-functions option causes the function
void __cyg_profile_func_enter (void *this_fn, void *call_site);
to be called just after every function entry, even an inlined one, and
void __cyg_profile_func_exit (void *this_fn, void *call_site);
just before exit from that function. They're always paired. The first parameter identifies the function (it is the same as the address of the function being called), and the second identifies where the function was called from; i.e. callee and caller addresses, respectively.

If you expand that into a triplet that includes the stack pointer at the entry, you could save these into a separate call trace stack (three pointers per nesting level), and even track how full the stack is, especially during development (I'm thinking of testing phase, where the devices are not necessarily in a testbench setup in the developer's desk, but tested by butterfinger users like myself). As this would be separate from the standard stack, you might even be able to detect a corner case stack buffer overrun.


EEVblog® Main Site	EEVblog® on Youtube	EEVblog® on Twitter	EEVblog® on Facebook	EEVblog® on Odysee

Author Topic: Ideas for code instrumentation (Read 7246 times)

Share me