Author Topic: Annoying long runtime bug (Read 2609 times)

paulca · « **on:** December 02, 2022, 12:15:38 pm »

I have a bug that takes hours and hours to manifest. Leaves no trace as to if/when it's going to happen, just that the device stops doing it's thing.

I added an independently clocked watchdog, but it is being refreshed. So it's not locking up or hard faulting.

So yesterday after work I fired it up on the breadboard with the debugger attached and let it run and run and run. By 11pm when I was heading to bed it was still running.

At 2am when I got up for the bathroom, I poked my head in and .... it was locked up. However I was NOT about to sit and debug it at 2am! So this morning I opened up the VM to find that at some point the debugger had disconnected, damn.

Of course if you run it without the debugger attached it will lock up inside 20 minutes! (Does anyone know a way to connect CubeMX to an already running core?)

It will be a missed interrupt that deadlocks a state machine which then deadlocks (or provides no impetus for) the others. I hate state machines. A bit like I hate the cold and rain, but just like the weather you can't avoid state machines, necessary evil. Just so damn hard to get right and cover all possibly scenarios.

I suppose I can just plaster it in debug logging. If it's refreshing it's watchdog time, I can put a big fat state dumping log statement there to show me the state machine variables etc.

paulca · « **Reply #1 on:** December 02, 2022, 12:22:30 pm »

A question I have is around missing interrupts.

Take the scenario where you have a bursty process which takes a few milliseconds. While it's doing it's thing, 2 interrupts are received. By the time the core gets to the second one, another has already "overwritten" or supersceded it.

Are there any "engineering" best practices around recovering from the potential state machine mess this can create?

My own personal feeling is to aim for "detection" of invalid states or invalid transitions and have a way to completely reset the state machine and notify other state machines accordingly. I mean, would it not be prudent to be able to stop, clear and restart any sub component and not have the whole system lock up whlie doing so.

I suppose this problem is already targeted by the RTOS et.al. Maybe I should consider going to RTOS and stop managing my own wonky/janky state machines, timing and interrupt management.

paulca · « **Reply #2 on:** December 02, 2022, 12:32:10 pm »

(sometimes I make this posts as I often answer half my own problem just writing them).

Two things occurred to me. First, I have no idea and no way to tell how much "load" is on the core at any point in time or generally. There is a possibility that something is (or everything is) doing too much in interrupts such that they are getting missed.

Second thing was that, while an IWDC will help reset your core when it locks up, you want a NON indenepnant watchdog timer to tell you if you are overloading the core. AKA, the RTOS Idle Task, which could be done with a standard watchdog. If I set the timeout of the watchdog as a interrupt and refresh the timer in the interrupt context that will tell me very quickly if interrupts are backlogged longer than the period of the timer "reset window".

GromBeestje · « **Reply #3 on:** December 02, 2022, 12:38:55 pm »

Attaching to a running target, I am not sure how to do it within CubeMX, but for the Eclipse for embedded developers, using a SEGGER J-Link plugin, there is this option "Connect to running target".

T3sl4co1l · « **Reply #4 on:** December 02, 2022, 02:36:20 pm »

Impossible to say without source, of course, but some general ideas:

- In the main() spin loop, add a counter (increment per loop). Reset it every heartbeat or so. Just before resetting, save the value to a cpuCount variable. Monitor that via debugger, or spit it out via serial, whatever. Interrupts use up CPU cycles --> reduced spin count --> more CPU usage. So it's an inverse measure. Disable all the main() housekeeping and interrupts to see what the maximum value is, and, obviously the minimum value is about 0. (Depending on platform, interrupt saturation may still let through one main() instruction per interrupt, so it might not go completely to zero, it just grinds extremely slowly.) (If you're crashing out at the same time, this might not show up, of course.)

- Others have advocated for this before: run everything through a single interrupt, so nothing gets out of sync. Which really means, have all the ISRs jump to the same handler and figure it out from there. Maybe call with a fixed parameter so the callee knows what device to service and switch() from there.

Downsides are full stack overhead (at least on many devices; anything with register swaps ala Z80 could potentially do very well at this, assuming the compiler emits relevant instructions to use it) and somewhat more latency, and no way to prioritize high-rate or missable interrupts while another is executing.

I'm not a fan of this myself, at least on the smaller platforms with a handful of active interrupts, that I've been working with. I do have issues getting the priorities right and making sure everything is consistent in any order, but with only a handful, it's not unmanageable. Granted, it wouldn't take many more to get there.

Complexity of those interactions goes up, probably exponentially with number, so if you're using a lot... it may be a worthwhile change.

Which segues into the next point...

- Don't try to write something you can't understand. Reduce it to manageable pieces you can reason about. This level is different for everyone, but know your strengths and weaknesses, and plan accordingly.

If you have multiple overlapping and interacting interrupts, that sounds like a recipe for disaster. Add a state machine below that and who knows? Maybe that's descriptive of the present issue, I don't know. But just look at that laundry list of hazards. It's too complex and (literally) unreasonable. Break it into smaller pieces, isolate the processes.

You can deal with high-rate interrupts by isolating them with minimal or zero side effects, for example buffering data then handling it either in another (strictly lower priority) interrupt, or preferably in the main() spin loop or heartbeat. Bring it down to a manageable place where everything can be resolved, in preordained order.

In your state machine, plan for zero or more items in each buffer. Maybe it will sometimes run too often, or interrupts get held up, so you sometimes read an empty buffer -- just take that into account. If more than one item is seen, create a plan to resolve them: process them in order, take the latest one, whatever.

If the order of multiple simultaneous data/items/events matters, you can save a timestamp with each event and resolve them later. Or better yet, use a common event buffer -- which can be designed and tested* for atomicity and therefore is a safe gate between asynchronous interrupts and synchronous (i.e. main() loop) logic.

*Mind, this is a fairly simple system and we're already well into the realm of "tests don't prove anything". The best testing you could do here, would be something like fuzzing inputs and triggering interrupts with random inputs to try and force any possible timing or logic error. For sure, you're going to have minimal to nonexistent test coverage if you're going by internal timers and stuff -- the timing is too consistent, at best plus or minus a few CPU cycles due to instruction timings, and sure you can set different timer rates or even randomize them, but those updates might still be synchronous with other CPU functions and you can't guarantee every pattern will be seen. You can at least get independent timing from an external source -- for example, hooking up a GPIO interrupt to a signal generator and giving that a spin. But you still have no way to enumerate every possible combination and sequence of interrupt, latency, delay, etc.

On the upside, at least on simple platforms, the clock rates are fairly low (10s MHz to low 100s), so the timing doesn't need to be very fine-grained to find that one-in-a-million hangup where two interrupts conspire to corrupt the buffer state, or something like that. And it's usually just a timing coincidence, not pattern-dependent. And, note the failure threshold: one error at all, is evidence of total design failure; absence of error is not evidence of success. If it comes down to interrupts interfering within a single clock cycle, out of say ~100s Hz average interrupt rates, that's literally a one in a million chance, and you need to be able to detect and log/report that one failure when it happens. So, plan accordingly.

- If it's overlapped interrupts, also check the stack from time to time. This isn't usually easy to do from inside the language, but there may be some methods (library?), or if nothing else, a little ASM can do it.

Examples:
* In each interrupt, log the stack pointer: take the minimum value between the variable and the current value, and store that back. (ATOMICALLY!)
* From time to time, scan the memory space where the stack resides (usually upper RAM?). This will either be uninitialized RAM (random?), or you can clear it or fill it with patterns (0s or 0xff's or 0xcafebabe's or...) before program start (__initn or something like that). It will be overwritten with return pointers, assorted register contents, and local variables as functions utilize it.

Again, this is an observational method, so it doesn't prove anything (it's a testing method). But it may give direction to a problem you already know exists.

Downside, you can't scan it after the CPU's crashed, and monitoring from main() say will only get a check so often. Well, if you can attach debug and dump RAM, you don't need anything in the program, that'd be something.

Tim

eutectique · « **Reply #5 on:** December 02, 2022, 03:10:33 pm »

Quote from: paulca on December 02, 2022, 12:32:10 pm

First, I have no idea and no way to tell how much "load" is on the core at any point in time or generally. There is a possibility that something is (or everything is) doing too much in interrupts such that they are getting missed.

Hard to tell not knowing your software architecture, but you've got some sort of dispatcher loop, perhaps, which is triggered by sysclock, or a timer, or an event. Wiggle a pin when CPU is sleeping or woken up to do something, see it with oscilloscope.

I would also suggest Segger SystemView, it is trivially portable to a bare-metal project. But again, I have no idea about your setup, especially the debug probe.

paulca · « **Reply #6 on:** December 02, 2022, 03:37:35 pm »

Segmenting / compartmentalising is something I have been trying to stick to.

The "flow" of the application in no particular order:

UART Interrupt per character on UART1. Depending on what is happening this will block.
* In the command | response pattern it will spin on it's buffer waiting for the response it expects or timeout on systick.
* Normal run mode: it's appending them into a buffer character by character and pushing the buffer onto a queue once it sees a non-repeating newline.

UART Interrupt per character on UART2. Appending to a buffer until it sees... a non-repeating newline when it pushing the buffer onto a queue.

The buffer queue is non-blocking, if it's full it over writes the last entry so is lossy. UART1 and UART2 are using different buffer queues. Each is 8 buffers of 2K! Buffers are asynchronously with "get and release" both sides of the state machine implemented and the head/tail pointers protected with irqmasks. The buffers are fairly well tested but nothing is 100%.

An callback ping pong state machine controls an RGB LED. It reuses a timer between DMA PWM output for the LED colour data and the Complete Call back is used to reconfigure the timer for "time elapsed" count up. (Which is basically just turning off the DMA and updating the prescaler, reseting it). The call back from that can then cause another DWM write and they ping more The effect is a flashing LED (It's like artisan blinky!).. Which does not block your main code. I do have my suspicions this could be subject to deadlocking if an interrupt is missed. The blocking call "Update()" for it waits on the state machine clearing previous operations and it has previously locked up in there, I added a forced reset with a timeout of 1 second. ... it's a candidate for bugs.

The main loop is basically consuming the two UARTs line buffer by line buffer. It has to parse a small amount of JSON using a performant tokeniser then run a switch() statement on whether the message contains a certain field, tokenize out the value and update the display.

Which is the second point I need to reconsider. I think I need another queue in there. The render queue. As my experiments with the Window watchdog have just taught me. That main loop will spent a very long time rendering a message that contains all 9 of the values it's looking for. Each call to write some text does an interrupt based write to the TFT memory. The driver does not use DMA. So each of those is about 1ms. A single main loop could take 100ms.

The second UART buffers are command inputs and debug outputs.

So I'm going to poke, provoke and torture that LED statemachine such that it can't get into a messed up state.

Given the TFT is not using a DMA driver, it is possible I am maxing out the core for far too long and it may be doing way too much in it's interrupt contexts that I'm not aware of. I've been meaning to give the DMA bi-directional driver a go as it's a far better way to use the screen. You can use the screen itself as the frame buffer and even trigger on the refresh interrupt properly.

EDIT: The LED PingPong can be made more resilient by enabling the auto preload on it's delay timer. If it missed it, it will fire again and it's up to it's call back to turn it off again. But how can it miss it, it's a bit in a timer reigster and its the only thing using it.

paulca · « **Reply #7 on:** December 02, 2022, 03:44:35 pm »

... and not I won't be surprised if this comes down to something stupid, simple and utterly blonde.

I had an issue last week and I had to laugh at myself because I had done this....

char buffer[24];
// blah blah blah
HAL_DMA_Transit(..... buffer, 24);
return;

LOL Hmmm. That's going to work. At least seeing the stack come out the UART gave me a clue as to what was happening!

voltsandjolts · « **Reply #8 on:** December 02, 2022, 03:51:52 pm »

Sounds like UARTs are busy, so random suggestion....

If a UART reception overrun occurs, the UART will halt by default, no more chars received until overrun condition is cleared, by writing ORECF to ICR.
Maybe HAL deals with this, IDK.

You can ignore overruns by setting e.g.
USART2->CR3 |= USART_CR3_OVRDIS;

Or something like that on your particular mcu.

paulca · « **Reply #9 on:** December 02, 2022, 03:52:42 pm »

On componentisation... I think this is the first time I have been really trying to put any kind of structure into the layout of functions and there data.

It was going well. I started with each module allocating it own memory in c files, correctly exporting only "publc" functions and variables. Making c variables private and static etc. etc. I even passed in pointers for the timers and dma streams etc.

It didn't carry on quite that well. I found allocating the memory in the modules is not the way to go if you want to say, use that module twice!

So, at this stage the hygene level is about 80% and some pointers have been borrowed and there is a bit of "invasive code" in at least one callback because it's shared with another timer. Stuff like that, which needs a designed, interface around it, but have instead just "grown into place".

paulca · « **Reply #10 on:** December 02, 2022, 03:57:00 pm »

Quote from: voltsandjolts on December 02, 2022, 03:51:52 pm

Sounds like UARTs are busy, so random suggestion....

If a UART reception overrun occurs, the UART will halt by default, no more chars received until overrun condition is cleared, by writing ORECF to ICR.
Maybe HAL deals with this, IDK.

Ah... very good point. There ARE holes in my state machines. I am not catch ANY of the UART error conditions. Error conditions which are highly likely to occur on a breadboard power off unisolated USB power and a Wifi adapter on it.

All it takes is an overrun, or a RESET or a BREAK condition detected and HAL is very likely to call the Error callback and not the complete call back.

There are places where that would be immediately evident. And others where it wouldn't.

rstofer · « **Reply #11 on:** December 02, 2022, 04:58:49 pm »

Sounds to me like you should be using an RTOS and sometimes manufacturers include examples with FreeRTOS.

Your interrupt routines should never block. You get in, do some small thing and get out. No floating point arithmetic inside an interrupt handler, it takes too long. Get in, do some small thing and get out!

How you handle queue overflow is an issue. Why did it overflow? That can only happen if higher level code can't keep up and, if that's the case, you're using the wrong device or maybe the queues are too small. In a perfect world, none of the code would be blocking. Input queues, output queues, semaphores and mutexes were created for a reason. Code can certainly be pre-emptive but it shouldn't have spin loops.

In at least some of the ARM world, programmers are using the NVIC with much success. Read the documentation, particularly the part about priority and let it help with interrupt scheduling. Then figure out the priority for each of the handlers. Some interrupts can wait, some can not. Side note, the column interrupt from a card reader (old school) was often the highest priority because the card didn't stop once it started moving. Getting the priority right is a big task but critical. Remember, no blocking in the ISR.

There is often an idle task that runs when nothing else is ready. A lot of deugging can be done in this task without impacting any other tasks.

Using an RTOS allows the tasks to be coded separately with little to no interaction. Messages are passed through queues or semaphores. Each task performs its function from start to finish in spite of being interrupted.

https://www.freertos.org/tutorial/index.html

paulca · « **Reply #12 on:** December 02, 2022, 05:18:47 pm »

Thanks. I found and fixed one bug anyway. The UART error conditions.

A common technique if something is broken and you can't figure out why. Break it some more. Break it worse. You will often learn something.

So I pointed an unreasonable amount of message flow at the data UART(1) and caught it in the act at last! And sure enough, first thing I went to look at was the UART status. To find it in ORE + IDLE.

So I implemented the error callback. Currently I just restart the UARTS.... I should probably reset the buffers they were writing too or I will be missing a character or two. I set a volatile flag and flash the LED red for 50ms each time it goes 1 (in main loop). At a "unreasonable" 10-20 messages a second the LED flashes red quite a lot. Back at it's normal leasurely 1-2 m/s I expect it might flash once every few hours when a perfect storm of load hits it. I have caught the BIG PC that is sending it data being a bit unfair. It will happily go off and not send any message for a second or two, then transmit them all at once (it is a message bus after all). Luxury of large MB buffer space.

I'll run it for the evening and see if it has any other bugs. I do need to optimise the value update rate and improve the "only send on change" to be individual value granular as updating 8 tiles in one pass of main was too much for the WWDG at maximum timeout. It's main though, it should be able to take it's time.

I did also get to test the IWDG as I did lock the code up once and it did reliably reset the whole MCU after a few seconds.

paulca · « **Reply #13 on:** December 02, 2022, 05:23:24 pm »

Next thing I want to try is using a spare basic timer to time trace sections of code. That should be the easiest possible use of a timer. Maybe making that a macro somehow.

T3sl4co1l · « **Reply #14 on:** December 02, 2022, 05:30:52 pm »

Quote from: paulca on December 02, 2022, 03:52:42 pm

On componentisation... I think this is the first time I have been really trying to put any kind of structure into the layout of functions and there data.

It was going well. I started with each module allocating it own memory in c files, correctly exporting only "publc" functions and variables. Making c variables private and static etc. etc. I even passed in pointers for the timers and dma streams etc.

It didn't carry on quite that well. I found allocating the memory in the modules is not the way to go if you want to say, use that module twice!

So, at this stage the hygene level is about 80% and some pointers have been borrowed and there is a bit of "invasive code" in at least one callback because it's shared with another timer. Stuff like that, which needs a designed, interface around it, but have instead just "grown into place".

Ah yes...

There are levels, at least as it's traditionally done in C.

It's a bit easier, or more obvious, with OOP, but in C you have to do all that by hand without all the syntactic sugar to help you along.

Usually the next step up is to stuff all the locals into a struct. You can do this trivially already, what with your state variables being local to the module, that is. Then just allocate one static/local struct for the module and be done. (The compiler will probably resolve the struct accesses to direct pointers, no overhead computing offsets. Alternately, maybe those are faster, and it'll choose that way instead!)

Now with everything packed up, you can make a new struct, and call the same functions but on it instead, and boom, you've got a general purpose module that can repeat.

Basically OOP under the hood, is a collection of functions associated to a struct, and maybe the function pointers are members of that struct too (you can put them there if you like!), and together that's called a class. And the constructor/destructor are done somewhat automatically, at least up to having the place to put them (usually special named functions) but you still have to fill in all the init/free/etc. yourself (as in C++), but maybe a bit more is done automatically (Java etc.), whatever.

Which also makes inheritance a very natural thing to do, all the base elements are there, and just append to the struct for whatever junk you add onto it, and call the base con/de/structors first then handle your part of it and you're set.

Which also maybe explains why multiple inheritance isn't trivial and a lot(?) of languages (Java) don't do it, you can't just glue two disparate structures together and make their potentially completely different internal logic play nice together. Maybe they can be resolved with certain restrictions, or with symbolic representation (keeping source, or some representation of it, rather than a compiled binary, of the base classes), I don't know, but yeh.

But I digress, also, if you haven't used Java or whatever, this probably isn't any familiarity at all... but in that case, maybe knowing roughly how they do it, maybe is still something of a clue.

Anyway, for C purposes, who's responsible for allocating those structs, and where and when, is an open question. Often, the module using them declares/allocates them, then calls the constructor to init those structs, and so on and so forth. The pointers to these objects can be passed freely between modules and any can operate upon them (just don't loose track of who's doing what, of course!). Given headers for the module in question of course (so, you can use #include "header.h" as equivalent to Java's import my.Module.Class.Subclass or whatever).

And of course you don't get the syntactic sugar of object.doStuff(params)*, but that's just saying doStuff(object, params), basically any non-static method on an object has an implicit first parameter this which links to the object it was called from (via "." member operator).

*Unless you put the function pointers in the struct, but you still need to call object.doStuff(object, params).

You can even do, at least certain kinds of parameterization, by passing in callback functions (to the constructor). So you can do the basic data structures and algorithms gimmicks of, here's an array of X, here's a linked list of Y, here's... whatever. The constructor and internal logic handle the size of the objects (which would have to be dynamic (heap alloc) in this case) and the callbacks handle the actual operations on them (so, in C, you have to implement and pass around every piddly function that does basic things like add, subtract, concatenate, etc., on the data type). This is all more straightforward in C++ of course (including operator overloading so you can use e.g. "+" for string concatenation, and the compiler being able to optimize callbacks down to static calls or inlined functions even*).

*I think? I don't use C++ so I haven't seen personally, I'm just assuming it has this kind of visibility.

It really helps to see a couple projects that work this way; I've used few myself, but for example the freemodbus module I used recently has some callbacks in it, making it flexible for ASCII, RTU, TCP, etc. It's not fully object-oriented in the above sense, but it wouldn't be hard I think to put all its state objects into a struct and instantiate it that way. (Which would then have to include hardware interface, which they have as a file you have to fill out the implementation of. Having a e.g. serial port object would be handy anyway, good way to introduce buffering, general allocation of identical resources (you want how many ports with Modbus? You got it!..), and only modest overhead. Downside: most platforms you can't resolve interrupts by object.)

Also, as these things go, maybe some of the above context helps you with the HAL itself:

Quote from: paulca on December 02, 2022, 03:57:00 pm

Ah... very good point. There ARE holes in my state machines. I am not catch ANY of the UART error conditions. Error conditions which are highly likely to occur on a breadboard power off unisolated USB power and a Wifi adapter on it.

All it takes is an overrun, or a RESET or a BREAK condition detected and HAL is very likely to call the Error callback and not the complete call back.

There are places where that would be immediately evident. And others where it wouldn't.

The HAL is largely written in that sort of way, i.e. you set up a state object, insert callbacks, activate the instance, etc. etc., and there you go. You're constructing and instantiating a whole object!

Well, handling all the status and error conditions may feel more natural too when you consider them part of an object, too.

And you could even make, for example, a common error handler that just iterates over all the objects fed to it of that type, and does whatever it needs to. Maybe/probably the error handling will be different for each one so this wouldn't be useful, but just to say, it could be, if it were. You don't have to check each one with repeated code (eliminate code duplication)!

You may also find -- I mean, maybe not, since this is a fairly early point on the learning curve I guess, with respect to these things; but something to think about / look forward to in the future -- you just don't like the way the HAL objects allocate and interact with the hardware, and what API they expose for your program to work with. Well, you have all the source (HAL is nothing but headers), you can run the hardware yourself bare-bones if you like -- and maybe you work with that directly, or at the CMSIS level instead, and implement your own, preferred, perhaps higher level too (buffering and fault logic and etc.?), interface.

The downside of course, of cooking your own, is it's almost certainly going to be very narrowly scoped, tailored to your immediate need, and may not be very general with respect to other parts in the family, or other families; so, there is a lot of potential there for keeping things general, or for scope creep and overgeneralizing.

Also, maybe you're not using all the HAL tools that are provided; the API is massive, you might just be missing something. I don't recall offhand but maybe there's buffering and stuff already there? Could be usable directly, or adaptable.

---

Oh, on another note, be careful of when to decode and parse, versus perform IO. That mention of blocking (a thread? an interrupt..?!!) waiting for a char or timeout, seems devilishly nasty, at least from the least favorable interpretation of that description. I like to do those by keeping internal state in a function (static locals) or struct (object) and parsing what's available then leaving immediately.

A... I guess fairly simple example is here: https://github.com/T3sl4co1l/Reverb/blob/master/console.c It's a basic command prompt sort of thing, so it basically takes line input, persistently / no timeout delay, but it also has editing features (backspace/delete, cursor move, insert). The GetInputLine function is non-blocking and returns immediately in almost all cases (except when executing an actual command, in which case execution is passed to the function found in the list, if one is found). So you can just spin-loop on it and it takes little overhead in main().

Speaking of wait for timeout, Modbus does that -- might want to look at the freemodbus implementation to see how they handle that. It's (IIRC offhand) just a one-shot timer reset on last char rx or something like that, and when it times out, it clears a flag in the state machine. Simple as that. Modbus in general is quite simple and effective, and may be interesting just to study. (Or that's exactly what you're doing and trying to roll your own, heh, in which case I would recommend just going with this codebase, it seems fairly mature and I didn't find any problems with it.)

Tim

eutectique · « **Reply #15 on:** December 02, 2022, 05:34:52 pm »

Quote from: paulca on December 02, 2022, 05:23:24 pm

Next thing I want to try is using a spare basic timer to time trace sections of code. That should be the easiest possible use of a timer. Maybe making that a macro somehow.

Cycle counter is your friend.

Siwastaja · « **Reply #16 on:** December 02, 2022, 05:41:20 pm »

This is why I created this thread:
https://www.eevblog.com/forum/microcontrollers/ideas-for-code-instrumentation/

ALWAYS check all error flags in peripheral interrupts. In presence of error flags you don't know how to handle, just call some kind of error(15) which blinks the LED 15 times so it's obvious what happened; it still hangs, but at least you know why and where.

Bonus points for printing the peripheral status register contents on UART. And of course, write the report every 10 seconds or whatever, so that you don't need to keep the UART cable connected all the time.

paulca · « **Reply #17 on:** December 02, 2022, 06:05:17 pm »

Should have mentioned I meant structuring C code specifically. Granted I haven't worked in a large C++ project in 10 years either! Even then working in established code bases you hardly ever see the day to day nitty gritty bits underneath that run it.

Lots to read! Thanks.

The idea of the struct with function pointers is ... yea pretty much a poor mans class. I mean outside of "proper" OOP language like Java most just explicitly pass the instance anyway. Perl, PHP, Python (all the Ps) do this. (pun!)

I keep thinking in Enterprise Java terms, like if I have a "big main buffer" I want to push it up and treat it differently. It's just a pointer to a struct, it's a single 32bit int, you don't need to be precious of it.... just the memory it allocates, which... doesn't really exist in your code anyway and is up to the linker. So it matters not where you put it.

Actually that is pretty much how I fixed the buffers module. I passed the buffers struct to all the functions. Then I allocated two structs.

paulca · « **Reply #18 on:** December 02, 2022, 06:11:33 pm »

What are your thoughts on moving all allocating defintions to a single file and just pulling in extern pointers where you need them in h files?

Similarly moving all IT handlers and HAL callback implementations to a single file?

T3sl4co1l · « **Reply #19 on:** December 02, 2022, 06:55:21 pm »

Quote from: paulca on December 02, 2022, 06:11:33 pm

What are your thoughts on moving all allocating defintions to a single file and just pulling in extern pointers where you need them in h files?

Similarly moving all IT handlers and HAL callback implementations to a single file?

ALL allocations, would probably be a horrible mess. You want locality of allocation and use. I can see there being a compromise; though I don't know a good example offhand.

I mostly use separate files for big constants, externally generated data, etc. Or stuff that isn't necessarily needed in main.c or whatever directly. Like, maybe consts in EEPROM; presets, menu data, graphics, long strings, etc. in Flash. EEPROM often can be allocated in its own file and thus generates an .elf section that can be programmed by itself, and that doesn't need reference from main.c to be done. (But, if you're duplicating the initial values across Flash, EEPROM and RAM variables, maybe set them all from a macro and not split them up across files.)

I suppose you don't need to use locals, or statics in functions; you could rip out all locals and cram them into structs (and let the compiler figure out if they're local-local or reused and thus not collapsible in that way). Then allocate everything in main.c. But any time you need to make modest to large changes to a module, you probably have to touch at least those two files (plus header(s)). The main advantage to locals, I guess, is not having to pass around the goddamned context pointer all the time -- cleaner code -- and some modules will simply never be reused so are safe to leave as hard-coded singletons.

But you could slice it a different way, hierarchically. Maybe you put all the interrupts and HAL stuff into one file, or set of files; these then define a higher level abstraction or interface, between the remainder of your codebase (which now interacts strictly through those function calls), and the hardware of your platform. This might be useful for portability: the internal program state will behave identically (well, assuming you've not used UB and implementation-dependent code..), and only the lower levels need to change between platforms.

Downside: platforms might not support all the features your program expects, so some nasty hacks might have to adapt them (say you need a periodic interrupt on a platform with only one-shot timers; I... don't think such an example exists among MCUs, but I mean just for simple illustration).

At some point you start writing a whole-ass emulation layer for what interfaces/devices your program expected, and it starts getting easier to rewrite the program instead, or go back and forth between a few seemingly-worst-case (most different) platforms and figuring out a compromise or suitable generalization among them. But I mean, that's pretty high level shit right there, like Microsoft's bread and butter kind of stuff -- at various points in their history I think, like, some of their earliest work was porting BASIC and etc. between multiple, very different, systems -- the same code (I think they had an abstract / pseudo-assembler / VM they wrote for??) might run, via code translation, on anything from 6502 to Z80 to 8086, interfacing with whatever BIOS/DOS, and direct hardware (if no software interface provided), probably with a lot of platform-dependent hacks (i.e. having to implement device access in native asm for certain targets, basically what you'd use an e.g. #ifdef __ARM_ARCH_7__ for today). (I don't know what all they did -- I think not many people know, anyway, beyond those that actually worked in it; it was all very proprietary of theirs? So needless to say I'm just speculating/extrapolating here. What I've heard of it sounds neat, something like this, anyways.)

So it's kind of fascinating that yeah, that's something that's not only been done, but big business as far as software development goes, or went. (I mean, it certainly still goes -- MS has all manner of runtime code tools, I don't know what all even goes into running a .NET app, or anything else, but it's my understanding some serious translation can happen at load time, or even runtime (JIT), either just running things on W10+ kernel, or specifically things written in these bytecode languages. I'm not clear on how much of that applies to what things, mind. Also, there are other possible explanations for why W10 is so god damned slow starting up seemingly basic apps like Calc, and I don't know what the hell all it's actually doing.)

Needless to say... you don't want to go down such a deep and wide rabbit hole... but, just to say, it can be done, given enough time and experience. And that it's a huge sliding scale: if you want to just toss all the hardware interaction stuff to one side, and leave the rest in the abstract, yeah, it's not a big deal, it can be done that way -- maybe it's easier, maybe it's harder, maybe it's just different, but it is a reasonable way to slice things, at least for certain values of "reasonable". And maybe it has advantages like portability, but nothing is truly portable, so, how easy that goes, depends. But that kind of pattern can help with it, if still very susceptible to the particulars.

Tim

paulca · « **Reply #20 on:** December 02, 2022, 07:18:43 pm »

I suppose it comes back to ... if it works, it's correct.

Other sound bites like, when you have engineered it enough that it functions to requirements. STOP.

There is a vast rabbit hole and where you end up is in a HAL library or a framework. Ultimately you need to limit functionality to make it useful. A lesson that STM need to learn.

In the hobbiest world, at least, I have the flexibilty to make those "frameworks" and libraries work for only my projects.

Nobody has to get out of bed for them.

paulca · « **Reply #21 on:** December 02, 2022, 07:23:43 pm »

On the positive side of things the ESP32 with the Espressif AT 2.4 firmware has basically simplified the whole Wifi and MQTT thing to a set of line buffers.

If I pull the power out of it. Disconnect the UART. Then power/connect it again. It will pick up where it left off, it will recover to it's previous state including MQTT subscriptions.

To be honest I shortened my boot time from 10 seconds to 1 second by just not restarting and reconfiguring the module. I figure if I go even further and accept the currently stored parameters (like wifi ssid/password), and listen for "WIFI CONNECTED" the boot time would come down to faster than my boot time.

It is a bit of a shame to waste an entire core though. I should really be using a single core ESP32 for this.

Siwastaja · « **Reply #22 on:** December 02, 2022, 07:31:30 pm »

Quote from: paulca on December 02, 2022, 06:11:33 pm

What are your thoughts on moving all allocating defintions to a single file and just pulling in extern pointers where you need them in h files?

Similarly moving all IT handlers and HAL callback implementations to a single file?

No and no!

One module = one .h file and one .c file. One module does one logical thing, with clear and simple interfaces. Interfaces are functions. (In some rare cases, extern variables defined in .h and declared in .c.)

Whoever wants to use uart_send(), includes uart.h, which contains shared datatypes and function prototypes.

And uart.c contains uart_inthandler(), and uart_inthandler() only uses internal (static) variables only defined in uart.c (and not uart.h).

Pretty normal stuff. Abstraction and encapsulation.

paulca · « **Reply #23 on:** December 02, 2022, 07:34:10 pm »

Yes. But you have 6 UARTS and 3 modules using them. Who owns the memory?

Perkele · « **Reply #24 on:** December 02, 2022, 07:55:44 pm »

Test setup that I'm using when chasing these kinds of bugs:
- on STM32: data pushed over ITM, timestamped on receiving side, dissected and stored into a CSV or a similar format.
- If latency issues are preventing use of ITM (fast ISRs), then toggling of GPIOs (if any available) to signal some states or their changes. GPIOs are connected to a low-cost logic analyzer, monitored by sigrok (PulseView).
- If there are some weird voltage drops, or other factors triggering the bug, then using whatever measurement instrument is appropriate. If SCPI is not available on that instrument, then sigrok with SmuView. End result is again a CSV with timestamped measurements.
- If there's any kind of console or a similar output from that device, it also gets timestamped and stored into a separate file.

In ideal world I would sit down and finally write an interface between sigrok and our in-house test system. There are situations when we need to do long-term monitoring for problems that appear randomly and take several days to reproduce. To avoid large amounts of data being generated, we would only need to keep a snapshot of last couple of hours or days worth of data. When the test system detects an error, it would take this snapshot and discard all the other data.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Annoying long runtime bug (Read 2609 times)

Share me