I have an extremely frustrating bug in a Cortex M7 project I'm currently working on--every once in a while the system gets hung up somewhere, and becomes unresponsive. I have a watchdog implemented, so the system resets cleanly at least, but this is still very undesirable. Unfortunately, this may only happen once in a few hours, and so far has not correlated with any other action or activity, which makes it rather hard to debug. I know (or, am pretty sure) that it's not encountering a fault, because I have a fault handler implemented, and that's not triggering. I could add some instrumentation to various parts of the code, and will probably start that next, but there's enough real-time stuff happening that extensive logging is difficult and, due to the infrequency of occurrence, any sort of brute force approach will take days to carry out.
Anyway, that's not the real point of the thread, because I have a number of other strategies in mind to try, but one thing in particular has been a problem:
I've been trying to wait for the system to hang up, with the watchdog disabled, and then debug via SWD to see what the state of things is, and hopefully find *where* it's hanging up. The problem is that I cannot seem to start a debug session without the application being reset. I'm using a J-Link Plus and Atollic, and I've cut down the standard debug startup script so that it should just connect to the target and halt, and this works fine normally. I can start the application, then at any point connect and stop it in its tracks. But whenever I DO catch the system in a hang, using the same debug script seems to reset the system to the beginning of the application. What gives?
Any ideas what might be causing the apparent application reset? Does that give any hint as to what the underlying fault may be? Is there another setup or tool I should be using here that may allow me to see what's going on without disturbing the hang state? Even catching a glimpse of the PC and other system registers would be nice!
(Actually as I write this, it occurs to me that the debug script I've been using is not aware of the bootloader, so perhaps the application is somehow getting thrown to somewhere before the application, and this confuses the debugger, since it's not aware of any code below the application start address? Oh well, will have to wait a few hours for it to fail again to test that, so if anyone has any suggestions in the meantime...)