A piece of equipment we make has a 68HC908JK3ECPE micro in it and it runs 24/7. Customer rings up and says device is acting up. Field service tech goes out and has a look and finds, sure enough, something is wrong. Still functioning but the A/D ports were not reading some pots properly, among other things. Push a reset switch that pulls the micro reset line low and still no improvement. Replaces board and all is good. I take a look at it when he brings it back and it fires up and goes just fine. 18 hours later it is still all okay.
What I want to know is this: is there a difference on that particular micro between resetting by pulling the reset line low, and fully powering off then on again? Maybe in theory there shouldn't be a difference, but could there be something in the micro that the reset line has no effect on but only powering down will clear?
Even if the silicon doesn't have any reset related design flaws, just about any LSI CMOS chip can suffer from ESD or EMI transient induced latchup, or a single event upset due to cosmic or background radiation, and firmware can suffer from uninitialised variables.
Frequently, a RESET does not modify RAM memory contents, while a power-cycle does (nominally leaving RAM with random values, but frequently with conveniently beguiling zeros.)
Your overall problem description sounds suspiciously like an "uninitialized variable has a particularly obnoxious value" sort of error... Does your code use malloc()? Are you sure that your malloc() clears the returned chunk of memory like it's supposed to (and many people expect. But it's a frequent optimization of "baby" mallocs on deep embedded systems.)
Does your code use malloc()? Are you sure that your malloc() clears the returned chunk of memory like it's supposed to (and many people expect. But it's a frequent optimization of "baby" mallocs on deep embedded systems.)
Malloc() typically doesn't zero allocated memory, while calloc() does.
Does your code use malloc()?
No, it's written in assembly.