Electronics > Microcontrollers

Failure mode/probability of micro-controller sleep/standby/etc modes?

(1/3) > >>

Perhaps this is too general of a question (e.g. depends on the particular micro-controller) but I am curious about the failure modes and the probability of those failures of micro-controllers to not come out of a sleep state.  For example some flip-flop gets flipped and the device will no longer detect the wake-up condition.  Anyone have any experience with this?

More detailed backstory: I'm doing a Failure Analysis (FA) for a consumer product that I am helping design (electronics + firmware).  The device is battery powered and probably will sit around for quite a bit of time so minimizing quiescent current consumption is important.  Of course cost is also important.  I am using a STM32F030 value line micro and its lowest-power, most disabled standby mode to cut power (just it and an ultra-low quiescent current LDO).  Two wakeup pins are used to detect a button press or the start of battery charging.  It all works just fine - in my lab - but I want to know how possible it is that it won't wake up.  This would be a catastrophic event and to the user the device would simply appear to be broken.

It seems to me that this kind of failure mode must be pretty low since I'm sure there are a huge number of devices out there that use this strategy.  The thing that makes me wonder about this is that I have remote controls that sometimes seem to completely stop working until their AA or AAA batteries are removed and re-inserted.  That seems like it could be a hung micro. 

If this is a real risk then I could do things like add an external watchdog and wake up the micro every so often to tickle it, lest it issue a reset.  That's an additional cost.  Or I could do something like use the low-going status signal from the charger IC to reset the micro through a capacitor so users could recover a device by plugging it into a charger.  But this has some non-ideal side-effects.

Are you actually doing FA, as in there is already a failure? Or you are just trying to predict what may go wrong? Trying to predict failure modes like this is a waste of time. You will never guess what may go wrong when things go wrong. You may be running into some unique silicon bugs that only apply to your specific use case. There is no way to predict that.

If it is really critical, provision as many watchdogs and recovery methods as acceptable by the design and the price and don't worry about it. When the failures happen, you will have to investigate and address them as they show up.

And as you assume more and more things that may fail in theory, your whole system runs a risk of turning into a mess of watchdogs, which in turn may be the source of the issues itself.

What is the worst case scenario if your system fails in some way?
Is there a risk of injury, death or just some financial loss?

I am contributing to a predictive FA being spearheaded by the client.  The failure mode of the micro not coming out of standby mode is that the device appears dead to the customer and the cost to the client could be a warranty expense and reputation hit.  It's not life threatening or risky in that way at all (it's just a consumer gadget that would look like it died).   I understand the risk is low and the cost is probably low too.  However it's an item on the FA list so I'm just seeing if people have experience they could share.  I did some online research without finding much helpful info so I turned to the brain trust here to see what experience you all have.

I will probably do what Alex suggested and use either the IWDG or RTC wake-up facility to try to wake up the device periodically while it is asleep so it can reset the various register states (on the assumption that would eliminate some possible failure modes) and then return to standby.  This can be done infrequently enough so as not to impact the sleep power budget significantly, costs me nothing but a little time, and it's simple enough that it's probably not a big risk of introducing more bugs.

If you can run the device from an internal RC oscillator, this would eliminate the possibility of an external crystal failure (or the oscillator not starting due to being outside temperature range). Having the device wake-up regulary using the RTC timer is also helpfull so it can re-init the wake-up conditions.

It is also a good idea to test your device at temperature extremes in a climate chamber and subject the device to external disruptions like ESD and radiated immunity. Personally I like to know what a device can take before it starts to misbehave. The limits for consumer goods are quite low where it comes to ESD and radiated immunity. In the real world a device can be subjected to much more mayhem. The radiated immunity level for consumer devices to pass EMC testing is typically 3V/m but I like my designs to keep working at 30V/m. It helps to reduce complaints & returns from consumers.


[0] Message Index

[#] Next page

There was an error while thanking
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod