| Electronics > Projects, Designs, and Technical Stuff |
| Is "integrated circuit burn-in" a thing? |
| (1/10) > >> |
| IDEngineer:
I've got a weird situation going on here. Several of my ongoing projects involve MEMS sensors. I'm qualifying a new (to me) one from ST, the LSM6DS3, which is very popular and very deeply stocked at multiple large distributors so they move a lot of volume. In other words, it's not some obscure part that nobody's heard of. When I powered up the first two proto PCB's with this part, the LSM kept exhibiting these failure modes. It would take anywhere from 30 minutes to 2 hours, but within 2 hours you could pretty much guarantee it would fail. By "fail" I mean the part would change operational modes with no external stimulus. For example, if the part was configured (via SPI) for an update rate of 400Hz, after a while it would switch to (say) 12.5Hz. You could see this on a scope. The even weirder aspect was that reading the configuration registers back out showed the PROPER VALUES for 400Hz (or any other value, I tried several) were still in the registers. And yes, I tried rewriting those proper values back again on the chance that this would cause the chip to re-evaluate them and go back to proper operation - but no luck, the chip stayed in its self-configured mode. The only way to restore proper operation was to invoke its "soft reset" SPI command or power cycle the device. There was another error condition where the SPI interface itself became unresponsive and the ONLY way to regain control of the device was to cycle power. Ugh. I wrote a bunch of failsafe routines to detect and gracefully handle everything but the SPI failure, and over a few days got it to the point where no more failures occurred (at least in the sense that if they did occur, my firmware caught them and dynamically restored proper operation). Then, separately, I discovered that the SPI problem appears to be that the internal digital low pass filters require too much processing power. This led me to suspect that the internal filters are implemented in firmware, not hardware, and when they hit some edge case they take out the internal CPU. By disabling all possible internal filtering and going back to my own filters in my own firmware, the SPI errors also disappeared. At this point two prototypes ran for a couple of weeks with no errors at all. (I'd like to add that every MEMS device I've worked with has weirdnesses like this. They're finicky, and their poor documentation doesn't help. I don't know why they all seem to be like this, but they're simply a PITA to get running. I feel like I could write an aftermarket cheat sheet on the ones I've sleuthed through. However, they're darned useful so I just budget extra time whenever I start working with a new one, knowing I'll be basically writing my own appnote to get them going.) Yesterday, I fired up a fresh proto PCB and left it running overnight. By early this morning, it had entered one of the failure modes! Since my firmware can handle anything but the SPI lockup, all evidence points to the LSM6DS3's internal firmware having locked up again. I didn't have the unit instrumented (I thought we were past this) so I couldn't dig any deeper beyond knowing that my recovery code couldn't handle it, and it handles everything I know about except the SPI failure. Cycling power instantly fixed it, exactly as with the first two units, which further supports the SPI failure theory. Since then it's been running flawlessly for about 8 hours so far today. Thinking about differences between this latest board and the earlier ones, the only one is clock time. They're all running the same firmware, powered by the same supply, flashed by the same programmer, etc. Those first two boards got dozens of hours on them while I was working through the error handling, and their error rate dropped as I discovered more failure modes and wrote firmware to handle them. But maybe those parts were just "burning in" and my code had little to do with it. This latest one had never been out of its antistatic bag before I flashed it and let it run overnight, and within hours it failed such that commanding a reset over SPI would not work (the classic symptom of SPI failure on this part). I've never considered "burn-in" of integrated circuits to be a "thing" before, but this sure feels like it. And these are MEMS devices, which brings in a mechanical aspect that most IC's don't have. Is there anything in the industry about IC burn-in? Perhaps specific to MEMS devices in particular? |
| AndyC_772:
ST's MEMS accelerometers can be very fragile and easily damaged by soldering heat. Were the boards soldered by hand? Have you tried buying a preassembled evaluation board and wiring across to it instead of fitting the bare device to your PCB? |
| IDEngineer:
Thanks for replying! --- Quote from: AndyC_772 on April 23, 2019, 09:45:40 pm ---ST's MEMS accelerometers can be very fragile and easily damaged by soldering heat. --- End quote --- How would that situation self-correct with time, as appears to be happening here? --- Quote ---Were the boards soldered by hand? --- End quote --- No, they were assembled by a contract manufacturing house that we use all the time. Very high end shop, leading edge pick-and-place and reflow ovens, etc. They do all of our protos and most of our production. I'm confident they followed ST's guidelines from the spec sheet. But again, even if they didn't, anything "damaged by soldering heat" doesn't usually spontaneously repair itself. These parts seem to stop self-reconfiguring after a while, and the two protos that were originally having problems now work perfectly. |
| Whales:
I'd try and contact ST. Even if the parts are very popular, there could be bad batches or other shenanigans. |
| IDEngineer:
--- Quote from: Whales on April 23, 2019, 10:16:59 pm ---I'd try and contact ST. Even if the parts are very popular, there could be bad batches or other shenanigans. --- End quote --- I did that, first thing (like two weeks ago). ST has stopped having formal trouble tickets and now directs you to their online forums. I have two lengthy threads there, complete with scope screenshots where applicable, but the responses I've gotten have nothing to do with my questions! Example: Here's a paragraph excerpted from one of my messages on the ST forum: --- Quote ---...The bug appears to be related to LPF2, the digital filter block after the A/D. If you look at Figure 5, page 32 of the LSM6DS3's spec sheet, the "Digital LP Filter" and the "Composite Filter" are configurable filters that post-process the output of the A/D converter. Extensive testing suggests that if those digital filters are completely disabled, the LSM6DS3 becomes reliable. But enabling those filters - with any of a variety of configuration options - risks the LSM6DS3 silently switching its ODR (best case) or simply locking up (worst case).... --- End quote --- ...and here's the response: --- Quote ---HI IDEengineer, thank you for your effort and the related detailed description of the activity. Sorry for our delay in the answers, I would double check the LPF2 filter behavior, but consider that all the filters introduces some delays and you should have to wait some ms to get a stable value each reading. Regards --- End quote --- I talk about disabling filters to improve reliability, and they answer that digital filters have delays. Non-sequitor! :wtf: :rant: Here's what I wrote back, in part: --- Quote ---...I understand your point, that filters introduce delay, but that is not what is happening. There is a bug in the LSM6DS3, and our research suggests it is related to the workload caused by enabling LPF2 and its related filter chain. Since we disabled LPF2 and its filter chain, we have not been able to reproduce the bug. In our opinion, the LSM6DS3 should be treated as a bare IMU as if it did not contain LPF2 nor anything past it in the signal chain. LPF2 and all further filters should be forcefully disabled at startup if reliable operation is desired. --- End quote --- ...and there's been no response since. So it appears ST themselves are not going to be a source of useful assistance. |
| Navigation |
| Message Index |
| Next page |