Author Topic: How do you do failure analysis when hardware gets destroyed? (Read 849 times)

e100 · « **on:** May 30, 2019, 11:42:14 am »

Take for example, a rocket launching a satellite that goes off in the wrong direction and has to be destroyed before it falls back to earth.
Are there systems onboard monitoring the behaviour of the code (like a debugger) and sending the data to the ground station in real-time, or do you just send the raw sensor telemetry to the ground station and later play that back into a duplicate system in lab and look at how it misbehaves?

David Hess · « **Reply #1 on:** May 30, 2019, 02:39:15 pm »

There is whatever telemetry they think they need. When I do this sort of thing even with hardware which will survive, I may include execution traces and state information in my "telemetry".

ejeffrey · « **Reply #2 on:** May 30, 2019, 04:54:03 pm »

On the software side it isn't that much different than hardware that isn't destroyed. When you have faults in production that didn't show up in testing you need to get whatever log data out you can leading up to the crash. Once you application crashes it doesn't necessarily leave any other useful traces. Hopefully you left enough information to figure out how to reproduce the fault on a development system or simulator. Of course with a rocket you get a lot fewer "tries" than most other system.

To me the part that sounds hard is the hardware side. If some mechanical part of a car engine fails you can often examine both the failure and the secondary damage it caused. But in a rocket you may only get whatever data sensors were able to log and try to infer what hardware fault caused that condition.

T3sl4co1l · « **Reply #3 on:** May 30, 2019, 07:40:51 pm »

Destruction is hardly total. For example, say the engine turbine explodes, but you didn't know that -- as you recover fragments of rocket, you'll find a ton of casing and charred pipes and supports and such, and not many engine parts, but you will find them, and you'll find more and more as collection rate gets closer to 100%. It's not going to be a nuclear detonation where the whole thing is vaporized (and even then, things can survive nuclear blasts*), you're going to find whole chunks of it.

*A popular story-but-that's-all, https://io9.gizmodo.com/no-a-nuclear-explosion-did-not-launch-a-manhole-cover-1715340946 but the design of the 1960s Orion project might be considered a legacy of such thinking, and was expected to work just fine.

And in turn, when you find chunks of the engine housing, you're likely to see puncture and tear failures, suggestive of something impacting them at high velocity, and radially at that -- like an exploding rotor. And if you find chunks of turbine, you're likely to see stress cracks indicative of imminent failure, in radial directions, and mashed bits of housing smeared across them, and so on. You may even see different amounts of charring, indicative of where those bits flew through the fireball, or how the fireball itself developed as the turbine came apart.

Collecting telemetry is just an easier way to monitor these things before failure -- if you have enough telemetry that you can complete the investigation with strong confidence, you can save the cost to comb a hundred km ellipse of countryside looking for bits.

But you may still need to do that anyway, to improve confidence in that investigation, or to remediate the toxic mess you've dropped (rocket fuels are often nasty substances, and some exotic materials are arguably even worse*). Or because you don't want competitors investigating your trade secrets in the same way.**

*I don't know if anyone's actually using beryllium in commercial rocket designs? I don't know that it's all the competitive against modern alloys and composites, or that the costs and hazards are justifiable.

**Historical note: in WWII, the Germans successfully reconstructed the proximity fuze, developed by Britain and USA. This was a bit of particularly robust vacuum tube electronics, placed in the tip of an explosive artillery shell -- you might not think anything would be left of it, but they were indeed concerned about this security risk, and rightfully so. Initially, proximity fuzes were limited to the Pacific, where the risk of reverse-engineering was small (anti-aircraft shells used over the sea). Later, they approved them for use in Europe (over land); by then it was too late for the Germans to do anything with them.

Tim

Sal Ammoniac · « **Reply #4 on:** May 30, 2019, 08:21:43 pm »

Reconstructing the debris from crashed airliners is routine (in the U.S. at least). Great pains are taken to recover as much of the wreckage as possible and to attempt to put it back together.

Example: the TWA 800 crash in the 1990s:


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: How do you do failure analysis when hardware gets destroyed? (Read 849 times)

e100

How do you do failure analysis when hardware gets destroyed?

David Hess

Re: How do you do failure analysis when hardware gets destroyed?

ejeffrey

Re: How do you do failure analysis when hardware gets destroyed?

T3sl4co1l

Re: How do you do failure analysis when hardware gets destroyed?

Sal Ammoniac

Re: How do you do failure analysis when hardware gets destroyed?

Share me