Electronics > FPGA
FPGA failure modes
AussieBruce:
I have an application that requires backup to deal with device failure, it uses an ARM micro (LPC1768) and a sidekick small PIC acting as a watchdog, there is an SPI data loop between the two and if an irregularity is detected whatever CPU is in action initiates a safe state. There is an incentive to replace the larger CPU with an FPGA for various reasons, but I’m wondering whether the failure modes of that class of device might prejudice integrity. When a computer fails, it generally either stops completely, or goes into some sort of chaotic loop, either event will be handled correctly by my present system. However, with a gate array, where functions are performed by different areas of the device, somewhat autonomously, I wonder whether there is a risk that a fault may cause one area of function to fail, but others to continue operating correctly. Is that the case, or do some gate arrays somehow check for irregularities across the entire die?
Incidentally, this is not a ‘certified’ application, ie. it doesn’t have to comply with official standards. Just has to be ‘significantly better’.
james_s:
I have never experienced a failure of any FPGA, I don't think they are any less reliable than most other parts, certainly isn't something I'd worry about so long as it's operated within its specs. If reliability is critical you might want two separate entire units.
Berni:
Chips don't tend to fail in just local areas of the die.
Some old chips are famous for giving up the ghost of old age randomly, they come from a much more primitive time of semiconductor manufacturing. But this has much improved since then so in general for a chip to stop working you have to expose it to enough torture externally. Like too much voltage or current going trough a pin, running it really hot, physical damage etc.. And usually when damage happens on the silicon die there is a high chance the damage ends up finding a power trace, causing the whole supply rail to go short, this is why dead chips usually get really hot while going completely dead.
In terms of resilience FPGAs do tend to be better since a single bit flip caused by a cosmic ray will usually not crash a FPGA in any big way. One could even build logic blocks in triplicate then use voting to cut away any that misbehave.
Failure is actually most likely to come from memory. Programmable logic with a burned in ROM is a thing of the past, today most large programmable logic boots from just regular flash memory. This flash memory gets read in byte by byte to configure all the junctions between logic blocks and the blocks themselves. This itself is stored in basically local SRAM cells, so should be just as susceptible to a cosmic ray bit flip, but does come back after a power cycle. More of a issue is flash bit rot since more modern flash crams more bits in there to a point where flash bits might rot away in just a few decades. Once that happens the FPGA might refuse to load that flash image and just sit there non initialized doing nothing. However since flash is only used on boot means that this won't crash a FPGA if a bit rots away while the FPGA is running (unlike a typical MCU that actively runs from flash all the time)
I did something similar before and ended up using a Altera MAXV CPLD for the job. However it's more like a tiny FPGA than a CPLD. Point is to use a simple chip that doesn't have much things to fail like a gazilion supply voltages, fancy sensitive PLLs, external memory etc...
SiliconWizard:
Well, there are a couple things specific to FPGAs compared to CPUs. Most FPGAs are RAM-based, meaning their whole configuration is essentially RAM-based. So whether RAM content can be corrupted more easily frome external events than general logic is a complex matter. A CPU-based system can run entirely from Flash or ROM, but will still need RAM at some point to store data/state. The consequences of corruption can be different, but not necessarily any less severe.
As said above, one thing you can do with FPGAs is implement redundant structures - something that most general-purpose CPUs do not have. So, from that POV, that would give FPGAs the edge. As long as you take those extra steps.
x86guru:
FPGA's that that use volatile SRAM based LUTs are susceptible to alpha and neutron radiation that can cause LUT and general configuration corruption. And since FPGA's configuration static RAM/LUTs do not have ECC, the corruption can't be corrected.
Navigation
[0] Message Index
[#] Next page
Go to full version