Author Topic: Securing a critical logic level against failure (Read 2009 times)

BreakingOhmsLaw · « **on:** May 26, 2020, 06:40:32 am »

Hi everybody. What is your preferred way to protect an MCU output that controls something critical (e.g.unlock a door or trigger Armageddon) against unintended activation? For example by undefined state during startup, brownout or device failure?
My method right now requires the pin to have a specific output frequency, that signal is fed through a bandpass to get the fundamental and the result is rectified and fed into a capacitor that has a bleed resistor. That voltage is passed into a Schmitt-Trigger and finally to a Mosfet or BJT.
It works, but it is awfully complex and has drawbacks like a noticeable delay, BOM cost and also possible failures in the opamps involved.
Is there a simple gold-standard way to do this more efficient?

MosherIV · « **Reply #1 on:** May 26, 2020, 06:43:58 am »

Pull up/down.

Ian.M · « **Reply #2 on:** May 26, 2020, 07:33:37 am »

+ a watchdog circuit (or internal module) and an external supply supervisor or internal BOR module to reset the MCU if its crashed or the supply to it dips excessively.

If the state of the MCU I/O pins cant be trusted at low voltages, use two I/O pins on the same port which must be at different levels to activate, pullups/pulldowns to hold them inactive when tristate, and wide voltage range logic to combine them. See NC7SZ57/..58 universal two input logic gates

Whales · « **Reply #3 on:** May 26, 2020, 07:35:55 am »

So a discrete hardware-level watchdog for keeping an eye on the MCU? Hmm.

A single pull up/down isn't enough, a crashed or shorted MCU could sustain these.

Ian.M · « **Reply #4 on:** May 26, 2020, 08:00:00 am »

It depends on what level of paranoia is justifiable. If failure can result in an unintended ICBM launch, it wouldn't be over-kill to have six or more MCUs of different architectures, in independently powered triads running independently developed firmware + 'tell me three times' voting logic and separate actuator circuits, so that all three MCUs must agree to activate, and if not it falls back to the other three MCUs, and if both triads internally disagree, fails safe.

For less critical applications, the watchdog deals with the stuck pin case. If its internal, it cant cope with a shorted pin, but external watchdogs are available that require a regular pulse from the MCU to hold off, and their output can be used to drive lock-out logic as well as reset or halt the MCU. Whether or not you also need a monostable to require a minimum signal duration (greater than the watchdog period) for activation is application specific, but it can be a good place to apply the watchdog lockout signal.

RoGeorge · « **Reply #5 on:** May 26, 2020, 08:53:56 am »

- against high-Z pins during reset, put physical pull-up or pull down resistors outside of the MCU.
- against program hang, use the watchdog timer reset
- against unstable power, use the brown-out reset
- against cosmic rays, use a rad-hard MCU
- against bad switches, transit states of contacts or cut wires, never asume the state is either close or open. Use two bits (two inputs) for each switch, and evaluate the state (A and not A). This can also detect when a switch is in transit state or in an undefined state.
- use a heartbeat mechanism and a hot standby redundancy, or full redundancy of many units plus voting to evaluate each decision
- use anti-tamper sensors against physical attacks
- use proper encryption when sending/receiving signals
- always look for single points of failure, sometimes they can appear unintended during the lifetime of a product, even if those single points of failure were not present in the design phase
- log activities and events
- limit the personnel access and secure the physical location
- ...

Kleinstein · « **Reply #6 on:** May 26, 2020, 09:18:01 am »

I would consider using 2 pins with opposite logic. One can use a simple NPN to require a low on the emitter and high at the base to sink collector current. When operating something like a relay / coil one could consider switching both sides.

Ian.M · « **Reply #7 on:** May 26, 2020, 09:32:41 am »

A minimal charge pump to drive a MOSFET gate is only four components including the bleed resistor to guarantee the MOSFET's off if the I/O pin isn't toggling fast enough. However its NOT a suitable approach if you need fast turn-on/turn-off.

You need two caps, one resistor and a BAT54S or similar dual series Schottky diode, arranged in the classic charge pump circuit, pumping up from ground, with the resistor and MOSFET gate as its load. Choose the caps so a few pin flips doesn't pump enough charge to start turning the MOSFET on, but sustained toggling at your main (or other convenient) loop execution rate turns it hard on. Add a long enough software delay on startup so repeated resetting cant toggle the pin fast enough to activate it. Apart from the MOSFET, only the caps and resistor are safety critical - the coupling cap mustn't fail short, and the storage cap and resistor mustn't fail open.

BreakingOhmsLaw · « **Reply #8 on:** May 26, 2020, 09:45:37 am »

Quote from: Ian.M on May 26, 2020, 09:32:41 am

A minimal charge pump to drive a MOSFET gate is only four components[...]

That's actually quite clever, thanks Ian. I'm going to try this one out and add it to my bag of tricks.
Should come in handy when pin count is an issue so XORing two outputs is not an option.

Jeroen3 · « **Reply #9 on:** May 26, 2020, 11:32:06 am »

We used to have some gate driver on a board that had two inverted signals. IN+ and IN-.
IN+ required to be driven high, and IN- to be pulled low, in order for it to work. If you add some logic chips you can wire a watchdog or comparator to it.
The onsemi fan3100.

David Hess · « **Reply #10 on:** May 27, 2020, 10:23:39 pm »

For that sort of thing I AC coupled it, or AC couple *two* outputs which also have to be anti-phase. A pulse transformer could use useful here if galvanic isolation is required.

T3sl4co1l · « **Reply #11 on:** May 28, 2020, 01:14:45 am »

What they said ^^

I suppose you could generalize that, perhaps putting a timed state machine on the hardware side. The passive (pull-up/down) case is simply one state with no timeout; a WDT with min and max times would have three states (waiting-high, waiting-low, latched off) and two timers; the bandpass filter (of whatever sort, passive RC, RLC or transformer, active, timers, etc.) is another implementation of this, merely give or take how precise the timing margins are, and whether they respond to, say, timing per cycle, or averaged over some cycles (for a higher Q filter), or if it's sensitive to input amplitude say (well, not an issue with strict digital sources, but for sake of argument it might be worth noting).

The next step up from that, might be a simple state loop, say, a counter and comparator; every time the bus is accessed, the counter increments, and the correct count needs to be echoed back, else the bus enters an error state (which might start as error correction or re-synchronization or something, and progress to a full faulting condition if expected inputs are not given).

A higher level example might be a bus keepalive condition. USB heartbeat for example. A null packet or token is passed back and forth periodically, preventing a timeout condition. It might truly be null, or it might be stateful (e.g., a TCP packet with sequence count), or it might carry, say, link status, or configuration, or other metadata, etc. This isn't really feasible with discrete logic, but is common between more advanced hardware systems (would be reasonable to implement in an FPGA), or lots of software.

Security devices are very much relevant here; we can include sequential keyed (and various others) kinds of interfaces, from keyfobs and garage door openers to full on industrial cryptography. The number of bits (representing the state) goes up considerably, as does the complexity in manipulating them; it's not very useful to analyze these systems as state machines, but that doesn't magically stop them still being state machines in general.

So, that's some theory, and you can choose anywhere in the range based on how tough it needs to be.

Regarding the hardware complexity of the bandpass design -- keep in mind you can use single transistors for e.g. Sallen-Key filters (the unity-gain type, where a simple follower is all that's needed). You still need all the R and C parts of course, plus some biasing most likely, but it is what it is.

One could probably also arrange a Schmitt trigger as an oscillator with gain just below threshold, so that it exhibits resonance of a crude sort. Remember that digital is really just analog, heavily quantized (to 1 bit) -- linear operations like filtering don't simply cease to apply!

Tim

mikerj · « **Reply #12 on:** May 28, 2020, 06:45:31 am »

Quote from: Ian.M on May 26, 2020, 09:32:41 am

A minimal charge pump to drive a MOSFET gate is only four components including the bleed resistor to guarantee the MOSFET's off if the I/O pin isn't toggling fast enough. However its NOT a suitable approach if you need fast turn-on/turn-off.

You need two caps, one resistor and a BAT54S or similar dual series Schottky diode, arranged in the classic charge pump circuit, pumping up from ground, with the resistor and MOSFET gate as its load. Choose the caps so a few pin flips doesn't pump enough charge to start turning the MOSFET on, but sustained toggling at your main (or other convenient) loop execution rate turns it hard on. Add a long enough software delay on startup so repeated resetting cant toggle the pin fast enough to activate it. Apart from the MOSFET, only the caps and resistor are safety critical - the coupling cap mustn't fail short, and the storage cap and resistor mustn't fail open.

Should be obvious to most but worth mentioning that for failsafe applications the pin would need to be toggled using software, using whatever conditionals are required to ensure all paths within the code are running. Don't use a timer, either with a hardware output or indirectly via an interrupt. It should be treated as a (correctly implemented) watchdog timer.

nctnico · « **Reply #13 on:** May 28, 2020, 12:09:11 pm »

Quote from: BreakingOhmsLaw on May 26, 2020, 06:40:32 am

Hi everybody. What is your preferred way to protect an MCU output that controls something critical (e.g.unlock a door or trigger Armageddon) against unintended activation? For example by undefined state during startup, brownout or device failure?
Is there a simple gold-standard way to do this more efficient?

Yes. I use a standard watch-dog chip for such situations. Or some other external logic. For example to do cycle-by-cycle current limiting and preventing both a high & low side MOSFET are switched on in a software PWM controlled converter.

splin · « **Reply #14 on:** May 28, 2020, 02:29:38 pm »

Quote from: mikerj on May 28, 2020, 06:45:31 am

Should be obvious to most but worth mentioning that for failsafe applications the pin would need to be toggled using software, using whatever conditionals are required to ensure all paths within the code are running.

It should be obvious to most that anyone who can come up with a reliable method to 'ensure all paths within the code are running', for anything but the most trivial of cases, will be set up for life in software consultancy Nirvana.

I think fail-a-bit-safer-in-some-circumstances might be a better label. Sure a watchdog trigger should be done in software and definitely not (solely) in a timer interrupt routine that may continue to function long after the rest of the software has gone to code execution hell. But putting it in the main loop (if there is one) might not help much. The system tick timer that triggers the main loop/scheduler may well be working, all the tasks get invoked but if important peripherals/DMA/interrupts etc. have gotten their state corrupted somehow then chances are the system isn't going to do anything useful. Including reacting to that critical early warning message from a sensor sent to the disabled or block UART.

When you've added enough code to periodically check that everything is working correctly, all peripherals are still configured correctly and tested (non-intrusively of course as you can't interfere with their normal operation) you've doubled the amount of code and the chances are it's your sophisticated monitoring and test code that's going to be the most unreliable part of your system. The problem with software systems is that the size of the error space is enormous. Just one bit out of millions of memory or register bits or the flip-flops controlling the state of the processor getting flipped due to a brownout/cosmic ray/ESD/cell phone interference can have almost impossible to predict consequences.

Who watches the watchers?

One strategy you could consider is to have an external hardware timer that periodically resets the processor and external peripherals - preferably a full power cycle reset - to guarantee they can't remain in an corrupted state for too long. It would likely need extra hardware to ensure that external signals are maintained over the reset. It won't help if, during a crash, the software erroneously decides to trigger the explosive bolts and goes through the correct, but elaborate, sequence required to prevent it being done by accident... (the war games scenario).

nctnico · « **Reply #15 on:** May 28, 2020, 04:40:07 pm »

Quote from: splin on May 28, 2020, 02:29:38 pm

Quote from: mikerj on May 28, 2020, 06:45:31 am
Should be obvious to most but worth mentioning that for failsafe applications the pin would need to be toggled using software, using whatever conditionals are required to ensure all paths within the code are running.
It should be obvious to most that anyone who can come up with a reliable method to 'ensure all paths within the code are running', for anything but the most trivial of cases, will be set up for life in software consultancy Nirvana.

You can get quite close by range checking. In a critical system I add checks for ranges in the hardware driver layer. The software on top can fall apart but it will never result in the hardware being driven in a way which defies the laws of nature. Sometimes I put such checks are in layer on top of the hardware driver layer as well. That way there is double checking against invalid input values.

T3sl4co1l · « **Reply #16 on:** May 28, 2020, 05:00:50 pm »

Quote from: splin on May 28, 2020, 02:29:38 pm

It won't help if, during a crash, the software erroneously decides to trigger the explosive bolts and goes through the correct, but elaborate, sequence required to prevent it being done by accident... (the war games scenario).

I find it a little amusing because, most likely there is a function call to construct exactly that sequence. No one wants to put together all the pieces at time of use -- that would be error-prone at the very least. More likely the core logic is simply going to say: when internal state is this and inputs are that: call the function.

So if the CPU jumps into a stray pointer, executes whatever randomly, and it happens to call very near either that function or the part of the function which calls it; off it goes.

On the ATXMEGA I've been working with, there is hardware protection, a little dance you need to go through, to activate certain registers, like master clock settings. So naturally, the first thing you do is include "ccpwrite.h" which emits inline asm() guaranteeing timing of the operation.

At least it still protects against errant RAM access (these registers are memory-mapped), so the second most likely attack vector is stack corruption I suppose, followed probably fairly distantly by jump tables (which might not be vulnerable at all on this platform, due to Harvard architecture; I forget how gcc builds them).

So, it might be interesting to discuss how else these mechanisms could be made, and improved, without inviting too many bugs. One option comes to mind: the function checks the return address it was called from. If it's not in its lookup table, log an error. This suggests shades of INTERCAL, but IIRC and AIUI, it's been used for secure operations like the Etherium cryptographic virtual machine.

Tim

Doctorandus_P · « **Reply #17 on:** May 28, 2020, 08:42:41 pm »

No such circuit can be designed without careful analyses of what you want to protect against.

Do you want to protect against software malfunctions?
Do you want to protect against component failure in the final driver circuit?
Do you want to protect against deliberate attacks from malicious people?

A weird nomenclature with doors is that some are "fail safe", while others are "fail secure".

"Fail Secure" doors can not be opened during for example a total blackout.
"Fail Safe" doors unlock automatically during such a total blackout, for example to let people escape out of a burning building.

David Hess · « **Reply #18 on:** May 29, 2020, 01:26:54 am »

The AC coupling method I posted about protects against interface failure like stuck in any state. It is particularly important if the interface state does not reset to a known state although that is unlikely to be a problem today; halt and catch on fire was a real thing in some systems. Conveniently it also facilitates transformer coupling.

A related concept is the use of AC excitation and synchronous demodulation on transducers to remove interference like from a magnet influencing a hall sensor on a throttle control ... Toyota. There is a reason LVDTs are popular in industrial control systems.

On the software side I remember long ago writing a set of interlocks into control code to prevent improper outputs but proving that this increased reliability was difficult to impossible. I suspect simplifying the critical code as much as possible and fully documenting the state machine would be more effective. Today my first instinct would be to place the simplified critical code on a completely separate processor which is responsible for preventing improper operation.

floobydust · « **Reply #19 on:** May 29, 2020, 01:57:43 am »

Hardware-wise, for safety critical industrial furnaces and burners, most common is a dual MCU system where one MCU is the main and the second is supervisory (watches the main I/O). Both must have concensus to activate any safety critical output, such as a gas valve.

For a single MCU system, for high SIL one requirement is you must be able to self-test CPU functionality (microcode) which rules out 98% of the MCU's out there. The TI Hercules lock-step ARM MCU's do offer that feature but they are stupid complicated and expensive and erratas etc.

In your FMEDA you consider (and document and test) each MCU pin open/shorted to adjacent/short to GND/short to VCC, as well as the passives failing open or high/low value. Transistors are six permutations. The crystal oscillator is the worst part, as they do stop or go flaky and off frequency. It's a lot of work to look at all failure modes in a product.

AC signal can keep a H/W signal line integral- but if you are a bonehead and generate the signal in an RTI which is not the way to do it in firmware, your Main can hang or another task get hung and you still make a lovely AC signal. I have seen that, as well as Jr. EE kick the watchdog in an RTI and wonder why it never tripped.

If you have a critical output malfunction, it can be good enough if the MCU can alert the user there is a problem and generate an alarm. Self-check of the output in other words.

coppercone2 · « **Reply #20 on:** May 29, 2020, 03:39:03 am »

human and mechanical switch thats periodically inspected

for a door, you use a guard

for a silo, you use a specially trained marine guard unit and numerous technicians inspecting systems constantly

when you don't do these things, like russia did with dead hand, the whole world turns against you pretty quickly. or you might get the movie fail-safe

RoGeorge · « **Reply #21 on:** May 29, 2020, 06:00:51 am »

Quote from: coppercone2 on May 29, 2020, 03:39:03 am

for a door, you use a guard

And a door turret, second 00:30 .. 00:50


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Securing a critical logic level against failure (Read 2009 times)

Share me