Author Topic: Ionizing Radiation and error detection  (Read 5134 times)

0 Members and 1 Guest are viewing this topic.

Offline nwvlabTopic starter

  • Regular Contributor
  • *
  • Posts: 65
  • Country: it
    • next-hack.com
Ionizing Radiation and error detection
« on: February 09, 2016, 01:24:14 pm »
Hi there!

As the sizes scale down, single event effects, caused for instance by an alpha particle, could be an issue not only for military or space applications, but also on some consumer, automotive or industrial fields.

Luckily some recent microcontrollers have embedded some sort of error detection in certain SRAM regions or in their cache. Such error detection techniques might mitigate the effects of radiation-induced single event upsets.

However, I have not found yet any “consumer” microcontroller, which also features some error detection in the peripheral or MCU configuration registers (but maybe I missed some). I think that a bit flip in one of those registers might be as bad as a bit flip in a random SRAM location. By the way, peripheral configuration registers might be as many as some hundreds, amounting therefore for some kbits (i.e. the probability of a bit-flip in one register is not orders of magnitude smaller than the bit-flip probability in the SRAM).

I’m wondering if anyone knows why the manufacturers do not implement some protection on these registers too: is it due to complexity/cost constraints, or maybe do the manufacturers rely on the user, which could implement his code so that the registers are periodically scrubbed? (Still this would not be as effective as hardware detection). Or maybe, these register are implemented at the transistor levels in a way so that they are intrinsically more immune to ionizing particles (larger MOSFET) ?

Cheers!

Offline Xenoamor

  • Regular Contributor
  • *
  • Posts: 83
  • Country: wales
Re: Ionizing Radiation and error detection
« Reply #1 on: February 09, 2016, 01:31:54 pm »
... how is a 2 proton, 2 neutron atom going to hit an electronic device?
Are you talking about helium atoms hitting ICs here?

Like all decent software you should be using the watchdog to recover from any program stalls and you should have feedback from whatever you are controlling to check it's doing what you tell it

Pinging around the net I found:
Quote
In order to prevent latch-up in space, epitaxial substrates, silicon on insulator (SOI) or silicon on sapphire (SOS) are often used to further reduce or eliminate the susceptibility.

Most if not all devices have anti-latchup protection.
https://en.wikipedia.org/wiki/Latch-up
Check what NASA has to say as well:
http://nepp.nasa.gov/docuploads/31342D2C-05DD-4165-88EC472A3AB88B6A/SEU_Flash-97.pdf

EDIT -
I imagine checksums or some kind of RAID array of flash memory is used in space applications
« Last Edit: February 09, 2016, 01:48:25 pm by Xenoamor »
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8605
  • Country: gb
Re: Ionizing Radiation and error detection
« Reply #2 on: February 09, 2016, 02:00:48 pm »
As the sizes scale down, single event effects, caused for instance by an alpha particle, could be an issue not only for military or space applications, but also on some consumer, automotive or industrial fields.
Alpha paricles was a problem around 1980, when alpha emissions from a ceramic package could corrupt the operation of the finest geometry chip of the time. This was particularly troublesome with DRAM devices. Die coatings were found, which protected the chips, and the problem went away. Why did you ask about alpha in particular? Its things like gamma which are the real problem, especially in space.
Luckily some recent microcontrollers have embedded some sort of error detection in certain SRAM regions or in their cache. Such error detection techniques might mitigate the effects of radiation-induced single event upsets.

However, I have not found yet any “consumer” microcontroller, which also features some error detection in the peripheral or MCU configuration registers (but maybe I missed some). I think that a bit flip in one of those registers might be as bad as a bit flip in a random SRAM location. By the way, peripheral configuration registers might be as many as some hundreds, amounting therefore for some kbits (i.e. the probability of a bit-flip in one register is not orders of magnitude smaller than the bit-flip probability in the SRAM).
Do automotive MCUs count as consumer? There are lock step redundant MCUs from several automotive suppliers, such as the TI Hercules ARM chips.
 

Offline nwvlabTopic starter

  • Regular Contributor
  • *
  • Posts: 65
  • Country: it
    • next-hack.com
Re: Ionizing Radiation and error detection
« Reply #3 on: February 09, 2016, 02:25:35 pm »
Why did you ask about alpha in particular? Its things like gamma which are the real problem, especially in space.

Because I'm referring to the sea-level soft-error. Alpha particles are heavy enough to induce a single event effect, still being quite frequent, because, for instance, neutrons coming from cosmic rays can be captured by the atoms composing the chip or the package. These new isotopes can be radioactive and undergo in some radioactive decays, in particular, alpha decay.

Gamma particles typically are a concern in terms of total ionizing dose, i.e. small amount of radioactivity, whose effects accumulate over time (charge loss in flash memory, charge trapping in the oxides, trap generation, etc).

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8605
  • Country: gb
Re: Ionizing Radiation and error detection
« Reply #4 on: February 09, 2016, 03:29:35 pm »
Why did you ask about alpha in particular? Its things like gamma which are the real problem, especially in space.

Because I'm referring to the sea-level soft-error. Alpha particles are heavy enough to induce a single event effect, still being quite frequent, because, for instance, neutrons coming from cosmic rays can be captured by the atoms composing the chip or the package. These new isotopes can be radioactive and undergo in some radioactive decays, in particular, alpha decay.

Gamma particles typically are a concern in terms of total ionizing dose, i.e. small amount of radioactivity, whose effects accumulate over time (charge loss in flash memory, charge trapping in the oxides, trap generation, etc).
Alpha can't get through the package. The only time alpha was a problem was when it came from the package itself, right next to the chip. A thin film over the chip was sufficient to block it. Anything helium gas tight is helium nucleus tight.
 

Offline nwvlabTopic starter

  • Regular Contributor
  • *
  • Posts: 65
  • Country: it
    • next-hack.com
Re: Ionizing Radiation and error detection
« Reply #5 on: February 09, 2016, 03:42:09 pm »
Alpha can't get through the package. The only time alpha was a problem was when it came from the package itself, right next to the chip. A thin film over the chip was sufficient to block it. Anything helium gas tight is helium nucleus tight.

right, an alpha particle cannot get through the package if it comes from the outside.

However, if it comes from the inside of the chip, you can't do anything. How can it happen? By neutron capture, a stable isotope can become unstable and then decay.  If you have a 5 MeV alpha particle, it can travel some tens of microns before it's stopped, and that is a very long distance if it originated "close enough" to sensitive nodes.

Offline Daving

  • Contributor
  • Posts: 34
Re: Ionizing Radiation and error detection
« Reply #6 on: February 09, 2016, 05:44:09 pm »
From wikipedia:



A sheet of paper or a few cm of air block alpha.
A sheet of thin metal or a few meters of air block beta.
X-Ray and Gamma takes much more material but don't have enough energy to flip bits in most MCUs with larger features than those found in consumer CPUs.

Cosmic is what you need to be concerned about, but at sea level is not a substantial issue.

Memory in many architectures is not linearly arranged. In other words, if you have byte 0 and byte 1 you don't have:
Code: [Select]
B0b0 B0b1 B0b2 B0b3 B0b4 B0b5 B0b6 B0b7
B1b0 B0b1 B0b2 B0b3 B0b4 B0b5 B0b6 B0b7

You typically end up with something more like this:
Code: [Select]
B0b0 | B1b1 | B2b2 | ...
B1b0 | B2b1 | B0b2 | ...
B2b0 | B0b1 | B1b2 | ...

Actual arrangements very.  This helps to prevent 2-bit errors in a single byte or word.
Next many memory systems include some form of ECC.  The simplest of which is a 2D parity check, which can correct any 1-bit error and detect any 2-bit error.

Here's an example with 16 bits of data:
The parity bit is the rightmost or bottommost bit in each row or column (on the other side of the line from all the other bits.
The parity bit is a 1 if there are an ODD number of 1s in the row or column.
The parity bit is a 0 if there are an EVEN number of 1s in the row or column.

Code: [Select]
0 1 1 0 | 0
1 0 0 0 | 1
1 0 1 0 | 0
0 0 1 1 | 0
_______
0 1 1 1

If a single bit flips:
Code: [Select]
0 1 1 0 | 0
1 0 1 0 | 1 <-wrong row
1 0 1 0 | 0
0 0 1 1 | 0
_______
0 1 1 1
    ^ Wrong column
So you kn0w the row and column of the error and can fix it.

If 2-bits flip
Code: [Select]
0 1 1 0 | 0
1 1 0 0 | 1 <wrong row
1 1 1 0 | 0 <also a wrong row
0 0 1 1 | 0
_______
0 1 1 1
  ^ This column looks correct.

The error is detected, but cannot be corrected.  Any of the rows could have the error.

Of course, there are cases where multiple bits can be corrected, but it is guaranteed to work if any 1 bit is wrong.

If 2 parity bits are in error, you will detect the error, but you may make a bad correction.

There are other more robust ECC techniques.
https://en.wikipedia.org/wiki/Forward_error_correction
 

Offline nwvlabTopic starter

  • Regular Contributor
  • *
  • Posts: 65
  • Country: it
    • next-hack.com
Re: Ionizing Radiation and error detection
« Reply #7 on: February 09, 2016, 07:00:23 pm »
Ok, this does not answer my initial question... Which was: why do they not implement the ECC (or at least a parity check) even on the peripheral and cpu config registers?

Beside, cosmic rays at the sea level might be indeed an issue. Back to the 1996 IBM showed that in a 256-MB system you should expect about 1 soft error per month. That is not negligible!
This means that if you have 1000 256-kB systems you should expect roughly the same error rate (i.e. one of these 1000 systems is likely to have a soft error in a month).   
Similarly, in 1 million 256-byte system, you are likely to have 1 error per month in at least one of these 1E6 systems.

(IBM considered errors in DRAM, but SRAM has similar problems (plus there is also a positive feedback) and possibly higher soft error rates.

Offline edavid

  • Super Contributor
  • ***
  • Posts: 3381
  • Country: us
Re: Ionizing Radiation and error detection
« Reply #8 on: February 09, 2016, 07:44:53 pm »
Ok, this does not answer my initial question... Which was: why do they not implement the ECC (or at least a parity check) even on the peripheral and cpu config registers?
Because they are much larger than RAM cells, so they have lower (negligible) error rates.
« Last Edit: February 09, 2016, 10:12:04 pm by edavid »
 

Offline Daving

  • Contributor
  • Posts: 34
Re: Ionizing Radiation and error detection
« Reply #9 on: February 09, 2016, 08:55:10 pm »
Ok, this does not answer my initial question... Which was: why do they not implement the ECC (or at least a parity check) even on the peripheral and cpu config registers?
Because they are much larger than RAM cells, so they have lower error rates.

Also, many microcontrollers in use have 256B rather than 256MB.  The error rate is on the order of less than 1 soft error per millenia [citation needed].
Most uC users don't need ECC, and those that do can implement it in software.
 

Offline nwvlabTopic starter

  • Regular Contributor
  • *
  • Posts: 65
  • Country: it
    • next-hack.com
Re: Ionizing Radiation and error detection
« Reply #10 on: February 09, 2016, 09:32:09 pm »
Also, many microcontrollers in use have 256B rather than 256MB.  The error rate is on the order of less than 1 soft error per millenia [citation needed].

Yep, but you must take into account that you have millions of devices. Therefore the number of "failing devices" in one year is relatively high.

Offline RogerRowland

  • Regular Contributor
  • *
  • Posts: 193
  • Country: gb
    • Personal web site
Re: Ionizing Radiation and error detection
« Reply #11 on: February 10, 2016, 06:33:44 am »
Yep, but you must take into account that you have millions of devices. Therefore the number of "failing devices" in one year is relatively high.

The pragmatic view is probably as simple as this: the reason manufacturers don't implement any protection is that it isn't considered a significant problem and it doesn't affect sales. If it ain't broke ....
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8605
  • Country: gb
Re: Ionizing Radiation and error detection
« Reply #12 on: February 10, 2016, 06:42:09 am »
Makers of medical devices, like blood glucose meters, are generally fairly unconcerned about device failures, but very very concerned about devices giving wrong results. They are, therefore, prepared to pay for self correcting memories and some other fault detection complexity in their MCUs. I haven't heard of them showing concern about alpha corruption.
 

Offline nwvlabTopic starter

  • Regular Contributor
  • *
  • Posts: 65
  • Country: it
    • next-hack.com
Re: Ionizing Radiation and error detection
« Reply #13 on: February 10, 2016, 02:18:48 pm »
I haven't heard of them showing concern about alpha corruption.

Don't focus on alpha. I wrote "for instance".

Maybe they don't care because they are not aware of? And they think that every error is for sure due to a software bug or EMI or whatever ?

I recall that many years ago,soft errors were considered only a minor topic of a larger "device and process" track at the IEEE International Reliability Physics Symphosium (IRPS). Since few years ago, they have a dedicated track.


Offline Dago

  • Frequent Contributor
  • **
  • Posts: 659
  • Country: fi
    • Electronics blog about whatever I happen to build!
Re: Ionizing Radiation and error detection
« Reply #14 on: February 10, 2016, 02:20:19 pm »
What do you mean by "consumer" exactly? There are plenty automotive/safety microcontrollers which have ECC ram and peripheral memory and lockstep cores such as TI TMS570 http://www.ti.com/lsds/ti/microcontrollers_16-bit_32-bit/c2000_performance/safety/tms570/overview.page
Come and check my projects at http://www.dgkelectronics.com ! I also tweet as https://twitter.com/DGKelectronics
 

Offline Jeroen3

  • Super Contributor
  • ***
  • Posts: 4067
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: Ionizing Radiation and error detection
« Reply #15 on: February 10, 2016, 02:58:12 pm »
Have you already tried to get an "consumer grade" chip in a faulty condition with a radiation source?
 

Offline nwvlabTopic starter

  • Regular Contributor
  • *
  • Posts: 65
  • Country: it
    • next-hack.com
Re: Ionizing Radiation and error detection
« Reply #16 on: February 10, 2016, 03:50:12 pm »
Have you already tried to get an "consumer grade" chip in a faulty condition with a radiation source?

Yep, memories.

Offline Daving

  • Contributor
  • Posts: 34
Re: Ionizing Radiation and error detection
« Reply #17 on: February 12, 2016, 04:33:15 pm »
Makers of medical devices, like blood glucose meters, are generally fairly unconcerned about device failures, but very very concerned about devices giving wrong results. They are, therefore, prepared to pay for self correcting memories and some other fault detection complexity in their MCUs. I haven't heard of them showing concern about alpha corruption.

Yes, however you have medical developers and watch developers buying the same part.  The watch developers don't need to pay for ECC memory that they don't need, when the medical folks can add the checks in software.

I skydive, and the AADs (dave did a teardown of a cypress) do self check every time they're turned on.

Self checks can cover a checksum or CRC of the flash.
Then during operation, important state variables can be written as an ECC struct.  Reads and writes would go through a function for the ECC API.  It would be slower, but the safety is in place.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf