Author Topic: How many of you validate your I2C data in embedded systems?  (Read 2273 times)

0 Members and 1 Guest are viewing this topic.

Offline globoyTopic starter

  • Regular Contributor
  • *
  • Posts: 178
  • Country: us
I am prompted to ask this question by a recent failure mode where, very occasionally - say 1 in 2 million reads - an I2C read of a device returns invalid data.  The bad data leads the system to decide it needs to shut down.  Perhaps a fortunate outcome because otherwise I may have never known the corruption was occurring.

So I am curious what those of you who develop embedded systems do to validate I2C (or SPI accessed) control register data.  I know that usually it's easy to detect a failed transaction - a missing ACK or address decode or timeout - since the driver can report the failure.  But unless there is some sort of validation then corrupt data may make its way into the system.  Perhaps often it doesn't matter - for example a single bad battery voltage pushed into an averaging array probably won't negatively affect system operation.  But for cases like looking for a status bit that is clear-on-read then it can matter.

Most serial protocols, and most data storage systems, include data integrity support such as a checksum or error correcting codes, but most hardware control register interfaces do not.

Just curious what others do.
 
The following users thanked this post: AndersJ

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8574
  • Country: gb
Re: How many of you validate your I2C data in embedded systems?
« Reply #1 on: June 03, 2023, 04:13:22 pm »
Have you checked your signal integrity, especially under high EMI conditions? A well designed board should exchange I2C data continuously, with only the rarest of errors. If individual transfers are of high importance, like storing key data in an I2C EEPROM or FRAM, people tend to add a checksum to each stored block of bytes.
 
The following users thanked this post: globoy

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26682
  • Country: nl
    • NCT Developments
Re: How many of you validate your I2C data in embedded systems?
« Reply #2 on: June 03, 2023, 04:14:30 pm »
For sensors / analog inputs I typically use averaging. That also takes care of false readings. In some cases I ignore outliers. With sensors there can be many causes of false readings that aren't necessarily interface related but can also be caused by external interference.

When dealing with SPI / I2C based storage, I always have a checksum over the data (from a simple modulo 256 to SHA256 depending on the application). Typically I use a round robin system that wears an external memory evenly. So when a verify fails, I let the software use the next sector. This covers flash failures and communication failures.

There is also a third use case and that is controlling something external. My goto approach is to continuously update the output values so that if a write goes wrong, it is corrected quickly. A 10ms internal usually is enough because many mechanical actuators (like relays, valves, etc) need more time to react anyway.

edit: typo
« Last Edit: June 03, 2023, 06:18:51 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 
The following users thanked this post: I wanted a rude username, globoy

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 16510
  • Country: us
  • DavidH
Re: How many of you validate your I2C data in embedded systems?
« Reply #3 on: June 03, 2023, 04:22:55 pm »
For analog measurements I have sometimes calculated the standard deviation and peak-to-peak which could be used to reject bad readings or detect degradation.

For packetized communications between microcontrollers, I almost always use some form of error detection.  I do the same for stored data records.


 
The following users thanked this post: globoy

Offline globoyTopic starter

  • Regular Contributor
  • *
  • Posts: 178
  • Country: us
Re: How many of you validate your I2C data in embedded systems?
« Reply #4 on: June 03, 2023, 04:26:34 pm »
I have looked at signal integrity with a 200 MHz BW scope and it looks ok.  Have not correlated it with EMI but it's being tested and failing in the same location.

I have been doing some testing where I leave systems running and have them print out unexpected results. So far I've gotten two failures this way and in both cases extra (reserved 0) bits were set.  In my mind this reduces the possibility that, for example, the data being driven is too slow and is being sampled incorrectly since there should be long stretches of time where SDA is pulling the bus low.

Regarding checksums and the like.  Yup, exactly.  Things like storage or even streams of data can be - and usually are - protected with additional data.  But what to do with control registers?

Probably this particular problem has some acceptable work-arounds (I can validate the data and perhaps even recover from an errant shut down by immediately rebooting in that case).

But I'm a little embarrassed to admit for the last 15 years of doing embedded systems I've sort-of ignored this until it has come to bite me.  So now I'm curious what y'all do.
« Last Edit: June 03, 2023, 04:34:20 pm by globoy »
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26682
  • Country: nl
    • NCT Developments
Re: How many of you validate your I2C data in embedded systems?
« Reply #5 on: June 03, 2023, 04:36:16 pm »
In some cases when I2C goes off-board (short wires or a board-to-board connector) I like to add 100pf capacitors to the SDA and SCL lines. This helps to be more immune to external influences.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 
The following users thanked this post: boB, globoy

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8090
  • Country: fi
Re: How many of you validate your I2C data in embedded systems?
« Reply #6 on: June 03, 2023, 05:15:48 pm »
I always try to add validation / error tolerance / recovery, given practical constraints (most often that of development time).

At early stage, I prefer "fail on iffy things", i.e. assert() approach and come up with some kind of logging system; when the stuff runs on internal testers or pilot customers, a reboot is not catastrophic, but will force us to commit time to fixing or working around the issue. Later these types of asserts can be changed into some form of self-recovery + logging.

I2C sensors rarely offer anything helpful in reliability regard so what you can do is some kind of watchdog timer and maybe a detection of stuck number for more than 100 last samples or so, followed by power-down power-up and reinitialization sequence of the sensor. Any timing guarantees go out of window but such is life with I2C.

Signal integrity is possible to verify with oscillosscope and improve by simple schematic/PCB design, but much bigger problem are buggy MCU I2C peripherals and buggy I2C slaves.

For off-board stuff, try to maintain signal integrity by choosing sensible pinout, like GND-SDA-GND-SCL-GND. It makes a huge difference
« Last Edit: June 03, 2023, 05:18:55 pm by Siwastaja »
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 16510
  • Country: us
  • DavidH
Re: How many of you validate your I2C data in embedded systems?
« Reply #7 on: June 03, 2023, 05:23:03 pm »
I have looked at signal integrity with a 200 MHz BW scope and it looks ok.  Have not correlated it with EMI but it's being tested and failing in the same location.

Limiting the bandwidth of the communications channel to what is required can help.  There is no need to have higher bandwidth than necessary.

Quote
Regarding checksums and the like.  Yup, exactly.  Things like storage or even streams of data can be - and usually are - protected with additional data.  But what to do with control registers?

Reading back values written to control registers for verification may help, although I remember one case where this caused errors.

Quote
Probably this particular problem has some acceptable work-arounds (I can validate the data and perhaps even recover from an errant shut down by immediately rebooting in that case).

I try to make the programs resilient to errors by either catching them as early as possible or catching and correcting them later.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7673
  • Country: de
  • A qualified hobbyist ;)
Re: How many of you validate your I2C data in embedded systems?
« Reply #8 on: June 03, 2023, 06:11:26 pm »
It depends on the risk involved. If it's totally uncritical the answer might be 'I don't care'. The other extreme would be to go into fail-safe mode after a few consecutive failed reads or to have a second sensor on a different bus.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: How many of you validate your I2C data in embedded systems?
« Reply #9 on: June 03, 2023, 08:26:42 pm »
Yes, this is obviously one question that can't be answered in a single, general manner.

* If some I2C transmission replies with a NACK, there is no data to speak of. If that was a write transfer that is supposed to trigger an action on the I2C slave and you get a NACK, then you just don't know. The slave device may have received the command and executed it, and the NACK comes from a signal integrity problem. Or the device may just have malfunctioned and done nothing at all. Or something you didn't expect.

* If all your transfers get an ACK, that's not always sufficient to assume that everything went right. If you are reading data and the values are critical, ideally select an I2C device that has embedded CRC - quite a few sensors do have that. Favor devices that compute CRCs and check the CRCs.

* If you have no way of checking data integrity (such as CRCs), then you are on your own and will do what needs to be done when measurements can't be trusted. It all depends on the application. Sometimes just an average is OK. Other times you'll add redundancy with additional sensors. You'll also check the returned values to reject values that make no sense. It can be tricky. Doing that can lead to rockets crashing.

* When the associated risk is critical, it's tough shit.

But to sum up:
- Use CRCs when CRCs are available and favor devices that handle them.
- Use redundancy if that's economically viable.
- Validate sensors values just as you should validate all of your inputs. (This one can get very tricky though as I said above. But do it at least as best as you reasonably can.
- Unless not critical at all, never assume anything about the range of values you *should* be getting, never settle for "this particular case should never happen, so let's ignore it." It rarely ends well.

Just a few pointers. Now the way to handle things if none of the above leads to a recoverable state entirely depends on the application.
 

Online wek

  • Frequent Contributor
  • **
  • Posts: 476
  • Country: sk
Re: How many of you validate your I2C data in embedded systems?
« Reply #10 on: June 03, 2023, 09:23:16 pm »
So far I've gotten two failures this way and in both cases extra (reserved 0) bits were set. 

That's strange indeed. IMO bit shift (extra clock from noisy SCK transition) and then 0-instead-of-1 data (noise while SDA being recessive) is much more likely than that.

I'd suspect software bug.

Can you upon detecting error toggle a pin, to trigger the scope, so that the transfer in question is captured?

JW
 

Offline globoyTopic starter

  • Regular Contributor
  • *
  • Posts: 178
  • Country: us
Re: How many of you validate your I2C data in embedded systems?
« Reply #11 on: June 03, 2023, 09:51:00 pm »
Can you upon detecting error toggle a pin, to trigger the scope, so that the transfer in question is captured?

With some difficulty but yes.  It's probably worth doing.  I didn't do it so far because I didn't want to tie my scope up for days waiting for a failure.
 

Offline AVI-crak

  • Regular Contributor
  • *
  • Posts: 124
  • Country: ru
    • Rtos
Re: How many of you validate your I2C data in embedded systems?
« Reply #12 on: June 04, 2023, 09:40:28 am »
Chips with I2C interface are divided into two types: very simple and with full support for the interface protocol.
It is impossible to verify information from simple microcircuits, because they are extremely simple. These are port expanders, address switches and a very rare type - an address stub. Such devices are able to process only part of the protocol - they are not able to correctly handle the situation in the presence of distortions. The solution is to control some of the interface errors from the MK.

Devices with full interface protocol support are able to correctly handle error situations. Data line errors cause exceptions on the MC and on external devices - it is enough to provide for the handling of such situations. Errors on the devices themselves after the i2c interface (inside the chip) - can be caught through response delays (keeping the clock line at zero).
For example, timeout for reading data.

Devices with full support for the i2c interface and active transfer mode are a separate headache.
The i2c interface has a large number of error flags, but the situation is even worse - you need to process combinations of errors. In this situation, there is no point in hurrying, because most often you have to pump the i2c bus without receiving data. Therefore, the whole variety of states can be processed in the main thread - it won't get any worse.

And now tell me fellow programmers - how many of you have seen the i2c driver with a complete list of error handling?
 

Offline Doctorandus_P

  • Super Contributor
  • ***
  • Posts: 3313
  • Country: nl
Re: How many of you validate your I2C data in embedded systems?
« Reply #13 on: June 10, 2023, 03:05:48 pm »
I do not use I2C much, but have written some software that uses internal EEprom in an uC. And because I already have a 16bit CRC function for communicating over Uart/RS485, I also add a 16-bit CRC to the EEprom data.
 

Offline Georgy.Moshkin

  • Regular Contributor
  • *
  • Posts: 143
  • Country: hk
  • R&D Engineer
    • Electronic projects, modules and courses on Arduino and STM32
Re: How many of you validate your I2C data in embedded systems?
« Reply #14 on: June 11, 2023, 02:46:14 pm »
I only tested some products serial port by sending random data. Some devices failed even after small data bursts. Failures: garbled image and reboot on smart camera, erased firmware or wrong settings in some rare cases.
In my opinion, ability to store recording of SDA and SCL lines may be useful. Very similar to 2 channel oscilloscope, which monitors I2C bus and stores ADC sliding window samples for the most recent transfer. I thought about implementing it by using timers 16 bit repetition counter on STM32, this allows to record predefined number of samples after software/hardware triggering using DMA, no waste of performance.
« Last Edit: June 11, 2023, 02:48:48 pm by Georgy.Moshkin »
 

Offline globoyTopic starter

  • Regular Contributor
  • *
  • Posts: 178
  • Country: us
Re: How many of you validate your I2C data in embedded systems?
« Reply #15 on: June 11, 2023, 04:34:21 pm »
After some bone-headed mistakes, each wasting days, I finally captured a failure on the scope using another GPIO to trigger scope as wek suggested.  The scope trace below shows the failure.  It is the read data for a single-byte read.  You can see what appears to be the slave stretching the clock for about 27 uSec during the ACK phase.   The transaction appears to complete successfully after the clock stretching (the ESP32 issues a NACK indicating the end of the read).  The data (0x01) the slave delivers is correct data but the driver hands the code incorrect data (0xb4).  The driver does not indicate any I2C errors occurred.  So I am suspecting a rare issue in the ESP32 or its driver.  I have a work-around in the code that attempts to validate the data by looking for reserved bits being set or invalid combinations of bits.  But I guess I could try some additional back-burner experiments to try to further quantify the issue - more failure collection and see if I can come up with an experiment to see how often the slave clock stretches and if this correlates with the failure.
 
The following users thanked this post: boB

Offline eutectique

  • Frequent Contributor
  • **
  • Posts: 357
  • Country: be
Re: How many of you validate your I2C data in embedded systems?
« Reply #16 on: June 11, 2023, 05:29:55 pm »
But I guess I could try some additional back-burner experiments to try to further quantify the issue - more failure collection and see if I can come up with an experiment to see how often the slave clock stretches and if this correlates with the failure.

I would use a spare dev board (or similar) to implement an I2C slave with controllable parameters. This way there would be no need to hunt for rare events that happen once in a blue moon.
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 821
Re: How many of you validate your I2C data in embedded systems?
« Reply #17 on: June 11, 2023, 08:09:15 pm »
Quote
The data (0x01) the slave delivers is correct data but the driver hands the code incorrect data (0xb4).
Not sure if that makes it easier or harder if the problem is only contained within the esp32 hardware/code, but it narrows the search. I think I would want to grab the i2c ram (and rx status) to see if the correct data is actually inside and the corruption is in the transfer of this i2c ram byte to somewhere else, or maybe the wrong byte is read from the i2c ram and the wrong value will be seen inside the i2c ram.
 

Offline brumbarchris

  • Regular Contributor
  • *
  • Posts: 216
  • Country: ro
Re: How many of you validate your I2C data in embedded systems?
« Reply #18 on: June 11, 2023, 08:42:25 pm »
Nice catch! (on the scope)

I would second eutectique's suggestion above: just make another programmable i2c slave to send valid data, instead of your current slave. Do it by bit banging, so that you will have manual control to introduce any clock stretching as needed.

Cristian
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1621
  • Country: nl
Re: How many of you validate your I2C data in embedded systems?
« Reply #19 on: June 11, 2023, 09:31:42 pm »
Does my eye also spot that the data changes at seemingly random positions with respect to clock?

Sometimes it seems to be in the middle of clock transition, while other times it seems to occur at the falling edge.

Not sure when the I2C will read in the bit (hopefully the rising edge), but perhaps that could lead to some setup/hold time issues.

Otherwise this seems to be more of a RTL/software bug. Apart from the delay, which can be completely normal, the trace looks quite normal to me.

It is quite hard to explain how a byte 0x01 would change to 0xB4 unless some crazy stuff happens with the signal line (which doesn't appear to be, apart from the edges moving around), or some crazy stuff goes wrong with some buffer/statemachine. Maybe some interrupt is not serviced in time etc. If the target in question is ESP32, I would be quite cautious with that. I don't know if you use(d) ESP-IDF, but the bloat in that framework is IMO very substantial (although it does serve its purposes), and for issues like these I don't like random stuff thats in the background which, even worse, is not always clear whats happening.
 
The following users thanked this post: boB

Online wek

  • Frequent Contributor
  • **
  • Posts: 476
  • Country: sk
Re: How many of you validate your I2C data in embedded systems?
« Reply #20 on: June 11, 2023, 09:58:19 pm »
Nice catch indeed!

How do you know it's the slave which stretches the clock?

(I use resistor divider to determine that using oscilloscope).

+1 to eutectique's suggestion.

JW
 

Offline globoyTopic starter

  • Regular Contributor
  • *
  • Posts: 178
  • Country: us
Re: How many of you validate your I2C data in embedded systems?
« Reply #21 on: June 11, 2023, 11:24:28 pm »
Quote
Does my eye also spot that the data changes at seemingly random positions with respect to clock?

I think this is the difference between the ESP32 driving the SDA line and the slave (a Silicon Labs EFM8SB micro) driving the SDA line.  Earlier when debugging this design I did zoom in on I2C transactions involving different slaves and it does look like timing is being met.

In general millions and millions of transactions are executed successfully in periods stretching into multiple days.  This problem only shows up rarely.  Two of three slaves are accessed frequently (every 30 and every 100 mSec).  The other slave is accessed very rarely.

Quote
It is quite hard to explain how a byte 0x01 would change to 0xB4 unless some crazy stuff happens with the signal line (which doesn't appear to be, apart from the edges moving around), or some crazy stuff goes wrong with some buffer/statemachine. Maybe some interrupt is not serviced in time etc. If the target in question is ESP32, I would be quite cautious with that. I don't know if you use(d) ESP-IDF, but the bloat in that framework is IMO very substantial (although it does serve its purposes), and for issues like these I don't like random stuff thats in the background which, even worse, is not always clear whats happening.

It is strange and I agree, I wonder if some interrupt is missed or something.  This is an IDF project and has has a handful of tasks running.  Most of the time they should be pretty idle.  Classic Bluetooth is also running and the device has been connected during my testing.  I know BT runs at a high priority on the same CPU that is also doing I2C operations (I2C access is mutex protected for use with multiple tasks).  My fear is that it's some more complicated set of conditions occurring that will be hard to reproduce in a dedicated setup.

I've done several products with the ESP32 and this is the first one where I've seen an issue with the I2C bus.  A good thing is that I can probably dig into the driver and instrument it some (I've found I often have to look at the IDF source to figure something out). 

Quote
How do you know it's the slave which stretches the clock?

Actually, I don't.  I made an assumption because I know the EFM8 can stretch the clock briefly for short periods.  However you're right, it could be another chip and I've also done the trick of putting a resistor in series with SCL to find out who's pulling it low.  It would be a pain to do that with this design (e.g. cutting traces, bodging in a resistor, etc) but probably could be done.

I have another capture running.  When that one finishes I think I'm going to start another without Bluetooth connected to anything.

 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 821
Re: How many of you validate your I2C data in embedded systems?
« Reply #22 on: June 12, 2023, 01:56:20 am »
Quote
This problem only shows up rarely.
That you are aware of (?). Maybe there is more data corruption inside the esp32 i2c framework that goes unnoticed. If these are mostly single byte reads, it looks like the i2c peripheral will put the first read byte into the first i2c ram location so maybe you can compare all single byte read values with what is stored in the i2c ram which then requires just a single additional hardware read. If the values do not match, somewhere from hardware buffer to your result the value was corrupted. You would have to make sure the reading of the ic2 buffer takes place before another transaction is started. If there is more i2c data corruption than first thought, the simple verification could possibly speed up the 'catch' rate.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf