Author Topic: ESP8266 somehow causing noise/data corruption on serial RX line  (Read 1617 times)

0 Members and 1 Guest are viewing this topic.

Offline exscapeTopic starter

  • Contributor
  • Posts: 43
ESP8266 somehow causing noise/data corruption on serial RX line
« on: October 24, 2021, 01:33:54 pm »
I'm about to pull my hair out over this; I can't make any sense of what's going on.
Sorry for the length, but I don't think I can make it much shorter without leaving out important information.

I have a custom board with an Atmega328p, two temperature sensors, and an ESP8266 for Wi-Fi support (added later, as an afterthought).
The ESP board is an ESP-01, simple board with 8 pins. Pins 4 and 8 are RX and TX respectively, so connected to the Atmega's TX and RX.
I'm not using the "AT" firmware but fairly simple custom code; see below for the debug code used.

The Atmega and ESP communicate via a simple UART connection, using Arduino's SoftwareSerial library on the Atmega (on pins A2 and A3), and the Arduino ESP8266 core's Serial class on the ESP. The communication FROM Atmega TO ESP is sometimes corrupted, but it always works perfectly from ESP to Atmega.
I have a USB-to-serial adapter placed on the two serial data lines, near the ESP, so I can check the data on my computer. The computer seems to receive the same data the ESP receives (i.e. sometimes NOT the data the Atmega transmits).


[Inline doesn't seem to work? Schematic is attached below]

The corruption is sometimes a bit or two at a time, causing single characters to look wrong, and sometimes bigger.
For testing, I have therefore set the Atmega to simply send a test string in a loop every 100 ms.
Every now and then the serial adapter reads nothing but (repeated) gibberish instead of a mildly corrupted test string; not sure why.

If the ESP is held in reset mode, or doesn't do anything at all in the code, it works perfectly every time as seen from the USB-to-serial adapter -- so I don't think the Atmega or the SoftwareSerial library can be the cause here. Without the ESP in the picture the data looks perfect at 300 baud, 9600 baud and 115200 baud.

Now add the ESP to the picture by having it e.g. communicate via Wi-Fi, the corruption appears (as seen on the computer, but also in the data from Serial.read() if I'm calling that).

I don't have another way of checking the data, but I do have an oscilloscope (w/o UART decoding), so I hooked it up to the ESPs RX line and the header's ground pin. Most of the time it looked fine, with nice transitions and voltages of either 0-80 mV or 3.3-3.36 V.
However, every now and then it looks like this, with an intermediate voltage of ~2.64 V for a while:


[Inline doesn't seem to work? Scope capture attached below]

I then figured that perhaps the ESP is driving the RX pin for some reason I've missed (perhaps the pin is used for something else as well?). I recoded the Atmega to do nothing (set the RX *and* TX pins to INPUT and just call delay() in a loop), let the ESP send Wi-Fi data in a loop, and checked the oscilloscope then. Nothing. It hovers at about 200 mV for some reason, but I suppose both microcontrollers have the pins in a high-impedance state here and that's fine?
It never transitions above 600 mV, so I suppose that rules out that it's driving the pin?

Other thoughts:

1) Due to the logic level difference (Atmega 5 V, ESP 3.3 V and not 5 V tolerant), I used a simple resistive divider from Atmega to ESP, see schematic.
Is it possible the resistance is too high and that causes noise due to a high impedance connection? I'm doubtful but I can't rule it out. Due to the way it's assembled, changing the resistors would be quite the pain, but I'll try it if you guys do think it's a possible or likely cause of the problems.

2) Could it be EMI? But can it really be this strong? It seems the issue appears when the ESP is working, regardless of what it's actually doing.

3) Insufficient power filtering? The 3.3 V line is from a 3.3 V regulator (TI TPS7A4501DCQT), which is rated for 1.5 A. It is located maybe 7-10 cm trace distance from the ESP, but there is a electrolytic/polymer cap plus an ceramic on the daughterboard near the ESP.
I measured with the scope over that polymer cap; BW limit off, 50 ms/div, 20 mV/div with AC coupling. Roughly 110-135 mV Vpp when it works perfectly (running simple code), roughly 115-145 mV when it doesn't (heavy Wi-Fi use). So I suppose this is also not likely to be the cause?

I've added the code and output below just in case, but it might not be necessary to check!

Appendix: code and example serial output

Atmega328p:
Code: [Select]
#include <SoftwareSerial.h>

#define ESP8266_RX A2 // Connected to ESP TX
#define ESP8266_TX A3 // Connected to ESP RX
#define BAUDRATE 9600

SoftwareSerial softwareSerial(ESP8266_RX, ESP8266_TX);
void setup() { 
  // I *think* these are still necessary due to a very old SoftwareSerial bug
  pinMode(ESP8266_TX, OUTPUT);
  pinMode(ESP8266_RX, INPUT);
  softwareSerial.begin(BAUDRATE);
}

void loop() {
  softwareSerial.println("TEST TEST TEST TEST");
  delay(100);
}

ESP8266:
Code: [Select]
#include <ESP8266WiFi.h>
#include <WiFiManager.h>
#include <ArduinoOTA.h>
#include <WiFiUdp.h>
#include "WiFiLogger.h"

WiFiUDP wifiUDP;

void setup() {
  Serial.begin(9600);
  WiFiManager wifiManager;
  wifiManager.setDebugOutput(false);

  yield();
  wifiManager.autoConnect("WiFiAddon Captive", "*****************");
  yield();

  WiFiLogger.println("Setting up OTA updates...");
  ArduinoOTA.setPassword((const char*)"***************");
  ArduinoOTA.setHostname("DAQWiFiAddon");
  ArduinoOTA.begin();

  wifiUDP.begin(40100);

  WiFiLogger.println("WiFi module started up in debug mode");
}

void loop() {
  ArduinoOTA.handle();
  WiFiLogger.println("In loop still..."); // Does *not* use the serial port
 
  // It fails with or without this line:
  // while (Serial.available()) Serial.read();
 
  delay(100);
}

With the ESP held in RESET, the data received on the USB-to-serial adapter is just "TEST TEST TEST TEST" repeated as you'd expect.
With it running, the output looks roughly like this (with some correct lines removed to reduce the line count):

TEST TEST TEST TEST
TEST TEST TEST TEST
⸮TEST TEST TEST TEST
TEST TEST TEST TEST
TEST TEST TEST TEST
TEST TEST TEST TEST
TES⸮⸮ͪ⸮⸮⸮TEST TEST TEST TEST
TEST TEST TEST TEST
TEST TEST TEST TE⸮T
TEST TEST TEST TEST
TEST TEST TEST TEST
TEST TEST TE⸮TEST TE⸮TEST TEST TEST TEST
TEST TEST TEST TEST
TEST TES⸮TEST TEST TEST TEST

... you get the picture.  :D
« Last Edit: October 24, 2021, 01:36:02 pm by exscape »
 

Offline lunacyworks

  • Contributor
  • Posts: 25
  • Country: us
Re: ESP8266 somehow causing noise/data corruption on serial RX line
« Reply #1 on: October 24, 2021, 07:57:10 pm »
I know due to the GPIO 0 pin being used for flashing, there can be some issues.  But the question I have is what happens if you switch the pins you use for TX and RX?
 

Offline exscapeTopic starter

  • Contributor
  • Posts: 43
Re: ESP8266 somehow causing noise/data corruption on serial RX line
« Reply #2 on: October 24, 2021, 08:25:49 pm »
I know due to the GPIO 0 pin being used for flashing, there can be some issues.  But the question I have is what happens if you switch the pins you use for TX and RX?
On which device? AFAIK you can't change them on the ESP except to swap for two entirely different pins; that would be a massive pain since everything's on routed PCBs though, and in addition those pins aren't routed to the 8-pin connection the ESP-01 has.
I could change them on the ATmega since it's a software "UART" but the routing issue remains there as well.

(Two-way communication does work albeit with maybe 90% accuracy so the routing has to be correct -- but it is still possible (likely?) that something's up with the pins used.)
 

Offline exscapeTopic starter

  • Contributor
  • Posts: 43
Re: ESP8266 somehow causing noise/data corruption on serial RX line
« Reply #3 on: October 25, 2021, 06:00:20 pm »
This is pretty bizarre.
I got the error rate down to maybe once per 30 seconds (transmitting ~20 bytes every 100 ms, so something like 1 error per 6000 bytes) by switching to the SoftwareSerial library on the ESP. I'm considering simply being content here; either adding a simple checksum to the transfer, or simply accepting that there will be errors now and then...  |O
 

Offline exscapeTopic starter

  • Contributor
  • Posts: 43
Re: ESP8266 somehow causing noise/data corruption on serial RX line
« Reply #4 on: November 03, 2021, 10:59:40 am »
Even adding a CRC didn't help to solve the issue completely; I've tried my best making the data transfer reliable over an unreliable medium, but it's still not there; long story, and I'd prefer to focus on the actual issue at hand, i.e. why the error rate is something like an error every 500 bits or something along those lines.

I changed out the voltage divider for the "logic level converter" from 2.2k/4.4k to 330/660 ohms, and I'm not surprised, but it didn't help whatsoever.

It seems the issue has worsened a bit though, as I've been having issues even connecting the USB-to-serial converter. It usually just hangs when I connect it, and the computer receives zeroes.
The wall wart for the project is floating, and I connect GND/RX on the USB-to-serial to the board. It's not rare that it hangs the same second, and it almost always hangs within 20-30 seconds. I just don't understand why adding another reader would hinder communication so badly; even less so now that the impedence was reduced by a factor of 6.6.

EDIT: I'm not even sure why or how, but the bit errors stopped appearing. I would have to say it was because of the resistor change, since that's the ONLY hardware change I've made (and I haven't made *any* software change -- not that I believe a software change on the ESP could matter for what the computer is receiving). I'm 100% certain I was getting errors after that change, but not entirely 100% they were of the same kind.

Part of the "long story" mentioned above is that the transmission bugged out every time a FF byte was sent over the serial link. That issue remained after the bit errors disappeared, so it crashes after maybe 5-10 minutes on average, when the CRC contained FF. I then realized that I'm still using software serial on the ESP, and the only reason I switch from HW to SW was that it seemed to work slightly better at the time -- but perhaps a coding issue in the serial library (or some interrupt timing stuff) was the cause of the FF hangups?
And indeed, after switching back to HW serial again, the FF issue was gone, and thus ALL issues were gone!  ;D

From having errors once every 30 seconds or so, I've now got exactly 0 bit errors in over 180 000 transmissions (something like 25-30 hours).
« Last Edit: November 06, 2021, 08:40:21 pm by exscape »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf