Hi all,
I have a problem that has only showed up after building 300 or so of a new custom flight yoke device, and so far I haven't been able to determine either the cause or a fix. TL;DR summary is that the MCU still runs and talks over USB, but GPIOs that should be in input + pulled high mode are seemingly driven to odd voltages like 0.2V, 1.4V, or 2.0V. Input registers can still detect the logic level when strongly driven externally, but something is internally wrong.
Here's a short list of salient points about the current board design:
- ATSAMD21G18A microcontroller design w/external 32768Hz crystal
- SPX3819 3.3v regulator for main power rail, same as on many Adafruit boards
- USB type B connector for data + power
- Downstream USB hub (USB2514) for low-power related peripherals (almost never used)
- External 5V supply connection in case downstream peripherals are more power-hungry (no customers have used this to date)
- MBR120 diodes to protect either USB or 5V in from feeding into the other one
- MCU configured as a downstream device of the USB2514, but a physical switch allows rerouting MCU USB data lines directly to the USB B connector for dev/troubleshooting
- Two ALS31300 hall effect sensors connected via I2C
- 6x inputs comprising 2 momentary pushbuttons and a 4-way hat switch, all active LOW connecting GND to the pin through a 220-ohm safety resistor*
- All 6 button GPIOs are set to input + pulled high, all other GPIOs are left in their POR state and electrically unconnected
- *1 pin has no 220-ohm safety resistor, to allow it to be used as an optional VDD source for an LED in a different build variant, but this is only a standard button in the failing units
The circuit is not very complicated. I built about 30 of these boards myself (home stencil+paste, PnP, reflow) and did dozens of hours of firmware development on a couple of different workstations without any issues. Now, some units that have made it into the field (nearing 10% of those shipped) have failed, usually after many hours of perfect functionality. Buttons no longer respond correctly and sometimes remain in a "pressed" state all the time, rendering the yoke useless to the customer.
I'm not the most knowledgeable electrical engineer in the world, but I have done
many designs around the SAMD21 and I've never seen this failure mode. The USB hub is the only departure from my typical designs, but I can't figure out why or how that could cause this kind of problem, especially since none of the failing units (to my knowledge) have involved the use of any downstream peripherals. I have not been able to reproduce the problem myself so far.
As for troubleshooting, this is what I've found:
- No detectable physical issues such as solder bridges, flux residue, bad part placement, broken components, etc.
- Different sets of GPIOs seem to be affected, not always the same ones
- GPIOs on multiple ports (PORTA + PORTB) are affected
- Erase+reflash of firmware has no impact
- Disconnecting hall effect sensors (via ribbon cable) has no impact
- Disconnecting buttons/hat switch (via ribbon cable) has no impact
- Removing 220 ohm safety resistors has no impact
- Swapping in a new MCU IC with hot air and flashing new firmware fixes everything immediately
- Monitoring various I/O pins, USB data lines, power nets, etc. with a DSO reveals nothing unexpected (e.g. spikes or dips), even through many repeated USB cable removal/insertion cycles
- One customer had two different units fail (now on his 3rd, still working)--suggests a possible customer-side problem given the odds?
Searching online has not revealed a definitive answer to this. Not being able to reproduce the problem locally is limiting; "here's a custom hardware-tweaked replacement, see if this works for a few weeks" makes for an incredibly inefficient troubleshooting process. The majority of yokes are working fine, but every new device shipped is gambling on customer happiness.
I'm asking here in case anyone has see this before, or knows better than I do what questions to ask--or who to pay for consulting, if necessary. About the only thing I've found so far is that there's no TVS protection on the USB input. But even in the case of USB subsystem damage, I wouldn't expect these symptoms.
Any brave/knowledgeable EEVblog forum souls care to weigh in? I can share additional details if anyone needs to know something I haven't shown.