Author Topic: Rare (0.1%) failure of ULPI initialisation on embedded processor  (Read 800 times)

0 Members and 1 Guest are viewing this topic.

Offline tom66Topic starter

  • Super Contributor
  • ***
  • Posts: 7031
  • Country: gb
  • Electronics Hobbyist & FPGA/Embedded Systems EE
We've got a board with a Xilinx Zynq SoC and a USB3320 transceiver.  I'm not posting this in the FPGA forum because although the device is an FPGA, I think this is more general-purpose-embedded as bugs go as the USB peripheral is built into the hard ARM cores on the Zynq.

Our board is a custom hardware platform running Linux 5.10 (petalinux) using Zynq 7010 and 256MB DDR3.  Boot is via SD card.

About 1 in 1000 boot cycles, the USB peripheral sometimes fails to initialise.  In the kernel log we see "udc-core: couldn't find an available UDC or it's busy" when our script attempts to initialise the USB peripheral.  Only a hard power cycle fixes the issue.
 
With extra debug turned on, I noticed that every failure like this is accompanied with this kernel log:

Code: [Select]
ULPI transceiver vendor/product ID 0x0424/0x0007
Found SMSC USB3320 ULPI transceiver.
ULPI integrity check: failed!

A successful boot, instead, has "ULPI integrity check: passed."

The origin of this message is here: https://github.com/Xilinx/linux-xlnx/blob/master/drivers/usb/phy/phy-ulpi.c (in ulpi_check_integrity).

Now one thing that struck me about this is all the routine does is two byte read/writes.  It doesn't retry anything.  If there's a SEU, then the check will fail and the device won't be enumerated.  But perhaps if our interface fails 1-in-1000 writes then we're not in SEU territory but more unreliable bus territory?

I did notice that when I manually write to this using some low level C code the failure rate increases with SDIO activity on our SD card, so that makes kind of perfect sense:  bootup time involves a lot of SDIO read activity.  However, looking at the board layout, the traces are quite far from one another, and the supply voltage to the devices appears stable throughout, so the correlation was not obvious to me.

I was considering getting the kernel engineer to patch this function and just try a few more times to see if ULPI works.  Whilst running normally we don't see any particular issues with USB and can run for hours on end without fault.

Curious what others might think is going on here.  I haven't yet probed the ULPI bus, that will be the next task.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11763
  • Country: us
    • Personal site
Re: Rare (0.1%) failure of ULPI initialisation on embedded processor
« Reply #1 on: June 08, 2023, 03:58:20 pm »
Even if traces on the board are not close, they all come together inside the package. The interference may not even be observable outside.

I personally would add more information to that message - written and expected value. And may be extend the loop to write more values and don't bail on the first failure. It would show if some specific bit fails consistently. This may guide the investigation. It may be something simple like one of the data traces is longer/shorter and close to the setup/hold margins and a minor interference affects it.

Also, do a repeated read in case of failure to see if it is a read or a write error.

Working later one may not be a good indicator that there is no issue. USB and ULPI  are very fault tolerant, they will hide minor errors.
Alex
 
The following users thanked this post: tom66

Offline tom66Topic starter

  • Super Contributor
  • ***
  • Posts: 7031
  • Country: gb
  • Electronics Hobbyist & FPGA/Embedded Systems EE
Re: Rare (0.1%) failure of ULPI initialisation on embedded processor
« Reply #2 on: June 08, 2023, 06:00:03 pm »
Thanks.  I think it's worth trying a patch.

Looking at our test log, what I find particular interesting are there are no occurrences where the VendorID/ProductID is read wrong.

Instead, it is always the scratch register test that fails.   To me, this suggests a failure on write:  the read direction is okay, but any write (from SoC to ULPI) has a small chance of being corrupted, which leads to the value being read back wrong.

Indeed, on the C program I noticed the same anomaly.  If the register read back wrong, it reads back wrong every time after that, until it is re-written.
 

Offline PCB.Wiz

  • Super Contributor
  • ***
  • Posts: 1878
  • Country: au
Re: Rare (0.1%) failure of ULPI initialisation on embedded processor
« Reply #3 on: June 09, 2023, 03:46:18 am »
Retry on write is easy to test.
If that does not fix, perhaps the POR is bad ? Non monotonic Vcc rise can do that on some parts.
Of course, if retry on write does fix, you have to wonder how fragile normal run is.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf