Author Topic: Intel Atom C2000 Failures  (Read 5035 times)

0 Members and 1 Guest are viewing this topic.

Offline bson

  • Supporter
  • ****
  • Posts: 1640
  • Country: us
Intel Atom C2000 Failures
« on: September 27, 2017, 09:12:48 pm »
Does anyone know more about the Atom C2000 family failures?  In the errata sheet, Intel states:
Quote
AVR54. System May Experience Inability to Boot or May Cease Operation
Problem: The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock
outputs) may stop functioning.
Implication: If the LPC clock(s) stop functioning the system will no longer be able to boot.
Workaround: A platform level change has been identified and may be implemented as a workaround
for this erratum.
Status: For the steppings affected, see Table 1, “Errata Summary Table” on page 9.

The LPC is a PCI-to-ISA bridge controller and is one of the two supported BIOS boot locations; the C2000 can either boot from SPI (default) or LPC/ISA (set via external sense pins at powerup).  This is fixed in a stepping, and curiously the "fix" consists of eliminating the ability of muxing the LPC bus pins with GPIO - they no longer become software selectable.  This is pretty much ALL I've been able to find on the subject.  There's a workaround which consists of adding an external 100 ohm resistor, but it's not clear what pins this is added to.  It's added across two pads on a connector on some Synology NAS units, so it's not an output current limiter but almost certainly a stiff pullup or pulldown.  This leads me to suspect it really goes on a configuration sense pin.  Intel hasn't made their "platform level change" public.  Tracing it out on a board is kind of hard since the SoC is a large BGA package that would need to be desoldered.

Does anyone know more about this?  Like, for example, where the resistor is added - in particular is it added to the LPC clock outputs, or to the sense inputs? 

It's also not clear if the clock output actually fails, or this is merely a convenient symptom any engineer with a scope can identify.  (The two LPC clocks are only 25MHz.)  Some possible root causes I can think of are:

1. The sense input pullup is underdimensioned and fails, resulting in the CPU trying to fetch boot firmware from SPI.
2. The sense configures it for LPC boot while the pins are reset to GPIO, resulting in duplicate pin drivers that short out internally.
3. 1+2 -  multiple sense inputs with slightly different thresholds result in inconsistent pin configuration with both pin drivers enabled.
4. The clock pin output driver actually dies.

#4 sounds simple and straightforward, but somewhat implausible to me.  This isn't Intel's first rodeo, and besides how would an external resistor help with this?

Here's the C2000 family datasheet:
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/atom-c2000-microserver-datasheet.pdf

Errata:
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/atom-c2000-family-spec-update.pdf
« Last Edit: September 27, 2017, 09:15:02 pm by bson »
 
The following users thanked this post: rthorntn

Offline amyk

  • Super Contributor
  • ***
  • Posts: 6767
Re: Intel Atom C2000 Failures
« Reply #1 on: September 28, 2017, 02:23:14 am »
 
The following users thanked this post: rthorntn

Offline Monkeh

  • Super Contributor
  • ***
  • Posts: 6303
  • Country: gb
Re: Intel Atom C2000 Failures
« Reply #2 on: September 28, 2017, 04:31:04 am »
#4 sounds simple and straightforward, but somewhat implausible to me.  This isn't Intel's first rodeo, and besides how would an external resistor help with this?

From what I've been able to gather, this is exactly what's happening - the resistor is connecting a different clock pin to the LPC bus. This is not the first time Intel has had consistent aging failures like this.
 
The following users thanked this post: rthorntn

Offline bernroth

  • Regular Contributor
  • *
  • Posts: 125
  • Country: de
Re: Intel Atom C2000 Failures
« Reply #3 on: November 28, 2017, 03:40:02 pm »
I made a post some time ago:

https://supportforums.cisco.com/t5/firewalling/clock-signal-repair-pictures-isr4300-asa-isr4400/m-p/3088505/highlight/false?attachment-id=107384

The fix applied by Cisco is by putting a 110 ohm pull-up resistor from either LPC_CLKOUT0 or LPC_CLKOUT1 to +3.3V

I don't want to talk about their repair quality  |O

Currently I am trying to fix another Cisco router. I found the signals LPC_CLKOUT0 and LPC_CLKOUT1.

Does anyone know which one of these pins requires the pull-up?
Maybe I'll just put two pull-up resistors :)

 
The following users thanked this post: rthorntn

Offline hfiennes

  • Newbie
  • Posts: 1
  • Country: us
Re: Intel Atom C2000 Failures
« Reply #4 on: October 07, 2019, 09:01:13 pm »
Ok, a very late reply but my Synology DS415+ box died, and the 100 ohm resistor (across pins 1 & 6 of a 12 pin, 2mm header) made it work again.

It's all back together now so I can't really verify this theory, but do we know *how* the clock output dies? The register article quotes intel as saying "a degradation of a circuit element under high use conditions at a rate higher than Intel’s quality goals after multiple years of service.".

https://www.theregister.co.uk/2017/02/06/cisco_intel_decline_to_link_product_warning_to_faulty_chip/

Could the issue be the PFET in the clock driver dying? If that was the case, a strong pull-up on the clock line - meaning only the NFET needs to be functional to get a clock out of the pin - would indeed solve the issue. This seems to square with the Cisco fix too - it's just a strong pull-up.

(it also means that there's 33mA being sunk by the driver 50% of the time, which makes me fear for the longevity of the fix - and whether something like the smart pull-up on an I2C FM+ bus would stress the D2000 less)
« Last Edit: October 07, 2019, 09:06:04 pm by hfiennes »
 
The following users thanked this post: rthorntn

Offline rthorntn

  • Frequent Contributor
  • **
  • Posts: 304
  • Country: au
Re: Intel Atom C2000 Failures
« Reply #5 on: November 18, 2019, 05:42:45 am »
I have five c2xxx Supermicro motherboards, one is a "2013" A1SAi-2750F and the rest are "2015" A1SAi-2550F's.

Basically I'm wondering out loud if I should preemptively mod these, or just wait for them to die and fix them, I don't run them 24/7 atm and I wouldn't put anything business critical on them now, how would one go about figuring out where to stick the resistor?

Thanks.
« Last Edit: November 18, 2019, 06:12:29 am by rthorntn »
 

Online EEVblog

  • Administrator
  • *****
  • Posts: 31256
  • Country: au
    • EEVblog
Re: Intel Atom C2000 Failures
« Reply #6 on: February 26, 2020, 06:24:23 am »
Ok, a very late reply but my Synology DS415+ box died, and the 100 ohm resistor (across pins 1 & 6 of a 12 pin, 2mm header) made it work again.

It's all back together now so I can't really verify this theory, but do we know *how* the clock output dies? The register article quotes intel as saying "a degradation of a circuit element under high use conditions at a rate higher than Intel’s quality goals after multiple years of service.".

https://www.theregister.co.uk/2017/02/06/cisco_intel_decline_to_link_product_warning_to_faulty_chip/

Could the issue be the PFET in the clock driver dying? If that was the case, a strong pull-up on the clock line - meaning only the NFET needs to be functional to get a clock out of the pin - would indeed solve the issue. This seems to square with the Cisco fix too - it's just a strong pull-up.

(it also means that there's 33mA being sunk by the driver 50% of the time, which makes me fear for the longevity of the fix - and whether something like the smart pull-up on an I2C FM+ bus would stress the D2000 less)

I just shot a video on this after finding a DS415+ in the dumpster!
Yes,m the resistor fix works, and I assumed it was bypassing a clock somehow but couldn't trace exact details.
 
The following users thanked this post: hsn93, Marco1971

Offline Mazian

  • Newbie
  • Posts: 1
  • Country: us
Re: Intel Atom C2000 Failures
« Reply #7 on: May 14, 2020, 06:46:35 pm »
Quote from: rthorntn on November 18, 2019, 05:42:45 am
I have five c2xxx Supermicro motherboards, one is a "2013" A1SAi-2750F and the rest are "2015" A1SAi-2550F's.

Basically I'm wondering out loud if I should preemptively mod these, or just wait for them to die and fix them, I don't run them 24/7 atm and I wouldn't put anything business critical on them now, how would one go about figuring out where to stick the resistor?

Bit of a late followup, but... I'm also running an A1SAi-2750F, and a friend pointed me at the very helpful DS415+ video.  A user on another forum found the pins for a similar board, and the manual for the A1SAi boards has the same header shown on page 43, the JTPM1 header.  I made a 100 ohm jumper and popped it onto the board across pins 1 (LPC clock) and 9 (+3.3V):



Can't be 100% sure it's doing anything, since my board hadn't died yet, but at least it didn't make it worse!
 
The following users thanked this post: awallin

Online awallin

  • Frequent Contributor
  • **
  • Posts: 656
Re: Intel Atom C2000 Failures
« Reply #8 on: May 15, 2020, 10:44:35 am »
Thanks for posting this!  :-+

We tried this on our SuperMicro C2000s, and it does work! One of these was run 24/7 since 2015 and died last week - now back from the dead  8)
IIRC these are SuperMicro 5018A-MLTN4 with A1SAM-2550F mb.
991776-0
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf