I'll add some new details since this thread might have some more visibility, being associated with the video and all:
Why does a hard pull-up work?It looks like the LPC bus just rests at high using a typical pull-up (tens of kOhm) and gets driven low by the devices. The drivers for the clock lines must degrade to a point where they develop a parasitic resistance in parallel with the driver transistor that's low enough enough to overcome the normal pull-up. Adding a hard pull-up allows for enough current to flow for there to be a usable voltage at the clock output.
Why do systems fail to boot?The tl;dr is that one of the clock output pins is used at SoC power-on to choose the boot device - either an LPC or SPI flash memory. SPI is the default, LPC can be selected by pulling it down at power-on. Since basically no C2000 system boots from an LPC flash, this places the CPU in a state where it tries to read from a flash memory that does not exist. There are several open questions (the specific pins don't seem to be correct, there are like 4-6 different clock outputs that would be suitable, how is the LPC clock supposed to work if it's pulled down externally, etc.), but that's the clearest explanation anyone's publicly provided.
Is the mod safe?With a 150 Ohm resistor, that's some 14 mA of current constantly being sunk by the SoC, per bodged pin (many systems use at least two clocks, one for the BMC and one exposed to the TPM header). Not great, but not terrible if each pin has an absolute max rating of 50 mA, which seems to be a standardish rating for Parallel PCI clock buffers (the design of which was likely borrowed wholesale for the LPC bus).
What's the official fix?Short of a new SoC, which nobody actually seems to have done, it's bodge resistors on any affected LPC clock lines. Supermicro uses 150 Ohm, ASRock uses 120 Ohm.
Clock lines plural?Many/most systems need more than just the one clock line used for boot:
- TPMs connect over the LPC bus (hence why it's exposed in the first place). Not typically critical, except for the whole wrong boot device thing.
- BMCs connect over the LPC bus for in-band management, to provide sensors info and to provide SuperIO functionality. Typically not critical, but very annoying. Does not break iKVM, but don't lose your BMC password.
- Some systems have additional SuperIO controllers for fans, serial, etc. (ASRock uses a BMC and a SuperIO, because they like wasting money I guess.)
I guess Intel's reference design just puts the TPM on the clock line that controls the boot device and everyone followed along, or something. Fortunate coincidence.
More info (and examples of fixes for Supermicro and ASRock Rack boards):https://www.eevblog.com/forum/microcontrollers/intel-atom-c2000-failures/https://www.truenas.com/community/threads/fyi-intel-c2000-family-of-processors-system-fault-may-lead-to-dead-system.50314 (Shameless plug alert)