So I did some fault injection of my own. This is what I discovered.
When testing a 16752A in a 16903A chassis (the 3 slot 16900 series Windows XP chasis), the DRAM chip identifiers given by the self test are COMPLETELY WRONG!
Chip 0 = the main logic analysis chip for pod 1/2
Chip 1 = the main logic analysis chip for pod 2/3
So, it was telling me there was a fault on bit 0x10000000 of U90/U60. At first glance, this makes sense - U90 and U60 are on opposite sides of the board of eachother, and their data lines are connected together - U90's D15 connects to U60's D0 and so on.
Ok, I'll inject another fault on a different pin on U90 - let's inject a fault on D15... run the self test
new failure on U31 / U74 at bit 0x80000000. Ok, so the data bits on the bottom chip align with the numbering they're using here obviously, but the identifiers are completely wrong.
I was also very confused as to whether this was talking about the data bus between the FPGAs and the DRAMs, or between the FPGAs and the aquisition ICs, so I injected a fault there as well and ran some tests.
No change to the "Memory Data Bus Test", but a new fault on the "Analyzer chip memory bus test"
Ok, so "Memory Data Bus" = between the FPGAs and the DRAMS including all of the 33 ohm resistor packs (which were a huge problem on my card - I took most of them off, cleaned the pads, had to repair a couple pads as they were eaten away right where the pad transitions to the trace at the boundary of the opening in the solder mask, and soldered them all back - the corrosion on the solder joints of those on my card was pretty bad)
and "Analyzer chip memory bus" = between the acquisition ASICs and the FPGAs
Chip identifiers completely unreliable
Ok, now we're getting somewhere.
On the 16752A in 16903A Memory Data Bus test, it uses the nomenclature
Chip => bank => port
Chip = 0 / 1 - which acquisition ASIC or that general side of the board - pod 1/2 = chip 0, pod 3/4 = chip 1
Bank = 0 / 1 = Top / bottom side memories - not exactly sure which bank is which side of the board as the chips' data pins are wired together
Port = 0 to 3 => seems like each FPGA has 2 ports - and there's 2 FPGAs per ASIC. Each "port" is 4 chips (2 on each side of the board). At least for Chip 0, with the bottom of the PCB facing up, and the pod connectors towards you, the "ports" go from 0 in the middle of the board to 3 on the left side of the board. Port 0 = U76 U77 on the bottom and U36 and U37 on the top. Port 3 = U89 and U90 on the bottom and U59 and U60 on the top. The port numbering and the byte order in the ports follows no logical order, and is all over the place. Port 1 - the chips right by the central bus of traces that runs to the top section of the board - aka right next to where a runner with the double sided adhesive was. I finally found the right chip!
There's also a "BONUS" port on each Chip which seems to be the one extra DRAM that doesn't have a partner that's only on the top side.
On the Analyzer Chip Memory Bus test, it uses the same nomenclature, but drops "bank" and only talks about chip and port
So seeing as my failure on chip 0 is on port 1 bit 0x00000002, that would be U82 / U36, not U90 / U60 as the incorrect self test says.
Time to go poke around with the continuity tester now that I know where I'm actually looking for a fault!
I wonder how they managed to screw that up!