Author Topic: Series defect on agilent 167xx boards?  (Read 37524 times)

0 Members and 2 Guests are viewing this topic.

Offline MateKrisz

  • Regular Contributor
  • *
  • Posts: 97
  • Country: hu
Re: Series defect on agilent 167xx boards?
« Reply #150 on: December 01, 2022, 08:55:42 pm »
I'm here again. My digital microscope has been arrived and I checked the 16720A PCB first time. I found a physical damage on the board. One trace is broken. I attached some picture about this. I think this is connect the U87 11 with the U72 ?? The clock hide this trace connection. I think need to replace this trace with extra wire on the pcb directly.
 

Offline ahakman

  • Regular Contributor
  • *
  • Posts: 87
Re: Series defect on agilent 167xx boards?
« Reply #151 on: February 13, 2023, 10:45:48 am »
I have a couple 16752a cards I'm trying to fix as I have another project that requires a logic analyzer. I fixed one by removing the plastic runners and doing some careful track repair. That one passes all self tests. But the second one is being very difficult.

On the second card, it fails the Memory Data Bus Test. It's the same single bits on 2 banks
Chip 0 Bank 0 Port 1 Bits 0x00000002 U2
Chip 0 Bank 1 Port 1 Bits 0x00000002 U64
Chip 1 Bank 0 Port 3 Bits 0x10000000 U60
Chip 1 Bank 1 Port 3 Bits 0x10000000 U90

U2 and U64 are on opposite sides of the board in the same location. Same with U60 and U90. I verified that their data bits are indeed connected together (in a way that's convenient for the layout, not necessarily D0 on one connects to D0 on the other).

I verified continuity between all data bits on U60 and U90 to the 33 ohm resistor packs, and verified that on the other side of that, I do indeed measure about 35+ ohms. From there, the signals go to the Virtex FPGA, and then the other side of the FPGA looks like it connects to the actual logic chip.

What I'm a little confused about is how can it be bit 28 on U60 and U90 when they're only 16 bits wide each?? Or are they setup in a 32 bit arrangement with their companion chips (U89 and U59), and if the failure is in the upper word, it calls out U60 and U90, but if the failure was in the lower word, it would call out U89 and U59??

How does the Chip / Bank / Port nomenclature work?
Chip 0 / 1 I get - the 2 main logic analyzer asics
Is Bank which FPGA memory controller it's talking to?
Is Port which set of DRAMs the FPGA is talking to?
Does U60 tell me the same information as Chip 1 Bank 0 Port 3?

Because it's a single bit error, and I've traced the signal from the DRAM through the 33 ohm resistor pack to the other side of that, and that goes directly to the FPGA, does that point at the bga ball under the FPGA? I find it hard to believe that one ball is broken on each of the outter-most FPGAs, but it could happen I suppose.
Could this also be an interconnect issue between the FPGA and the main logic analyzer chip?

Or is the U60 / U90 thing a complete red herring, and the problem is somewhere else entirely? If I run one of the later tests, I get the same bits failing (bit 1 and bit 28), but it calls out entirely different chip identifiers??? HUH??? I'd have to go stick the card back in the analyzer and run the tests again to check which test and which chips it was calling out, but the wrong bits are in the same positions, but it was on completely different chips (U37 seems to ring a bell). I also kind of read somewhere that if there are multiple self test failures, to basically ignore all the tests after the first one that failed, as they're all dependent on each other. Is that true?

I've traced out many of the lines, including through vias that were under or close to the runners and double sided sticky pads between the main logic analyzer chip and the FPGA that controls U60 / U90, and I can't find anything that looks or measures broken.

I'm a bit stumped on this one. The one card was relatively "easy" to fix (I guess if you count scraping solder mask with a pin under a microscope and repairing traces with a single strand of wire from a 22gauge wire 'easy'), and this one is the exact opposite.
 

Offline MarkL

  • Supporter
  • ****
  • Posts: 2099
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #152 on: February 13, 2023, 07:38:05 pm »
The two-connector (4 pod) analyzer boards are pretty much two independent acquisition engines.  It's split down the center and there is no crossover for acquisition data flow (although I think there is some muxing between the pods before  acquisition occurs in ASICs U22 and U45).

The data R/W access from the backplane to all the memory chips, however, is common between the two sides.  I think this is what it's complaining about, or at least is where I would look first.

As you've noted, the bit assignments are done based on what's best for the layout, and not how the chip manufacturers label the pins.  It's possible you're looking at a single problem, but I think probably not since the chips are on opposite edges of the board.

The nomenclature is a bit confusing.  To be honest I have not seen errors reported with "Chip 0" and "Chip 1" before, but I think it referring to the Virtex FPGA's as "0" and "1" within each side.  The big acquisition ASIC under the heatsink are numbered like this:

  16750A/51A/52A: U45 (pods 1+2 "Chip 9"), U22 (pods 3+4 "Chip 8")

And then I'm guessing on the pod 1+2 side:

  Chip 0 is U41
  Chip 1 is U52

And then on the pod 3+4 side:

  Chip 0 is U10
  Chip 1 is U25

Repeating... This is only a guess.  Was there any Chip8/9 information printed with these errors?

(EDIT: So this guess was wrong.  The discrepancy is from units running HP-UX vs. units running windows.  Chip 8/9 in HP-UX is Chip 0/1 in windows.  Keep reading...)


I would closely examine the large clump of traces going down the center of the board and heading towards the two FPGAs near the backplane connector, and mostly on the bottom since they are all directly under one of the runners.  I think the Altera MAX (U18) is responsible for backplane access to the acquisition memory, and on the 16750/1/2 boards is done through the Virtex.

Do you have more detailed output from the failing pv test with debug turned on (d r=10 d=9) ?  Are there any other errors being reported on the Memory Data Bus Test, or are any other pv tests failing?  In general, the first test to fail is the one to focus on fixing, since other tests very often fail as a result of the first failure.  But it's not to say to disregard clues that might exist in later failed tests, so it's at least worth running all the tests once to be aware of what else pops up.

If unable to find a break anywhere, one troubleshooting method is to put the failing test into a loop and then start perturbing operation of the various signal lines with a low resistance to ground.  Try a 33R to start, but you might need to go as low as 10R.  The idea is to see if you can generate the same error report on other bits and then try to zero in on which physical data line is having trouble.  The data lines are usually (but not always) in bit order next to each other, so you can usually tell when you're getting close to the culprit.

This method will also reveal a lot about the nomenclature as various errors are reported.  Perturbing data, address, and control signals on various memory chips will help create an understanding of Chip/Bank/Port and bit ordering.

You can also do this on your working card to try to recreate the error message you're seeing on the bad card.

It's frustrating not having any documentation for this level of diagnostics.  The exact point of failure could be sitting right in front of us and we'd never know it.
« Last Edit: February 14, 2023, 03:37:29 pm by MarkL »
 
The following users thanked this post: alm

Offline ahakman

  • Regular Contributor
  • *
  • Posts: 87
Re: Series defect on agilent 167xx boards?
« Reply #153 on: February 14, 2023, 09:47:15 am »
So I did some fault injection of my own. This is what I discovered.

When testing a 16752A in a 16903A chassis (the 3 slot 16900 series Windows XP chasis), the DRAM chip identifiers given by the self test are COMPLETELY WRONG!

Chip 0 = the main logic analysis chip for pod 1/2
Chip 1 = the main logic analysis chip for pod 2/3

So, it was telling me there was a fault on bit 0x10000000 of U90/U60. At first glance, this makes sense - U90 and U60 are on opposite sides of the board of eachother, and their data lines are connected together - U90's D15 connects to U60's D0 and so on.

Ok, I'll inject another fault on a different pin on U90 - let's inject a fault on D15... run the self test

new failure on U31 / U74 at bit 0x80000000. Ok, so the data bits on the bottom chip align with the numbering they're using here obviously, but the identifiers are completely wrong.

I was also very confused as to whether this was talking about the data bus between the FPGAs and the DRAMs, or between the FPGAs and the aquisition ICs, so I injected a fault there as well and ran some tests.

No change to the "Memory Data Bus Test", but a new fault on the "Analyzer chip memory bus test"

Ok, so "Memory Data Bus" = between the FPGAs and the DRAMS including all of the 33 ohm resistor packs (which were a huge problem on my card - I took most of them off, cleaned the pads, had to repair a couple pads as they were eaten away right where the pad transitions to the trace at the boundary of the opening in the solder mask, and soldered them all back - the corrosion on the solder joints of those on my card was pretty bad)

and "Analyzer chip memory bus" = between the acquisition ASICs and the FPGAs
Chip identifiers completely unreliable

Ok, now we're getting somewhere.

On the 16752A in 16903A Memory Data Bus test, it uses the nomenclature
Chip => bank => port

Chip = 0 / 1 - which acquisition ASIC or that general side of the board - pod 1/2 = chip 0, pod 3/4 = chip 1
Bank = 0 / 1 = Top / bottom side memories - not exactly sure which bank is which side of the board as the chips' data pins are wired together
Port = 0 to 3 => seems like each FPGA has 2 ports - and there's 2 FPGAs per ASIC. Each "port" is 4 chips (2 on each side of the board). At least for Chip 0, with the bottom of the PCB facing up, and the pod connectors towards you, the "ports" go from 0 in the middle of the board to 3 on the left side of the board. Port 0 = U76 U77 on the bottom and U36 and U37 on the top. Port 3 = U89 and U90 on the bottom and U59 and U60 on the top. The port numbering and the byte order in the ports follows no logical order, and is all over the place. Port 1 - the chips right by the central bus of traces that runs to the top section of the board - aka right next to where a runner with the double sided adhesive was. I finally found the right chip!

There's also a "BONUS" port on each Chip which seems to be the one extra DRAM that doesn't have a partner that's only on the top side.

On the Analyzer Chip Memory Bus test, it uses the same nomenclature, but drops "bank" and only talks about chip and port

So seeing as my failure on chip 0 is on port 1 bit 0x00000002, that would be U82 / U36, not U90 / U60 as the incorrect self test says.

Time to go poke around with the continuity tester now that I know where I'm actually looking for a fault!

I wonder how they managed to screw that up!
« Last Edit: February 14, 2023, 10:53:25 am by ahakman »
 

Offline ahakman

  • Regular Contributor
  • *
  • Posts: 87
Re: Series defect on agilent 167xx boards?
« Reply #154 on: February 14, 2023, 11:35:48 am »
Here's the issue - hard to tell in the photo, but that's a nodule of corrosion and obviously the track is completely eaten away between the pad and the trace.

[ Specified attachment is not available ]

Don't mind the resistor pack being crooked - I re-flowed them all with hot air - obviously I need to remove them and clean and inspect all those pads properly too, not just reflow with some flux.

What a mess
« Last Edit: February 14, 2023, 11:37:32 am by ahakman »
 

Offline ahakman

  • Regular Contributor
  • *
  • Posts: 87
Re: Series defect on agilent 167xx boards?
« Reply #155 on: February 14, 2023, 11:51:45 am »
Here's some context, if it can help someone else. I labeled the couple chips I know for sure by fault injection and running the self test (again, I stress what the self test reports on a 16903A 3ch Windows XP mainframe - I have some 16702B's I could try the test in as well in HP-UX - the 16752A cards are the only cards I have though that are new enough to work in the newer mainframe, which boots faster, has a better screen, and the hard drive doesn't sound like a jet engine running).

C0 P1 L = Chip 0 Port 1 Low Word

1716143-0
« Last Edit: February 14, 2023, 11:58:38 am by ahakman »
 

Offline MarkL

  • Supporter
  • ****
  • Posts: 2099
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #156 on: February 14, 2023, 03:04:42 pm »
Wow, I can't see any problem with that joint on the resistor pack.  A good reason to always do end-to-end continuity checks in areas with corrosion or on traces that transit corroded areas.  Another common location for breaks that are hard to spot are those solder blobs which are probe points.  The traces leading up to it are exposed for a very short distance after they come out from underneath the soldermask.

So, after reflowing does the card work now?

Thanks for the update on the nomenclature.  All my information is using a 16702B, and clearly they changed the chip 8/9 designation to chip 0/1 for the windows OS models.  And from your investigation, it sounds like they got much of the Uxx identifier reporting just plain wrong in the windows version.  Thanks, Agilent.

Each pod pair has 32 bits of data that is acquired and stored.  Plus, there are two extra bits for the clock/qualifier inputs that is also acquired and stored.  These are the Bonus bits, and I think are stored in U19/U47 (the DRAM chips that don't have a partner on the underside), or in whatever logical grouping that includes those two chips.

Some people have installed SCSI2SD adapters to get rid of the "jet engine" hard drive in the 167xx units.  But the drive noise is nothing compared to the chassis fans, IMO.
 

Offline ahakman

  • Regular Contributor
  • *
  • Posts: 87
Re: Series defect on agilent 167xx boards?
« Reply #157 on: February 14, 2023, 06:51:21 pm »
I didn't scrape the corrosion off, repair the trace and try it yet - it was about 3AM when I finally found the right chip by moving my induced fault around. Hopefully tonight I will get that fixed along with the other memory error on the other chip (after figuring out which port is the one that's failing) and look into what's causing the error withe the Bonus bank on Chip 1, then we'll see how many self tests still fail.

Yes, this card is very tricky compared with the first one I fixed. The first one had very obvious corroded traces that looked green under the solder mask. Scrape the mask off until you get to good copper on both sides, and solder in a jumper - problems solved.

This one is all about the ends of the traces where they meet pads being corroded, which are MUCH harder to spot visually. And the corrosion is further away from where the runners were. This card almost looks like it was above another card that was off gassing or something, and the corrosion is much more widespread. As I said earlier, ALL of those 33 ohm resistor packs looked just downright awful, and I can see now that some of them still need some attention. They didn't look the best on the other card that's working either, but better than this one. Long term, I probably need to remove, clean / rehab the pads, and re-solder ALL of them on the other card too - but that can wait until later.

I do want to get this up so i can actually use it for the project I have in mind. I should probably just stick the pair of 16550A cards that don't have runners and thus don't have any corrosion into a 16702B chassis and use that. For some reason I initially thought those weren't supported in a 16702B chassis, but looking at the compatibility matrix again, looks like they are - they're one of the 165xx series cards that works in the 167xx series mainframes (in the same way that the 16752a is one of the 167xx series card that works in the 169xx series mainframe)

The next issue is going to be cables - I see now that the 167xx series cards use a wider plug (not to mention that spacer built into the back of the card) than the 165xx cards do. I have some cables for the 165xx cards, but none for the 167xx series cards :(

 

Offline ahakman

  • Regular Contributor
  • *
  • Posts: 87
Re: Series defect on agilent 167xx boards?
« Reply #158 on: February 26, 2023, 02:41:46 am »
I was dragging my feet working on this again as I was waiting for my new soldering microscope to arrive.

I fixed all of the connections on all of the 33 ohm resistor packs, and now all my memory errors are gone! Now I only have 2 self test failures left (down from about 10 self tests failing before):
Comparators and ZoomChipSel

These are obviously in a different area of the board - back to the microscope to do some detailed inspection...

Edited to add: I found the comparitor problem - I think it was a trace I repaired previously, but there wasn't enough solder on my bodge wire and it wasn't making a good connection, or it could've been one of the vias I reflowed and got a bunch of weird looking junk out of). Sweet, both of my 16752A cards are passing self test now!

« Last Edit: February 26, 2023, 04:08:46 am by ahakman »
 
The following users thanked this post: alm

Offline MarkL

  • Supporter
  • ****
  • Posts: 2099
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #159 on: February 26, 2023, 07:55:10 pm »
Great!  Glad you found the problem.

The comparator control lines seem to be a common victim of corrosion.  It's likely the comparator failure(s) caused the zoom failures since the system uses the comparators to set up test data patterns for the zoom chips.

For reference for future readers, the comparators are 1NB4-5036 next to the pod connectors (top and bottom), and the zoom chips are 1NB4-5040 next to them (top only).
 

Offline ahakman

  • Regular Contributor
  • *
  • Posts: 87
Re: Series defect on agilent 167xx boards?
« Reply #160 on: February 26, 2023, 10:24:50 pm »
Yes, the comparators are right down by the pod cable connectors on the external side of the card, but I think the reference voltages come from a DAC that's way up close to the backplane connector. They run right under where one of the plastic runners was, and the trace was broken there. I had already bridged that trace with a wire, but it was completely disconnected from one side, either from cleaning the board with q-tips and acetone, or it just didn't solder well the first time.

For others reading this thread, just because it says "comparator failure" and the comparators are down on the external connector side of the board doesn't necessarily mean that the problem is there. Always focus on the areas with the plastic runners and the areas around where they were.

And if you have memory data bus errors, focus on all of the 33 ohm resistor packs. Especially focus on any signs of corrosion where the pads turn into traces right at the edge of the solder mask opening for the pads.

These are the kind of self test numbers I like to see:
1725992-0
« Last Edit: February 27, 2023, 12:21:54 am by ahakman »
 

Offline MarkL

  • Supporter
  • ****
  • Posts: 2099
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #161 on: February 27, 2023, 02:41:27 am »
Yes, the comparators are right down by the pod cable connectors on the external side of the card, but I think the reference voltages come from a DAC that's way up close to the backplane connector. They run right under where one of the plastic runners was, and the trace was broken there. I had already bridged that trace with a wire, but it was completely disconnected from one side, either from cleaning the board with q-tips and acetone, or it just didn't solder well the first time.

For others reading this thread, just because it says "comparator failure" and the comparators are down on the external connector side of the board doesn't necessarily mean that the problem is there. Always focus on the areas with the plastic runners and the areas around where they were.
...
100% agree.  It's the path between the DAC and the comparators that can get severed which causes these problems.  It's much more rare that it's an actual chip failure (although it has happened).

On the 16752A, the DAC is U39 (AD7841ASZ), and as you point out is near the backplane connector on the top.  The traces to the comparators run on the bottom of the card on the very outer edge on the POD1/2 connector side.  And as you say, they pass right under one of the runners.  The crossing is adjacent to U90, and is a favorite corrosion spot.

There is a similar looking set of traces on the opposite edge, also on the bottom, and if corroded these can create board ID errors.

There's a mapping of DAC output pins to comparator inputs earlier in this thread for anyone needing to do the end-to-end continuity check:

  https://www.eevblog.com/forum/repair/series-defect-on-agilent-167xx-boards/msg2720304/#msg2720304
 

Offline fpgaarcade

  • Contributor
  • Posts: 18
  • Country: se
Re: Series defect on agilent 167xx boards?
« Reply #162 on: March 22, 2023, 10:30:10 am »
Bit of an odd request.

I've been hunting on ebay for a while for a dead 167xx or similar board - anything with the modern low density probe connector. Most of them are in the US and the shipping costs a fortune.

I am producing a new high end FPGA board for retro gaming, and it has a daughterboard slot. I'm thinking about using the HP front end comparator and circuit around it, connected to the FPGA.
I should be able to get speeds of @1.6Gb per channel.

I see the pinout of the comparator is quite well understood but has anybody drawn a complete schematic yet?
Does it vary much with the highest speed boards, say the 2GHz 16751a?

I could do with the cable as well, I have some probes to play with..
I'm in Sweden and happy to pay for parts - although I do need at least one intact front end with all the parts present.

One reason is to make a modern logic analyser with huge depth, but the other is for real time debug/emulation of chips in arcade boards.


Thanks for reading.

Mike.
www.fpgaarcade.com
mike@fpgaarcade.com
 

Offline MarkL

  • Supporter
  • ****
  • Posts: 2099
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #163 on: March 22, 2023, 03:25:34 pm »
Bit of an odd request.

I've been hunting on ebay for a while for a dead 167xx or similar board - anything with the modern low density probe connector. Most of them are in the US and the shipping costs a fortune.

I am producing a new high end FPGA board for retro gaming, and it has a daughterboard slot. I'm thinking about using the HP front end comparator and circuit around it, connected to the FPGA.
I should be able to get speeds of @1.6Gb per channel.

I see the pinout of the comparator is quite well understood but has anybody drawn a complete schematic yet?
Does it vary much with the highest speed boards, say the 2GHz 16751a?

I could do with the cable as well, I have some probes to play with..
I'm in Sweden and happy to pay for parts - although I do need at least one intact front end with all the parts present.

One reason is to make a modern logic analyser with huge depth, but the other is for real time debug/emulation of chips in arcade boards.
...
Not an odd request at all; an interesting idea!  The probing is not a trivial piece to get right.  Why not take advantage of a probing system that's already out there and readily available.

The 16715A, 16716A, 16717A, 16718A, 16719A, 16740A, 16741A, 16742A, 16750A/B, 16751A/B, and 16752A/B all use the same front end design, including the DAC.  In fact, many of the cards are identical and only differ by model setting resistors.  The 16753A, 16754A, 16755A, 16756A, 16760A use a different front-end design and comparator.

The former all support 2GHz Timing Zoom (except the 16715A which has unpopulated areas for it), but that's just the sample clock rate.  If you're shooting for 1.6Gbps, you should take note that the max state capture is 400MHz in the 16750/1/2 cards, and channel-to-channel skew is only specified as <1.0ns.  Besides the acquisition ASICs, the front-end could be contributing to those limits.  When you get a board, you might want to measure the actual switching characteristics of the comparators in their natural habitat before proceeding with a design.

Unfortunately schematics don't exist.  Some of the passives connected to the incoming data lines are unlabeled and would need to be measured with appropriate high-frequency gear.  The easiest approach would probably be to duplicate their layout exactly and lift all the front-end components from the board.  Length-matched traces may include some post-comparator delay compensation, so you may need to tweak lengths in your final design.

I'm in the US, so I'm unfortunately in the category of "costs a fortune" shipping.
 

Offline fpgaarcade

  • Contributor
  • Posts: 18
  • Country: se
Re: Series defect on agilent 167xx boards?
« Reply #164 on: March 22, 2023, 09:52:37 pm »
Thanks for the detailed response.
I can get hold of a working 1680A for a bit which looks to use the same front end? and I can probe around that. I've got access to a decent 'scope at work.

The test mode feature of the comparator is interesting, I should be able to use that to compensate for delays between the front end and the FPGA.

I doubt I'll get the layout quite as good as the original, but hopefully sufficient.

The MPSoC Xilinx device I am using is quite a beast, with Ethernet, USB and a couple of built in ARMs. It will be able to stream the captured data direct to the connected DDR4 memory.

 

Offline MarkL

  • Supporter
  • ****
  • Posts: 2099
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #165 on: March 23, 2023, 01:13:32 am »
Thanks for the detailed response.
I can get hold of a working 1680A for a bit which looks to use the same front end? and I can probe around that. I've got access to a decent 'scope at work.
...
I don't have a 1680A, but it uses the same 40-pin probing system as the cards mentioned previously.

I was able to find a few teardown photos which show the 1680A acquisition card.  The photos weren't good enough to read the number off the comparator, but it has the right number of pins, and the layout and passives look the same on the input side.  The layout of the test clock area also looks the same.

You'll know for sure when you get it open and can verify if it's using the 1NB4-5036 comparator.

Please post your findings if you can - thanks!
 

Offline fpgaarcade

  • Contributor
  • Posts: 18
  • Country: se
Re: Series defect on agilent 167xx boards?
« Reply #166 on: March 23, 2023, 11:27:28 am »
I will!
 

Offline dorkshoei

  • Frequent Contributor
  • **
  • Posts: 493
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #167 on: March 24, 2023, 08:59:25 pm »
Well that's interesting.

A long time ago I bought my first 16xxxx, a 16700b.  One of the installed cards was a 16534a and it passed all the self-tests and showed valid calibration.  It seemed to work based on some minimal usage but I'd never re-run the calibration.

A year ago I bought a  mint 16702b from a local seller on Craigslist.  He assured me it passed self-tests but it turned out he thought that meant it powered on  :-/O   All three of the 16752a cards failed test (I've got two working since).   The 16720a passed self test (and failed spectacularly a week later, now every test fails).  The 16534a passed self test.

When I got home I happened to notice that C107 on the 16702b/16534a card had imploded (pic).   Given the cap type and location I doubted it was critical but not exactly encouraging.

This week I fixed the damaged pcb/cap and got around to running calibration on both the 16534a cards using a T-cable setup from AliExpress  I expected one card to pass fine and it did.  The other immediately threw "PROBLEM: either the cables are not connected properly or there is a serious problem with this module".  Continuing on only 2 of the 7 tests pass, hysterisis and trigger level.

What was unexpected was that it's the original module from the 16700b that's failing.  The one with the repaired cap is working fine.   Never assume LOL.

Question: So on the working one I'd previously removed the old runners and bought some of the recommended 3M tape.   Before I apply conformant coating and reinstall the runners and bless it good are there any other tests I should run.   I see a bunch of other tests in the service manual (hooking upto multimeter and signal generator).   

[I'll read through this thread for tips, I recall some from MarkL, on fixing the non-working card]


Thanks!
« Last Edit: March 24, 2023, 09:03:22 pm by dorkshoei »
 

Offline ahakman

  • Regular Contributor
  • *
  • Posts: 87
Re: Series defect on agilent 167xx boards?
« Reply #168 on: March 25, 2023, 06:40:43 am »
So I've managed to get 2 more 16752A cards working, of 3 more that I bought. After my first round of repair attempts on this batch of 3, I had one working (which I've stacked with my original 2 I repaired before, filling my 16903A chassis - but I have a line on a 16902B so it would still be nice to get all 5 cards working), one failing the ZoomAcq test, and one failing the Memory Unload Modes Test.

The one failing the ZoomAcq test seemed very suspiciously similar to the failing PLL chip problem reported earlier in this thread, so I just swapped the PLLs on the 2 boards that had problems, and now I have one card that works entirely, and presumably one that's failing both memory unloads and the ZoomAcq tests now.

Does anyone still have any "beyond repair" cards they could scavenge a PLL chip from? Or that maybe are not quite as "beyond repair" as they thought?

Does anyone have experience with where the fault would be for the Memory Unload Modes Test? Reading the service manual, that test tests reading the memory data off the card to the backplane (so presumably to the CPLD chips close to the backplane connector). The card that's failing that test had by far the worst corrosion on it. I suspect one of the vias around the middle runner (with the huge parallel bus of tracks that runs up the middle of the card), or the next 2 runners towards the "POD 1/2" side of the card - there was some very nasty corrosion there, but I just can't see anything that looks broken after I cleaned it all up. I tried probing a few of the most suspect vias on a known good card to see if I could find where they went, so I could test which one was broken on the broken card, but I wasn't able to trace some of them (inner layer traces to BGA pads I'm thinking??)
« Last Edit: March 25, 2023, 06:53:06 am by ahakman »
 

Offline MarkL

  • Supporter
  • ****
  • Posts: 2099
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #169 on: March 25, 2023, 07:33:34 pm »
...
This week I fixed the damaged pcb/cap and got around to running calibration on both the 16534a cards using a T-cable setup from AliExpress  I expected one card to pass fine and it did.  The other immediately threw "PROBLEM: either the cables are not connected properly or there is a serious problem with this module".  Continuing on only 2 of the 7 tests pass, hysterisis and trigger level.

What was unexpected was that it's the original module from the 16700b that's failing.  The one with the repaired cap is working fine.   Never assume LOL.
Just to be clear, is the 16534A that's failing its cal passing all the self-tests?  If you run it anyway, can you see any traces with a signal applied to one or both channels?

Quote
Question: So on the working one I'd previously removed the old runners and bought some of the recommended 3M tape.   Before I apply conformant coating and reinstall the runners and bless it good are there any other tests I should run.   I see a bunch of other tests in the service manual (hooking upto multimeter and signal generator).   
I've never worked through the performance section.  It seems like it's a fair amount of work and wouldn't gain you that much unless you were using the card to produce verifiable test results.

The only thing I thing I think would be useful is to check is that the 50R termination is working on the specified attenuator ranges.  I don't think the 50R terminator is checked in either the self-tests or cal.  I've had attenuators where the the 50R termination relay was flaky, and one where the resistor itself was blown.
 

Offline MarkL

  • Supporter
  • ****
  • Posts: 2099
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #170 on: March 25, 2023, 08:08:24 pm »
...
The one failing the ZoomAcq test seemed very suspiciously similar to the failing PLL chip problem reported earlier in this thread, so I just swapped the PLLs on the 2 boards that had problems, and now I have one card that works entirely, and presumably one that's failing both memory unloads and the ZoomAcq tests now.

Does anyone still have any "beyond repair" cards they could scavenge a PLL chip from? Or that maybe are not quite as "beyond repair" as they thought?
Wow, another dead PLL.  Can you see any clock output from the dead one?

The PLL can be had on any of the 167xx cards with Timing Zoom except the 16715A.  It might help to know your country for anyone who can scrounge up a PLL.

Quote
Does anyone have experience with where the fault would be for the Memory Unload Modes Test? Reading the service manual, that test tests reading the memory data off the card to the backplane (so presumably to the CPLD chips close to the backplane connector). The card that's failing that test had by far the worst corrosion on it. I suspect one of the vias around the middle runner (with the huge parallel bus of tracks that runs up the middle of the card), or the next 2 runners towards the "POD 1/2" side of the card - there was some very nasty corrosion there, but I just can't see anything that looks broken after I cleaned it all up. I tried probing a few of the most suspect vias on a known good card to see if I could find where they went, so I could test which one was broken on the broken card, but I wasn't able to trace some of them (inner layer traces to BGA pads I'm thinking??)
It's really better to check continuity via to via on all the traces running through or near corroded areas.  Extremely sharp probes pushed into the via holes at an angle works well.  On multiple occasions I've had traces with no visible breaks and it turned out the corrosion had gotten under the soldermask.  It can take some time to do the testing.  But you're right, it could also have eaten away the via hole plating, and that's happened to me too.

The HP-UX based analyzers are able to turn on detailed debugging output when running any of the verification tests (pv).  Is there any more detail from the windows version on which bit(s) and/or chips are failing during "Memory Unload Modes Test"?

On the 1675x cards I think the acquisition memory access path from the backplane is through one of the FPGAs near the backplane connector (I think it's the Altera MAX), up to the Virtex FPGAs on top, and then back down to the actual DRAM chips.  On the 1671x cards, it goes direct from the backplane controller FPGA to the memory chips (there are no Virtex FPGAs acting as a memory controller layer).
 

Offline dorkshoei

  • Frequent Contributor
  • **
  • Posts: 493
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #171 on: March 25, 2023, 08:35:44 pm »
Just to be clear, is the 16534A that's failing its cal passing all the self-tests?  If you run it anyway, can you see any traces with a signal applied to one or both channels?
The card had been working (at least for my usage) the previous time I tried.  Now it passes self-test but fails most of the cal.  I'd have to try again to see if it's still showing traces.

Quote
I've never worked through the performance section.  It seems like it's a fair amount of work and wouldn't gain you that much unless you were using the card to produce verifiable test results.
I'm not.  I just don't want to glue down the new runners only to find there is a fault :-)
 

Offline MarkL

  • Supporter
  • ****
  • Posts: 2099
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #172 on: March 25, 2023, 08:36:50 pm »
One further thought on the failing 16534A scope card...

It's worth the time to check the on-board regulator outputs before heading down other troubleshooting rabbit holes.  There are five regulators and their output voltages are all labeled on the top of the board.

I had one card that self-tested fine, but consistently failed calibration because one of the output setting resistors had gone bad.

I've also had bad output setting resistors on logic analysis cards too, so it's never a bad idea to verify regulator outputs on these cards also.  I remember one card that had a very out of spec ECL termination voltage, which caused a number of self-tests to fail.

Bad resistors occur more often on cards that have corrosion.  Maybe the corrosion is getting into the film on the resistor, but I've never been able to see any damage under a microscope.  It's a just a correlation at the moment.
 

Offline MarkL

  • Supporter
  • ****
  • Posts: 2099
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #173 on: March 25, 2023, 08:47:47 pm »
...  I just don't want to glue down the new runners only to find there is a fault :-)
Understood.  I left my runners off, and I dislike conformal coatings passionately.  Time will tell if I'm wrong.
 

Offline MarkL

  • Supporter
  • ****
  • Posts: 2099
  • Country: us
Re: Series defect on agilent 167xx boards?
« Reply #174 on: March 26, 2023, 06:00:17 pm »
On the 16534A card again...

Besides the previous question on the self-test, are the cal errors associated with one channel or both?

If faulty on only one channel, a useful troubleshooting technique is to compare signals on equivalent nodes between the working and broken channel.

Similarly, if you have a fully working card, you can use that as your comparison.  A working card is a really useful resource, given the lack of any detailed documentation for these units.

You can probe two cards at the same time by using a 16701B expansion chassis, or a home-made card extender.  User DogP has some gerbers available for an extender that works well:

  https://www.eevblog.com/forum/repair/series-defect-on-agilent-167xx-boards/msg4031926/#msg4031926

There was another user who had done something similar (or was working on it), and I think it was a full-length extender card.  Can't find the post at the moment.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf