Author Topic: SRAM woes, ISA bus diagnostics (Read 8696 times)

CkRtech · « **on:** April 29, 2017, 10:51:11 pm »

Greetings eevblog,

I recently restored an old motherboard (detailed here) and have turned my attention to testing it prior to finishing up repairs.

Sadly, it looks like I am having issues with the cache - 256k of cache - 32k x 8 across eight sockets with an extra for tag. I've adjusted timing, speeds, wait states, - exhausted all I can to try to make it happy with 256k of cache. I have freezes, graphics corruption, reboots, the works - all of which disappears if I turn off External cache.

I tried pulling cache from a computer in my garage that seemed to be functioning properly, only to encounter the same problems. I returned the donor cache to the donor computer and now IT seems to not even boot with external cache enabled.

I have yet another system in the garage with some cache, but I am seriously concerned that I have turned myself in an Evil Cache Killer. I have a pretty good anti-static workstation, but I am quiet hesitant.

1: How well does 20+ year old SRAM hold up?
2: What do you like to do to test, and can you recommend an affordable tester of some sort?
3: Should I poke around the now-empty sockets with a scope or something to look for noise? I am fairly certain my PSU is fine.
4: How concerned would you be given the following situation I described above? Is it bad hardware, bad luck, or stupidity?

Appreciate all feedback.

Rasz · « **Reply #1 on:** April 30, 2017, 05:13:15 am »

1 who cares, you can buy new old stock chips for single dollars
2 you could do something on a breadboard with arduino
3 start with multimeter checking for shorts to ground/supply lines, and supply itself
4 meh, could be anything, you never posted full picture of the board/model number, so no idea how far cache was from the leak

helius · « **Reply #2 on:** April 30, 2017, 06:20:39 am »

For a time in the 1990s, PC clones were sold with cache SRAM chips that were fake. This was one of the first widespread examples of fake or remarked chips (knowingly used by the board makers, such as PC Chips).

Rasz · « **Reply #3 on:** April 30, 2017, 07:28:01 am »

Quote from: helius on April 30, 2017, 06:20:39 am

For a time in the 1990s, PC clones were sold with cache SRAM chips that were fake. This was one of the first widespread examples of fake or remarked chips (knowingly used by the board makers, such as PC Chips).

and if you think it was just one isolated incident

)) here is an interview with Art Astrin, Apple RF engineer responsible for desktop Wifi revolution (first 801.11b product? I think they started selling before standard was ratified). This type of thing is really common in China/Taiwan.

the case of vanishing capacitors, replaced by glued in plastic cubes, because cheaper!!1

CkRtech · « **Reply #4 on:** April 30, 2017, 09:33:52 am »

Corrosion is (was) on opposite side of the board from cache and cache controller. I have a datasheet on this cache, although the ones listed on that sheet don't go down to 20ns.

It is typical for boards to have 8+1 chips on them. If it is possible that only one is bad, it doesn't make sense to spend the money to buy a whole new set - the counterpoint would naturally be the question of how quickly the cost of a tester offsets the cost of a full replacement... however, this will definitely not be the only board I work on.

Of course it is possible that there is another power/thermal/speed related issue causing problems (changing the video card makes it slightly more stable - but not 100%). I have tweaked plenty of BIOS settings, altered jumpers, and used a different power supply to try to make it happy - but something is still causing it not to be. Problem seems to disappear with external cache off - but is that from faulty cache, an angry controller IC, or fed up tantalum filter caps? I've been swimming in it for awhile now.

Photo is from earlier. I have the board in a case and covered with components/wires/etc at the moment.

TassiloH · « **Reply #5 on:** April 30, 2017, 10:33:20 am »

Hi,

Quote

256k of cache - 32k x 8 across eight sockets with an extra for tag.

this might be a stupid question, but you noticed that the 9th chip (the tag SRAM) is likely a different type and must not be mixed up with the other 8, right?

stj · « **Reply #6 on:** April 30, 2017, 04:51:49 pm »

9th chip is usually smaller and used for parity bits.

drussell · « **Reply #7 on:** April 30, 2017, 07:56:01 pm »

Quote from: stj on April 30, 2017, 04:51:49 pm

9th chip is usually smaller and used for parity bits.

Uhh... No...

A chip used for parity would have to be the same as the ones it is doing parity for.

A TAG ram is used to store just the locations of what is currently being mirrored in the cache so the system knows what it can grab from the cache instead of having to go to main memory.

stj · « **Reply #8 on:** April 30, 2017, 08:14:50 pm »

go look at some old 486 class motherboards.

CkRtech · « **Reply #9 on:** April 30, 2017, 10:19:21 pm »

All the chips in the board are the same. 32k. The only requirements of the tag chip (as per the printed manual I fortunately still possess) is that if you use 128kb of cache, the tag should be 8kb. If you use 256kb of cache, the tag should be 32kb.

Devil's advocate - In defense of what has been said regarding parity and the pairing of a x9 tag chip with other x8 chips - Some chipsets do have stricter requirements on how you add and configure external cache.

A little update from my end - Trying something a bit different here. I elected to disable the internal L1 (welcome, slowness) and leave L2 enabled. Benchmark has been looping for several hours now. No glitches, freezes, or reboots.

james_s · « **Reply #10 on:** May 01, 2017, 03:32:22 am »

SRAM chips are quite fragile, they are easily damaged by ESD. Do any of the other boards you have work? A known working board would be a good way to test the cache. It's entirely possible that you have damage remaining somewhere which is affecting the cache, it only takes one bad via or corroded trace to make it not work.

CkRtech · « **Reply #11 on:** May 01, 2017, 07:43:56 am »

So one thing that is interesting is that I managed to get the board to work with L1 and L2 enabled if I changed the AT clock divider to CPUCLOCK/8. It should be at CPUCLOCK/4 (33 MHz / 4 in this case)

I ran 40+ loops of a Doom demo level (my go-to test at the moment), and every single one of them completed without a freeze/crash. Previously, with the same ISA video card with the AT bus at the system auto-selected (and would-be accurate) CPUCLOCK/4, it would crash 1 out of 6 times.

Putting in the VESA (VLB) card, it crashes on the first demo as always. VLB has a much stricter timing requirement.

All this testing was done with L2 turned on.

My theory is that the crystal used for bus timing has an issue - be it inaccurate to begin with or riddled with too much noise.

I imagine this means that my SRAM issue is not an SRAM issue.

Running at CPUCLOCK/4 with L1 off but L2 on also produced stable results - just extremely slow, of course.

Humbly, I submit that perhaps I damaged the crystal with heat as it was on the component side of the board between the two slots I desoldered during my clean-up phase.

stj · « **Reply #12 on:** May 01, 2017, 12:21:23 pm »

i doubt it, a 32Khz watch crystal and a pair of parallel crap ceramics is hardly precise.

my first thing to check would be the electrolytics and the cache chipsockets.

Rasz · « **Reply #13 on:** May 01, 2017, 01:57:17 pm »

do you have a scope? I would suspect decoupling
looks like signal integrity problem, enabled L2 cache means cpu can bang on the ISA bus faster and this most likely exposes whatever weakness you got on the board (still corroded vias or aged caps).
does it manifest on every isa slot?

you didnt mention anything about the resistor banks RNx RPx, those are responsible for signal termination on fast parallel buses

drussell · « **Reply #14 on:** May 01, 2017, 03:24:08 pm »

Yeah, that little crystal is just the clock for the real time (wall) clock, not the xx MHz CPU clock. This board sounds new enough to use a programmable clock generator chip so you can set the various speeds rather than a single discrete oscillator running at the system CPU clock rate that was common up to 33 and 40 MHz 386 level machines.

I agree with the above posted idea that your problems are likely bus related.

What manufacturer and model is that board? I can't see it mentioned anywhere in either thread.

drussell · « **Reply #15 on:** May 01, 2017, 04:47:52 pm »

Oh, and also.... What is your methodology for ensuring that your video card(s) are actually functioning correctly? Have you tried them in other, faster machines, preferably with the bus speed overclocked a bit to test the stability of the card(s) on a known good system?

james_s · « **Reply #16 on:** May 01, 2017, 05:09:32 pm »

I remember running into compatibility issues with cards, especially VLB stuff. We've been spoiled in recent years with computer hardware that usually more or less just works when you plug it in. When I was a kid I spent countless hours screwing around, swapping cards out, messing with IRQ, DMA and address range jumpers, DOS drivers, memory managers, I feel nostalgic sometimes and enjoy playing with the old stuff but I wouldn't want to go back and have to rely on it again.

CkRtech · « **Reply #17 on:** May 01, 2017, 10:23:57 pm »

Quote from: stj on May 01, 2017, 12:21:23 pm

i doubt it, a 32Khz watch crystal and a pair of parallel crap ceramics is hardly precise.

So I guess I have two 32Khz crystals on the board (attached photo shows both)? There are two barrel crystals. One replaced with one from a 386 that sits up closer to the keyboard IC, and a second that I haven't done anything to that sits near those ISA slots.

Also - No electrolytics on this board. Just tantalums and ceramics.

Quote from: Rasz on May 01, 2017, 01:57:17 pm

do you have a scope? I would suspect decoupling
looks like signal integrity problem, enabled L2 cache means cpu can bang on the ISA bus faster and this most likely exposes whatever weakness you got on the board (still corroded vias or aged caps).
does it manifest on every isa slot?

you didnt mention anything about the resistor banks RNx RPx, those are responsible for signal termination on fast parallel buses

I do have a scope. What do you suggest I probe-walk?

I have tried other slots, but I don't know if I have tried all of them. I could move I/O and VGA to the ones furthest away from the RAM slots to see what happens.

I haven't replaced or measured anything in the resistor networks. I can continuity check their paths again to see if perhaps I have missed something.

Quote from: drussell on May 01, 2017, 03:24:08 pm

What manufacturer and model is that board? I can't see it mentioned anywhere in either thread.

AOpen Vi11. I think the BIOS is dated ~Sept/Nov 1993.

Quote from: drussell on May 01, 2017, 04:47:52 pm

Oh, and also.... What is your methodology for ensuring that your video card(s) are actually functioning correctly? Have you tried them in other, faster machines, preferably with the bus speed overclocked a bit to test the stability of the card(s) on a known good system?

I pulled the VGA card from a working board that was actually running Windows 95 quite well - "quite well" relative to a 486 with 24 MB of RAM and a period-appropriate hard drive, that is. I also took the I/O card from it to make sure all (two) expansion cards I was using were from a functioning system. I also tried all of the cache from the functioning system - symptoms were the same. I put the original cache back in prior to determining I could remedy the situation (at least with ISA VGA) by raising the CPUCLOCK's divider number.

In addition to the borrowed ISA VGA, I have another VLB VGA card. Each VLB card shows the same problem in this Vi11 motherboard. The ISA VGA card works "most" of the time at CPUCLOCK/4 and seems to be rock solid at CPUCLOCK/8. I know there are major restrictions on timing for VLB, and I just figured that the inaccurate timing or some sort of noise that is present in the signal(s) was causing the instability. Memtest86+ loops the RAM tests and finds no problems.

I think we are on the right track - that the problem is bus related. My assumption prior to the welcomed feedback from all of you was that perhaps I should see what ISA slot pin "B 20" (CLK) was showing to see if it was accurate/stable/noisy. Be nice if I had something other than the default 1x/10x Rigol probe with that alligator antenna ground clip.

But aside from via continuity and solder joint double-checks, I will happily measure something elsewhere if you guys think it would be a better direction to go.

Thanks again for the feedback.

james_s · « **Reply #18 on:** May 01, 2017, 10:48:44 pm »

I have seen other speed crystals in those little round cans before, but the vast majority are 32kHz. You could scope the crystal and see what frequency is on it, although the loading of the scope probe may cause it to stop running, it probably won't hurt anything.

CkRtech · « **Reply #19 on:** May 02, 2017, 02:40:48 am »

Crystal appears to be 14.3 Mhz.

Also included scope of ISA CLK pin with CPUCLOCK divider in BIOS returned to /4 (33MHz/4). Accurate-ish but not very pretty.

Rasz · « **Reply #20 on:** May 02, 2017, 05:31:55 pm »

scope all isa address/data(maybe even irq) signals during looped doom or some other vga intensive test, look for signal ringing or general garbage
also put a scope on supply lines, is it stable or full of IO noise?
you can test sram cache chips using <$40 TL866
I read your vogons thread, are you using both VLB IO and VGA? VLB was always troublesome\
+you wrote there 128K cache worked ok all the time, so why bother complicating your life? do you even have >8MB of ram for this board?

tantalum caps used for decoupling do go bad when old, some of them even go bad in style - loud bang, quite poisonous when inhaled

james_s · « **Reply #21 on:** May 02, 2017, 05:36:22 pm »

Most of the tantalum cap failures I've seen have been shorts, and yeah it can be a little exciting when they give way. I was peering into my old XT trying to figure out why the PSU was shutting down when there was a loud bang and bits of tantalum capacitor whizzed by my face.

CkRtech · « **Reply #22 on:** May 02, 2017, 11:37:54 pm »

Ha. At the moment, I don't mind complicating my life for the sake of learning. My electronics hobby wants to be square with my vintage computing hobby, and I see this as an excellent opportunity to learn more about how this stuff works by using a real life example from when I was quite a bit younger. The fact I am the original owner as of ~1994 also adds to the desire.

Tests were with ISA VGA, VLB I/O (apologies), and auto config for timing (CPUCLOCK / 4. Mostly showing as 8.33MHz).

Some scoping - Hmm. The first signal I checked was the Address Line 00. The following video shows signal probe measurement as the hard drive loads up Doom and starts the demo level. I believe the start of it was the previous demo finishing, the command prompt appearing to show the next execution in the batch file (I set it up to loop about 12 times), the loading of Doom (you can hear the hard drive going with the volume up), and then the footage of Doom. You can see noise/garbage (appropriate term?) at the bottom of the signal during major graphics (previous demo finishing and then next demo loaded and running) that fills up a 1V division. The pulse rise/fall has extra junk that causes it to reach down and extra 1/2 volt. Ideally, shouldn't these be nice, clean 5V pulses? (Hmm... Perhaps the "noise" I am seeing is just the more frequent rises happening thanks to high volume of activity and a squirrelly start pulse)

Video of Address 00:
Address 00 (A31) during Doom finish/launch/start

OSC (B30) signal - something that should be a representation of that smooth (at the crystal) 14MHz signal from my earlier post, I believe:

And then this next signal seems rather odd. I assume the bounce up/down from 0 in the following video is because my probe wasn't sitting properly in the outside pocket of the ISA signal I was probing, but this IRQ 4 or 3 (scoped both. Forgot which in this video) - Basically either COM1 or COM2 serial ports seem to be showing a signal (20+ second mark) that is in sync with the hard drive read access you hear in the video. Neither of the two COM ports has anything plugged into it.

Why would hard drive access manifest itself visually in an IRQ signal for COM port 1 (IRQ 4) or COM port 2 (IRQ 3)?

Video of either COM1 or COM2:
COM port monitor

EDIT: For additional testing, I changed the I/O card for an ISA one - the original one in the system - and monitored several of the signals again. I also monitored the AT clock signal at B 20. The clock signal remained at 8.33 with an occasional quick dip to 4.XX. I just assume that is my alligator grounding style. However, I did notice the AT Clock signal illustrating hard disk reads during a loading screen. I don't know what acceptable "noise" would be, but it seems like whatever could be in the clock signal should be within a specified tolerance and NOT allow any other signal activity in.

Is it worth (a subjective word, I know) replacing all the tantalums?

stj · « **Reply #23 on:** May 03, 2017, 02:18:15 am »

scope the 5v & 12v lines on the motherboard psu connector.

Ian.M · « **Reply #24 on:** May 03, 2017, 04:15:54 am »

Solid Tantalum caps don't loose capacitance with age. There's little point in replacing them unless you suspect they have been overstressed, or degraded by prolonged moisture penetration at elevated temperatures, or unless one of the same type on the same rail is showing signs of running hot or has already catastrophically failed.

However if the PSU is also vintage, recapping that selectively (All small Aluminum electrolytics + any large ones that fail on capacitance below lower tolerance limit, high ESR or physical signs of distress + any others directly in parallel with a defective one) is probably worth doing.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: SRAM woes, ISA bus diagnostics (Read 8696 times)

Share me