Author Topic: Fixing graphics card voltage regulators  (Read 3526 times)

0 Members and 1 Guest are viewing this topic.

Offline jcd

  • Contributor
  • Posts: 6
  • Country: de
Fixing graphics card voltage regulators
« on: January 01, 2018, 07:52:55 pm »
Hello,

maybe a bit of an odd first post but i've been watching the channel for a couple of years now and i couldn't come up with a better place to ask for help.

I have an XFX Radeon R9 390 8GB which i've been using for a bit over 2 years, so out of any warranty at this point.

The card required dust removal about 1 year ago, also the VRM temperatures (for which it has 2 sensors apparently) seemed a bit high at 100 degrees, so i took it apart and cleaned it.
I noticed the thermal pad for the VRM mosfet heatsink kind of desintegrated, and i didn't feel confident in it. So i removed the spacers and replaced it with a thick layer of thermal paste that has a very thick consistency. I checked with a straight edge if the mosfet surfaces all line up well enough to not produce big gaps, and it seemed very good to me.

The card then ran for about 1 year flawlessly with lower temperatures, until the fans developed a rattle. So 2 days ago i took it apart again and cleaned the fans, and i noticed when unplugging the card that a small component was lying underneath it on the mainboard. Turned out to be a VRM inductor from the GPU. No idea how long it has been lying there.

It honestly looks to me as if the pads of the VRM inductor got so hot that it actually de-soldered itself, there is no rough broken surface but rather smooth solder that looks like it was liquid when the VRM inductor dropped off.

The card can use up to 350 Watts, and i used it extensively with overclocking. I always checked the temperature readouts, but who knows where those sensors are.

I cleaned the fans, but they're hopeless so i ordered some replacement fans. I've replaced fans on the last 3 graphics cards i owned over the years, and sometimes even entire coolers, so i wasn't too bothered with that which might have been a mistake.

What bothered me was the VRM inductor that fell off! Obviously the card had been working without it, so i re-assembled the card and tested it: It worked fine for a couple of hours yesterday.

This morning i started the PC, and nothing. Upon closer inspection it seems to me that the power supplies short curcuit protection is triggered, because the fans and LEDs come on for a split second, and then a clicking noise from the PSU and it goes back to only providing the 5V standby voltage. Have to unplug the PSU and let the capacitors drain for a few seconds before it will start again.

Unplugged the graphics card, and the system runs fine (cpu integrated graphics useful for once).

I'm not entirely sure where the problem is. Since i messed with the VRMs twice by putting thermal paste on the mosfets and taking on and off the heatsinks and since an inductor plainly came off the PCB, i highly suspect there is a problem in that area. I already cleaned off all the thermal paste and checked for damages, i found a suspicious looking solder spot under a mosfet, but with the way these SMD components are shaped no way to tell if it is a short. It looked like a bit of solder squirted out on the side of the mosfet.


I was quite careful with the card, so i don't think i've done any mechanical damage. I might have done some ESD damage to it, i only have barebones ESD equipment (i ground myself and avoid having materials producing static electricity around)

The thermal compound is non-conductive according to the manufacturer, i double checked that. But they won't even tell anyone what's inside, so who knows. I used Thermalright Chill Factor 3 both times i took the card apart and cleaned it.

I made a very crappy mobile phone picture of the missing VRM inductor (i placed it next to its pad) luckily it is very large so easy to see:


I also found this review which has higher resolution pictures of the naked PCB of the card:

https://www.vortez.net/articles_pages/xfx_r9_390x_double_dissipation_review,5.html

My questions:

1. I suspect that one mosfet got so hot that it maybe de-soldered itself causing a short? That should be easy to check in theory but i think i can't actually reach the mosfet pads with a probe without de-soldering them?

2. What should i look for in damages? I checked the entire PCB with a magnifying glas and couldn't spot anything obvious. Especially the VRM mosfets and caps all look still good. The other VRM inductors are mechanically solid and don't move when i try to wiggle them.

3. What can i measure with only a basic multimeter to check for issues? A short should be easy to measure but i don't really have an idea where to start?

4. Would a missing VRM inductor on one phase cause immediate issues if the card was only used in 2D mode at low clockspeeds and a fraction of it's maximum power consumption?
It seems to me the card has 6 phases for power delivery to the GPU itself.

I have a bit of electronics and soldering experience, and i feel confident re-soldering the VRM inductor, since it is a huge chunky component. I don't feel confident with soldering these mosfets at all.

Thanks in advance for any help, this card was quite expensive and still runs any games i play nicely, so i would really like to repair it.
 

Offline Messtechniker

  • Frequent Contributor
  • **
  • Posts: 328
  • Country: de
  • Old analog audio hand - No voodoo.
Re: Fixing graphics card voltage regulators
« Reply #1 on: January 01, 2018, 08:23:30 pm »
thick layer of thermal paste
That's definitely a mistake and a likely fault cause. You should apply only a
thin layer of thermal paste to smooth out the minute roughness of the heat
conducting surfaces so that you get a good metal to metal contact where possible.
The thermal paste is used to replace any small air gaps (air is a good thermal insulator)
by a material with better thermal conduction properties. Bear in mind that the
thermal properties of thermal paste are not as good as a metal-to-metal contact.
So apply thermal paste sparingly.
The German proverb"Viel hilft viel" (The more the merrier) does not apply here.
Agilent 34465A, Hameg HMO1022, R&S HMC 8043, Voltcraft VC 940 and M-Audio Audiophile 192
 

Offline jcd

  • Contributor
  • Posts: 6
  • Country: de
Re: Fixing graphics card voltage regulators
« Reply #2 on: January 01, 2018, 08:34:43 pm »
Well i might have phrased that a bit misleading.

I'm familiar with applying thermal paste for like 15 years now, and i am fully aware that the layer should be as thin as possible.

But the heatsink on the VRM mosfets is a quite rough piece of aluminium, the surface that touches the (multiple) mosfets is only an extruded aluminium profile surface, not machined flat. Also the mosfets span maybe 8cm on the PCB and they aren't perfectly flat like a silicon die. There is a good reason why the manufacturer of the card used a thermal pad i guess.

Hence why i was a bit more liberal with the thermal compound. I squeezes out the vast majority of it, which ended up inbetween all the mosfets. I mention this because i fear it might have caused issues, because there is still a possibility the thermal compound is actually conductive enough to cause issues despite the manufacturer saying otherwise.

I don't think there was a too thick coat of thermal compound inbetween the mosfets and the heatsink, it is screwed down with two decently beefy screws and the aluminium part has threaded steel inserts - so the mounting force is considerable and squeezed out all of the excess thermal compound.
« Last Edit: January 01, 2018, 08:37:57 pm by jcd »
 

Online mariush

  • Super Contributor
  • ***
  • Posts: 3603
  • Country: ro
  • .
Re: Fixing graphics card voltage regulators
« Reply #3 on: January 01, 2018, 08:45:03 pm »
You were probably supposed to use a thermal pad  maybe 0.5 mm to 1mm thick between the mosfets and the heatsinks, not thermal paste.
Like people told you above, thermal paste is supposed to be applied in a very thin layer to fill up microscopic holes in the heatsink and IC packages , not in thick layers

Either way, the mosfets themselves are probably rated for 150c so the circuit probably has a 100c to 120c threshold for temperature of the VRM.  The heatsinks on the VRM help a bit, but they're not really "critical", even without the heatsinks the fans above blowing through the main heatsink fins would have added enough cooling to keep the vrm within reasonable temperatures.

The gpu chip itself in that generation (r390) is probably capable of 110c or something like that, but it should be programmed to throttle down at around 85-90c

The temperature would have to go above ~ 220c for the inductor to fall down due to solder melting, the mosfets would shut down before that would happen. My guess is that it was some factory flaw, too little solder paste applied in the first place.
 

Offline jcd

  • Contributor
  • Posts: 6
  • Country: de
Re: Fixing graphics card voltage regulators
« Reply #4 on: January 01, 2018, 09:00:30 pm »
There are tools these days (GPU-Z) which allow detailed monitoring. I do not know how accurate the temperature readouts are, but since there is tools for interacting with the voltage controls in-depth these days even in the driver, i guess those are read-outs of internal sensors.



I've always keept an eye on the temperatures, and especially before and after cleaning and applying thermal compound.

I noticed a slight drop (10 degrees) in VRM temperatures after replacing the pad with compound. If the sensor doesn't directly measure mosfet temperatures, but rather the heatsink temperature, then a lower readout could mean actually higher fet temperatures?

Any idea to explain the (probably) short curcuit that appeared after running for a few hours?

EDIT:

I also wonder, do you actually think the controller that controls the VRMs checks the temperature itself reliably and shuts the card off if anything gets too hot? The GPU throttles itself for sure but i am not certain regarding the VRMs.

Also, i checked the components. The mosfets are IRF6894 WRNU 1515 and IRF6811 SLLS 1438. Looking at the datasheets and the packaging, there is no way i got thermalpaste on any spot that could have caused a short circuit. As i assumed, the pads are hidden deeply under the metal cap of the chips.

I'm a bit lost as to why the pc won't even turn on with the card plugged in. Could that be the missing inductor? Can i safely just solder it back on?
« Last Edit: January 01, 2018, 09:19:23 pm by jcd »
 

Offline Armadillo

  • Super Contributor
  • ***
  • Posts: 1725
  • Country: 00
Re: Fixing graphics card voltage regulators
« Reply #5 on: January 01, 2018, 09:42:11 pm »
Most VRM has intrinsic thermal protection and over current protection, but the controller does not monitors the temperature of these VRMs on board. Despite the internal protections, we do know that VRMs do fail depending on the power transients affecting it.

Edit: In some VRM, it does send signal to the graphic controller to reduce power when the VRM is heated up. In this case, the controller does not monitors the VRM, the VRM request for action to the Controller.

The Card should be protected by fuses. In case the mosfet will to fail short circuit, the fuse should be the first to blow and not allow the mosfet to be heated up so profusely. Could that be the cause of mechanical force/shock vibration than electrical?

I suppose if the fuses don't blow, you should be able to try solder it back and give it a go.

Edit: Check all fuses to ascertain it.

 ;D
« Last Edit: January 01, 2018, 09:50:34 pm by Armadillo »
 

Online mariush

  • Super Contributor
  • ***
  • Posts: 3603
  • Country: ro
  • .
Re: Fixing graphics card voltage regulators
« Reply #6 on: January 01, 2018, 10:28:39 pm »
The VRM controller may be monitoring the current drawn by each phase or the output voltage of each phase and when it detects no current draw from the phase that misses it's inductor , or no voltage, it may shut down the whole circuit for protection reasons.
IMHO you should resolder the inductor before trying things again.

You should investigate why the inductor fell - look at the solder joints on the circuit boards .. do they look like it's cracked solder (rough edges) or does it look like the solder melted? If the solder melted due to too much heat... maybe check if the inductor got so hot that it melted the solder at the terminals (extremely rare but coil inside inductors could be shorted and produce heat internally this way) ... or maybe the mosfet closest to the inductor is shorted or damaged in some way and that caused the inductor or traces from mosfet to inductor to overheat.

Re. how temperature is measured... most likely it's a surface mount resistor like temperature sensor near one of the mosfets, not a temperature sensor on heatsinks. Some "power stages" have internal temperature monitors and can trigger interrupts or send feedback to vrm controller through some pin.  If you place a smd sensor close enough on the circuit board to a mosfet and account for the 10-20c difference in temperature, you can make a  good enough "over temperature" protection mechanism.

ANd yeah, look at pci-e slot pinout and make sure there's no blown fuses for the 12v power coming from pci-e slot, and the 12v coming from pci-e 6/6+2 pci-e connector.  On some video cards with 4 phases or more, the vrm circuit often feeds a bunch of phases from the pci-e slot and a bunch from the extra power connectors. Make sure the power gets "juice" from both sources.
If a mosfet died shorted it could have tripped a fuse for the extra power connector but the card could still get some power from the slot..
 

Offline Messtechniker

  • Frequent Contributor
  • **
  • Posts: 328
  • Country: de
  • Old analog audio hand - No voodoo.
Re: Fixing graphics card voltage regulators
« Reply #7 on: January 01, 2018, 10:53:02 pm »
But the heatsink on the VRM mosfets is a quite rough piece of aluminium, the surface that touches the (multiple) mosfets is only an extruded aluminium profile surface, not machined flat.
So the fault cause would in this case be bad design specs or no specs at all regarding flatness.
In Purchasing, Production and QA then nobody cared. Why should they, if there are no specs.
Trying to compensate roughness with thermal paste is bad design especially in view of the
elevated operating temperatures here. The first thing to do is to replace the heat sink with a better,
machined, one. Or have the existing heat sink machined over.
GPU-Z will tell the tale.
« Last Edit: January 01, 2018, 10:55:06 pm by Messtechniker »
Agilent 34465A, Hameg HMO1022, R&S HMC 8043, Voltcraft VC 940 and M-Audio Audiophile 192
 

Offline jcd

  • Contributor
  • Posts: 6
  • Country: de
Re: Fixing graphics card voltage regulators
« Reply #8 on: January 01, 2018, 11:09:01 pm »
First, i forgot to thank you guys for all the replies!

The VRM controller may be monitoring the current drawn by each phase or the output voltage of each phase and when it detects no current draw from the phase that misses it's inductor , or no voltage, it may shut down the whole circuit for protection reasons.

How would it do that? If it just shuts down power to the GPU, then the PC should boot fine but without recognizing the GPU i guess?

IMHO you should resolder the inductor before trying things again.

I'll do that.

You should investigate why the inductor fell - look at the solder joints on the circuit boards .. do they look like it's cracked solder (rough edges) or does it look like the solder melted? If the solder melted due to too much heat...

It is hard to tell. These lead free joints all look like crap compared to the lead solder joints i'm used to soldering myself. A part of the pad looks rough, but a part of the solder looks like it was hot while the inductor fell off, resulting in a little spike.

maybe check if the inductor got so hot that it melted the solder at the terminals (extremely rare but coil inside inductors could be shorted and produce heat internally this way) ... or maybe the mosfet closest to the inductor is shorted or damaged in some way and that caused the inductor or traces from mosfet to inductor to overheat.

How can i check if the inductor is still good? Optically it looks fine from the outside.

Re. how temperature is measured... most likely it's a surface mount resistor like temperature sensor near one of the mosfets, not a temperature sensor on heatsinks. Some "power stages" have internal temperature monitors and can trigger interrupts or send feedback to vrm controller through some pin.  If you place a smd sensor close enough on the circuit board to a mosfet and account for the 10-20c difference in temperature, you can make a  good enough "over temperature" protection mechanism.

If that is the case, then the VRM did probably not overheat. I checked GPU-Z values quite often on the card and i would have noticed that.

ANd yeah, look at pci-e slot pinout and make sure there's no blown fuses for the 12v power coming from pci-e slot, and the 12v coming from pci-e 6/6+2 pci-e connector.  On some video cards with 4 phases or more, the vrm circuit often feeds a bunch of phases from the pci-e slot and a bunch from the extra power connectors. Make sure the power gets "juice" from both sources.
If a mosfet died shorted it could have tripped a fuse for the extra power connector but the card could still get some power from the slot..

What do these fuses look like? Where are they usually located?

So the fault cause would in this case be bad design specs or no specs at all regarding flatness.
In Purchasing, Production and QA then nobody cared. Why should they, if there are no specs.
Trying to compensate roughness with thermal paste is bad design especially in view of the
elevated operating temperatures here. The first thing to do is to replace the heat sink with a better,
machined, one. Or have the existing heat sink machined over.
GPU-Z will tell the tale.

I don't think that heatsink is actually a problem. Sure, it is not a milled, ground and polished part, but it probably doesn't need to be and they used a thermal pad to begin with. I'm just being quite pedantic about it, and compared to the milled copper surface of the GPU heatsink this is a bit lower in quality, and as someone familiar with metalworking it caught my eye.

I would assume there might be gaps of a few 0,01mm between the mosfets and the heatsink at worst, hence why i was more generous with thermal compound. As i said, i checked the fets and the heatsink with a straight edge when i replaced the pad and the spacers and mounted the heatsink directly with compound.

I would imagine in an industrial PCB manufacturing process, you do not want to have a worker check each card for perfectly flat soldered FETs before attaching a heatsink, hence a thermal pad to equalize manufacturing tolerances (Like they do on the memory chips as well).
 

Offline Armadillo

  • Super Contributor
  • ***
  • Posts: 1725
  • Country: 00
Re: Fixing graphics card voltage regulators
« Reply #9 on: January 01, 2018, 11:38:27 pm »

What do these fuses look like? Where are they usually located?

SMD normally with Alphabet written on it and normally Green in color or brown whatever.



 

Offline jcd

  • Contributor
  • Posts: 6
  • Country: de
Re: Fixing graphics card voltage regulators
« Reply #10 on: January 02, 2018, 12:26:10 am »
Thanks!
 

Offline Bashstreet

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
Re: Fixing graphics card voltage regulators
« Reply #11 on: January 02, 2018, 05:56:31 am »
The card required dust removal about 1 year ago, also the VRM temperatures (for which it has 2 sensors apparently) seemed a bit high at 100 degrees, so i took it apart and cleaned it.
I noticed the thermal pad for the VRM mosfet heatsink kind of desintegrated, and i didn't feel confident in it. So i removed the spacers and replaced it with a thick layer of thermal paste that has a very thick consistency. I checked with a straight edge if the mosfet surfaces all line up well enough to not produce big gaps, and it seemed very good to me.

Just No! it is not "A bit high" it is WAY TOO HIGH!

It is never sensible to overclock your card that has no adequate cooling.
Running your Gpu of any manufacturer at 100 decrees it is sure way to fry your chip or kill it in year or two if you lucky.
You never want your Gpu run higher than 80 decrees in any situation and most will start throttle at 82 Nvidia and ATi's bit higher depending on model.
Most Gpu's are optimized already and getting couple % more out the card is not worth the damage you will do.

VRM heatsinks are basic in most cards and rely on simple aluminum sink and "silicon" pads to transfer the heat, they slowly disintegrate especially if you go and remove the pad.
Best option is to get new one (pad) and yes you can try use thermal grease but the problem is like you noticed the unevenness of the heatsink and the uneven pressure the it gives..

I do not think the thermal paste is the problem at all (unless you went totally bonkers with it)  The problem was the high temps caused by excessive current draw.

If you can repair it do remove all over clocking and make sure temperatures are acceptable UNDER LOAD.



 

Offline jcd

  • Contributor
  • Posts: 6
  • Country: de
Re: Fixing graphics card voltage regulators
« Reply #12 on: January 02, 2018, 12:37:54 pm »
Just No! it is not "A bit high" it is WAY TOO HIGH!

It is never sensible to overclock your card that has no adequate cooling.
Running your Gpu of any manufacturer at 100 decrees it is sure way to fry your chip or kill it in year or two if you lucky.
You never want your Gpu run higher than 80 decrees in any situation and most will start throttle at 82 Nvidia and ATi's bit higher depending on model.
Most Gpu's are optimized already and getting couple % more out the card is not worth the damage you will do.

VRM heatsinks are basic in most cards and rely on simple aluminum sink and "silicon" pads to transfer the heat, they slowly disintegrate especially if you go and remove the pad.
Best option is to get new one (pad) and yes you can try use thermal grease but the problem is like you noticed the unevenness of the heatsink and the uneven pressure the it gives..

I do not think the thermal paste is the problem at all (unless you went totally bonkers with it)  The problem was the high temps caused by excessive current draw.

If you can repair it do remove all over clocking and make sure temperatures are acceptable UNDER LOAD.

The maximum temperature the VRM Mosfets are specified for are 150 degrees. The gpu itself will throttle at 90 degrees and clock down aggressively. I was specifically talking about VRM temperatures.



I meanwhile found some hardcore overclocker guys who know these VRM circuits very well, and after some measurements it seems the highside mosfet of the lowest phase where the inductor dropped off is shorted to ground.

I'll try removing that mosfet, since the card should work with only 5. If it does, i'll order a spare part. There is a local electronics store that offers repair, i'll check with them if they would solder that new part on for me professionally then.

Also i might have shorted the drain of all the VRM mosfets replacing the thermal pad with compound since the cap isn't just ground, it is the drain of the fet  :(

Measurements for the other phases look OK tho so they seem intact.
 

Offline Bashstreet

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
Re: Fixing graphics card voltage regulators
« Reply #13 on: January 04, 2018, 05:36:01 am »
Just No! it is not "A bit high" it is WAY TOO HIGH!

It is never sensible to overclock your card that has no adequate cooling.
Running your Gpu of any manufacturer at 100 decrees it is sure way to fry your chip or kill it in year or two if you lucky.
You never want your Gpu run higher than 80 decrees in any situation and most will start throttle at 82 Nvidia and ATi's bit higher depending on model.
Most Gpu's are optimized already and getting couple % more out the card is not worth the damage you will do.

VRM heatsinks are basic in most cards and rely on simple aluminum sink and "silicon" pads to transfer the heat, they slowly disintegrate especially if you go and remove the pad.
Best option is to get new one (pad) and yes you can try use thermal grease but the problem is like you noticed the unevenness of the heatsink and the uneven pressure the it gives..

I do not think the thermal paste is the problem at all (unless you went totally bonkers with it)  The problem was the high temps caused by excessive current draw.

If you can repair it do remove all over clocking and make sure temperatures are acceptable UNDER LOAD.

The maximum temperature the VRM Mosfets are specified for are 150 degrees. The gpu itself will throttle at 90 degrees and clock down aggressively. I was specifically talking about VRM temperatures.



I meanwhile found some hardcore overclocker guys who know these VRM circuits very well, and after some measurements it seems the highside mosfet of the lowest phase where the inductor dropped off is shorted to ground.

I'll try removing that mosfet, since the card should work with only 5. If it does, i'll order a spare part. There is a local electronics store that offers repair, i'll check with them if they would solder that new part on for me professionally then.

Also i might have shorted the drain of all the VRM mosfets replacing the thermal pad with compound since the cap isn't just ground, it is the drain of the fet  :(

Measurements for the other phases look OK tho so they seem intact.

Running mosfet at 150 decree or 100 degree is sure way to reduce their lifespan ( what happened to you )
I do not know where you get these ideas it is "safe" to run Mosfet at these temperatures but you might want to think again.
« Last Edit: January 04, 2018, 05:38:36 am by Bashstreet »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf