Author Topic: NVidia GTX Titan Black: Anything more to check before I consider baking my card?  (Read 2124 times)

0 Members and 1 Guest are viewing this topic.

Offline brian27

  • Contributor
  • Posts: 6
  • Country: nl
Hi all, greetings to all of you.

I am new here. I have a bit troubleshooting skill and I do electrical engineering for living.
 
I am stumbled on a dead NVidia GeForce GTX Titan Black (reference card, made by Asus, out of warranty of course, this card is almost 5 years old).
This card decided to stop working without any prior sign.
I never OC my card. The last time I used is just to measure worst case system temperature using OCCT stressing both CPU and GPU.
I put my PC to sleep during stressing. When I resume my PC, Windows was frozen. I restarted and then the card was no longer detected. I used this in SLI configuration and only one can be detected.

Tried to detect the card using NVFlash both in Windows or DOS environment, however, it says no NVidia adapter found.

My quick PC spec:
Intel Xeon E5-2680v2
Asus P9X79 Pro
2x GeForce GTX Titan Black
32GB DDR3 ECC
Silverstone Strider 1500W Silver SST1500

The PCB of my dead GPU is reference NVidia P2083.

I followed the debugging guide on this thread:
https://www.eevblog.com/forum/repair/dead-graphics-card-780ti-dead-mosfet-perhaps/
and found no shorted MOSFETs.

Then I tried to power the card on and measure the output voltages. Surprisingly the GPU stays cool for about 35C on ambient temp of around 20C measured by a infrared thermometer.
I measure the output core voltage on all the inductors: 0.85V
Also the output RAM voltage on the inductors: 1.52V
All shunts got the 11.85V power
I measure the BIOS ROM and it got 3.35V nicely.

See attachment on the probing point and the voltage value.

My question is, is there any chance that the BIOS got corrupted and not detected by the system? I never re-flash the BIOS or play anything with it.
If everything else is correct, shall I consider baking the card or reheating with the heat gun? I found also the replacement chip in eBay with reasonable price. What's your suggestion on replacing the GPU chip?
 

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3058
  • Country: fr
If you want to destroy the card for good, do "bake" it.

Search this forum on what the "baking" does and doesn't do. If the card has died because of cracked solder balls or an underfill problem, "baking" it will "fix" it for only a short time at best. The only way to really repair that is replacing the chip but see below ...

If the problem is something else - e.g. a dead power regulator or some bad capacitors then the "baking" won't do anything and you can only fry the card for good. What could have been a few bucks repair will now be an unrepairable mess and garbage bin material.

TL;DR - "baking" anything except food is a stupid idea. It won't really fix anything and only can make things worse.

Also this two videos explain well that this "reflowing" and "baking" is complete BS, despite geniuses like iFixit or Linus from Linus Tech Tips claiming otherwise.

 (warning, LOUD and a lot of swearing)



Re GPU being cool - that's quite understandable because the GPU is not really doing anything, since the card isn't running. If it wasn't cool that would be a problem indicating a possible short (and fried chip).

Re replacing the GPU chip - not realistically doable. You can't replace a BGA of that size with a heat gun. And having it done professionally using a BGA rework machine would cost more than any value that card may still have. Plus you have no guarantee that the chip from eBay is not counterfeit or broken, so you could end up with large labor bill and a still broken card.

Re BIOS - could happen. If you have a second working card, you could try to extract the BIOS from that one and transplant it into the broken one. If it is just that it may fix it. Firmware corruption is not a very common issue but it could happen.

Also, if you have a second working card, compare voltages between the two as well. The voltages you have measured look reasonable but who knows how that particular card was set up and the voltages could be higher/lower depending on how the manufacturer configured it (and some are likely software configurable too). 
« Last Edit: October 25, 2018, 10:38:11 pm by janoc »
 

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 1950
  • Country: ca
If you want to destroy the card for good, do "bake" it.

Search this forum on what the "baking" does and doesn't do. If the card has died because of cracked solder balls or an underfill problem, "baking" it will "fix" it for only a short time at best. The only way to really repair that is replacing the chip but see below ...
..
Also this two videos explain well that this "reflowing" and "baking" is complete BS, despite geniuses like iFixit or Linus from Linus Tech Tips claiming otherwise.

So either it is complete BS, or it can work for some time, its one or the other not both.
If all other troubleshooting options have been exhausted (eeprom, etc.), and replacing the chip means it is BER, then baking is a completely legitimate option to try.

Good advice though.
« Last Edit: October 25, 2018, 11:52:27 pm by thm_w »
 

Offline cdev

  • Super Contributor
  • ***
  • Posts: 5082
  • Country: 00
"What the large print giveth, the small print taketh away."
 

Offline Rasz

  • Super Contributor
  • ***
  • Posts: 2339
  • Country: 00
    • My random blog.
check Q6 and that small IC next to it (afair its a 74 series logic gate array)

you have 2 same cards? great, you can compare signals on all small transistors (q7 8 9 10 etc)
Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Offline blueskull

  • Supporter
  • ****
  • Posts: 13275
  • Country: cn
  • Power Electronics Guy
The only legit reason to bake a card is either you want to use it for something urgent, or you are trying to sell it (which is f*ing immoral, but people do this from time to time).
If you want to actually fix it, don't bake it. It's already gone.
 

Offline brian27

  • Contributor
  • Posts: 6
  • Country: nl
Search this forum on what the "baking" does and doesn't do. If the card has died because of cracked solder balls or an underfill problem, "baking" it will "fix" it for only a short time at best. The only way to really repair that is replacing the chip but see below ...

Re BIOS - could happen. If you have a second working card, you could try to extract the BIOS from that one and transplant it into the broken one. If it is just that it may fix it. Firmware corruption is not a very common issue but it could happen.

Also, if you have a second working card, compare voltages between the two as well. The voltages you have measured look reasonable but who knows how that particular card was set up and the voltages could be higher/lower depending on how the manufacturer configured it (and some are likely software configurable too). 

Indeed. I am fully aware not in favor on baking my card, therefore I am looking for measurement point or things to check before baking.

Problem is, the BIOS (and the card itself) is not detected by NVFlash on both Windows and DOS, thus I am considering buying a USB BIOS programmer with clip to do a standalone programming. Is this also giving a clue that assuming I got VRMs working and the card is not detected by NVFlash, then might be something bad in GPU / BIOS?

If you want to destroy the card for good, do "bake" it.

Search this forum on what the "baking" does and doesn't do. If the card has died because of cracked solder balls or an underfill problem, "baking" it will "fix" it for only a short time at best. The only way to really repair that is replacing the chip but see below ...
..
Also this two videos explain well that this "reflowing" and "baking" is complete BS, despite geniuses like iFixit or Linus from Linus Tech Tips claiming otherwise.

So either it is complete BS, or it can work for some time, its one or the other not both.
If all other troubleshooting options have been exhausted (eeprom, etc.), and replacing the chip means it is BER, then baking is a completely legitimate option to try.

Good advice though.

What do you mean by BER?

Is this the card?

https://www.videocardbenchmark.net/gpu.php?gpu=GeForce+GTX+TITAN+Black&id=2842

Yes. This to be exact:
https://www.techpowerup.com/gpu-specs/asus-gtx-titan-black.b2759

check Q6 and that small IC next to it (afair its a 74 series logic gate array)

you have 2 same cards? great, you can compare signals on all small transistors (q7 8 9 10 etc)

Many thanks for your suggestion, I will try to measure the output voltage of each pins on q6 - q10 and the 74 logic family IC next to q6 tonight.
However, I have some questions for my understanding, what does those transistor do? Is that FET or BJT? Do you know which type is it?

Is the 74 IC also part of the control logic of these transistors?
 

Offline Bud

  • Super Contributor
  • ***
  • Posts: 3979
  • Country: ca
I baked my card once, it failed again when running a game. I baked it second time and now not running games on it but use it as a secondary videocard for general computing, as a secondary display for Altium and other CAD programs and to watch video on the TV connected to its HDMI port. It has been doing well ever since in this mode.

Edit: I also used it for mining cryptocurrency after the second bake, it handled the load just fine.
« Last Edit: October 26, 2018, 01:29:38 pm by Bud »
Facebook-free life and Rigol-free shack.
 

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3058
  • Country: fr

So either it is complete BS, or it can work for some time, its one or the other not both.
If all other troubleshooting options have been exhausted (eeprom, etc.), and replacing the chip means it is BER, then baking is a completely legitimate option to try.

Good advice though.

It is bullshit, because a fix is not something that "works for some time" (days, a few weeks tops).

Baking is never a legitimate option to try. All that you will achieve if you heat the board below the melting temperature of the solder (that's what usually understood as "baking" as opposed to "reflowing") is that the underfill in the GPU melts.

If you get lucky, the messed up bumps on the flip chip BGA will realign and the board will work for a short time again because the underfill will hold them place once it solidifies. That will work until the thermal stresses make it fail again because the underfill alone is not meant to ensure contact and you have done nothing with the cracked solder bumps (not solder balls under the BGA, the solder bumps that attach the flip chip BGA to the carrier!) by heating the board so low that the solder didn't even start melting. These tend to crack because of the thermal stresses the card is under during its life, together with some types of underfil, causing the problem. The only way to fix this is to replace the chip.

This is what I mean when talking about the "bumps":


 
Another thing that could happen is that you heat a failing capacitor and it will start working - until it cools down and fails again. Again, you haven't really fixed anything.

Of course, while all doing this you could easily damage even an originally good GPU by prolonged exposure to heat (it is not designed to be kept this hot for that long!), so when the original problem is eventually found and fixed you will discover that the GPU is now messed up and you have artifacts on the screen or some other problem.


If you actually heat the board so hard that the solder reflows, you will likely end up with parts falling off the board/knocked out of alignment - good luck fixing that.


This is why "baking" anything is BS and not a fix. Of course, some people will do this and then sell a (temporarily) "working" card on eBay or Craigslist. Which is fraud, IMO, but difficult to prove unless it is obvious the board was "cooked". Or record a Youtube video showing how they baked their card and it works - but not that it has died on them again several days later ...
« Last Edit: October 26, 2018, 07:48:41 pm by janoc »
 

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 1950
  • Country: ca
What do you mean by BER?

Beyond economical repair - ie its cheaper to buy a new card than fix this one.

It is bullshit, because a fix is not something that "works for some time" (days, a few weeks tops).

Baking is never a legitimate option to try. All that you will achieve if you heat the board below the melting temperature of the solder (that's what usually understood as "baking" as opposed to "reflowing") is that the underfill in the GPU melts.

If you get lucky, the messed up bumps on the flip chip BGA will realign and the board will work for a short time again because the underfill will hold them place once it solidifies. That will work until the thermal stresses make it fail again because the underfill alone is not meant to ensure contact and you have done nothing with the cracked solder bumps (not solder balls under the BGA, the solder bumps that attach the flip chip BGA to the carrier!) by heating the board so low that the solder didn't even start melting. These tend to crack because of the thermal stresses the card is under during its life, together with some types of underfil, causing the problem. The only way to fix this is to replace the chip.

You say a few weeks tops, and yet there are hundreds if not thousands of people that baked their xbox's. Surely not all of them failed after a few weeks?
Maybe there is some misunderstanding by me here, because OP said "bake card or reheat with a heat gun" which implied to me, reflowing the chip.
 

Offline cdev

  • Super Contributor
  • ***
  • Posts: 5082
  • Country: 00
Rather than the blunt instrument of heating the whole thing, and quite possibly destroying it, what I would do is try to find some documentation on the card (first) and then on similar cards generally, and then trouble shoot. As people here will tell you, more often than not when any piece of electronic equipment is broken, the chances of being able to repair it are often pretty good.

If you can't find documentation, I would apply general rules to the situation.

Trace the power, voltage regulation circuitry, etc, try to observe what it does do when power is applied, and see if there is anything that seems obviously wrong.

If you have logs, look at them to see if the hardware is recognized at all at some low level.  All working video cards will be recognized as a VESA card and supply a very basic functionality without any 3D acceleration. When the computer is booted, they will enumerate themselves in the logs.

If its completely dead, that may indicate something simple is wrong. Part of the power circuitry for example.

And so on.
"What the large print giveth, the small print taketh away."
 

Offline Refrigerator

  • Frequent Contributor
  • **
  • Posts: 859
  • Country: lt
I bake my cards, laptops, and even phones from time to time and if you do it right there's not much harm you can do.
By the looks of it your card just crapped out from the stress test, happens all the time.
If there are no shorted mosfets then nothing else but the GPU chip itself died.
I'd say bake it, but before (if) you do check for any knocked off components, just in case you knocked one off while handling the card around.
If you decide to stick it in your oven i can tell you how i do it, just a simple trick to get the temps right.
BTW, i only bake my own stuff for myself, or maybe if my friend asks me to because it's a bit of a hit or miss in terms of reliability.
For example i baked a SONY Vaio laptop and used it for two years for games and it's still working fine, baked HD7850 1GD5 and 2GD5 model and both work fine, baked my LG G3 about six times but that one would only come back to life for two months after the bake, there were also some laptops i baked that would come back to life but only for a short period of time, sometimes less than a week.
Basically it's a bit of gamble but as a last resort it works pretty well.
PS: don't heatgun your GPU, that's how i killed my G3, probably would have lasted a few more bakes in the oven had i not cooked it  >:D >:D
Just started a blog at http://brimmingideas.blogspot.com/ . Not much in it as of now but more is sure to come :)
 

Offline brian27

  • Contributor
  • Posts: 6
  • Country: nl
If you have logs, look at them to see if the hardware is recognized at all at some low level.  All working video cards will be recognized as a VESA card and supply a very basic functionality without any 3D acceleration. When the computer is booted, they will enumerate themselves in the logs.

If its completely dead, that may indicate something simple is wrong. Part of the power circuitry for example.

Fully agree. If at the first place I see artifacts on screen or 3D acceleration causes instability, or I cannot install the driver in Windows, I would rather say I'll bake it.
However, this card is completely not detected under Windows, DOS (NVFLash) or even BIOS. It just behave like no graphics card attached to my system. Therefore I am curious either the supply circuitry or the BIOS itself gets corrupted.
What do you mean by log? Do you have an idea how to log the enumeration during booting process?

PS: don't heatgun your GPU, that's how i killed my G3, probably would have lasted a few more bakes in the oven had i not cooked it  >:D >:D

Interesting, I thought heatgun will be more reliable than household oven.

check Q6 and that small IC next to it (afair its a 74 series logic gate array)

you have 2 same cards? great, you can compare signals on all small transistors (q7 8 9 10 etc)

Allright, I find out that Q6, Q7, Q8, Q9 and Q10 are N-channel enhancement mode MOSFET. https://assets.nexperia.com/documents/data-sheet/2N7002.pdf
I find that on my card, Q7 - Q10 has the same characteristic if I measured with the ohmmeter
Rds = 0.7k
Rgs = 10k
Rgd = 10.8k

However for Q6:
Rds = open
Rgd = open
Rgs = 100k

Is this make sense. What to check in the 74 logic IC next to it?

I have not got time yet to disassembly my other card that is working.
« Last Edit: October 27, 2018, 12:16:44 am by brian27 »
 

Offline cdev

  • Super Contributor
  • ***
  • Posts: 5082
  • Country: 00
I don't know anything about Windows. No comprende Windowse! Sorry. Somebody else here can probably help. Remember that it may not complete the boot so you want to write it to some non-volatile storage medium, like a disk. Also and/or if you can log to another machine that may be useful too.
"What the large print giveth, the small print taketh away."
 

Offline Rasz

  • Super Contributor
  • ***
  • Posts: 2339
  • Country: 00
    • My random blog.
Many thanks for your suggestion, I will try to measure the output voltage of each pins on q6 - q10 and the 74 logic family IC next to q6 tonight.
However, I have some questions for my understanding, what does those transistor do? Is that FET or BJT? Do you know which type is it?

Is the 74 IC also part of the control logic of these transistors?

monkey see monkey do - Iv seen that logic chip fail before, afaik that 74 is responsible for some enable signal, results in dead card when bad


You say a few weeks tops, and yet there are hundreds if not thousands of people that baked their xbox's. Surely not all of them failed after a few weeks?
Maybe there is some misunderstanding by me here, because OP said "bake card or reheat with a heat gun" which implied to me, reflowing the chip.

360 has old AMD GPU. Bad underfill was Nvidia blunder, and happened generation later. 360 was just a case of bad thermal design and new at the time rohs requirements resulting in bend pcb ripping/cracking BGA balls. It was the source of bake it bro myth.
whole nvidia thing is writen up here, with some good comments from actual fab engineer in the comments https://www.theinquirer.net/inquirer/news/1004378/why-nvidia-chips-defective


Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Online wraper

  • Supporter
  • ****
  • Posts: 11071
  • Country: lv
So either it is complete BS, or it can work for some time, its one or the other not both.
It's BS because it's often presented as repair when it's a short term "fix" at best.
 

Offline ebastler

  • Super Contributor
  • ***
  • Posts: 3418
  • Country: de
I bake my cards, laptops, and even phones from time to time and if you do it right there's not much harm you can do.
[...]
 probably would have lasted a few more bakes in the oven had i not cooked it 

Maybe that's a hobby in its own right, but it is certainly not a repair...  :P
 
The following users thanked this post: Rasz, wraper

Offline Refrigerator

  • Frequent Contributor
  • **
  • Posts: 859
  • Country: lt
Maybe that's a hobby in its own right, but it is certainly not a repair...  :P
I never said that it is repair  ::)
But yeah, a hobby is what it is for me because i wanted to find out myself what happens when you bake stuff.
And i did find out what baking is about, i also wasted a bunch of money in the process  :palm:
It basically boils down to what you are trying to bake, for example, HD7850's take baking very well, the first one i baked has been working for well over a year now, handling PUBG and other games as well as it can.
Haven't baked any Nvidia yet, though. Got a dead GTX550ti to try bake but that one turned out to just have a knocked off 0603 inductor and worked fine after i replaced it.
Just started a blog at http://brimmingideas.blogspot.com/ . Not much in it as of now but more is sure to come :)
 

Offline brian27

  • Contributor
  • Posts: 6
  • Country: nl
monkey see monkey do - Iv seen that logic chip fail before, afaik that 74 is responsible for some enable signal, results in dead card when bad

I've measured the Q6 and the IC next to it between good and bad card, however it is the same.
The things that I noticed differs between the good and the bad card are all the Qs next to the RAm chip Q4 - Q5, Q7-Q10. In the good card, the gate has seen 3.3V. In the dead card, it stays at 0V.

I have yet to find out where the control logic driving these FETs. Can it be driven by the GPU or by the logic IC?
« Last Edit: October 27, 2018, 10:01:41 pm by brian27 »
 

Offline cdev

  • Super Contributor
  • ***
  • Posts: 5082
  • Country: 00
Where do you guys get your dead GPUs? Ebay? Miners? Dumpsters?
"What the large print giveth, the small print taketh away."
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf