Author Topic: [MOVED] Hacking NVidia Cards into their Professional Counterparts (Read 1649226 times)

gordan · « **Reply #600 on:** October 23, 2013, 11:37:06 am »

I've always used Kepler BIOS Tweaker to sort out the checksums for me NiBiTor doesn't really understand recent BIOS-es properly, especially the UEFI wrapped ones.

The memory configuration is stored in the EEPROM. I guess you could make some kind of a piggy-back adapter to re-write the EEPROM out of band. I also did some basic comparisons between BIOS images with same BIOS version but for cards with different amounts of RAM, if you look a few pages back on the thread. I'm sure it should be possible to work out where the memory size is stored.

Please, do report back when you find the 3rd nibble resistor pair on the GTX780. I'm most interested in the prospect of turning a Titan into a K6000.

oguz286 · « **Reply #601 on:** October 23, 2013, 11:52:09 am »

User athanor posted photos of his TITAN a while back (https://www.eevblog.com/forum/chat/hacking-nvidia-cards-into-their-professional-counterparts/msg212115/?topicseen#msg212115).
I sent him a personal message to ask if he could measure some resistor values so that we can cross-reference some values. He hasn't responded though

If anyone has a TITAN or a K20 or K20X and could measure some resistors, then it would help tremendously.

gordan · « **Reply #602 on:** October 24, 2013, 09:43:11 pm »

Just completed modifying my new GTX690 and I can confirm what I suggested previously - the only hard-mod required is removing the resistor controlling the 3rd nibble..

This makes the 3rd nibble unstable but it only flaps between A and B. That's not a problem since that is only an instability in the least significant bit of the 3rd nibble, or bit 5 of the device ID.
Since that means only the bottom 5 bits need to change/stabilize, this can be achieved using using a soft mod.

On my card, the default soft-strap in the UEFI header starting at 0xC is effectively null (fully hard-strapped card), i.e.
FF FF FF 7F 00 00 00 80
We want all 5 accessible bits of the device ID to go high, so we need to change the OR section to:
00 3C 00 90.

And voila. GTX690 is now stably a Grid K2 with only two resistors removed, none replaced - which makes the mod at least 4x less complex. The improvement in difficulty is probably greater than 4x since manually re-soldering a 0402 component is considerably more difficult than removing it unless you truly are a ninja with a soldering iron.

Also note that the UEFI headers don't appear to be a part of the checksum - Kepler BIOS Tweaker doesn't detect a checksum mismatch after the above change to the strap. Nvflash also doesn't complain about anything other than the fact that the BIOS ID you end up flashing doesn't match the device ID of the board you are flashing to do you have to do it with --overridetype - no big deal. It also appears there is no strap checksum in the UEFI headers, unlike in the main BIOS payload. All of this makes the process even simpler.

oguz286 · « **Reply #603 on:** October 24, 2013, 10:00:26 pm »

Can I ask you where how you figured out where the UEFI header is? The guy who did the GTX480 to Tesla mod is a colleague of mine, but I did not know that the soft-strap locations for Kepler cards were known. Any link to where I can find such information?
Maybe this is also possible on the 700-series.

BTW, I was very busy so I couldn't make any progress with the GTX780, but I will have some more time tomorrow

gordan · « **Reply #604 on:** October 24, 2013, 11:25:13 pm »

UEFI header is the first 1024 (0x400) bytes. If you look at the BIOS closely you'll find that once you strip off the first 0x400 bytes and all the padding and crypto certs off the end (more or less everything past 64KB), you are left with what is a pretty familiar Nvidia BIOS that hasn't changed much since the Fermi (4xx series) days.

The soft-straps for Kepler are similar to Fermi. At least the ID bits are the same, but I suspect most if not all bits will be the same. It is possible (even likely) some of the previously unknown/unused bits now do something new/different, of course.

I'd be surprised if this worked fundamentally differently on the 7xx series cards.

Update on my GTX690/Grid K2 mod - the card works fine on bare metal, but I cannot for the life of me get it to work with VGA passthrough. Tried both GPUs, and all I get is error 10 (device cannot start). My GTX680 works fine, both as a K5000 and a K2. I suspect the doubled-up PLX PCIe bridge on the 690 is causing a problem - NF200 bridges on my SR-2 are already problematic, and further bridging on top of them isn't going to be helping. The PCIe arrangement is thus Intel PCIe bridge -> NF200 (known to have VT-d affecting bugs) -> PLX bridge -> PLX bridge -> GPU.

Which means I need to start seriously considering either taking a chance on a Titan in hope we can figure out how to mod it into a K6000 as well as hope that the driver "just works" in that arrangement, or I get an ATI card for the VM this was intended for. Tough choice... I guess it depends on whether I can get a Titan for a similar price as an R9 R290X...

oguz286 · « **Reply #605 on:** October 25, 2013, 10:54:46 am »

I tried to softmod the 4th nibble (0x1004 to 0x1022, so from 4 to 2). My BIOS contains the following:

00000000: 4E 56 47 49 0C 01 10 80 B8 04 00 00 DE 10 4B 10
00000010: 08 E2 00 00 00 06 00 00 02 10 10 82 FF 3F FC 7F
00000020: 00 50 00 80 0E 10 10 82 FF FF FF 73 00 00 00 8C
00000030: 00 50 00 80 0E 10 10 82 FF FF FF 73 00 00 00 8C
00000040: 36 10 10 02 FF FF FF FF 00 00 00 00 02 87 08 02

So in my case I guess that means that:

	HEX	Binary
AND0	7FFC3FFF	0111 1111 1111 1100 0011 1111 1111 1111
OR0	80005000	1000 0000 0000 0000 0101 0000 0000 0000
AND1	73FFFFFF	0111 0011 1111 1111 1111 1111 1111 1111
OR1	8C000000	1000 1100 0000 0000 0000 0000 0000 0000

If the bit positions for the device ID are still the same, then to go from 0x1004 to 0x1002 I would change AND0 to 7FFC3BFF and OR0 to 80005200 so that I don't take the resistor value for the 4th nibble, and force it to 2 via OR0.
I tried this and also changed the device ID strings from 0x1004 to 0x1002 in the BIOS, updated the checksum, flashed it to the card... it still displays 0x1004

I hope I did something wrong, but if not, then NVIDIA probably changed some things. The reason why I started with nibble 3 is that I hoped I could change the 4th nibble without hardmodding.

gordan · « **Reply #606 on:** October 25, 2013, 11:32:50 am »

I just spotted an error in what I said - I'm not sure where 0xC came from. Looking at the BIOS dumps I think I meant 0x1C. Annoyingly, I don't have my GTX690 dumps handy, and they are the most minimally modified ones I have.

Ignore the second strap mask - there is nothing of interest there. The important bit is AND0/OR0.

The 32-bit mask can be represented this way:

-xx4 xxxx xxxx xxxx xx32 10xx xxxx xxxx

You want to flip bit 2 low (AND0 to 0, OR0 to 0) and bit 1 high (OR0 to 1).

OLD AND0   7FFC3FFF   0111 1111 1111 1100 0011 1111 1111 1111
NEW AND0   7FFC2FFF   0111 1111 1111 1100 0010 1111 1111 1111
OLD OR0   80005000   1000 0000 0000 0000 0101 0000 0000 0000
NEW OR0   80004800   1000 0000 0000 0000 0100 1000 0000 0000

Which makes the new mask:
AND0 7FFC2FFF
OR0 80004800

And of course remember the byte order is little-endian when editing in the BIOS.

Disclaimer - I may be completely wrong in the above calculation - I haven't had enough coffee yet today.

On the off-chance I'm right, however, the relevant hex pseudo-patch would be:

< 00000010: 08 E2 00 00 00 06 00 00 02 10 10 82 FF 3F FC 7F
> 00000010: 08 E2 00 00 00 06 00 00 02 10 10 82 FF 2F FC 7F
< 00000020: 00 50 00 80 0E 10 10 82 FF FF FF 73 00 00 00 8C
> 00000020: 00 48 00 80 0E 10 10 82 FF FF FF 73 00 00 00 8C

If you could let me know in the next few hours if that works for you, I'd very much appreciate it - I need to make a decision on whether to get a Titan by tonight, and a confirmation that soft-modding works on the GTX780 would go a long way toward persuading me that is the way forward.

Unfortunately, GTX780 and Titan only different in the 4th nibble, so while we should have no trouble figuring out which resistor controls the 4th nibble based on the difference, finding the 3rd nibble will be more difficult.

oguz286 · « **Reply #607 on:** October 25, 2013, 12:42:13 pm »

Yeah I haven't had coffee as well. I cannot recall the reasoning behind my modifications, but yes they are wrong.
Your modification look right, and I will try them when I get back home. Hopefully it will work

EDIT: Well gordan, you were right! nvflash now displays device ID 0x1002! Now we just have to figure out how to modify the 3rd nibble (which is going to be harder).

Cubexed · « **Reply #608 on:** October 25, 2013, 05:54:17 pm »

Hello, I have a GTX 770 with PCI id 0x1184, I am wondering if changing the soft straps I can turn it to 0x118F (Tesla K10), as both have the same chip, I want mainly for VGA Passtrough, as http://wiki.xen.org/wiki/XenVGAPassthroughTestedAdapters indicates that might work, but I want experts opinion before attempting anything

gordan · « **Reply #609 on:** October 25, 2013, 08:13:42 pm »

GTX770 is a relabeled GTX680.
Yes, you can change the soft-straps as described above to turn it into a Tesla K10.
Tesla cards do not work for VGA passthrough.

The only cards that will work for VGA passthrough are:

Fermi (soft-moddable)
Quadro 2000 (or modified GTS450)
Quadro 4000 (no directly equivalent GeForce)
Quadro 5000 (or modified GTX470)
Quadro 6000 (or modified GTX480)

Kepler: (some hard-modding required)
Quadro K5000/Grid K2 (or modified GTX680/GTX770, some got a modified GTX690 to work, but mine refuses to, almost certainly due to the extra PCIe bridging on the card on top of NF200 PCIe bridges on the motherboard - happy to sell you my modified one if you're interested).

IIRC GTX650 -> Grid K1 has also been done.

Also, as you can see above, some effort is going into figuring out how to modify a GTX780/Titan into a K6000, but we're not there yet.

axero · « **Reply #610 on:** October 27, 2013, 03:55:21 pm »

Earlier in this thread there was a discussion about the FLR feature which surprisingly, in spite of its sheer simplicity is rarely implemented in PCI hardware for some reason. So another way to to trig a reset in selected hardware is to use the ACPI API and alter the power states of that hardware. In ACPI the power state D0 means fully on and the power state D3 means off. In power state D3, the Vcc of the PCIe slot is turned off, and in state D0 it is turned back on. Therefore this reset method is called D3-D0.

Now to my question; does this really work on PCIe cards that also take power from an auxilliary power source, i.e. directly from the PSU like most graphics adapters do these days? Setting the slot to D0 will not cut the power to the card, only to the slot where the card is sitting in. Has anyone managed to reset a GPU with the D3D0 method above in spite of the auxiliary power connection? I would really like to know more about that.

Also, I read a lot of questions regarding the vGPU feature offered by the Grid K2 GPU. A few years ago I wrote a post regarding a similar feature in the Intel Communty (and some other forums which I cannot remember):

https://communities.intel.com/thread/25945

In my posts there I was suggesting the development of Intel VT-x/AMD-v like extensions that allows the GPU(s) be shared seamlessly among VMs (and the host) just like the CPU cores can be shared among the (physical and virtual) machines with hypervisors such as VirtualBox, Parallels Desktop, Xen, KVM, Hyper-V, VMWare and so on. A lot has happened since then so I suspect that the vGPU feature is exactly that; a set of hardware assisted GPU extensions. The rendered image (of the computer desktop, a DirectX game or video playback) is then shared over a network connection most likely by using a remote desktop protocol such as RDP, VNC, Spice, etc, (Also The AMD FirePro R5000 et al have the capability of outputting the video signal over an ethernet connection, it even has its own NIC sitting on the card).

If this is the case then perhaps those extensions get disabled on non-K2 GPUs by zener zapping a certain set of fusebits. That would probably lead to the GPU ignoring such instructions as it receives them. Perhaps some instruction sets execution bits have a connection to one of those fusebits through an AND-gate (or NAND-gate depending on how you look at it). These instructions may then get blocked even before they reach the cores.

Edit: Regarding Nvidia's penchant for artificially disabling things, they appear to have done than on their Linux binary blob. This article mentions this about their Mosaic feature:

http://www.phoronix.com/scan.php?page=news_item&px=MTQ3NDE

gordan · « **Reply #611 on:** October 27, 2013, 11:47:35 pm »

IIRC the power management trick is something Xen already implements, provided the GPU actually reports as supporting D3 and D0 power management states.

It doesn't always work, though. Due to various issues, one of my gaming VMs is running a HD7970 cards, and for some bizzare, inexplicable reason GPU-Z locks up the VM and and the GPU hard. Issuing the reset (presumably implemented using the above trick, since the card doesn't support FLR) does nothing to shake it loose. The particularly weird thing is that if I use a HD4850 card for dom0, the whole machine locks up solid as soon as the ATI driver is loaded in domU. This doesn't happen, however, if the domU card is HD7450. Really strange. Because of this I had to switch back to my 8800GT card for dom0.

The trick might, however, just work well enough to resolve the issue of performance degradation after domU reboots (instead of rebooting, power it off, issue the reset, then start up the domU).

While this is still nowhere nearly the "just works" 100% reliable and robust experience achieved using something like the soft-modded Fermi (GTS450/GTX470/GTX480) cards into corresponding Quadros (Quadro (2/5/6)000 respectively), it is a vast improvement on ATI usability since a few months ago. So much so, in fact, that given my experience of assorted oddness with my Kepler cards (GTX680 won't do dual-link DVI modes but only when virtualized (works fine on bare metal), GTX690 modified to Grid K2 that flat out refuses to initialize in domU), I'm actually switching back to using ATI cards for some VMs where I need more performance than a GTX480/Q6000.

Anyway, back on the issue of power management reset trick, (most?) Nvidia cards don't seem to support the required levels of power management to do that trick (or at least they don't seem to advertise it), yet they work just fine across domU reboots.

Finally, I don't think any of the extensions for virtualization are disabled - I seem to recall that somebody posted somewhere that they got vSGA/ESXi working with a GeForce GK104 based card after modifying it. If you think about it, no special hardware support is required. All vSGA does is implement a virtual GPU driver that offloads DirectX and suchlike via a paravirtualized interface to the host's GPU. I may be wrong, but IIRC older models like Quadro 6000 are also supported.

I guess the next logical step from there might be to use remote GPU rendering using a trick along the lines of DMA mapping the BARs using RDMA over infiniband.

Also, you don't actually need Mosaic on Linux - you can achieve the exact same thing using standard Xorg configuration and extensions like Xinerama. I've been using it for years - it's the only way to get stretched full screen desktop working on monitors like the IBM T221 that require two dual-link DVI channels to achieve enough bandwidth for their full resolution and frame rate, and thus appear as either 2x1920x2400 screens or 4x960x2400 screens that need to be "glued" together in Xorg.

axero · « **Reply #612 on:** October 28, 2013, 01:10:31 am »

Perhaps a failure to reset e.g. the earlier generations of the ATi/AMD card might be due to the auxiliary power keeping the card "alive" even when the power to the PCIe slot is cut. If that is the case then maybe a switch or relay that turns off the auxiliary input (upon detection of a Vcc cut) might help (such relays ought to be able to take quite a few amps though 240W@12V => 20A).

After reading a Whitepaper on sSGA/sDGA deployment in ESXi I found at the end of that document that the Quadro 6000 uses "Bridge reset" as a reset method (a search for '"bridge reset" virtualization' in google made me discover that also Quadro 4000 and Grid K1 use this method, this can be found in ESXi through the command "esxcli hardware pci list -c 0x300 -m 0xffc"). Bridge reset is what I believe to be also known as "bus reset" which presumably means that the entire PCI bus is reset. The different reset methods are discussed very briefly in the VM DirectPath documentation although I don't know any more about it. How bridge or bus reset works is as to this date a mystery to me...

The special thing about the vGPU feature is that it can be shared among up to 8 virtual guests as one GPU, i.e. it is not dedicated to one VM as VGA passthrough (or sDGA in ESXi). That requires a more sophisticated solution than when it is dedicated which made me suspect that the drivers are not only paravirtualized but also hardware assisted through certain extensions (mind you that the AMD-v/Intel VT-x extensions do not require special paravirtualized drivers on the guest side). The downside with this technology is that it currently only gives up to 512MB of video RAM to each VM and that only DirectX up to version 9.0c is supported, at least in ESXi. Other conditions may apply in Hyper-v and other hypervisors that support the vGPU technology. So, maybe there are no hardware extensions involved with the vGPU technology after all.

oguz286 · « **Reply #613 on:** October 28, 2013, 01:32:32 am »

Well, here's an update. In trying to find the resistor(s) that control the third nibble, ijsf (the guy who did the original GTX480 to Tesla hack) and I screwed around with the BIOS, and sure enough, the card was not recognized anymore.

I disconnected the power to the eeprom but that didn't help either. In the end I ended up with hooking up the eeprom to my Raspberry Pi, writing a python script that can read from and write to the eeprom and finally managed to write the original BIOS back. Luckily the card works again, and now I can always reflash the card because I have a breakout-board that I can hook up to my RPi

gordan · « **Reply #614 on:** October 28, 2013, 07:57:45 am »

Quote from: axero on October 28, 2013, 01:10:31 am

Perhaps a failure to reset e.g. the earlier generations of the ATi/AMD card might be due to the auxiliary power keeping the card "alive" even when the power to the PCIe slot is cut. If that is the case then maybe a switch or relay that turns off the auxiliary input (upon detection of a Vcc cut) might help (such relays ought to be able to take quite a few amps though 240W@12V => 20A).

You know, you might be on to something here - I thought I found that resetting the HD7450 card works, but - that only draws power from the slot, no auxiliary ATX power goes into it. So perhaps you are right. Putting the card into D3 state, and cycling ATX aux power might just do the trick. You'd need a multi-throw (3 lines per power connector) switch to make this work, but it does sound like something worth testing. Do post some designs if you come up with an internal USB+hub widget that can do this.

Quote from: axero on October 28, 2013, 01:10:31 am

The special thing about the vGPU feature is that it can be shared among up to 8 virtual guests as one GPU, i.e. it is not dedicated to one VM as VGA passthrough (or sDGA in ESXi). That requires a more sophisticated solution than when it is dedicated which made me suspect that the drivers are not only paravirtualized but also hardware assisted through certain extensions (mind you that the AMD-v/Intel VT-x extensions do not require special paravirtualized drivers on the guest side). The downside with this technology is that it currently only gives up to 512MB of video RAM to each VM and that only DirectX up to version 9.0c is supported, at least in ESXi. Other conditions may apply in Hyper-v and other hypervisors that support the vGPU technology. So, maybe there are no hardware extensions involved with the vGPU technology after all.

It's not that magical/complicated. VMware has had accelerated emulated drivers in desktop hypervisors for years. They offloaded guest's 3D rendering onto the host's OpenGL subsystem. vSGA is conceptually similar - it is a hardware accelerated emulated graphics device that offloads the rendering work to the real GPU.

Quote from: oguz286 on October 28, 2013, 01:32:32 am

Well, here's an update. In trying to find the resistor(s) that control the third nibble, ijsf (the guy who did the original GTX480 to Tesla hack) and I screwed around with the BIOS, and sure enough, the card was not recognized anymore.

I disconnected the power to the eeprom but that didn't help either. In the end I ended up with hooking up the eeprom to my Raspberry Pi, writing a python script that can read from and write to the eeprom and finally managed to write the original BIOS back. Luckily the card works again, and now I can always reflash the card because I have a breakout-board that I can hook up to my RPi

Awesome stuff.

Any chance you could use this opportunity of having an unbrickable card to investigate whether the meaning of any of the bits in the first 32-bit strap have changed, and whether one of the previously unknown bits might have been used to set the 6th device ID bit? It'd be really handy if the soldering solution could be fully deprecated.

axero · « **Reply #615 on:** October 28, 2013, 05:03:32 pm »

Maybe it's not that much "magic" in sharing a GPU between VMs but it is quite tricky to do that without overhead and yet be rather as feature rich as if it were on bare metal. Before AMD-v and Intel VT-x the CPU sharing took a rather substantial penalty from the virtualization. Now this penalty is rather small thanks to the hardware assisted virtualization technology offered through VT-x and AMD-v. From the papers on vGPU there seems to be a rather small penalty to sharing the GPU, either they have really managed to bring up smart drivers or there is something hardware assisted to back it up. Maybe there is a rather substantial overhead that is "offset" by the capabilities of the GPU.

I have started a thread for resetting PCI devices with auxiliary power input here:

https://www.eevblog.com/forum/projects/acpi-power-saving-circuitry-for-150-w-pci-devices-%28ie-gpus%29/

I guess a further discussion about it should be taken there.

gordan · « **Reply #616 on:** October 28, 2013, 05:23:45 pm »

Quote from: axero on October 28, 2013, 05:03:32 pm

Maybe it's not that much "magic" in sharing a GPU between VMs but it is quite tricky to do that without overhead and yet be rather as feature rich as if it were on bare metal.

It's not without overhead - the overhead is likely quite substantial, not including the inevitable overhead of actually encoding it into an MPEG stream in realtime and sending it over the network. While it is a cool feature, its use-case is rather narrow.

Quote from: axero on October 28, 2013, 05:03:32 pm

Before AMD-v and Intel VT-x the CPU sharing took a rather substantial penalty from the virtualization.

Even with those virtualization performance penalty is substantial:
Virtualized Performance - or Lack Thereof

There were also other solutions before VT-x that provided only marginally worse performance (e.g. kqemu)

Quote from: axero on October 28, 2013, 05:03:32 pm

Now this penalty is rather small thanks to the hardware assisted virtualization technology offered through VT-x and AMD-v. From the papers on vGPU there seems to be a rather small penalty to sharing the GPU, either they have really managed to bring up smart drivers or there is something hardware assisted to back it up. Maybe there is a rather substantial overhead that is "offset" by the capabilities of the GPU.

That's pretty much it - the GPU has enough processing power to to produce reasonable results within the given constraints. That doesn't mean it's particularly efficient. I would be surprised if the performance is much more than 50% of what you might expect on bare metal, especially after you account for the MPEG encoding. I have a virtualized gaming rig that is pretty finely tuned, and the frame rates on bare metal are at least 10-20% higher - and that's just with VGA passthrough, which is a lot less overheady than something like vSGA.

gordan · « **Reply #617 on:** October 29, 2013, 07:32:59 am »

Quote from: oguz286 on October 28, 2013, 01:32:32 am

Well, here's an update. In trying to find the resistor(s) that control the third nibble, ijsf (the guy who did the original GTX480 to Tesla hack) and I screwed around with the BIOS, and sure enough, the card was not recognized anymore.

I disconnected the power to the eeprom but that didn't help either. In the end I ended up with hooking up the eeprom to my Raspberry Pi, writing a python script that can read from and write to the eeprom and finally managed to write the original BIOS back. Luckily the card works again, and now I can always reflash the card because I have a breakout-board that I can hook up to my RPi

Any chance you could post a detailed explanation of what you did to make an unbricking rig? I have a suspicion that the root cause of the death of my first GTX690 might have been a misflash that corrupted the PLX chip (PCIe bridge) EEPROM. It'd be nice to have a go at resurrecting it.

axero · « **Reply #618 on:** October 29, 2013, 11:32:26 am »

Quote from: gordan on October 28, 2013, 05:23:45 pm

Quote from: axero on October 28, 2013, 05:03:32 pm
Before AMD-v and Intel VT-x the CPU sharing took a rather substantial penalty from the virtualization.

Even with those virtualization performance penalty is substantial:
http://www.altechnative.net/2012/08/04/virtual-performance-part-1-vmware/

There were also other solutions before VT-x that provided only marginally worse performance (e.g. kqemu)

I guess it depends on what type of load you expose the virtual CPU to. I have seen tests from Phonoix.com website where the difference between VMs and bare-metal performance is considerably less. Look for example at this article:

http://www.phoronix.com/scan.php?page=article&item=ubuntu_1110_xenkvm&num=2

Regarding VirtualBox4 that has shown a notoriously bad result in the test you refer to (perhaps an old version? VB4.3 is out now), if you intend to run VB and find VB4 to be sluggish, you could also try the earlier VirtualBox 3.2.

gordan · « **Reply #619 on:** October 29, 2013, 09:26:29 pm »

My advice on the subject of benchmarks is to only believe your own. The reason I carried out mine was because I didn't believe all the ones that showed a negligible performance penalty. The performance hit from core-to-core migration alone is very significant (bigger than the "industry claims" of performance hit of virtualization).

Some years ago when I was folding I found that just pinning process threads to cores boosted performance by nearly 25% on a Core2 Quad. Now, granted, on a C2Q you get doubly hit when the process migrates between cores that aren't on the same die (C2Q is a 2x2 design) since the other die won't have it's caches primed for that process, but even staying on the same die there is a slow-down.

Now take this a further level of abstraction up where the guest generally has no idea what the physical CPU estate layout might be, and you are making the problem massively worse because the guest process scheduler is running completely blind, while the hypervisor is additionally context switching different VMs vCPUs all over the place to run on overbooked physical hardware. It doesn't take much imagination to see how the performance cannot be anything but poor.

Anyway, this is getting off-topic.

gordan · « **Reply #620 on:** October 30, 2013, 12:43:38 pm »

I'm disposing of some of my modified GeForce cards. If anyone who reads this wants a VGA passthrough capable Nvidia card on the cheap but lacks the confidence to attack it with a soldering iron and/or hex editor, you may be interested in this.

I have:
~~2x GTS450 -> Q2000~~
~~1x GTX470 -> Q5000~~
1x GTX690 -> Grid K2

PM me if you are interested in any of these and are in the EU (outside the EU the import duty and shipping would make this uneconomical).

oguz286 · « **Reply #621 on:** October 30, 2013, 06:05:15 pm »

Quote from: gordan on October 29, 2013, 07:32:59 am

Quote from: oguz286 on October 28, 2013, 01:32:32 am
Well, here's an update. In trying to find the resistor(s) that control the third nibble, ijsf (the guy who did the original GTX480 to Tesla hack) and I screwed around with the BIOS, and sure enough, the card was not recognized anymore.

I disconnected the power to the eeprom but that didn't help either. In the end I ended up with hooking up the eeprom to my Raspberry Pi, writing a python script that can read from and write to the eeprom and finally managed to write the original BIOS back. Luckily the card works again, and now I can always reflash the card because I have a breakout-board that I can hook up to my RPi

Any chance you could post a detailed explanation of what you did to make an unbricking rig? I have a suspicion that the root cause of the death of my first GTX690 might have been a misflash that corrupted the PLX chip (PCIe bridge) EEPROM. It'd be nice to have a go at resurrecting it.

It's really not that complicated. I have a Gigadevice GD25Q10B SPI-EEPROM on my card and I used the datasheet to figure out what the layout of the pins were. The SPI protocol is really simple as you have six pins that you have to connect: Vcc, Vss, CE# (= chip select), SCLK (=serial clock), MISO (=Master In, Slave Out) and MOSI (=Master Out, Slave In). The WE# and HOLD# pins are connected to Vcc so, you just solder wires onto the EEPROM and connect them to the Raspberry Pi. Then you just use the spidev user space module to sent commands to the EEPROM which are listen in the datasheet. To get it running quickly I used py-spidev so that I could whip up a simple python script that dumps the EEPROM (to check if I can communicate with the chip correctly), and one script that writes the correct ROM to the chip.

One potential problem though: I have a different card that has the Pm25LV512 chip which I couldn't program. It's because it reads/writes data at the falling edge of the clock, whereas the spidev module reads/writes on the rising edge of the clock. This is stated in the datasheet of that chip, and the datasheet of the GD25Q10B that is on the GTX780 states that it reads/writes on the rising edge of the clock. If you have a chip that acts on the falling of the clock you need to use a different linux kernel module that supports this mode (spidev does not) which involves recompiling the Raspbian kernel.

foxdie · « **Reply #622 on:** October 31, 2013, 09:12:49 pm »

Hi all, just want to chime in here.

Firstly, it's great to see such passionate hacking of Nvidias offerings. There's been a few geekgasms perusing this thread

Secondly, this thread has grown to become white a whopper and extracting information is becoming quite a challenge. What would be nice is a summary post once in a while like gordans post here. That helps people of all experience get a broad overview of progress and those with deeper understanding can delve deeper.

Whats not clear to me at this point is what will and won't work for those of us who want to make daily use of the end result. I personally would like to give guests in ESXi a decent 3D performance bump but I'm not sure how to approach that (what card is seen as the best starting point, what work needs doing to it etc). I realise this thread isn't about making card X work with technology Y but most of us are here for the virtualisation benefits.

I'm not scared to crack out a soldering iron and a multimeter (although I doubt my hands are steady enough to resolder SMD resistors hehe), I'd just like some recommendations on what direction to take

gordan · « **Reply #623 on:** November 01, 2013, 12:48:36 am »

Quote from: foxdie on October 31, 2013, 09:12:49 pm

Whats not clear to me at this point is what will and won't work for those of us who want to make daily use of the end result. I personally would like to give guests in ESXi a decent 3D performance bump but I'm not sure how to approach that (what card is seen as the best starting point, what work needs doing to it etc). I realise this thread isn't about making card X work with technology Y but most of us are here for the virtualisation benefits.

It comes down to 2 things:
1) Budget
2) Hand steadyness / suitable soldering equipment availability (tiny soldering iron, at least a good magnifying lamp, preferably an electronics microscope

If you are on a tiny budget (<= £50) and only need reasonable performance with direct VGA passthrough up to 1080 resolution, get a GTS450 GDDR5 card and turn it into a Quadro 2000.

As your budget and performance requirements go up, a GTX470 or GTX480 may be more suitable (they make good Quadro 5000 and Quadro 6000 cards respectively). These work not only with direct VGA passthrough, but should also work with ESXi vSGA. There have been reports of people successfully running ESXi vSGA with 6 clients using a single Quadro 6000 card to achieve mostly playable Borderlands at 800x600. A modified GTX480 should be somewhat faster at this than a real Quadro 6000. A modified GTX470 should be about 50% faster than a real Quadro 5000. I use Xen rather than ESXi, but I wouldn't expect anything to notice any difference between one of these modified cards and the real thing as far as GPU acceleration offload is concerned.

If your budget and performance requirements are even higher, you have little choice but to modify a GTX680/GTX690/GTX770 into Quadro K5000 or Grid K2. Either mod works fine, and I have not observed any obvious difference in functionality or stability between modifying to K5000 or K2, both work just fine. For this, however, you will need to at least remove the resistor controlling the 3rd device ID nibble. Replacing an 0402 resistor is harder than removing it, but leaving the resistor off can cause device ID instability. You can compensate for this by soft-modding the available strap bits.

It is perhaps worth noting that my experience with modified Fermi cards has been 100% problem-free (Q2000, Q5000 and Q6000 mods have all always worked flawlessly).

Kepler modding has been somewhat unpredictable for me. I had a GTX690 card that wouldn't cooperate for some reason, and another that works fine, only I cannot use it due to a motherboard bug (avoid anything with NF200 PCIe bridges if you want to virtualize). I have a GTX680 card that is quite thoroughly modified into a K5000, but it is also not trouble free (read back on the thread about the bizzare dual-link DVI issue - single link DVI modes work fine, dual link modes don't, and neither does DP - but all this is only a problem with running virtualized; on bare metal the card works absolutely fine on all ports and in all modes). Nobody else reported similar issues (and in fact, many people reported a resounding success with modified Keplers), so this is probably just my talent for finding bugs in everything showing up.

So if you can live with the performance, and you want to put in the least possible amount of effort on a relatively tight budget, GTX470 or GTX480 is probably the best price/performance/ease compromise.

Does that answer your question?

foxdie · « **Reply #624 on:** November 01, 2013, 12:30:57 pm »

Hi gordan, you're a legend for getting back to me so quick.

Budget isn't an issue (within reason), I'm looking to set up a virtualised gaming rig much like yourself. There'll be a Windows 7 (64-bit ent) VM that'll need as much 3D gaming grunt as possible and a couple of other VMs that need some acceleration to be responsive and usable (possibly 3D too). Naturally I want to go for as much power as possible, I plan to use the machine for current and next gen gaming.

My system will be a SuperMicro X9SRA motherboard with a Xeon E5-2620 v2 and 32GB Reg ECC RAM.

I thought about converting a GTX 480 into a Q6000 as they're fairly cheap to pick up used on ebay, however I'm not sure which brand would follow reference design to make the modification less of a headache.

What wasn't made clear earlier was that a Q6000 clone (GTX 470 / 480) can be used as vSGA with 6 guests, that's a great piece of information for those looking to accelerate 3D on multiple VMs on a budget

What would also be handy to know, and again I assume this is probably beyond scope of this thread, would be if multiple Q6000 clones can be added to a system, one passthrough'd to one VM directly for as much acceleration as possible, and a second Q6000 clone distributing as vSGA between remaining VMs? GTX 480s can be picked up second hand for around £100 each on eBay so they'd make a great price vs benefit starting point.

The GTX 680 I was secretly hoping you would have found a solution by now, but as with all things of this nature, it wouldn't be too easy or everyone would be doing this to their cards


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: [MOVED] Hacking NVidia Cards into their Professional Counterparts (Read 1649226 times)

Share me