> .. slow 8 year old Volta GPU with only 16GB of RAM
My though process was to test with a $40 card with made up heat shink just to get connections validated. After that was planning to move to P100, still an old card but does job running quantized models or small fine tunings.
> The cost of spinning a board with an SXM2 interface conversion function is likely to massively exceed the cost of buying a modern GPU with 2-10x the performance of this card.
At worst, will write it off as learning experience
.
Based on research so far (need to compile them and share it as well).
- SXM2 differential Pins are 1:1 with PCIE Lanes. 32 Diff pairs as well as SMBUS, REFCLK and other pcie houskeeping pins.
- Rest of the pins are power pins and NVLINK pins. NVLINK is optional can be left unconnected may be ?.
- For LLM inference, major bottleneck is the GPU memory bandwidth. Can take hit in CPU to GPU bandwidth, once models are loaded, the pcie link is used less. So link downgrade might be acceptible.
For PCB.
- Aiming as frugal overall solution as possible.
- The PCB will be bare bone SXM socket pins to PCIE male pins only.
- In addition to data traces, For power, either PCIE GPU Header or plain 12V pins.
- Looking at some of the PCIE extension cable's dimensions, 100x100 board should do it. PCBWAY or JLPCB's cheap 4 Layer should cover it.
- Biggest hurdle/cost sink is the BGA socket with 400 pins. Out of two, only one is required if not using NVLINK.
- My first experiment is going to be checking whether a PCB cutout can be used in place of socket. Basically a BGA pad placed on top of supporting structure that has screws to mount. And either soldering pins or high density headers. This is need to research more.