Products > Computers

NVIDIA Releases Open-Source GPU Kernel Modules

<< < (6/7) > >>


--- Quote from: magic on May 14, 2022, 08:50:23 am ---Not sure what you are talking about.

--- End quote ---

Keyword[]={ GSP-RISC-V }

Every GPU loads a piece of firmware, but this one's hefty! 34Mbyte!!! ~900 functions implemented!!! Hence we can affirm that with NVIDIA, a *good portion* of features that AMD and Intel drivers implement in kernel are, instead, provided by a binary blob from inside the GPU, and this blob runs on the GSP, which is a RISC-V core(1).

You actually have a programmable GPU! We already have read some good article like this(2), but this NVIDIA solution looks more like the one I saw in 2001 where a RISC-like CPU was paired with a dual port ram and DAC to create a 2D accelerated video card.

It was a CPU, running software, re-programmable, not dedicated hardware.

(1) only available on Turing  and younger GPUs.
(2) General-purpose computing on graphics processing units

NVIDIA Turing : RISC-V = Rendition Vérité 1000 : V1K-RISC

The v1000 was only a slow V1K-RISC CPU @25 MHz, having a one-cycle multiplication of 32 * 32, occupying a solid part of the chip,  a one-cycle instruction for calculating the approximate inverse value, that is, a two-stroke approximated integer division, and the usual set of RISC instructions, but encoded as custom "V1K-RISC". Oh, and another “bilinear load” instruction that read a 2x2 linear memory block and performed bilinear filtering based on fractional u and v values ​​passed to the instruction. The map had a tiny cache, it seems to be only 4 pixels. Therefore, if a perfectly matching 2x2 block appeared, we received a reduction in the load on the memory bandwidth.

Very old product, rare to see, an nobody remembers open source drivers have ever really worked with x11.

There are ~20 years between NVIDIA Turing and Vérité 1000, but they really look similar: no opensource drivers, all achieved by reverse engineering the PCI binary-only drivers.

Oh, and when it somehow works, well it's always bit bumpy  :-//

Nominal Animal:
Closed-source Linux kernel drivers make kernel issues undebuggable, because of the structure of the Linux kernel: a driver can do anything, and crashes usually occur somewhere else completely.  Use-after-frees, stale (incorrect) addresses scribbling over unrelated kernel memory, and so on, are typical examples of this.  (Many end users have a difficult time believing this, and insist that "my [closed source] drivers cannot be the cause, as other users would have reported it also, so it must be [code I'm responsible for]!" Yet, if the closed source drivers are never loaded in the first place, the bug does not occur... It is unbelievably frustrating for someone like me just trying to help.)

Closed firmware running on a peripheral communicating via a Linux kernel controlled bus (including address translation and DMA engines), on the other hand, poses no problems in this respect.  Besides, a typical kernel developer does not have the necessary documentation and/or hardware tools to safely develop custom firmware for arbitrary devices.

Because of this, I personally do not care much what internal firmware a graphics card might run, as long as its access to any other hardware and CPU-accessible memory is completely controlled by the Linux kernel.  As long as all code running on the CPU is open source, I can at least try to debug and fix issues, and that's good enough for me.

One problem with the infamous proprietary binary blob is long term support, especially in case of larger changes in the linux kernel. Manufacturers can drop support for their binary blob any time for any reason. When that happens and the driver requires some significant changes due to kernel changes you're stuck because you can't modify the binary blob. You could try to add some compatibility/translation layer for the outdated blob, but this would be just :horse:.

Nominal Animal:
Yep, what madires wrote.

A major reason the Linux kernel is so darned versatile –– I mean, being used from phones to routers to desktops to servers to HPC clusters –– is that it is internally modular, with no stable internal APIs.  The userspace interface (syscalls, /proc and /sys pseudofilesystems) should be stable, and mostly is (with limited exceptions), but internally everything is subject to change and refactoring.  The amount of refactoring-like code churn in it is stupifying.  I mean, it is no longer possible for a single human to keep track of all the changes by themselves.  (That's why Linus delegates subsystems to proven capable developers.)

(All that said, I haven't taken a look at the sources to find out exactly what NVidia is doing; whether they open sourced just a shim, or a full kernel driver with any closed source blobs running on the GPU only.  I don't have any NVidia hardware, and prefer OpenCL over CUDA anyway, so it does not affect me right now.)

It would be darned nice to have a computing ASIC, preferably one with a large number of cores able to do 4-component Euclidean and homogenous coordinate vectors and basic vector algebra (dealing with anything 3D) and basic real and complex algebra using double precision floating-point at full speed, and a memory model with extremely fast read-only "global" memory access (data lookup) and read-write "local" memory accesses; perhaps some kind of "page" per core..  Forget about trigonometric and special functions, just super-fast, basic mathy cores in parallel.  That is what could give us a new order of magnitude of efficiency in many simulations (MD, FEM, most 3D non-field/QM stuff).  Do note that graphics doesn't need double precision; single precision suffices for human vision related stuff, but not really for the kinds of simulations I'm talking about.


[0] Message Index

[#] Next page

[*] Previous page

There was an error while thanking
Go to full version