Products > Computers

NVIDIA Releases Open-Source GPU Kernel Modules

<< < (7/7)

nctnico:

--- Quote from: madires on May 14, 2022, 02:51:44 pm ---One problem with the infamous proprietary binary blob is long term support, especially in case of larger changes in the linux kernel. Manufacturers can drop support for their binary blob any time for any reason. When that happens and the driver requires some significant changes due to kernel changes you're stuck because you can't modify the binary blob. You could try to add some compatibility/translation layer for the outdated blob, but this would be just :horse:.

--- End quote ---
That is mainly a problem caused by the Linux kernel developers by changing internal structures at will without really thinking things through. From a software maintenance perspective the Linux kernel is a hot mess so I am very gratefull for any commercial hardware vendor to have Linux support. It is not a thankfull job to keep shooting at a moving target.

To be honest, it wouldn't surprise me if every new kernel version has the same number of new bugs as the number of bugs that are fixed / features added. One example I ran into many years ago: I had a problem with a SOC that wouldn't always come out of a system reboot (reboot command). Turns out that the code to reset the power management to a voltage level where the processor could run at default speed was removed in the kernel version I happened to be using (some developers thought it was a good optimisation). It could happen the voltage regulator got set at a lower voltage just before the processor was reset through software causing the processor to get stuck (voltage too low for the frequency). Turns out this bug actually affected a few PC platforms as well.

Nominal Animal:

--- Quote from: nctnico on May 14, 2022, 06:38:37 pm ---That is mainly a problem caused by the Linux kernel developers by changing internal structures at will without really thinking things through.

--- End quote ---
As I just explained, "without really thinking things through" != reality.  It is actually a very common, very silly misconception.

If the Linux kernel developers did not change internal structures as needed, it would not have the capabilities or support it does have today.  Simply put, the lack of any rigid internal structure is the cost of being so versatile.  Feel free to disagree, but the above is the actual reason among Linux kernel developers; the people who actually do this stuff day in day out.

As to SBCs and Linux running on SoCs, I still haven't seen a clean, well-ordered toolchain/kernel sdk from a commercial vendor.  They look more like Linux newbie development trees, and not something put together with a sensible design.  Routers are a perfect example.  Just compare a "plain" OpenWRT installation to say Asus router images or Realtek SDK to see for yourself.  The latter are laughably messy, like something put together by a high-schooler.  Which is also the reason why I wish more people –– especially the ones who might become integrators, building such images, at some point –– would learn LinuxFromScratch, git, and historical stuff like Linux Standard Base and Filesystem Hierarchy Standard, Unix philosophy and software minimalism, and the technical reasons why so many core developers dislike systemd.  It is not that hard to do Linux integration properly; it is just that newbies (with a proprietary software background) make the same mistakes again and again.

Kernel bugs do occur, of course –– but that is because most developers today, Linux kernel developers included, are more interested in adding new functionality and features than fixing bugs and making things robust.  Userspace-breaking stuff gets reverted –– and the language Linus used to use to berate authors making such breaking changes is what so many people complained about! Pity, I liked the sharpness! ––, so it turns out that for least maintenance work, you want to be able to upgrade to newer vanilla kernels, but recommend LTS kernels you test and help maintain yourself.  It is minimal work compared to testing and maintaining your own kernel fork.

It might be that NVidia is feeling pressure on this front from the HPC field.  It is well known nowadays that if you do e.g. CUDA stuff, bug-wise you're on your own; nobody outside NVidia can actually help you.  (There is a related tale of the "tainted flag" Linux kernel uses when a binary-only driver has been loaded, and how the users who think Linux kernel developers should be telepathic and clairvoyant and be able to fix bugs even when the kernel data structures have been accessed by unknown code, believed that it should not apply to NVidia drivers, and is "just offensive"...  :palm:  See the first paragraph describing this in the Linux kernel documentation, and consider the precise and gentle language used.  Heh.)

magic:

--- Quote from: Nominal Animal on May 14, 2022, 08:52:12 pm ---It might be that NVidia is feeling pressure on this front from the HPC field.  It is well known nowadays that if you do e.g. CUDA stuff, bug-wise you're on your own; nobody outside NVidia can actually help you.
--- End quote ---
The fact that desktop support is "alpha quality" is a big hint ;)

I think it's not only HPC but also big "machine learning" farms, particularly "cloud" ones made available for hire. They must feel at least a little uneasy about security implications of allowing network-facing applications or applications written by total strangers to call into a proprietary blob in the kernel. And of course, maintaining a tainted kernel is an extra headache too.

Navigation

[0] Message Index

[*] Previous page

There was an error while thanking
Thanking...
Go to full version