After this project, I'd like to play with a simple OS not written in C/C++.
I mean, I like to compile (modify?) some good piece of code, upload something on a board (68k? modern STM32?), and play with it.
Ada? Pascal? Modula2? Oberon? Assembly? All welcome
Other than assembly language, those are all isomorphic to C/C++, certainly the GNU version if not the standard (e.g. with nested functions possible). They differ only in surface syntax, the standard library, and things such as how visibility of names is controlled. Generate code is identical. The same goes for Lisp.
Really? You can get buffer overflow in Ada using the normal cliche programming style? Or access/mutate aCamel as if it was aHorse?
You don't get buffer overflows in properly-written C (or especially C++) code.
I wondered how long it would be until the There's No True Scotsman fallacy would rear its head! https://en.m.wikipedia.org/wiki/No_true_ScotsmanThat is an unfair statement.
Unlike C++, C has two different "modes": hosted environment and freestanding environment. The former includes the standard C library – including functions like fgets(), strcpy(), and so on –; whereas the latter is used when programming kernels and microcontrollers, often with a replacement set of functions (see Linux kernel C functions, or the Arduino environment, for examples).
Buffer overflows are intrinsic part of the C standard library, but not the C freestanding environment. It is quite possible to replace the standard C library with a completely different API, including arrays with explicit bounds and garbage collection, but keep the C compiler and syntax.
Therefore, this is not a No True Scotsman argument, because it identifies the problematic but optional half of C. I know this, because I myself am working on a "better" substitute (for my own needs and uses).
So you think it is an acceptable argument to exclude "half" of C uses and concentrate on the less inconvenient other half?No, I think it is an error to consider hosted C environment the only valid C, and call relying on freestanding C 'excluding "half" of C', when the main problems are all in the hosted C environment only.
The above linked argument is not a case of No True Scotsman, exactly because of the dual nature of the C standard, and the problems being avoidable by using freestanding C. It is a valid argument, because many C developers are not aware of the differences between the hosted environment and the freestanding environment, and conflate the two.
(Technically, one does not even need to write freestanding C, just avoid using most of the standard C library API; and use other APIs, like say Boehm GC for memory allocation, and mutable data buffers with explicit bounds instead of standard C strings, to basically completely avoid both buffer overruns and dynamic memory management problems.)
I fully agree that the C standard committee has dropped the ball over two decades ago, mainly due to increased vendor pressure and complete rejection of the POSIX standard, and instead veering into C++ and vendor-specific optional interfaces (like the so-called "safe I/O functions", which are nothing of the sort).
Put another way, buffer overruns and dynamic memory management issues are not an inherent part of C; only an inherent part of the library that forms the core of the hosted C environment: the standard C library. It is quite possible, and indeed very feasible, to either replace, or just augment the standard C library with something completely different that 1) does not suffer from buffer overruns because array boundaries are part of the data structures used by the replacement library interfaces, and 2) has an efficient automatic garbage collection; and the code will still be C that a typical C developer will be able to read and maintain. To develop such code, a typical C developer will have to learn those new interfaces, but that's it.
Having experimented and delved into this, it amazes me that no real work has been published on this front, because I'm basically drowning in possibilities and having to write a lot of test code just to see which options I prefer right now, for code running under the Linux kernel on typical ARM and Intel hardware.
To me, it feels like computer scientists are arguing amongst themselves how many sides should a polygonal wheel have, completely ignoring round, circular wheels... We really have not made much real progress in software engineering (and I'm suspecting in computer science too) in the last two or three decades. Small optimizations only.
Things like the Arduino library (which replaces the standard C library for Arduino development; and although the code is compiled using a C++ compiler, it relies on the GNU C++ compiler providing a freestanding C++ environment based on the C freestanding environment) are honestly quite horrible, possibly even worse than the standard C library. I shan't talk much about the various vendor-provided Hardware Abstraction Libraries, just that every single one I've seen has been a disappointment (in the software engineering sense – compare to a contractor seeing a house built with timber but using twine instead of nails or screws (or even pegs) to hold things together).
The dual nature is important to realize, because the non-library part of C is so simple yet powerful. It could be much better (code-level concurrency, barriers, memory semantics/coherency etc.), but that sort of stuff is better explored with other languages. However, the C library part, which is not a compulsory/required part of C, only an optional part, is the biggest problem with C, and is easily replaced with something else. (That is, all C compilers I have used, provide compile time flags and options that allow trivially replacing/omitting the standard C library with something else.)
many C developers are not aware of the differences between the hosted environment and the freestanding environment, and conflate the two.
I fully agree that the C standard committee has dropped the ball over two decades ago, mainly due to increased vendor pressure
Put another way, buffer overruns and dynamic memory management issues are not an inherent part of C;
To me, it feels like computer scientists are arguing amongst themselves how many sides should a polygonal wheel have, completely ignoring round, circular wheels... We really have not made much real progress in software engineering (and I'm suspecting in computer science too) in the last two or three decades. Small optimizations only.
Things like the Arduino library ... are honestly quite horrible, possibly even worse than the standard C library. I shan't talk much about the various vendor-provided Hardware Abstraction Libraries
The dual nature is important to realize, because the non-library part of C is so simple yet powerful.
It could be much better (code-level concurrency, barriers, memory semantics/coherency etc.), but that sort of stuff is better explored with other languages.
However, the C library part, which is not a compulsory/required part of C, only an optional part, is the biggest problem with C, and is easily replaced with something else.
Using C without the standard library doesn't make it any safer or more secure as a language. Proof of which is the long history of Linux kernel level exploits. Or exploits in no-OS network appliances. The MITRE database is a good place to look for examples.
To think that "libraries" are somehow a problem and not using them will improve the situation is odd. It is simply difficult to write secure code in C. It requires a certain mindset and a lot of experience. Rare traits in the industry.
Unlike C++, C has two different "modes": hosted environment and freestanding environment. The former includes the standard C library – including functions like fgets(), strcpy(), and so on –; whereas the latter is used when programming kernels and microcontrollers, often with a replacement set of functions (see Linux kernel C functions, or the Arduino environment, for examples).
Buffer overflows are intrinsic part of the C standard library, but not the C freestanding environment. It is quite possible to replace the standard C library with a completely different API, including arrays with explicit bounds and garbage collection, but keep the C compiler and syntax.
Can't believe I missed this thread. Right up my street.
100% agree with you. There is no way to make C and/or C++ a "safe" environment to write software in. It is absolutely 100% impossible. The problem is the compiler's fundamental model is memory regardless of what suit you dress your code in, whichever macros or libraries you use, whether or not Coverity have buggered you for cash or not and whether or not you have used clever compiler features to trip up people attacking your code.
Using C without the standard library doesn't make it any safer or more secure as a language. Proof of which is the long history of Linux kernel level exploits. Or exploits in no-OS network appliances. The MITRE database is a good place to look for examples.
To think that "libraries" are somehow a problem and not using them will improve the situation is odd. It is simply difficult to write secure code in C. It requires a certain mindset and a lot of experience. Rare traits in the industry.The original standard C library is certainly a problem. Its full of functions with no internal size checks. A good library isn't a magic cure for problems, but the original standard C library is like a banana skin.
typedef double float_t;
I stand by my original points.
I'll highlight and comment on a few of your points below...
There is no way to make C and/or C++ a "safe" environment to write software in. It is absolutely 100% impossible.
Unlike C++, C has two different "modes": hosted environment and freestanding environment. The former includes the standard C library – including functions like fgets(), strcpy(), and so on –; whereas the latter is used when programming kernels and microcontrollers, often with a replacement set of functions (see Linux kernel C functions, or the Arduino environment, for examples).
Buffer overflows are intrinsic part of the C standard library, but not the C freestanding environment. It is quite possible to replace the standard C library with a completely different API, including arrays with explicit bounds and garbage collection, but keep the C compiler and syntax.
If you believe that, how do you explain the regular exploits like:
100 Million More IoT Devices Are Exposed and They Won't Be the Last (WiReD)
Gabe Goldberg <gabe@gabegold.com>
Wed, 14 Apr 2021 19:41:06 -0400
The Name:Wreck flaws in TCP/IP are the latest in a series of vulnerabilities with global implications.
https://www.wired.com/story/namewreck-iot-vulnerabilities-tcpip-millions-devices/
or
A Casino Gets Hacked Through a Fish-Tank Thermometer (Entrepeneur)
Amos Shapir <amos083@gmail.com>
Fri, 16 Apr 2021 17:49:35 +0300
Hackers gain entry to a casino's internal net via a fish tank, and steal list of customers:
https://www.entrepreneur.com/article/368943
Both of those are from the yesterday's comp.risks (Volume 32 Issue 60 Saturday, 17th April 2021), which everybody should be reading.
See https://catless.ncl.ac.uk/Risks/
I stand by my original points.I really only object to the No True Scotsman fallacy claim.I'll highlight and comment on a few of your points below...I understand your points, and fully acknowledge their basis in facts; I only disagree on some of your conclusions.
As an example, consider PHP, a widely used but usually pretty horrible code. Especially its earlier versions were basically a security hole waiting to happen (magic quotes stuff in particular). Yet, one could write quite secure web service code with it, if one paid sufficient attention, and avoided using the features that usually lead to security problems. I know, because I have.
However, many of those security holes have been plugged (like magic quotes no longer supported, database interfaces switching from building query strings to using variable references so quoting is not even an issue, and so on). The most problematic design principle currently is that most PHP services are designed to be able to upgrade themselves, which necessarily means the installation is vulnerable to script drops/bombs et cetera. We could avoid that, and even things like password leaks, if we leveraged the POSIX/Unix user and group hierarchies, with server interpreters refusing to execute code owned by the user that can upload content to the server; and login/logout/account management facilities restricted to a few specific pages with all others not even having access to the sensitive fields of the user database...
We could do better with Python, but unfortunately Python insists on its "own" WSGI interfaces (as opposed to say FastCGI). (In particular, a page engine can be written as a FastCGI script, with each request (connection) served by a forked child. The engine can preload each instance by the main data structures, like navigation and file types supported, deduplicating most of the work done by most page loads.) As a result, typical widely used Python-based web services are vulnerable to similar bugs as PHP ones, on top of its own WSGI ones! No true forward development, just.. steps in odd directions, in my opinion.
This "proves" to me that the current software bugs and insecurity is really not a feature of the respective programming languages, but a consequence of us human developers accepting a software "engineering" culture that has discarded almost all good engineering principles, and is just sticking stuff together with spit and bubblegum, banking on the product working just long enough that they won't be held responsible for the crappiness.
And that we should not blame the languages for not trying to stop the developers for implementing idiotic designs.
Can low level programming be made totally "safe" just from features of the language itself?
I agree with Nominal Animal about good engineering practice being largely replaced with silver bullets and tools.
Various security reports I have read don't show C as the language with the most security exploits though in practice. I would have to dig a little now to provide links, but AFAIR, PHP, Javascript and even Java came largely sad winners here. It doesn't necessarily show anything related to the languages themselves though, but rather *how* (and probably by whom) they are typically used.
Unlike C++, C has two different "modes": hosted environment and freestanding environment. The former includes the standard C library – including functions like fgets(), strcpy(), and so on –; whereas the latter is used when programming kernels and microcontrollers, often with a replacement set of functions (see Linux kernel C functions, or the Arduino environment, for examples).
Buffer overflows are intrinsic part of the C standard library, but not the C freestanding environment. It is quite possible to replace the standard C library with a completely different API, including arrays with explicit bounds and garbage collection, but keep the C compiler and syntax.
If you believe that, how do you explain the regular exploits like:
100 Million More IoT Devices Are Exposed and They Won't Be the Last (WiReD)
Gabe Goldberg <gabe@gabegold.com>
Wed, 14 Apr 2021 19:41:06 -0400
The Name:Wreck flaws in TCP/IP are the latest in a series of vulnerabilities with global implications.
https://www.wired.com/story/namewreck-iot-vulnerabilities-tcpip-millions-devices/
or
A Casino Gets Hacked Through a Fish-Tank Thermometer (Entrepeneur)
Amos Shapir <amos083@gmail.com>
Fri, 16 Apr 2021 17:49:35 +0300
Hackers gain entry to a casino's internal net via a fish tank, and steal list of customers:
https://www.entrepreneur.com/article/368943
Both of those are from the yesterday's comp.risks (Volume 32 Issue 60 Saturday, 17th April 2021), which everybody should be reading.
See https://catless.ncl.ac.uk/Risks/I explain those by pointing out that possible ≠ easy.
It is much, much easier to write buggy C than it is to write robust, secure C code. C is a dangerous tool, but so powerful and useful that many choose to use it nevertheless.
Nothing is perfect, so where do you draw the line for "safe"? Even if the code is guaranteed to work perfectly for all possible inputs on standard hardware, there are glitches. Most current AMD64 architecture laptop and desktop machines do not support ECC memory, so single-bit errors can occur because of an odd cosmic ray, or for a number of other reasons.
I don't think it is a No True Scotsman argument to say something like "a properly trained person will never point a firearm at anything or anyone, even when the safety is on, unless they are ready to kill them". This is a simple rule known for as long as firearms have existed; but the firearm itself does not enforce the rule. A lot of people (including soldiers) do not know or neglect to follow that rule, so accidents happen, and people get killed. With C, bugs and security failures occur often because the developers do not care (about making the code secure against unexpected inputs); I've heard countless times that "we don't have the time for that right now; we'll add those in later".
I'm not sold on the C/C++ is evil because of the dreaded buffer overrun. Some asshat can make some sloppy C code and chances are good it will only crash at best. But an interpreter, ironically written in C, could have a bug which could expose 100s of thousands of users to an exploit.
I'm not sold on the C/C++ is evil because of the dreaded buffer overrun. Some asshat can make some sloppy C code and chances are good it will only crash at best. But an interpreter, ironically written in C, could have a bug which could expose 100s of thousands of users to an exploit.
Better that exploit gets fixed in one patch than 100,000 independent programs in C that don’t.
It’s even lower level than that. Solve a problem once, properly. Not a million times, badly.
It’s even lower level than that. Solve a problem once, properly. Not a million times, badly.
NIH is endemic in the C++ community, from acadaemia to grunts in the trenches.
Too many of the academic papers on C++ reference only other C++ papers. To give a contrary example, Gosling's Java whitepaper was notable for nicking concepts from many other languages, where each concept had been proven in practice and all concepts played nicely with each other.
It’s even lower level than that. Solve a problem once, properly. Not a million times, badly.
NIH is endemic in the C++ community, from acadaemia to grunts in the trenches.
Too many of the academic papers on C++ reference only other C++ papers. To give a contrary example, Gosling's Java whitepaper was notable for nicking concepts from many other languages, where each concept had been proven in practice and all concepts played nicely with each other.
I think that’s out of necessity. Someone has to write their own “framework” at every C++ house I’ve seen and work to some poorly defined subset of the language which doesn’t have so many foot guns.
Rust is the same. But the academic papers are blog posts and brigading on tech news aggregators.
Similar with Go. It’s that fuzzy joy that Java was (if you avoided J2EE 1.x )
I'm not sold on the C/C++ is evil because of the dreaded buffer overrun. Some asshat can make some sloppy C code and chances are good it will only crash at best. But an interpreter, ironically written in C, could have a bug which could expose 100s of thousands of users to an exploit.
Better that exploit gets fixed in one patch than 100,000 independent programs in C that don’t.
I'm not sold on the C/C++ is evil because of the dreaded buffer overrun. Some asshat can make some sloppy C code and chances are good it will only crash at best. But an interpreter, ironically written in C, could have a bug which could expose 100s of thousands of users to an exploit.
Better that exploit gets fixed in one patch than 100,000 independent programs in C that don’t.
Only after the impact of a high-yield exploit which had been refined to maximize the damage verses random programs with bad code that hackers can't be bothered attacking.
I'm not sold on the C/C++ is evil because of the dreaded buffer overrun. Some asshat can make some sloppy C code and chances are good it will only crash at best. But an interpreter, ironically written in C, could have a bug which could expose 100s of thousands of users to an exploit.
Better that exploit gets fixed in one patch than 100,000 independent programs in C that don’t.
Only after the impact of a high-yield exploit which had been refined to maximize the damage verses random programs with bad code that hackers can't be bothered attacking.
QuoteAnd that we should not blame the languages for not trying to stop the developers for implementing idiotic designs.
Again, agreed.
But when the fundamental properties of a language mean that large applications are "castles built on sand", we shouldn't shy away from recognising that choosing a different language ought to mean the "castle is built on rock". It is, of course, possible to choose a different language so that the "castle is built on a swamp".
We should distinguish swamps and sand from rock, and choose rock whereever possible.