The #1 thing is to ban free() and base a C library on garbage collection. I disagree with Apple and prefer tracing over reference counting. The Boehm library work very well, with a few modifications -- I vastly reduced the amount of static memory needed and optimized the object size buckets and a number of other things.
The other big thing is to ban arbitrary large object allocations.
- Choose a maximum malloc() size which might be 256 bytes or 4K or whatever. This allows you to vastly decrease fragmentation by keeping pools of objects of every possible size -- and there are only a small number of possible sizes.
- larger objects are a fixed-depth tree of nodes of the maximum size (except for possibly the last node at each depth), with each node containing an array of pointers to the next level nodes. With a small machine or big maximum object size the tree might be only two levels, with one layer of pointers and the second layer being the actual user data.
The #1 thing is to ban free() and base a C library on garbage collection. I disagree with Apple and prefer tracing over reference counting. The Boehm library work very well, with a few modifications -- I vastly reduced the amount of static memory needed and optimized the object size buckets and a number of other things.
I disagree here. I think improvements to C should be made by making the language simpler and having less gotchas, especially in terms of simpler to understand & simpler to predict. Garbage collection implementations always have corner cases and harder to predict behaviours (eg runtime slowdowns).
It's bad enough that dynamic memory allocation is already unpredictable, I don't want more of this :(
The #1 thing is to ban free() and base a C library on garbage collection.At the low level, daemons and such, I prefer pool allocators. That is, having each allocation belong to a context, and being able to free an entire context. This matches very well the needs of connection-oriented services, and also allows trivial implementation of per-connection memory use limits.
Make a single standard for the sizes of types and printf()-like format stringsUsing u8/u16/s32/s64 as typedefs for uint8_t/uint16_t/int32_t/int64_t, sure. Having printf()-like function with native formatting specifiers for them, definitely.
Saner string funcsMost definitely; strlcpy() (which ensures the target will contain an end-of-string nul byte), and also memfill(buffer, offset, size) which repeats the first offset bytes of buffer to the rest of the buffer, up to size; for example, to initialize floating-point vectors or structure arrays efficiently. I also like strndup(), which is not in standard C (they're in BSD and POSIX.1), but is definitely useful.
char *p = NULL;
size_t p_max = 0;
ssize_t p_len = my_printf(&p, &p_max, "foo-%s-%.3f", name, version);
(using current standard C library types here, for clarity). This way, if one needs to construct such strings often, they can reuse the buffer, but still get it dynamically reallocated whenever it needs be.I normally go for more sensible names like u8, u16, u32, u64, s8, s16, s32, s64, f32, f64 (rather than things like uint32_t, uint64_t, etc)
Since we are talking about a new base library, why restrict to what everyone else knows and understands?QuoteI normally go for more sensible names like u8, u16, u32, u64, s8, s16, s32, s64, f32, f64 (rather than things like uint32_t, uint64_t, etc)Why is that? Wouldn't it be sensible to use what everyone else knows and understands? One could make an argument that using something else is like aliasing 'while' as 'whilst'.
If it's a library it's going to be coexisting with already well-known libraries probably, and being the odd one out isn't a benefit.Because it is a replacement for the standard C library, requiring that C be compiled in freestanding mode, libraries developed against standard C will not be compatible.
The #1 thing is to ban free() and base a C library on garbage collection.
However, I do have noticed a lot of C programmers avoid using them, and I wonder why. Is it the type name, the _t suffix? Or something else?
1 or 2-letter differences (out of 8 or so letters) are hard to spot but make big differences.
I am thinking of writing a new support library from scratch, replacing the standard C library, but exposing the same (Linux/POSIX) concepts, but without the historical baggage of standard C library functions
The C programming language is used BECAUSE of the standard. If you replace the standard with your own stuff,
As a contrasting opinion
The 'C programming language' is only one part of C. C is more than the official standards that compiler authors stand by. It's a combined and universal effort that other ideas, languages and libraries get almost inevitably joined to in some way.
(...)
To be frank, that doesn't make much sense. The C programming language is used BECAUSE of the standard. If you replace the standard with your own stuff, there is no reason to use C anymore.
Umm, sometimes I happen to use C because I don't have other high level languages for certain targets.
That's just your opinion.I am thinking of writing a new support library from scratch, replacing the standard C library, but exposing the same (Linux/POSIX) concepts, but without the historical baggage of standard C library functionsTo be frank, that doesn't make much sense. The C programming language is used BECAUSE of the standard. If you replace the standard with your own stuff, there is no reason to use C anymore. Then you are free to use much better, more modern programming languages.
During the last decade, much better options than C have been developedBetter how? That is the rub, isn't it.
Yes, I know. We all have that problem. And this will never change if developers of new libraries keep using C. If we want to advance, there is no other way than to make a step forward.
That's just your opinion.
Because it does, and is actively used for both OS (Linux and BSD kernels) and embedded uses, your opinion is faulty.
Better how? That is the rub, isn't it.
Frankly, the CS articles I've read in the last decade or so, have concentrated more on constructing abstractions that help unskilled developers write code, and automatic detection and correction of the programming errors they create, than looking at robust, efficient, long-term maintainable code bases. Hot air and gushing about favourites, without any true invention or progress.
The purpose of this library is to see how much better C could be, if we wanted it to be.
The C standard provides two completely different environments, hosted and freestanding.
I take it that you are fully aware of the challenges and the problems. So ... that would actually entail that you go the extra mile and don't use C. If you have good ideas, look into the future, not the past.Why do you insist D, Rust, and Go are "the future", and C is "the past"? No, I am not interested in how convinced you are, or who else says so; I am interested in logical arguments for or against, or personal experiences.
I know that actually a different programming language would be the real painkiller.Yes, that I do agree with. Perhaps, one day, I will take that step, too.
I think I've missed something. What is a hosted or freestanding C environment, or, rather, the difference?In a hosted environment, the standard C library provides you a number of functions, from malloc() to exit(). In a freestanding environment, you do not; you only have some macros (from specific header files you can still include) and variadic parameter support (from <stdarg.h>).
I'd like to see standard hash tables/dictionaries.Environment variables are an example of a dictionary every process has access to, and I have an idea on those.
In a hosted environment, the standard C library provides you a number of functions, from malloc() to exit(). In a freestanding environment, you do not; you only have some macros (from specific header files you can still include) and variadic parameter support (from <stdarg.h>).
I take it that you are fully aware of the challenges and the problems. So ... that would actually entail that you go the extra mile and don't use C. If you have good ideas, look into the future, not the past.Why do you insist D, Rust, and Go are "the future", and C is "the past"?
No, I am not interested in how convinced you are, or who else says so; I am interested in logical arguments for or against, or personal experiences.
You wrote,Please stick precisely to what the other one says, otherwise we will make no progress. I never said that D, Rust or Go are the future. I gave those languages as examples for new languages that have been developed and that fix severe flaws of C. (And I repeat what I said earlier: I explicitly do not recommend Go.) I do not say that those languages are the future. I even hope that they ARE NOT.I take it that you are fully aware of the challenges and the problems. So ... that would actually entail that you go the extra mile and don't use C. If you have good ideas, look into the future, not the past.Why do you insist D, Rust, and Go are "the future", and C is "the past"?
During the last decade, much better options than C have been developed (including the embedded use case), for example D, Rust, or Go (I don't recommend Go, but of course I have to include it here).and later,
I take it that you are fully aware of the challenges and the problems. So ... that would actually entail that you go the extra mile and don't use C. If you have good ideas, look into the future, not the past.I'm sorry, but I cannot understand how else I should interpret these except as "don't use C, because it would be looking into the past", and by inference, that for example D, Rust, or Go are "the future" that I should be looking into.
What kind of anti-intellectual stance is that? There is actual RESEARCH available, research that provides all those "logical arguments", research that has been done for DECADES by many people in the mathematics and CS community, some of them even clever. And you tell me "no, I am not interested what others have to say".Ahem. "Please stick precisely to what the other one says, otherwise we will make no progress."
The #1 thing is to ban free() and base a C library on garbage collection. I disagree with Apple and prefer tracing over reference counting. The Boehm library work very well, with a few modifications -- I vastly reduced the amount of static memory needed and optimized the object size buckets and a number of other things.
I disagree here. I think improvements to C should be made by making the language simpler and having less gotchas, especially in terms of simpler to understand & simpler to predict. Garbage collection implementations always have corner cases and harder to predict behaviours (eg runtime slowdowns).
It's bad enough that dynamic memory allocation is already unpredictable, I don't want more of this :(
Garbage collection is faster than malloc/free. I can (and have) proved that by using LD_PRELOAD to replace the C library malloc with Boehm GC for a large range os programs, with the result that they finish sooner and/or use less CPU time.
That's even on programs that are completely ignorant of the fact they are now running under GC.
A properly-written GC can easily limit the maximum pause time to an imperceptible level.
There are also cache effects. Meaning, if instead of accessing different cache lines the GC causes the process to access the same cache lines (by reusing the same memory region allocated for the process) it can actually run in less wall clock time than the never-free()ing code, because of fewer cache misses.Garbage collection is faster than malloc/free. I can (and have) proved that by using LD_PRELOAD to replace the C library malloc with Boehm GC for a large range os programs, with the result that they finish sooner and/or use less CPU time.That would depend on if the program ever garbage collected. You might see the same performance increase by simply not calling free and hoping you do not run out of memory before the program finishes.
The #1 thing is to ban free() and base a C library on garbage collection. I disagree with Apple and prefer tracing over reference counting. The Boehm library work very well, with a few modifications -- I vastly reduced the amount of static memory needed and optimized the object size buckets and a number of other things.
I disagree here. I think improvements to C should be made by making the language simpler and having less gotchas, especially in terms of simpler to understand & simpler to predict. Garbage collection implementations always have corner cases and harder to predict behaviours (eg runtime slowdowns).
It's bad enough that dynamic memory allocation is already unpredictable, I don't want more of this :(
Garbage collection is faster than malloc/free. I can (and have) proved that by using LD_PRELOAD to replace the C library malloc with Boehm GC for a large range os programs, with the result that they finish sooner and/or use less CPU time.
That's even on programs that are completely ignorant of the fact they are now running under GC.
A properly-written GC can easily limit the maximum pause time to an imperceptible level.
That would depend on if the program ever garbage collected. You might see the same performance increase by simply not calling free and hoping you do not run out of memory before the program finishes.
Fix the size of int, long, char, etc; don't let them be platform specific. In practice this means having to use different type names like stdint.h does (otherwise you break compat with other code & compilers), I normally go for more sensible names like u8, u16, u32, u64, s8, s16, s32, s64, f32, f64 (rather than things like uint32_t, uint64_t, etc)Found the x86 user :D
Fix the size of int, long, char, etc; don't let them be platform specific. [...]Found the x86 user :D
C was designed to run on platforms where 32 bits is slower than 16 bits and on platforms where the contrary is the case. It will even support a 36 bit CPU for that matter, though probably fewer people care about that nowadays.That is completely true, but all our interchange formats use these base integer and IEEE-754 floating-point types; and that is what we really need them for.
Fix the size of int, long, char, etc; don't let them be platform specific. [...]Found the x86 user :D
I have spent a good 10 minutes now trying to come up with convincing defence. All it has done is remind me of is that I miss my snow laptop (http://halestrom.net/darksleep/blog/006_chromebook/) (edit 7 years now?!?!)
I have spent a good 10 minutes now trying to come up with convincing defence. All it has done is remind me of is that I miss my snow laptop (http://halestrom.net/darksleep/blog/006_chromebook/) (edit 7 years now?!?!)Okay, so you might know that ARM has no 16 bit registers and no 16 bit operations besides load/store IIRC. So perhaps 32 bit would be a good fit for int on ARM, but then try to use it on AVR and see what happens :scared:
The distinction between "int" and "long" has become annoying, because on LP64 x86-64 ABIs, longs tend to be "cheaper" than ints, as the native general purpose register size is 64 bits. On ILP64, there is no distinction.There is nothing cheaper about 64 bit on x86-64 until you actually need 64 bits. There is a full set of 32 bit registers and a full set of operations on them implemented efficiently. In fact, to get 64 bit arithmetic you add a special prefix byte to the corresponding 32 bit instruction. Ditto when accessing the R8-R15 registers, by the way. In all honesty, x86-64 is an LP64 architecture - an i386 with long pointers and ability to do 64 bit arithmetic if you really insist. If you think about it, its primary design objective was to run 32 bit Windows natively unlike Intel Itanic :P
I agree with the latter part, the standard uint_t names are annoying.
I agree with the latter part, the standard uint_t names are annoying.
I disagree. I don't find it annoying, but rather elegant, practical, and coherent :D
As I've already mentioned, I really like getline(), because it avoids line length limits.
I don't understand what you mean. You start withQuoteAs I've already mentioned, I really like getline(), because it avoids line length limits.But it's inconsistent. You sometimes have to free() and sometimes not, and sometimes have to free() even if you say you don't want anything from it! I think it's trying to cover too many bases and would be better split into two functions: one that allocates memory and one that uses pre-allocated memory.
char *buffer_ptr = NULL;
size_t buffer_max = 0;
or withsize_t buffer_max = somenumber;
char *buffer_ptr = malloc(buffer_max); /* verified to have succeeded */
The buffer_ptr must always be either NULL, or a pointer to dynamically allocated memory as returned by malloc(), calloc(), realloc(), or aligned_alloc(). It cannot and must not point to statically allocated memory.ssize_t len = getline(&buffer_ptr, &buffer_max, file_handle);
If len==-1, then either there was no more data to read, or there was an error.free(buffer_ptr);
buffer_ptr = NULL;
buffer_max = 0;
Because free(NULL) is safe, this is always safe to do, even when buffer_ptr==NULL.Does this buffer the line until it's completely read, then? If the stream is, say, a serial port, does it return after each call with -1, meantime buffering the input, until it sees a delimiter, and then return the buffer length? If so, I misinterpreted the description.Plain reading of the IEEE Std 1003.1-2017 (https://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html) says that if an error occurs, getline() and getdelim() should return -1 with errno set, but most current implementations instead return the currently buffered data. Most of them also assume that when read() returns a short count, it is either because of end-of-input or an error, and that just isn't true in practice.
OK. I retract my objection, then.It was a good point, though. Being able to use the higher-level "stream" interfaces on all types of descriptors is important.
I'd like to see standard hash tables/dictionaries.Environment variables are an example of a dictionary every process has access to, and I have an idea on those.
Do you have any examples of interfaces you've found useful? Function prototypes would give a good idea, with example real-world use cases.
Compare to e.g. qsort_r() instead of qsort() for sorting: the comparison operation often needs external information, like offset or column within the string to start the comparison at, and passing an untyped user pointer to the comparison function makes that easy in a thread-safe manner, as one does not need to use global variables.
rather than things like uint32_tI find it a bit depressing how often people have used uint16_t, when what they really should have used was uint_fast16_t. Of course, if you didn't like uint16_t because of readabilty or typeability issues, you REALLY hate uint_fast16_t :-(
I would really like a better set of string (text) functions. Maybe more capable of dealing with unicode that current stuff, but ... definitely more like the support in other languages.It is interesting to note that current Unicode limits to code points 0 to 0x10FFFF, inclusive (1,114,111 unique code points), which means that UTF-8 code points are 1, 2, or 3 bytes long. All newline conventions are either one or two bytes long. Commonly interesting escape/end sequences are two or three bytes long. And so on.
I'm not sure what that would look like, exactly. One possibility is that strings could have their own garbage-collected memory management, without switching other things away from malloc/free.Definitely an intriguing option. I also like the underlying idea of modularity.
I find it a bit depressing how often people have used uint16_t, when what they really should have used was uint_fast16_t. Of course, if you didn't like uint16_t because of readabilty or typeability issues, you REALLY hate uint_fast16_t :-(That's exactly the reason I was mulling using u16 for uint_fast16_t and u16e for uint16_t.