EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: Nominal Animal on January 08, 2021, 05:46:53 am

Title: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 08, 2021, 05:46:53 am
I've been thinking about writing my own systems programming library, to use instead of the standard C library, under Linux (and possibly other OSes, at least *BSDs).

As you probably know, the C programming language is actually defined separately for hosted environments (full standard C libraries available) and freestanding environments (standard C library not available, with only <float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, and <stdnoreturn.h> headers available).

I am thinking of writing a new support library from scratch, replacing the standard C library, but exposing the same (Linux/POSIX) concepts, but without the historical baggage of standard C library functions.  In essence, applications using this library will be compiled freestanding.  Note that this is a general-purpose systems programming library – for daemons and such –, not a tightly-constrained embedded one, so dynamic memory allocation will be heavily used.  (I am completely fed up with line length limitations stemming from fgets() when POSIX getline() exists, in particular.)  Most of the POSIX syscall functions (getpid(), getppid(), clock_gettime(), read(), write(), and so on) will be exposed using their traditional names, but probably without errno, i.e. if an error occurs, the functions will return a negative error number, instead of setting a thread-specific errno variable to reflect the error.

While I have my own wishlist based on my own experiences, I'd like to know what others would really like to see in a C "standard" library, and especially why.  Many of you have encountered things I haven't, so I'd like to hear your thoughts on this.  My reason for doing this is simple: I want to know how much better we could do, if we wanted to.
Title: Re: Replacement for C standard library: your wishlist?
Post by: brucehoult on January 08, 2021, 06:47:40 am
I have some things that I actually implemented around a dozen years ago while helping to create a Java to native (ARM) compiler for the BREW phones that were common just before iPhone and Android came out. The phones then had between 40 KB and 4 MB of RAM. Sadly as it was proprietary and I'm a good boy I no longer have access to that work.

I used the same library to hand write code using C and it was very pleasant.

The #1 thing is to ban free() and base a C library on garbage collection. I disagree with Apple and prefer tracing over reference counting. The Boehm library work very well, with a few modifications -- I vastly reduced the amount of static memory needed and optimized the object size buckets and a number of other things.


The other big thing is to ban arbitrary large object allocations.

- Choose a maximum malloc() size which might be 256 bytes or 4K or whatever. This allows you to vastly decrease fragmentation by keeping pools of objects of every possible size -- and there are only a small number of possible sizes.

- larger objects are a fixed-depth tree of nodes of the maximum size (except for possibly the last node at each depth), with each node containing an array of pointers to the next level nodes. With a small machine or big maximum object size the tree might be only two levels, with one layer of pointers and the second layer being the actual user data.

- for example on a 32 bit machine with 64 KB maximum malloc size and 2-level tree, a composite object can have maximum size of a 64K node containing 16K pointers to 16K nodes of 64K each of user data. That is, the maximum size of user data in a single string, array, buffer etc would be 16384 * 65535 = 1 GB.

- because the tree depth is fixed for a given implementation of the library, accessing a data item is straight line code with no branching. To access byte n in an array of bytes pointed to by p:
  - check that n is less than the current array size
  - the integer you want is at p[n>>16][n&0x3fff]
  - this is one shift, one AND, and two memory loads. That's slightly slower than just a memory load, but not much, especially if the top level of pointers is in L1 cache but the main data isn't.
  - for small objects the top level of pointers is also small. e.g. if the object is less than 64 KB in size then the pointer layer is just a 4 byte object containing one pointer (or whatever malloc()'s minimum size is)

- if you want a smaller maximum malloc() size then a 3 level tree can use a 4 KB maximum malloc() size to address a layer of up to 1024 pointers to a layer with up to 2^20 (1048576) pointers, to a layer with up to 4 GB of data. If 512 MB is enough in a single variable then you can reduce the maximum malloc() size to 2 KB.

- the overhead of chasing two or three pointers to find your data only applies to random accesses. If you want to process data sequentially then you can use C++ iterators that usually do only one access for sequential operations, doing an extra memory access only when moving from one node to the next. In C you can use an iterator function that you give a callback function that takes a pointer to the start of a contiguous block and a length (and also a void *userdata for accessing an environment, intermediate results etc). This can be extended to iterator functions that take two or three trees and call callbacks with two or three pointers and a single length.

- it's handy if the top level of pointers is actually a descriptor with the current number of elements *and* a byte offset to be added/subtracted from the user's offset before using it. This allows allows things such as substring() or array slices to be done by sharing structure with the original object. Especially if all the nodes are garbage collected so if you take a few substrings from a huge buffer and then forget the reference to the original buffer then all the parts that aren't in the substring can get GCd

Title: Re: Replacement for C standard library: your wishlist?
Post by: Whales on January 08, 2021, 06:55:05 am
Make a single standard for the sizes of types and printf()-like format strings

Fix the size of int, long, char, etc; don't let them be platform specific.  In practice this means having to use different type names like stdint.h does (otherwise you break compat with other code & compilers),  I normally go for more sensible names like   u8, u16, u32, u64,   s8, s16, s32, s64,   f32, f64  (rather than things like uint32_t, uint64_t, etc)

This means implementing printf()-like functions to use '%d' to mean one specific datasize too.  In the current world we have to write things like "I have %"PRIu32" and %"PRIs64" cats" if we want to be exactly specific about our data sizes, which really sucks.  Even between glibc x86_64 and glibc arm(6?) the definitions of some 'long' variants (long long IIRC?) is different; let along between whole microcontroller families  >:(

I basically always have to be specific about sizes, otherwise I start having to add if() cases when I deal with data saved in files or from over a network.  Fixing the sizes of types would make things much simpler and get rid of a lot of hidden gotchas.

(Technically we can also talk about struct packing & alignment on different platforms, but that's moving more into compiler territory rather than libc territory)
Title: Re: Replacement for C standard library: your wishlist?
Post by: Whales on January 08, 2021, 07:08:11 am
Saner string funcs

I notice your mention of fgets() :)

Checkout BSD's string funcs. (https://github.com/openbsd/src/tree/master/lib/libc/string)  Notably I keep a copy of strlcpy() (https://github.com/openbsd/src/blob/master/lib/libc/string/strlcpy.c) in my personal library, it's a much saner option than strncpy() and strcpy() (both are broken).  This is surprising for a lot of people because the string functions with 'n' in their name are typically the fixed options, but not in this case.

Sidenote: I've seen some projects bring in a whole linked libbsd dependency just to get this one func  :P
Title: Re: Replacement for C standard library: your wishlist?
Post by: Whales on January 08, 2021, 07:13:32 am
The #1 thing is to ban free() and base a C library on garbage collection. I disagree with Apple and prefer tracing over reference counting. The Boehm library work very well, with a few modifications -- I vastly reduced the amount of static memory needed and optimized the object size buckets and a number of other things.

I disagree here.  I think improvements to C should be made by making the language simpler and having less gotchas, especially in terms of simpler to understand & simpler to predict.  Garbage collection implementations always have corner cases and harder to predict behaviours (eg runtime slowdowns).

It's bad enough that dynamic memory allocation is already unpredictable, I don't want more of this  :(
Title: Re: Replacement for C standard library: your wishlist?
Post by: Whales on January 08, 2021, 07:19:13 am
The other big thing is to ban arbitrary large object allocations.

- Choose a maximum malloc() size which might be 256 bytes or 4K or whatever. This allows you to vastly decrease fragmentation by keeping pools of objects of every possible size -- and there are only a small number of possible sizes.

- larger objects are a fixed-depth tree of nodes of the maximum size (except for possibly the last node at each depth), with each node containing an array of pointers to the next level nodes. With a small machine or big maximum object size the tree might be only two levels, with one layer of pointers and the second layer being the actual user data.

Isn't this sort of what gets done anyway?  By the OS + memory mapper + memory controller/MMU?  A chunk of what appears to be contiguous memory from your program's perspective can actually be split up over all sorts of places and this gets internally tracked for you by some structures.

The method you describe to solve memory fragmentation sounds like pre-fragmenting everything.  Wouldn't a finer-grained MMU/similar achieve the exact same thing, but with better hardware acceleration?
Title: Re: Replacement for C standard library: your wishlist?
Post by: brucehoult on January 08, 2021, 07:51:27 am
The #1 thing is to ban free() and base a C library on garbage collection. I disagree with Apple and prefer tracing over reference counting. The Boehm library work very well, with a few modifications -- I vastly reduced the amount of static memory needed and optimized the object size buckets and a number of other things.

I disagree here.  I think improvements to C should be made by making the language simpler and having less gotchas, especially in terms of simpler to understand & simpler to predict.  Garbage collection implementations always have corner cases and harder to predict behaviours (eg runtime slowdowns).

It's bad enough that dynamic memory allocation is already unpredictable, I don't want more of this  :(

Garbage collection is faster than malloc/free. I can (and have) proved that by using LD_PRELOAD to replace the C library malloc with Boehm GC for a large range os programs, with the result that they finish sooner and/or use less CPU time.

That's even on programs that are completely ignorant of the fact they are now running under GC.

A properly-written GC can easily limit the maximum pause time to an imperceptible level.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 08, 2021, 09:23:03 am
The #1 thing is to ban free() and base a C library on garbage collection.
At the low level, daemons and such, I prefer pool allocators.  That is, having each allocation belong to a context, and being able to free an entire context.  This matches very well the needs of connection-oriented services, and also allows trivial implementation of per-connection memory use limits.

I am very ambivalent on GC.  If rock solid, it does eliminate a big swathe of possible bugs.  On the other hand, I've never found those bugs to be hard to avoid anyway.  I think it is slightly more higher-level concept than I'd like; the memory use overhead, especially in a service daemon, worries me.  Speed is not an issue, I know.  In any case, I'm not ruling either one out for now.

I do believe easy control of allocation limits is important, because typical Linux machines overcommit memory, and if it is easy for a service to internally control its memory use (with minimal code overhead), developers might actually use it.  In particular, if a service runs out of memory, I'd prefer it just drops a few connections, rather than die.

Make a single standard for the sizes of types and printf()-like format strings
Using u8/u16/s32/s64 as typedefs for uint8_t/uint16_t/int32_t/int64_t, sure.  Having printf()-like function with native formatting specifiers for them, definitely.
But, because this is C, and not a new programming language, we cannot restrict the size of say int or long.  We do need to stay within the rules of freestanding C, more or less.

There do need to be a couple of types that depend on the architecture, like size_t/ssize_t and uintptr_t/intptr_t.  But these should obviously have native formatting specifiers, just like size_t has (%zu) and ssize_t has (%zd).

Saner string funcs
Most definitely; strlcpy() (which ensures the target will contain an end-of-string nul byte), and also memfill(buffer, offset, size) which repeats the first offset bytes of buffer to the rest of the buffer, up to size; for example, to initialize floating-point vectors or structure arrays efficiently.  I also like strndup(), which is not in standard C (they're in BSD and POSIX.1), but is definitely useful.

Some of these are optimizations (memfill() definitely is), some avoid the horrible corner cases (like strncpy() not adding a string-terminating nul byte if the source string is long enough), and others make it easier to use better patterns.

For example, GNU and BSD provide asprintf(), which dynamically allocates the buffer to be formatted.  Very few programmers actually use it.  I'd prefer something along the lines of ssize_t my_printf(char **strp, size_t *sizep, const char *fmt, ...) with an interface similar to getline().  That is, to e.g. construct some complicated string, you might use
Code: [Select]
    char   *p = NULL;
    size_t  p_max = 0;
    ssize_t  p_len = my_printf(&p, &p_max, "foo-%s-%.3f", name, version);
(using current standard C library types here, for clarity).  This way, if one needs to construct such strings often, they can reuse the buffer, but still get it dynamically reallocated whenever it needs be.

I also dislike the interfaces that use static internal storage, and are therefore non-thread-safe (strtok() and so on).  BSD and GNU provide thread-safe versions, that point to either a prepared context variable (strtok_r()), or use dynamically allocated memory.
Title: Re: Replacement for C standard library: your wishlist?
Post by: PlainName on January 08, 2021, 10:00:59 am
Quote
I normally go for more sensible names like   u8, u16, u32, u64,   s8, s16, s32, s64,   f32, f64  (rather than things like uint32_t, uint64_t, etc)

Why is that? Wouldn't it be sensible to use what everyone else knows and understands? One could make an argument that using something else is like aliasing 'while' as 'whilst'.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 08, 2021, 10:29:23 am
Quote
I normally go for more sensible names like   u8, u16, u32, u64,   s8, s16, s32, s64,   f32, f64  (rather than things like uint32_t, uint64_t, etc)
Why is that? Wouldn't it be sensible to use what everyone else knows and understands? One could make an argument that using something else is like aliasing 'while' as 'whilst'.
Since we are talking about a new base library, why restrict to what everyone else knows and understands?

This is, after all, an attempt to discover what we could do better, not what everyone is already most comfortable with.

I do intend to provide a few macros of foreach_foo(...) format, for example to replace/augment ancillary data (https://www.man7.org/linux/man-pages/man3/cmsg.3.html) macros.
Say, foreach_msg_descriptor(fd, &msg) { /* fd is a new descriptor received via recvmsg(sockfd, &msg, flags) */ }.  These, too, will look oddly familiar but subtly weird to many, especially if you haven't looked at Linux kernel sources (which uses such macros to make many things much easier).
Title: Re: Replacement for C standard library: your wishlist?
Post by: PlainName on January 08, 2021, 10:57:50 am
Well, for backwards compatibility, ease of use and familiarity. Otherwise you might as well just do a complete language. Call it C+ or something. Hmmm, C#? Ah, D! Oh noes.... gotta be something not already sat on... D- then  ;)

Edit: forgot a bit...

'Better' tends to drag along 'comfortable' with it. There have been many better mousetraps that dropped by the wayside because they were a pain to use, or just different enough that potential users could easily see the benefits. If it's a library it's going to be coexisting with already well-known libraries probably, and being the odd one out isn't a benefit. Where there is a common 'thing' it is usually sensible to use the common paradigm. I don't think that doing so would be detrimental to the kind of features you are all discussing, but I am willing to be told different.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 08, 2021, 11:43:11 am
If it's a library it's going to be coexisting with already well-known libraries probably, and being the odd one out isn't a benefit.
Because it is a replacement for the standard C library, requiring that C be compiled in freestanding mode, libraries developed against standard C will not be compatible.
Backwards compatibility is not an option.

I do not have any problems using uintN_t, intfastN_t, size_t etc. in C; I do so constantly.  However, I do have noticed a lot of C programmers avoid using them, and I wonder why.  Is it the type name, the _t suffix?  Or something else?

Like I said, I don't really understand why the _t suffix was added to these types.  It does not bother me per se; I just want to know how that suffix has affected the types' use.  Do they make it easier to read C code?  If they did, then shouldn't they be more popular by now?

Since <stdint.h> is available in freestanding mode, these types would be typedef'd to those in this library headers anyway, so it is not a question of either or –– except for which type names are used for function parameters and return values.  (But even those, the most common integer type is size, not a fixed-width integer type.)

I too am willing to be convinced otherwise, but familiarity is not an argument that sways me, because if it did, then there would be no sense in doing this at all; as the answer then would be to use the most popular subset of POSIX/BSD/GNU C.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Whales on January 08, 2021, 11:59:11 am
Personally I don't use the official stdint.h names like int32_t and uint64_t for a couple of reasons:

1 or 2-letter differences (out of 8 or so letters) are hard to spot but make big differences.  "Hiding a tree in the forest" is not a good idea if the wrong type of tree is a possessed demon that eats your favourite numbers, so I prefer to have such trees standing on their own where everyone can make sure they're not lantana.  Compare how clearly words like int, long and float are differentiated at a brief glance.  Now look at the first line of this post and notice that one of the types I wrote was signed.

Of course I'm also lazy, typing u32 is much easier than uint32_t.  Words like 'int' are really easy to type, which is important, because you type them constantly in C code.

EDIT: I don't usually use the more advanced stdint.h types like int_least32_t and int_fast32_t; but I presume shortening these more corner-casey types might not be a good idea.  ie don't do il32 and if32
Title: Re: Replacement for C standard library: your wishlist?
Post by: DiTBho on January 08, 2021, 12:16:46 pm
The #1 thing is to ban free() and base a C library on garbage collection.

GC is something that makes me perplexed. Probably because I have never understood how it works internally.

I have implemented my own version of malloc() and free(). I know how they work internally, and this makes me less perplexed.
Title: Re: Replacement for C standard library: your wishlist?
Post by: PlainName on January 08, 2021, 12:18:12 pm
Quote
However, I do have noticed a lot of C programmers avoid using them, and I wonder why.  Is it the type name, the _t suffix?  Or something else?

Yes, that was essentially my question. The rest was just collateral :)
Title: Re: Replacement for C standard library: your wishlist?
Post by: PlainName on January 08, 2021, 12:20:41 pm
Quote
1 or 2-letter differences (out of 8 or so letters) are hard to spot but make big differences.

That's a very reasonable view.

Actually, that's what I used to use too, but at some point I went with the flow. Can't remember when or why, now, though.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Fixpoint on January 08, 2021, 12:28:00 pm
I am thinking of writing a new support library from scratch, replacing the standard C library, but exposing the same (Linux/POSIX) concepts, but without the historical baggage of standard C library functions

To be frank, that doesn't make much sense. The C programming language is used BECAUSE of the standard. If you replace the standard with your own stuff, there is no reason to use C anymore. Then you are free to use much better, more modern programming languages.

During the last decade, much better options than C have been developed (including the embedded use case), for example D, Rust, or Go (I don't recommend Go, but of course I have to include it here). From a computer-scientific and software-development standpoint, C is bad from so many angles, I don't know where to start. So, if you don't want to use the old libraries anyway, why don't you look at the new languages? That would make much more sense than to carry this 1970's abomination further into the 21st century.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Whales on January 08, 2021, 12:35:36 pm
As a contrasting opinion: "modern" programming languages do indeed solve some problems, but they bring new problems to the table too (https://halestrom.net/darksleep/blog/036_timesafety/).  They are not panacea, and small language/lib changes are just as valid as a path to experiment with as large language/lib changes (both with their own unique upsides & downsides).

Quote
The C programming language is used BECAUSE of the standard. If you replace the standard with your own stuff,

The 'C programming language' is only one part of C.  C is more than the official standards that compiler authors stand by.  It's a combined and universal effort that other ideas, languages and libraries get almost inevitably joined to in some way.

For a bit of perspective on this: (https://tailscale.com/blog/two-internets-both-flakey/) standards like ipv4 in the internet are never going to be replaced, only joined to or extended.  Other new "replacement" or different protocols, like ipv6, get bolted on via things like ipv6-to-ipv4 bridges.  If you write an incompatible network protocol then someone, somewhere will inevitable write a bridge that converts it to ipv4 (or ipv6), hence joining it back to the universal internet.  You cannot replace, only add.

Does this mean creating ipv6 is bad?  Does it mean using ipx is bad?  Does it mean writing new C libraries is bad?  No.  It's a natural part of the evolution of both.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Fixpoint on January 08, 2021, 12:47:10 pm
As a contrasting opinion

Unfortunatly, that is not a contrasting opinion but a platitude without any information content. The fact that no technology is perfect, including the new ones, is obvious and neither has anything to do with nor contradicts the fact that it would be a good idea to replace C with better stuff.

Quote
The 'C programming language' is only one part of C.  C is more than the official standards that compiler authors stand by.  It's a combined and universal effort that other ideas, languages and libraries get almost inevitably joined to in some way.
(...)

Again, all of that has nothing to do with what I said. What I say is that C must be replaced with a safer technology that implements computer-scientific knowledge that has been commonplace for many decades now (it even was already known at the time when C was developed). Honestly, C's nonsense design is nothing that only I can see. So, there is no reason to discuss this further.

If you want to implement new libraries, fine, go ahead. If it's good stuff I would love to use it, BUT of course not in C. You are missing a chance here. Please don't promote C along the way but use a better technology. When we develop a new car, we don't use a steam engine for that. It's as simple as that.
Title: Re: Replacement for C standard library: your wishlist?
Post by: DiTBho on January 08, 2021, 01:06:55 pm
To be frank, that doesn't make much sense. The C programming language is used BECAUSE of the standard. If you replace the standard with your own stuff, there is no reason to use C anymore.

Umm, sometimes I happen to use C because I don't have other high level languages for certain targets.

For example, I am writing a CAS software just right now for a proprietary terminal, and the only available SDK only provides a small C-89 compiler but no standard C libraries.

The same happened when I had to wrote an embedded firmware for an industrial embroidery machine whose control board is based on the A29k CPU, and yet again, I found nothing but a small SDK with a small C compiler and non standard C libraries.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Fixpoint on January 08, 2021, 01:11:11 pm
Umm, sometimes I happen to use C because I don't have other high level languages for certain targets.

Yes, I know. We all have that problem. And this will never change if developers of new libraries keep using C. If we want to advance, there is no other way than to make a step forward.

By the way, if a new library is implemented using a non-C language, you even could use it from C because natively compiled languages always provide means for efficient C bindings.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 08, 2021, 01:25:21 pm
I am thinking of writing a new support library from scratch, replacing the standard C library, but exposing the same (Linux/POSIX) concepts, but without the historical baggage of standard C library functions
To be frank, that doesn't make much sense. The C programming language is used BECAUSE of the standard. If you replace the standard with your own stuff, there is no reason to use C anymore. Then you are free to use much better, more modern programming languages.
That's just your opinion.

The C standard provides two completely different environments, hosted and freestanding.  If I were to take your opinion at face value, the freestanding environment does not exist; there is no reason to ever use it.  Because it does, and is actively used for both OS (Linux and BSD kernels) and embedded uses, your opinion is faulty.

During the last decade, much better options than C have been developed
Better how?  That is the rub, isn't it.

Frankly, the CS articles I've read in the last decade or so, have concentrated more on constructing abstractions that help unskilled developers write code, and automatic detection and correction of the programming errors they create, than looking at robust, efficient, long-term maintainable code bases.  Hot air and gushing about favourites, without any true invention or progress.

I have seen zero proof that Rust, D, or Go, are actually technically superior to C.  All I've seen are opinions, sometimes based on chosen statistics.  You can always pick poor code in other languages, and write a much better version in your favourite language; but that is not proof of anything.  You need to actually compare the best implementations against each other, and that usually shows results programming language developers don't like to see, so they don't, either consciously or subconsciously.

The purpose of this library is to see how much better C could be, if we wanted it to be.
Title: Re: Replacement for C standard library: your wishlist?
Post by: DiTBho on January 08, 2021, 01:50:55 pm
Yes, I know. We all have that problem. And this will never change if developers of new libraries keep using C. If we want to advance, there is no other way than to make a step forward.

100% agree with this. I thought the 2021 should be *THE* year to learn something new, for instead of investing money for fireworks, on  New Year's Eve I invested some money for a Rust course.

I can say "Rust", because I have a couple of customers that can provide a solid Rust SDK, so I can actually program something for a real target.

I have zero idea about what is better than what, I simply want to learn something new, with the hope that it will help my mind to grow in knowledge and wisdom.

That's it  ;D
Title: Re: Replacement for C standard library: your wishlist?
Post by: Fixpoint on January 08, 2021, 01:52:14 pm
That's just your opinion.

Surprise! But honestly, not quite. Fortunately, there are many people on this planet who know what I am talking about. But of course I get your point.

Quote
Because it does, and is actively used for both OS (Linux and BSD kernels) and embedded uses, your opinion is faulty.

Not so fast. Yes, there is software that doesn't use the standard library. But this software has a reason for doing that, so the question is why this software should use YOUR new library. The only point your library could make is that it is BETTER than the standard library. But what "better" means depends on the project.

Quote
Better how?  That is the rub, isn't it.

Well, not necessarily.

Quote
Frankly, the CS articles I've read in the last decade or so, have concentrated more on constructing abstractions that help unskilled developers write code, and automatic detection and correction of the programming errors they create, than looking at robust, efficient, long-term maintainable code bases.  Hot air and gushing about favourites, without any true invention or progress.

Without going into details: Yes, I agree that long-term maintainability is the big issue and that there hasn't been much innovation regarding that. Know what? That means that we are on the same page here. How, in God's name, are you going to tackle this problem with a C library that intends to replace the standard one?

I take it that you are fully aware of the challenges and the problems. So ... that would actually entail that you go the extra mile and don't use C. If you have good ideas, look into the future, not the past.

Quote
The purpose of this library is to see how much better C could be, if we wanted it to be.

You can make it as great as you please. This kind of experiment is not necessary, the result is clear.

Years ago I myself developed a C library that I *personally* much prefer to the standard one, it features a special form of hybrid memory management (basically a safe mixture of static and dynamic memory) and includes important data structures like strings, lists, and maps in a safe implementation (doesn't use this unsafe rubbish from the 70s). (But still, I don't want to use this library if I can avoid C.)

More recently, I also have developed a second C library (usable on small microcontrollers) that actually tackles the problem of long-term maintainability. Maybe I will release it when it is more mature, currently it is a bit basic.

However, I am doing nobody a favor with those libraries, and that holds me back. I know that actually a different programming language would be the real painkiller.
Title: Re: Replacement for C standard library: your wishlist?
Post by: PlainName on January 08, 2021, 02:41:57 pm
Quote
The C standard provides two completely different environments, hosted and freestanding.

I think I've missed something. What is a hosted or freestanding C environment, or, rather, the difference?
Title: Re: Replacement for C standard library: your wishlist?
Post by: SiliconWizard on January 08, 2021, 03:12:26 pm
Apart from replacing unsafe functions (even though a number of them have already been deprecated), I'd like to see standard hash tables/dictionaries. And, as you mentioned, pool allocation functions.

Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 08, 2021, 05:18:57 pm
I take it that you are fully aware of the challenges and the problems. So ... that would actually entail that you go the extra mile and don't use C. If you have good ideas, look into the future, not the past.
Why do you insist D, Rust, and Go are "the future", and C is "the past"?  No, I am not interested in how convinced you are, or who else says so; I am interested in logical arguments for or against, or personal experiences.

Like I said, I don't want more abstractions.  D++ is object-oriented, and I don't need that abstraction.  I don't want a runtime like Go has, either; if you have ever dealt with C++ runtime from other languages, you know why.  Perhaps Go avoids that particular trap, but that is yet to be seen.  I want an efficient low-level systems programming language, and that is what C is.  No, C is not particularly good.  A big part of not-particularly-good is the standard library, easily shown by comparing programs written with POSIX C to those doing the same thing in standard C, so I want to find out how much better we could do.  Rust is the closest, but I'm not convinced it is better than C yet, and I don't see how I could contribute anything meaningful there.

As to "better", I intend to have real-world compilable code that everyone can look at and compare to, and tell me how and why I'm wrong.  That will be useful.

I know that actually a different programming language would be the real painkiller.
Yes, that I do agree with.  Perhaps, one day, I will take that step, too.

But for now, freestanding C provides an easy starting point.  I intend to create the library, then implement a few typical service daemons, so that one can compare them to similar daemons written in C and other languages.  Instead of abstract examples or handwaving, I want tangible, real-world code to examine and compare to.  (And not just compare the human-readable parts, but also the binaries generated, and definitely the run-time resource use.)

I think I've missed something. What is a hosted or freestanding C environment, or, rather, the difference?
In a hosted environment, the standard C library provides you a number of functions, from malloc() to exit().  In a freestanding environment, you do not; you only have some macros (from specific header files you can still include) and variadic parameter support (from <stdarg.h>).

I'd like to see standard hash tables/dictionaries.
Environment variables are an example of a dictionary every process has access to, and I have an idea on those.

Do you have any examples of interfaces you've found useful?  Function prototypes would give a good idea, with example real-world use cases.
Compare to e.g. qsort_r() instead of qsort() for sorting: the comparison operation often needs external information, like offset or column within the string to start the comparison at, and passing an untyped user pointer to the comparison function makes that easy in a thread-safe manner, as one does not need to use global variables.

And everyone else, if you have found a particularly useful function you have used when writing low-level code – a service daemon, a command-line utility, etc. – please let me know; the most interesting bit is how/why you found it useful.
Title: Re: Replacement for C standard library: your wishlist?
Post by: PlainName on January 08, 2021, 05:47:04 pm
Quote
In a hosted environment, the standard C library provides you a number of functions, from malloc() to exit().  In a freestanding environment, you do not; you only have some macros (from specific header files you can still include) and variadic parameter support (from <stdarg.h>).

OK, thanks. First time I've come across that distinction. Well, those names for it anyway.

Title: Re: Replacement for C standard library: your wishlist?
Post by: Fixpoint on January 08, 2021, 07:43:45 pm
I take it that you are fully aware of the challenges and the problems. So ... that would actually entail that you go the extra mile and don't use C. If you have good ideas, look into the future, not the past.
Why do you insist D, Rust, and Go are "the future", and C is "the past"?

Please stick precisely to what the other one says, otherwise we will make no progress. I never said that D, Rust or Go are the future. I gave those languages as examples for new languages that have been developed and that fix severe flaws of C. (And I repeat what I said earlier: I explicitly do not recommend Go.) I do not say that those languages are the future. I even hope that they ARE NOT.

Quote
No, I am not interested in how convinced you are, or who else says so; I am interested in logical arguments for or against, or personal experiences.

What kind of anti-intellectual stance is that? There is actual RESEARCH available, research that provides all those "logical arguments", research that has been done for DECADES by many people in the mathematics and CS community, some of them even clever. And you tell me "no, I am not interested what others have to say". Well -- what do you expect from me? How am I supposed to react to this? Clap my hands? Do you think saying something like that makes you a free thinker? Honestly, I don't know what to make of this.

Don't get me wrong: If you are SERIOUSLY interested in this stuff, I am definitely willing to talk about it. But currently I am under the impression that that's not really the case.

By the way, object-orientation has nothing to do with all this. D (the name is D, not D++) SUPPORTS object orientation, but that does not mean that--

No, I'm sorry. I am already beginnnig to give a lecture on the 101. I shouldn't do that.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 08, 2021, 09:23:10 pm
I take it that you are fully aware of the challenges and the problems. So ... that would actually entail that you go the extra mile and don't use C. If you have good ideas, look into the future, not the past.
Why do you insist D, Rust, and Go are "the future", and C is "the past"?
Please stick precisely to what the other one says, otherwise we will make no progress. I never said that D, Rust or Go are the future. I gave those languages as examples for new languages that have been developed and that fix severe flaws of C. (And I repeat what I said earlier: I explicitly do not recommend Go.) I do not say that those languages are the future. I even hope that they ARE NOT.
You wrote,
During the last decade, much better options than C have been developed (including the embedded use case), for example D, Rust, or Go (I don't recommend Go, but of course I have to include it here).
and later,
I take it that you are fully aware of the challenges and the problems. So ... that would actually entail that you go the extra mile and don't use C. If you have good ideas, look into the future, not the past.
I'm sorry, but I cannot understand how else I should interpret these except as "don't use C, because it would be looking into the past", and by inference, that for example D, Rust, or Go are "the future" that I should be looking into.

What kind of anti-intellectual stance is that? There is actual RESEARCH available, research that provides all those "logical arguments", research that has been done for DECADES by many people in the mathematics and CS community, some of them even clever. And you tell me "no, I am not interested what others have to say".
Ahem. "Please stick precisely to what the other one says, otherwise we will make no progress."

I did not say anything remotely like that.  I said, "Frankly, the CS articles I've read in the last decade or so, have concentrated more on constructing abstractions that help unskilled developers write code, and automatic detection and correction of the programming errors they create, than looking at robust, efficient, long-term maintainable code bases.  Hot air and gushing about favourites, without any true invention or progress."  I later said, "No, I am not interested in how convinced you are, or who else says so; I am interested in logical arguments for or against, or personal experiences."

In other words, most of the CS articles about this kind of stuff that I have looked at (at ACM, including in TOPLAS (https://dl.acm.org/journal/toplas)), is either irrelevant in practice, or of poor quality; usually both.  Consider A large scale study of programming languages and code quality in github (https://dl.acm.org/doi/abs/10.1145/2635868.2635922): cited 126 times, but a recent reproduction study (https://dl.acm.org/doi/10.1145/3340571) showed the same data does not actually support the original conclusions claimed.  Perhaps you consider that a single bad example among the good ones, but in my experience, this is typical of current CS articles.  Sure, there are good papers in there describing say an algorithm, perhaps even something that in some selected cases produces a few percent increase in efficiency for some workloads, but I haven't seen anything that measurably affects software engineering in years.  Nothing that makes a significant practical difference.

Maybe I haven't read enough; certainly possible!  If you disagree, just point me to a paper you think is relevant to low-level systems programming and if/when implemented, would make a difference to software projects' code quality.  Right now, I believe you are seeing serious progress where I see only abstract, occasionally incremental development; I see nothing that really affects real-world practical software engineering.

My stance on this is simple: C is a language proven in practice, but its standard library has significant easily demonstrated issues.  C compilers do not force us to link against the standard library, so it is a very straightforward matter to create a replacement library, one that does not implement standard C functions, but provides a different set.  I believe I have at least a partial idea of what kind of a set/API would be "better", but I would highly appreciate others experience and findings, because those surely help: I do not know everything, and I err often.  Opinions are less useful than the reasons for those opinions, because with the reasons, I can compare to my own reasons, and decide whether my own opinion is on a weaker basis, and should change.
Aside from garbage collection (and automatic methods of detecting programmer errors), I know very few good CS articles relevant to low-level systems programming published in the last decade or so.  Definitely none that describe anything fundamentally better than I believe is achievable with freestanding C and a "better" base library.
Title: Re: Replacement for C standard library: your wishlist?
Post by: MarginallyStable on January 08, 2021, 09:42:41 pm
The #1 thing is to ban free() and base a C library on garbage collection. I disagree with Apple and prefer tracing over reference counting. The Boehm library work very well, with a few modifications -- I vastly reduced the amount of static memory needed and optimized the object size buckets and a number of other things.

I disagree here.  I think improvements to C should be made by making the language simpler and having less gotchas, especially in terms of simpler to understand & simpler to predict.  Garbage collection implementations always have corner cases and harder to predict behaviours (eg runtime slowdowns).

It's bad enough that dynamic memory allocation is already unpredictable, I don't want more of this  :(

Garbage collection is faster than malloc/free. I can (and have) proved that by using LD_PRELOAD to replace the C library malloc with Boehm GC for a large range os programs, with the result that they finish sooner and/or use less CPU time.

That's even on programs that are completely ignorant of the fact they are now running under GC.

A properly-written GC can easily limit the maximum pause time to an imperceptible level.


That would depend on if the program ever garbage collected. You might see the same performance increase by simply not calling free and hoping you do not run out of memory before the program finishes.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 08, 2021, 10:20:28 pm
Garbage collection is faster than malloc/free. I can (and have) proved that by using LD_PRELOAD to replace the C library malloc with Boehm GC for a large range os programs, with the result that they finish sooner and/or use less CPU time.
That would depend on if the program ever garbage collected. You might see the same performance increase by simply not calling free and hoping you do not run out of memory before the program finishes.
There are also cache effects.  Meaning, if instead of accessing different cache lines the GC causes the process to access the same cache lines (by reusing the same memory region allocated for the process) it can actually run in less wall clock time than the never-free()ing code, because of fewer cache misses.
Title: Re: Replacement for C standard library: your wishlist?
Post by: brucehoult on January 08, 2021, 11:09:28 pm
The #1 thing is to ban free() and base a C library on garbage collection. I disagree with Apple and prefer tracing over reference counting. The Boehm library work very well, with a few modifications -- I vastly reduced the amount of static memory needed and optimized the object size buckets and a number of other things.

I disagree here.  I think improvements to C should be made by making the language simpler and having less gotchas, especially in terms of simpler to understand & simpler to predict.  Garbage collection implementations always have corner cases and harder to predict behaviours (eg runtime slowdowns).

It's bad enough that dynamic memory allocation is already unpredictable, I don't want more of this  :(

Garbage collection is faster than malloc/free. I can (and have) proved that by using LD_PRELOAD to replace the C library malloc with Boehm GC for a large range os programs, with the result that they finish sooner and/or use less CPU time.

That's even on programs that are completely ignorant of the fact they are now running under GC.

A properly-written GC can easily limit the maximum pause time to an imperceptible level.


That would depend on if the program ever garbage collected. You might see the same performance increase by simply not calling free and hoping you do not run out of memory before the program finishes.

Yep, you can try that too. And I have.  For example with gcc compiles (a relatively short-running program).

What you'll find is that even relatively small compiles quickly use GB of RAM and it's faster to do a few GCs. But not too many.

Maybe I'll put together a page on how to try this experiment for yourself.
Title: Re: Replacement for C standard library: your wishlist?
Post by: magic on January 09, 2021, 01:19:59 am
Fix the size of int, long, char, etc; don't let them be platform specific.  In practice this means having to use different type names like stdint.h does (otherwise you break compat with other code & compilers),  I normally go for more sensible names like   u8, u16, u32, u64,   s8, s16, s32, s64,   f32, f64  (rather than things like uint32_t, uint64_t, etc)
Found the x86 user :D

C was designed to run on platforms where 32 bits is slower than 16 bits and on platforms where the contrary is the case. It will even support a 36 bit CPU for that matter, though probably fewer people care about that nowadays.

I agree with the latter part, the standard uint_t names are annoying.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Whales on January 09, 2021, 01:44:52 am
Fix the size of int, long, char, etc; don't let them be platform specific. [...]
Found the x86 user :D

I have spent a good 10 minutes now trying to come up with convincing defence.  All it has done is remind me of is that I miss my snow laptop (http://halestrom.net/darksleep/blog/006_chromebook/) (edit 7 years now?!?!)
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 09, 2021, 05:24:25 am
C was designed to run on platforms where 32 bits is slower than 16 bits and on platforms where the contrary is the case. It will even support a 36 bit CPU for that matter, though probably fewer people care about that nowadays.
That is completely true, but all our interchange formats use these base integer and IEEE-754 floating-point types; and that is what we really need them for.

In real world applications, size_t (and to a lesser degree, uintptr_t/intptr_t) are the "common" integer type, being able to describe the size of any in-memory structure (in units of char, "byte").

The distinction between "int" and "long" has become annoying, because on LP64 x86-64 ABIs, longs tend to be "cheaper" than ints, as the native general purpose register size is 64 bits.  On ILP64, there is no distinction.  What we need, is a (set of) integer types that can describe at least a specific range of values.  We could look at Fortran (its kind= notation), but make it simpler, explicit; so that to declare an unsigned integer type variable foo, we'd use
    INT(0, max) foo;
and to declare a signed integer type variable bar,
    INT(min, max) bar;
with the preprocessor macros evaluating to a scalar unsigned or signed type that can represent all integers from min up to and including max – but possibly a much larger range –, using the fastest representation and size for the current hardware; or emitting a compile error if no suitable type exists at preprocess time.  And yes, this implies that e.g.
    typedef  INT(0,256*256*256-1)  rgb_color;
    typedef  INT(-1, 1024*1024*1024-1)  my_size;
would be perfectly acceptable ways of declaring integer types.

Unfortunately, the C preprocessor does not really make that possible; it would require a ternary macro expression evaluator that is evaluated at preprocess time, with numerical values evaluated (with basic +-*/ integer arithmetic) at preprocess stage.  (That would sure be useful for standard C code too, though.)

The best we have, is the badly-named uintfastN_t and intfastN_t types, with <stdint.h> defining them for N = 8, 16, 32, and 64.  We could define them for all N between 8 and 64, but since there is no hardware that GCC and clang compile to with integer types having any other values of N, there hasn't been any practical need.  Maybe we should, though, at least for N multiples of 8/16/32?

I still have to come up with a good, short descriptive name for these types, that are easy to type too.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 09, 2021, 05:26:43 am
One idea does keep bugging me:  What if we used uNe and sNe for the exact-width types, and uN and sN for the fast types that may have bigger range?

In other words, that u32 == uintfast32_t, and s64e == int64_t?

I know that is a big break from existing practice, and would feel weird to many, but it just seems to me it would match real-world use cases better.
If you have a function or a local variable, you want to use the fast types.  You should only use the exact-width types in data structures (and in arithmetic expressions via casts).
Title: Re: Replacement for C standard library: your wishlist?
Post by: brucehoult on January 09, 2021, 06:08:05 am
Fix the size of int, long, char, etc; don't let them be platform specific. [...]
Found the x86 user :D

I have spent a good 10 minutes now trying to come up with convincing defence.  All it has done is remind me of is that I miss my snow laptop (http://halestrom.net/darksleep/blog/006_chromebook/) (edit 7 years now?!?!)

Geez, that thing's in far worse shape than my 2011 11" MacBook Air (dual core i7 1.8 GHz, 2.9 GHz turbo)

A dual 1.7 GHz A15 is nothing to be sneezed at. I've got a quad core A15 2.0 in an Odroid XU4 and it rocks if 32 bit is good enough. I'm surprised it can run on batteries though. Well, I mean ... anything *can* ... just not for long.
Title: Re: Replacement for C standard library: your wishlist?
Post by: magic on January 09, 2021, 09:06:48 am
I have spent a good 10 minutes now trying to come up with convincing defence.  All it has done is remind me of is that I miss my snow laptop (http://halestrom.net/darksleep/blog/006_chromebook/) (edit 7 years now?!?!)
Okay, so you might know that ARM has no 16 bit registers and no 16 bit operations besides load/store IIRC. So perhaps 32 bit would be a good fit for int on ARM, but then try to use it on AVR and see what happens :scared:

The distinction between "int" and "long" has become annoying, because on LP64 x86-64 ABIs, longs tend to be "cheaper" than ints, as the native general purpose register size is 64 bits.  On ILP64, there is no distinction.
There is nothing cheaper about 64 bit on x86-64 until you actually need 64 bits. There is a full set of 32 bit registers and a full set of operations on them implemented efficiently. In fact, to get 64 bit arithmetic you add a special prefix byte to the corresponding 32 bit instruction. Ditto when accessing the R8-R15 registers, by the way. In all honesty, x86-64 is an LP64 architecture - an i386 with long pointers and ability to do 64 bit arithmetic if you really insist. If you think about it, its primary design objective was to run 32 bit Windows natively unlike Intel Itanic :P
Title: Re: Replacement for C standard library: your wishlist?
Post by: DiTBho on January 09, 2021, 11:36:51 am
Yesterday I tried to compile a C++ project with a lot of big templates on a i686 32bit Linux box, and g++ crashed because unable to manage more than 2Gbyte of stack, or something similar. The CPU is already 64bit, but I have two kernels and two rootfs,  32bit and 64bit, so I rebooted the computer into i686 64bit, and everything went fine.

gcc is Mbyte-hungry
g++ looks Gbyte-hungry
Title: Re: Replacement for C standard library: your wishlist?
Post by: DiTBho on January 09, 2021, 11:51:39 am
I agree with the latter part, the standard uint_t names are annoying.

I disagree. I don't find it annoying, but rather elegant, practical, and coherent :D
Title: Re: Replacement for C standard library: your wishlist?
Post by: SiliconWizard on January 09, 2021, 04:58:29 pm
I agree with the latter part, the standard uint_t names are annoying.

I disagree. I don't find it annoying, but rather elegant, practical, and coherent :D

Ditto. This is actually a consistent naming convention for std C types, that I use in my own code as well.
It certainly shouldn't be changed. Shorter and less consistent identifiers are not just less consistent and possibly ambiguous, but they drastically increase the possibility of identifier clash, which the std comittee is well inspired to avoid at all costs.
Title: Re: Replacement for C standard library: your wishlist?
Post by: magic on January 09, 2021, 07:36:34 pm
There is nothing inconsistent or confusing about the set proposed by Whales, which is also used in many actual C projects.

At the very least, drop the damn _t. As if I didn't know that intNN is a type, as if everybody weren't using an editor which highlights it anyway |O
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 10, 2021, 05:58:50 am
Does anyone have any particular function interfaces they really like?

As I've already mentioned, I really like getline() (https://www.man7.org/linux/man-pages/man3/getline.3.html), because it avoids line length limits.  (Perhaps there should be an optional length limit, or rather a maximum memory use limit.)  I'd love a getdelim() (https://www.man7.org/linux/man-pages/man3/getdelim.3.html) that takes a set of possible delimiters instead of a single character, and the inverse (that only reads input as long as they are within a specified set), similar to strspn() and strcspn(); but without consuming the delimiter (like getdelim() does).  It would allow lexical analysis directly on file-like streams, as well as universal newline support, with very little overhead.  Instead of a first-level stream type, I think it'd be better to have buffer types (readonly, read-write, writeonly) exposing this functionality, attachable to a file descriptor.  I think having the stream abstraction explicitly described as an abstraction could also help new programmers get a more realistic overview of Linux/BSD/etc. OS operations.

I also find iconv_open() (https://www.man7.org/linux/man-pages/man3/iconv_open.3.html), iconv() (https://www.man7.org/linux/man-pages/man3/iconv.3.html), iconv_close() (https://www.man7.org/linux/man-pages/man3/iconv_close.3.html) very powerful for character set conversions; POSIX regex (https://www.man7.org/linux/man-pages/man3/regex.3.html) (regcomp(), regexec(), regfree()) for regular expressions; nftw() (https://www.man7.org/linux/man-pages/man3/nftw.3.html) for walking filesystem trees (although how the number of file descriptors it uses needs work, and needs atfile support); scandirat() (https://www.man7.org/linux/man-pages/man3/scandirat.3.html) for listing directory contents; glob() for finding files matching a pattern (except it too needs atfile support).

Instead of open(), I intend to expose open(dirfd, pathname, what, how [, mode]), with how containing flags affecting the file descriptor, and what affecting filename resolution (O_DIRECTORY, O_BENEATH, openat2() RESOLVE_ flags).  (The number of flags needed exceeds 32, so instead of requiring a larger type, I think splitting them into two by purpose works better.) 

Since Unix domain sockets and named pipes (FIFOs) are visible in the filesystem, I think we should expose them via what flags explicitly.  A synthetic dirfd constant would expose abstract Unix domain sockets (not visible in the filesystem), with how determining whether to connect or to listen, and what determining the socket type (stream, datagram, seqpacket).  The hope is to encourage use of Unix domain sockets for IPC.

(I do think that having to use mkfifo(), bind(), and connect() to access filesystem-visible FIFOs and sockets is a silly complication due to development history, and that the above is about fixing that, not providing an abstraction.)
Title: Re: Replacement for C standard library: your wishlist?
Post by: PlainName on January 10, 2021, 11:34:40 am
Quote
As I've already mentioned, I really like getline(), because it avoids line length limits.

But it's inconsistent. You sometimes have to free() and sometimes not, and sometimes have to free() even if you say you don't want anything from it! I think it's trying to cover too many bases and would be better split into two functions: one that allocates memory and one that uses pre-allocated memory.

edit: also it's a blocking call. Sure, we all use a RTOS nowadays but sometimes we don't (and this is specifically for small embedded systems, don't forget). If it doesn't block you can add blocking easily enough, not so easy to do it the other way.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 10, 2021, 11:53:38 am
Quote
As I've already mentioned, I really like getline(), because it avoids line length limits.
But it's inconsistent. You sometimes have to free() and sometimes not, and sometimes have to free() even if you say you don't want anything from it! I think it's trying to cover too many bases and would be better split into two functions: one that allocates memory and one that uses pre-allocated memory.
I don't understand what you mean.  You start with
Code: [Select]
char   *buffer_ptr = NULL;
size_t  buffer_max = 0;
or with
Code: [Select]
size_t  buffer_max = somenumber;
char   *buffer_ptr = malloc(buffer_max); /* verified to have succeeded */
The buffer_ptr must always be either NULL, or a pointer to dynamically allocated memory as returned by malloc(), calloc(), realloc(), or aligned_alloc().  It cannot and must not point to statically allocated memory.

To read an input line, you do
Code: [Select]
ssize_t  len = getline(&buffer_ptr, &buffer_max, file_handle);
If len==-1, then either there was no more data to read, or there was an error.

Regardless of the result, it is always safe to discard the buffer via
Code: [Select]
free(buffer_ptr);
buffer_ptr = NULL;
buffer_max = 0;
Because free(NULL) is safe, this is always safe to do, even when buffer_ptr==NULL.

You do have to be aware that after every getline() call, the function may have modified buffer_ptr and buffer_max; you must not assume they are unchanged.

TL;DR: You start with a NULL pointer and a zero size, or an already dynamically allocated buffer, and can discard (or steal the dynamically allocated buffer) at any time you wish.  After you're done with it, you always discard the buffer.  Very consistent, very simple, very robust.

So, what is the inconsistency?
Title: Re: Replacement for C standard library: your wishlist?
Post by: Siwastaja on January 10, 2021, 12:54:56 pm
The base C language is quite fine actually, most of the footguns are related to the stupidly designed standard library. Similarly, most things that are more complex than they need to be, are related to the library. People think they prefer "higher level" languages because of the core language features, but I think they actually like simple-to-use, powerful libraries that come with said languages.

With C, there is always some extra housekeeping to do, but there is no reason why searching, concatenating, parsing, generating strings for example must be a delicate 100-line endeavor vs. a two-liner in a "modern" language. With a decent, modern-day library, it would be, say, a 10-line job in C.

So I think this project is exactly on the right track. The freestanding C plus a few core library features (those that are likely provided by compiler built-ins instead of actually linking to stdlib) like memcpy or malloc are fine, but the mess caused by all the complex, unsafe, and unhelpful library "helpers" makes full-blown standard C more difficult than it needs to be.
Title: Re: Replacement for C standard library: your wishlist?
Post by: PlainName on January 10, 2021, 03:14:19 pm
Does this buffer the line until it's completely read, then? If the stream is, say, a serial port, does it return after each call with -1, meantime buffering the input, until it sees a delimiter, and then return the buffer length? If so, I misinterpreted the description.

Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 10, 2021, 04:07:47 pm
Does this buffer the line until it's completely read, then? If the stream is, say, a serial port, does it return after each call with -1, meantime buffering the input, until it sees a delimiter, and then return the buffer length? If so, I misinterpreted the description.
Plain reading of the IEEE Std 1003.1-2017 (https://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html) says that if an error occurs, getline() and getdelim() should return -1 with errno set, but most current implementations instead return the currently buffered data.  Most of them also assume that when read() returns a short count, it is either because of end-of-input or an error, and that just isn't true in practice.

The behaviour you and POSIX describe – that when the underlying descriptor is nonblocking, the call will return an error indicating it would block if the buffer does not contain a delimiter and end-of-input hasn't been received yet (because the buffer-filling read() call reports an error, EAGAIN/EWOULDBLOCK) – is what makes sense, obviously.  This also applies when the underlying read() is interrupted by signal delivery.  Instead of returning whatever contents there may be in the buffer, it should return an error (EAGAIN/EWOULDBLOCK, or EINTR).

Like I said, I like the interface; not necessarily the implementation details (which I consider buggy anyway).
Title: Re: Replacement for C standard library: your wishlist?
Post by: PlainName on January 10, 2021, 04:13:24 pm
OK. I retract my objection, then.
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 10, 2021, 05:05:14 pm
OK. I retract my objection, then.
It was a good point, though.  Being able to use the higher-level "stream" interfaces on all types of descriptors is important.

Another important case is full duplex I/O.  We need to make that simple, because so many file descriptor type interfaces are full duplex in Linux and BSDs.
I do have a few ideas (based on past experience) on how to implement this, but there are several ways of implementing it.

Many things boil down to exposing at least the receive/read buffer, so that the user can check if a specific code or sequence has been received, and how much data there is in the read buffer, without consuming or discarding the buffered data.  Exactly how best to do that, is an open question for me.
Title: Re: Replacement for C standard library: your wishlist?
Post by: SiliconWizard on January 10, 2021, 05:48:17 pm
I'd like to see standard hash tables/dictionaries.
Environment variables are an example of a dictionary every process has access to, and I have an idea on those.

Do you have any examples of interfaces you've found useful?  Function prototypes would give a good idea, with example real-world use cases.
Compare to e.g. qsort_r() instead of qsort() for sorting: the comparison operation often needs external information, like offset or column within the string to start the comparison at, and passing an untyped user pointer to the comparison function makes that easy in a thread-safe manner, as one does not need to use global variables.

Dictionaries are useful for a very large set of applications.

I'm not sure about the interface. I think ideally you should deal with arbitrary keys and values, so keys and values would be "items", each for instance being a pointer to the item's data, and a size field. I would certainly not restrict it to "strings".

Then a function adding a key-value pair to the dictionary would take 2 "items" as parameters.
Another function returning the value from the key would for instance take 1 "item" as parameter (the key) and would return the value as a pointer to an "item". A returned NULL would mean no matching key found.
You could also add a function for removing an entry (by key).

The function creating a dictionary may take an optional parameter defining if we want a dynamically resizable one, or a fixed max size (number of entries). Optionally it could also allow the use of a statically allocated dictionary (if we want to avoid dynamic allocation for instance.)

As for the key searching, the exact match could be either implemented with memcmp() (or equivalent), or with an optionally user-defined compare function for cases where a key match would not be strict binary equality of the whole key's data.
Title: Re: Replacement for C standard library: your wishlist?
Post by: westfw on January 15, 2021, 09:28:44 am
I would really like a better set of string (text) functions.  Maybe more capable of dealing with unicode that current stuff, but ... definitely more like the support in other languages.

I'm not sure what that would look like, exactly.  One possibility is that strings could have their own garbage-collected memory management, without switching other things away from malloc/free.

Quote
rather than things like uint32_t
I find it a bit depressing how often people have used uint16_t, when what they really should have used was uint_fast16_t.  Of course, if you didn't like uint16_t because of readabilty or typeability issues, you REALLY hate uint_fast16_t :-(
Title: Re: Replacement for C standard library: your wishlist?
Post by: Nominal Animal on January 15, 2021, 01:45:08 pm
I would really like a better set of string (text) functions.  Maybe more capable of dealing with unicode that current stuff, but ... definitely more like the support in other languages.
It is interesting to note that current Unicode limits to code points 0 to 0x10FFFF, inclusive (1,114,111 unique code points), which means that UTF-8 code points are 1, 2, or 3 bytes long.  All newline conventions are either one or two bytes long.  Commonly interesting escape/end sequences are two or three bytes long.  And so on.

It seems to me that we really need string functions that instead of single-character bytes, work on characters or character sequences that are 1, 2, or 3 bytes long.  This covers not only UTF-8, but other use cases as well.  UTF-8 sequences are in many ways even easier, because the initial byte also describes the sequence length; this makes them relatively easy to support automagically when globbing or implementing regular expressions.

(I've worked quite a bit with wide character strings and wide I/O, and while they solve the individual character problem, they do not solve combined glyphs nor newline conventions nor escape sequences.)

For operations that are done to a limited-size buffer, the functions need to be able to return the case when the decisive sequence is cut short by the end of the buffer, so that the caller knows to resize/grow/move the buffer.
(So, the equivalent of strnstr() should be able to return here, not found, or cut short by end of buffer.)
However, even this is most important for those short sequences - a few characters at most; the longer string matching is much rarer operation, relatively speaking.

Making the common operations efficient is the key.  After that, the rarer operations only need to be non-silly.

I'm not sure what that would look like, exactly.  One possibility is that strings could have their own garbage-collected memory management, without switching other things away from malloc/free.
Definitely an intriguing option.  I also like the underlying idea of modularity.

Perhaps, instead of a monolithic base library, it should be split into a core and optional sub-libraries?

I find it a bit depressing how often people have used uint16_t, when what they really should have used was uint_fast16_t.  Of course, if you didn't like uint16_t because of readabilty or typeability issues, you REALLY hate uint_fast16_t :-(
That's exactly the reason I was mulling using u16 for uint_fast16_t and u16e for uint16_t.

Minimum-size fast types should be the most commonly used ones, so why not make them the easiest ones to use?  The exact-sized types can then be thought of as stricter type variants (in the logical sense; computationally completely separate types), so a suffix denoting "exactly" seems logical to me.