but that's why thread cannot be safely implemented as C-library, neither you can assume C semantics will produce the right assembly code for trmem, and the more you push { space, speed} optimization, the more likely it won't
That's why I always suggest to directly use assembly for this stuff: because you have the full controll of it!
If we define 'this stuff' as these hardware-details (transactional memory, spinlocks, ll/sc-based locks or lockless accessors), we are in absolute agreement.
With GCC and clang, correctly written extended inline assembly will use machine constraints and "references" for the registers used, so that the C compiler can optimize the (inlined) code for each use site. For full functions that will always be called and not inlined, external assembly is absolutely fine.
If you think about it, my-c works better precisely because the optimizer works worse
This is a key observation. The more limited or stricter the optimization strategy for a C-like language, the better the control over the exact code generated, but the worse the portability (especially wrt. compilers generating code for a different hardware achitecture, from the same source) becomes.
While "new" languages like Rust, Julia, etc. are developed to hopefully avoid the core concepts of C that cause that effect, only time will tell, really.
Actually, to be honest, the my-C optimizer does almost nothing, and it's a great thing for me as you don't even have to care about the problems for which in C you often have to use "volatile" to ensure that the optimizer won't "asphalt your code" like a crushes stones vehicle driven by a monkey.
Have you noticed how I always accompany my suggested code snippets including
volatile with a specific explanation along the lines of "it stops the compiler from caching and inferring the value from surrounding code, as the value of such variables can change or be modified by code not seen by the compiler", exactly because it is such a heavy hammer? It is way too common for C programmers to simply sprinkle them on variables semi-randomly, until the code seems to work; the often described 'lets throw spaghetti at the wall to see what sticks' -approach.
Base C is a very simple language with a very complex optimization engines bolted on top, to use it effectively and to write portable efficient code, one needs to understand
a lot about the language, its theoretical model (the abstract machine the language specification used), as well as existing machine architectures and their differences. One of my pet peeves is the ubiquitous
opendir()/
readdir()/
closedir() example/exercise/use case, which is
wrong on most current operating systems, because the directory tree may be modified during scanning, and none of the existing examples take that into account. The proper solution is to use POSIX
scandir(),
glob(), or
nftw(), or
fts family of functions from BSD and derivatives, which are supposed to work even when the directory tree is concurrently modified. To implement these, you need to either use a helper process (
fts family, using its current working directory to walk the directory tree), or so-called "atfile" support (as e.g. standardized in POSIX via
openat(),
fstatat(), etc.). Exactly why, involves understanding how file systems are implemented, and their access properties (what is atomic, what is not, and so on).
With experience, that understanding distills into rules of thumb –– like using memcpy() or accessor functions, instead of 'tricks' like the
UNALIGNED() macros I showed above, to ensure correct machine instruction level access patterns ––, often ending up "codified" in programming howtos and guides and books; but when used without the true understanding, easily leads to misuse and inefficient/buggy code.
A good example of that is when using low-level POSIX/Unix/BSD I/O from
<unistd.h>, i.e.
read() and
write(). Ages ago, operating systems never returned short counts for normal files. This belief still persists today, even though it is absolutely false. First, on POSIXy systems a signal delivery to an userspace handler installed without the SA_RESTART flag can cause them to fail with
errno==EINTR; some filesystems, like Linux userspace (FUSE) ones, can return short reads or writes whenever they want, even for local files; slow network connections can cause short reads from shared network folders; and pipes and sockets
often return short reads or writes. I've had dozens of arguments about this with otherwise very proficient C programmers, with their argument basically boiling down to
"that doesn't happen to me, so I don't care".
As to
security aspects, don't even get me started.
This leads to an annoying dichotomy on my own part. With threads like this one, where a specific detail is discussed, I do not usually even consider whether there are real use cases for applying the detail or not; I just discuss what I know about it, because I tend to
suspect there is an use case, or the OP would have discussed the problem they're trying to solve via that detail. With threads like
this one about opendir()/chdir() on this forum, my response will be severely annoying (sorry, MikeK) even if/when they are explicitly useful/correct. See
my "original" answer to that question in 2015 at StackOverflow, read the comments, and note how it was
not the correct answer to the asker. To me, it really feels like seeing babies draw on the kitchen cabinets with their own poop.
I suspect that something like that dichotomy, mixed with experience that goes beyond the book examples and single architectures, and experience that is based on the book examples and having found that sufficient, is the underlying reason why so many threads about C details branch out and get a bit 'flame-y'.
Now, add to that useful pieces from domain- and hardware-specific variants of C like DiTBho's my-c, or my own that makes arrays base-level objects (allowing buffer overrun detection and tracking at compile time through function hierarchies), and conflagration is nearly assured.
Add to that the high-level concepts like monads (or even threads!) that can be used to sidestep many of the issues in real-world code, and the discussion will vary from friendly to heated, from practical to theoretical, and so on. I for one like to try and be useful, and find all of those aspects interesting, but that leads to walls of text like this post.
Apologies for this and the preceding over-long posts. I'm still trying to learn how to be more concise.