Author Topic: array of pointers to different types C  (Read 2074 times)

0 Members and 1 Guest are viewing this topic.

Offline IanB

  • Super Contributor
  • ***
  • Posts: 11891
  • Country: us
Re: array of pointers to different types C
« Reply #25 on: December 06, 2023, 04:14:37 pm »
It's an infamous source of subtle and difficult bugs. You may get what you want for a while if the compiler doesn't perform such optimizations or doesn't notice the opportunity for optimization (for example, due to the code being spread across multiple functions or files) but it's a ticking time bomb waiting for a sufficiently advanced complier to set it off. And compilers optimizations are only getting more complex and aggressive with time, not less.

GCC has the -fno-strict-aliasing mode which I believe is supposed to make such code work correctly (i.e. as expected by a sane person, not a language lawyer). I have never tried it, though, preferring to rewrite unsafe aliasing when I see it.

This is a really unfortunate state of affairs.

I have been out of touch with C for over 20 years, and it appears things have been changing over the decades.

K&R conceived of C as a low-level programming language, to be used as an alternative to assembly language. As such, it should/would be hardware-oriented and close to the metal, as you would need when working with microcontrollers. If a high-level general purpose programming language is needed, then C++ is available.

If, as I understand, the standards committee has been abstracting C in to a general purpose programming language that is hardware agnostic, then that leaves an unfortunate gap. What language is one supposed to use to write operating systems, device drivers and microcontroller code?

From what I read, it seems the Linux kernel has been affected by this too.
 

Offline magic

  • Super Contributor
  • ***
  • Posts: 6779
  • Country: pl
Re: array of pointers to different types C
« Reply #26 on: December 06, 2023, 05:59:53 pm »
C++ is a similar mess.

Pointers make C an unpleasant language to compile. Pointer variables can alias, i.e. point to the same object. Pointer arithmetic means that the target of a pointer variable can change in ways difficult to predict. The heavy use of pointers and pointer arithmetic in typical C code means that everything the compiler "knows" about the value behind a pointer can change at any time when other pointers are written through. It's hard to prove with certainty that two pointers will never alias at run time, particularly if they come "from outside" - as function parameters or global variables that could have been initialized by anyone to any value.

C is not only a systems language, it's also a speed freak language. People want C code to run fast. People want operating systems to be fast too. So compiler vendors convinced the standards committee to a compromise: pointers to the same type can alias any time they want and the compiler must prove that they don't before assuming so, pointers to different types must not. This relieves the compiler from aliasing worries which in 99.9% of cases would be completely unfounded, at the cost of breaking the remaining 0.1% of code.

You can still type pun by unions or by copying individual chars between variables of different type. I believe it's illegal to cast a char array to any other type and access it as such, although I have surely done it many times before I knew better and it worked - typically problems only occur when concurrent accesses are made through the two incompatible pointers.
 

Offline IanB

  • Super Contributor
  • ***
  • Posts: 11891
  • Country: us
Re: array of pointers to different types C
« Reply #27 on: December 06, 2023, 07:15:43 pm »
Pointers make C an unpleasant language to compile.
Unpleasant to compile, or unpleasant to optimize? Maybe if predictable behavior is needed we should turn off optimizations?

Quote
C is not only a systems language, it's also a speed freak language.
I'm certainly familiar with that perspective. But might it also be the case that people rely too much on the optimizer instead of their own coding skills?

Imagine if you were writing in assembly and the assembler decided to re-write your code for you?
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: array of pointers to different types C
« Reply #28 on: December 07, 2023, 07:36:06 am »
I don't like quoting the C standards in general, but here, consider these said while nodding yes to the above posts; I'll explain why, further below, after the horizontal line.

I believe it's illegal to cast a char array to any other type and access it as such, although I have surely done it many times before I knew better and it worked - typically problems only occur when concurrent accesses are made through the two incompatible pointers.
Well, unsigned char is the special type: it allows access to the storage representation of other types:

Quote from: ISO C99 6.2.6.1p4
Values stored in non-bit-field objects of any other object type consist of n×CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

C99 and later has three pointer qualifiers: const, volatile, and restrict.
const is a promise that the code itself will not try to modify the value.  volatile tells the compiler that the value may be changed by external code or causes at any point during execution.  restrict is an aliasing-related promise: that the pointed to object will only be referenced directly or indirectly via this particular pointer only; that any access to the pointed to object will depend on the value of this pointer.  (An entire chapter, 6.7.3.1 in C99, is dedicated for the formal definition of restrict, though.)

Type punning via an union was described in ISO C99 as a footnote (6.5.2.3 Structure and union members, footnote 82):

Quote from: ISO C 6.5.2.3, 82.
If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

The common initial sequence was described in ISO C99 6.5.2.3p5:

Quote from: ISO C 6.5.2.3p5
if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.



In a very real sense, the "default C" language has become more abstracted and "further away from the hardware" in a sense, in succeeding ISO C standards.  However, in my opinion, ISO C99 also added the tools to drill straight through those abstractions: type punning, exact-width two's complement standard types intN_t and uintN_t, minimum/fast two's complement types int_fastN_t and uint_fastN_t, size_t, intmax_t and uintmax_t, intptr_t and uintptr_t, and so on.

The main point about ISO C99 was that it did not state anything new, only documented existing agreements and behaviour of C compilers that their users had found useful/necessary.  You could say that the increased abstraction was necessary to allow better optimization schemes to evolve, while the added features were necessary for the low-level programmers (mostly kernel and library programmers) to keep performance and portability across a large diverse set of architectures.  (At this point, computer architectures were even more diverse than now.)

Then came the odd misstep that is ISO C11.  It was mostly a push by Microsoft to allow their C++ compiler to compile ISO C also (they still refuse to support ISO C99, though); and the infamous Annex K that is likely to be removed from the next ISO C standard, defining their "safe I/O functions".  It's main impact was aligning the atomic memory model semantics with C++, plus the _Generic macro facility allowing type-dependent polymorphic functions via a preprocessor macro –– that e.g. func(X) resolves to say func_int(X) if X is of type int, func_d(X) if X is of type double, and so on.
(Some disagree vehemently with this characterization, but I say the existence of Annex K is proof enough.  There is also the entire OOXML debacle in the same timeframe (first decade of this century), which in my opinion illustrates the approach MS then had with "standardization": weapon, rather than collaboration.)

ISO C17 was basically a stationary point.  Not only was this around the time Microsoft changed its approach to open source and to standardization in a lesser degree, but C17 added very little anything new.

If we look at what is to become ISO/IEC 9899:2024 (Wikipedia), it looks like the standard development is switching back to the practice-driven way C99 was developed, by incorporating features and facilities already provided by various C compilers that have been found useful (and sometimes necessary).  Sure, the new bit operations in <stdbit.h> have new names, like stdc_count_ones() instead of popcount(), but we can deal with those as things settle.  (I also haven't checked how the new things stand with respect to freestanding vs. hosted implementations –– i.e., their impact on embedded development ––, but I'm expecting it is sane/positive.)

In my opinion, all of the above means that those who want to write efficient low-level code in C, need to keep track with the ISO C standards, but even moreso with the features and facilities their toolchains provide; especially with binutils, gcc, and clang.  The original language as described by K&R has drifted a lot since then, away from its low-level simple origins; but the same tasks and performance (but with even better portability!) can still be achieved by using the new language features.

In particular, in embedded development, I very much rely on ELF object file format features exposed by the compiler and linker: the __attribute__((section (foo)) shenanigans.  Even in systems programming, <dlfcn.h> is indispensable for me for run-time extensions (plug-ins and such).

Is it worth it, chasing a moving target like this, instead of just staying with good ol' K&R C?

Well, I remember the time in the nineties when it was easy to exceed the performance of C compiler-generated assembly (by gcc, icc, pathscale, portland group) by rewriting it by hand.  Nowadays, SIMD vectorization and possibly avoiding one or two unnecessary register moves at the beginning of a function is about it: the optimization has progressed by leaps and bounds.  To me, the changes are worth it: I do eagerly expect switching to C23/C24 as soon as it becomes practical.  And I do write a lot of C, both freestanding (microcontroller/embedded) and hosted (especially combining with POSIX C) systems stuff.
 
The following users thanked this post: IanB, nfmax, RAPo


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf