EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: newbrain on June 12, 2022, 07:47:12 am

Title: New C23 working draft
Post by: newbrain on June 12, 2022, 07:47:12 am
I know only a subset of a minority of a segment will be interested, but a new working draft for what is supposed to become C23 has been published on open-std.org the 8th of June.

You can find it here (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf).

Quite a number of proposals have been included, from the top of my head:
- Remove K&R style function declarations
- Mandatory 2's complement representation for integers.
- Binary literals with 0b or 0B
- Digit separation character ' in literals as in 123'456'789
- unreachable() macro
- Optional VLAs (as in C17 and C11 but not C99) but mandatory variably modified types (change)
Title: Re: New C23 working draft
Post by: jeremy on June 12, 2022, 09:22:38 am
Thank you for sharing, there are indeed a few of us out here interested in this stuff. I'm glad to see that C (and C++) are getting some effort put into them, I just hope the compliers catch up quickly.
Title: Re: New C23 working draft
Post by: SiliconWizard on June 12, 2022, 06:24:30 pm
Yup, I've been keeping an eye on this too.

Haven't seen a lot of significant changes though apart from some sugar coating and removing obsolete constructs.
The binary literals are cool but nothing groundbreaking (I've actually never used them even with compilers supporting them as an extension, I find them hard to read and easy to fuck up), the digit separation is nice though, but nothing essential.

The mandatory 2's complement triggers a "finally!" reaction, but while it's important for compiler writers and standardization, in practice, all C compilers I've ever dealt with used 2's complement. So...

All in all, nothing big really. IMHO.
I'm still waiting for the support of "modules" in C. And also maybe some better support for generic programming - as long as it's kept simple, we don't want C to become C++.

As a question, I wonder what is the use of variably modified types if VLAs are not supported. Could you explain?
Title: Re: New C23 working draft
Post by: newbrain on June 12, 2022, 10:17:25 pm
Quote
Could you explain?
I could, but then I'd have to kill you.

Jokes aside, IIUC, a vmt can be used as a function parameter, the actual argument can be a regular array.
This comes handy to pass multidimensional arrays without pointer contortions and it's a really welcome improvement..

See, e.g., Example 4 at the bottom of page 118.

Other things:
- bool, true, false (so bool.h is not needed to use them), static_assert, alignas, alignof are now first class keywords.
- the set of reserved identifier has been relaxed.
  Though everyone was doing that, me included, it was strictly speaking UB to use identifier such as strength, EMPLOYEE, total and so on, due to 7.31 that reserved a vast number of prefixes for "future library use" (for the examples: str to E). They are now only "potentially" reserved.
Note that I'm not talking about the more commonly known (double) underscore prefixed ones.

EtA:
Quote
I find them hard to read and easy to fuck up
Same here! (Apart for me eschewing extensions as much as possible).
But now, with the introduction of the digit separator, they become much more usable!
Title: Re: New C23 working draft
Post by: SiliconWizard on June 13, 2022, 12:02:54 am
Quote
Could you explain?
I could, but then I'd have to kill you.

Jokes aside, IIUC, a vmt can be used as a function parameter, the actual argument can be a regular array.
This comes handy to pass multidimensional arrays without pointer contortions and it's a really welcome improvement..

OK, so as I thought, there's no other cases of VMT in C at the moment than arrays with unspecified size. I can see the point, but I admit I absolutely never use array types as function parameters, and indexing a "multi-dimensional array" is kinda trivial, but why not.

Otherwise, yeah, the rest looks like sugar coating. So after 20 odd years of C99, they now feel safe to add 'bool' as a keyword instead of _Bool (and so on). Great.
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 13, 2022, 04:29:12 am
As a question, I wonder what is the use of variably modified types if VLAs are not supported. Could you explain?

In a previous discussion here I have already provided a link to my detailed explanation on SO

https://stackoverflow.com/a/54163435/187690

In short, the key points are that:

1. VLAs and regular arrays are perfectly inter-compatible (as long as their sizes match), which means that you can write functions accepting VLAs, yet pass-in ordinary arrays
2. You can allocate VLAs in dynamic memory

In other words one can use the power of VLAs, yet never have to create VLAs "on the stack". (The latter is often [mis-]used as a major vector of criticism directed at VLAs. But in reality such criticism is completely misguided.)
Title: Re: New C23 working draft
Post by: magic on June 13, 2022, 06:26:09 am
Well, I am one of those VLA haters ;)


1. VLAs and regular arrays are perfectly inter-compatible (as long as their sizes match), which means that you can write functions accepting VLAs, yet pass-in ordinary arrays
2. You can allocate VLAs in dynamic memory
Not sure you can do 2 without resorting to 1?


And most importantly, none of that trickery buys you more than a better readable function declaration. Certainly not a guarantee of correctness of any sort, not even under the simplest of circumstances.
Code: [Select]
$ cat test.c
#include <math.h>
#include <stdio.h>
#include <stdlib.h>

void test(int n, int x[n]) {
        for (int i = 0; i < 4; i++)
                printf("%d\n", x[i]);
}

int main() {
        int n = 3;
        int x[n];
        for (int i = 0; i < n; i++)
                x[i] = i;
        test(0, x);
}
$ gcc test.c -o test -Wall -Wextra
$ clang test.c -o test -Wall -Wextra
$ ./test
0
1
2
134517376


Usefulness is further limited by not being usable in data structures; you really need to pass all your dimensions as separate arguments.
Code: [Select]
$ cat test.c
struct string {
        unsigned n;
        char s[n];
}
$ gcc -c test.c
test.c:3:9: error: ‘n’ undeclared here (not in a function)
  char s[n];
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 13, 2022, 06:40:46 am
Not sure you can do 2 without resorting to 1?

Um... It is not exactly clear to me what you mean

Code: [Select]
int n = 42, m = 25;
int (*a)[n][m] = malloc(sizeof *a);

Here I do 2 without resorting to 1.


Code: [Select]
$ cat test.c
#include <math.h>
#include <stdio.h>
#include <stdlib.h>

void test(int n, int x[n]) {
        for (int i = 0; i < 4; i++)
                printf("%d\n", x[i]);
}

int main() {
        int n = 3;
        int x[n];
        for (int i = 0; i < n; i++)
                x[i] = i;
        test(0, x);
}
$ gcc test.c -o test -Wall -Wextra
$ clang test.c -o test -Wall -Wextra
$ ./test
0
1
2
134517376

I don't see what you are trying to demonstrate with this code. Also, declaring a VLA in a function parameter list does not override the "classic" rule: array declarations in function parameter lists are automatically replaced with pointer declarations. So, your function declaration is actually equivalent to

Code: [Select]
void test(int n, int *x)

I.e. the function does not really use VLA at all. Basically, your example does not demonstrate anything related to VLA.

You really need a multidimensional array as a parameter to make VLA have any tangible effect, as in my examples at the link.

Usefulness is further limited by not being usable in data structures; you really need to pass all your dimensions as separate arguments.

This is akin to saying that usefulness of a microscope is limited by not being able to hammer in nails.

Natural semantics of VLA does not imply ability to use them inside data structures. Everybody understands that VLAs are run-time allocated. Nobody expects to be able to use run-time allocated arrays as struct fields. Such expectations do not arise.

And most importantly, none of that trickery buys you more than a better readable function declaration.

Again, here's an example where VLA parameter declaration translates into tangible functionality

Code: [Select]
#include <stdio.h>

void test(unsigned n, unsigned m, int a[n][m])
{
  for (unsigned i = 0; i < n; ++i)
  {
    for (unsigned j = 0; j < m; ++j)
      printf("%2d ", a[i][j]);
    printf("\n");
  }
}

int main()
{
  int a[2][3] =
  {
    { 0, 1, 2 },
    { 3, 4, 5 }
  };
  test(2, 3, a);
 
  printf("\n");
   
  int b[4][6] =
  {
    { 0, 1, 2, 3, 4, 5 },
    { 6, 7, 8, 9, 10, 11 }, 
    { 12, 13, 14, 15, 16, 17 }, 
    { 18, 19, 20, 21, 22, 23 }
  };
  test(4, 6, b);
}

This is far beyond "a better readable function declaration". This is impossible to implement in C without VLA, i.e. in C it is impossible to work with plain multi-dimensional arrays in any reasonable way without VLA support. Which is what makes VLA (or, more precisely, VMT) critically important. That is why it becomes mandatory in C23.
Title: Re: New C23 working draft
Post by: brucehoult on June 13, 2022, 07:40:03 am
Quote
This is impossible to implement in C without VLA, i.e. in C it is impossible to work with plain multi-dimensional arrays in any reasonable way without VLA support.

Code: [Select]
void test(unsigned n, unsigned m, void *p)
{
  int *a = p;
  for (unsigned i = 0; i < n; ++i)
  {
    for (unsigned j = 0; j < m; ++j)
      printf("%2d ", a[i*m+j]);
    printf("\n");
  }
}

Code: [Select]
$ ./vla_fake 
 0  1  2
 3  4  5

 0  1  2  3  4  5
 6  7  8  9 10 11
12 13 14 15 16 17
18 19 20 21 22 23
$

WFM

Not type-safe, certainly, but I think it falls within the bounds of "reasonable".
Title: Re: New C23 working draft
Post by: magic on June 13, 2022, 07:54:53 am
I don't see what you are trying to demonstrate with this code. Also, declaring a VLA in a function parameter list does not override the "classic" rule: array declarations in function parameter lists are automatically replaced with pointer declarations. So, your function declaration is actually equivalent to

Code: [Select]
void test(int n, int *x)

I.e. the function does not really use VLA at all. Basically, your example does not demonstrate anything related to VLA.
Well, I just wanted to confirm that "modern C" is still as primitive as you described ;)

Even though the VLAs know their size, there is not a slightest attempt at bound checking or tracking if you maintain information about their dimensions correctly, apparently not even as a warning in very obviously invalid cases.

You really need a multidimensional array as a parameter to make VLA have any tangible effect, as in my examples at the link.

This is far beyond "a better readable function declaration". This is impossible to implement in C without VLA, i.e. in C it is impossible to work with plain multi-dimensional arrays in any reasonable way without VLA support.
Okay, you have a point here.
(Though I would still prefer the C++ solution of simulated multi-dimensional arrays that know their size internally, instead of a bunch of function arguments passed around).

Usefulness is further limited by not being usable in data structures; you really need to pass all your dimensions as separate arguments.

This is akin to saying that usefulness of a microscope is limited by not being able to hammer in nails.

Natural semantics of VLA does not imply ability to use them inside data structures. Everybody understands that VLAs are run-time allocated. Nobody expects to be able to use run-time allocated arrays as struct fields. Such expectations do not arise.

Right, it wasn't a good example. But this still doesn't work and I hope you appreciate that it could be useful:
Code: [Select]
struct matrix {
        const unsigned n;
        const unsigned m;
        float (*x)[n][m];
};

int main() {
        int n, m;
        float (*x)[n][m] = malloc(sizeof(*x));
        struct matrix mat = {n, m, x};
}

It seems I would need to declare the array as simply *x and then cast it to (*x)[n][m] in each function that works with my struct.

Code: [Select]
      printf("%2d ", a[i*m+j]);

Not type-safe, certainly, but I think it falls within the bounds of "reasonable".
Yes, you can, but I agree with TheCalligrapher here - it really sucks.
Title: Re: New C23 working draft
Post by: newbrain on June 13, 2022, 08:48:35 am
Not type-safe, certainly, but I think it falls within the bounds of "reasonable".
Feasible, of course, but exactly what I referred to as "pointer contortions":
* the need to use an "universal" void* or, alternatively, cast the actual argument or pass the address of the first element etc.
* the need to explicitly make your own offset calculation, lowering the abstraction, as you just see an unravelled array and need to consider its memory layout (fixed and known, but still...).
ugly, less safe (as in more prone to error) and less understandable (especially with more than two axis).
Title: Re: New C23 working draft
Post by: brucehoult on June 13, 2022, 10:31:47 am
Not type-safe, certainly, but I think it falls within the bounds of "reasonable".
Feasible, of course, but exactly what I referred to as "pointer contortions":
* the need to use an "universal" void* or, alternatively, cast the actual argument or pass the address of the first element etc.
* the need to explicitly make your own offset calculation, lowering the abstraction, as you just see an unravelled array and need to consider its memory layout (fixed and known, but still...).
ugly, less safe (as in more prone to error) and less understandable (especially with more than two axis).

The safety, in a practical sense, is very little different. The most cursory unit test of the function will show whether the address calculation is correct or not.

There is no difference in efficiency. The multiply in a loop will be strength-reduced to an add in either case.

The much bigger worry is whether the function -- VLT or not -- is called correctly. There is absolutely no check in either version that the correct bounds are passed.

In particular, it would be very easy to accidentally swap the dimensions. The VLT version has no protection against swapped or incorrect dimensions.
Title: Re: New C23 working draft
Post by: Siwastaja on June 13, 2022, 11:14:37 am
The binary literals are cool but nothing groundbreaking (I've actually never used them even with compilers supporting them as an extension, I find them hard to read and easy to fuck up), the digit separation is nice though, but nothing essential.

Combined, they are pretty great. I use binary literals quite a bit but the biggest issue is sheer number of digits making eyeballing byte/nibble boundaries difficult, and more prone to errors than hexadecimal. Adding digit separation makes binary literals great again. 0b0101'1111'0110'1001 looks quite nice to me. But yeah, small stuff.
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 13, 2022, 01:14:14 pm
Quote
This is impossible to implement in C without VLA, i.e. in C it is impossible to work with plain multi-dimensional arrays in any reasonable way without VLA support.

Code: [Select]
void test(unsigned n, unsigned m, void *p)
{
  int *a = p;
  for (unsigned i = 0; i < n; ++i)
  {
    for (unsigned j = 0; j < m; ++j)
      printf("%2d ", a[i*m+j]);
    printf("\n");
  }
}

Code: [Select]
$ ./vla_fake 
 0  1  2
 3  4  5

 0  1  2  3  4  5
 6  7  8  9 10 11
12 13 14 15 16 17
18 19 20 21 22 23
$

WFM

Not type-safe, certainly, but I think it falls within the bounds of "reasonable".

Where is the rest of the code? What do the calls to `test` look like?

If you are implying that this can work with my original `main`, then no, it doesn't. You are not allowed to reinterpret/access a two-dimensional [N][M] array as a one-dimensional [N*M] array in C. The behavior is undefined.
Title: Re: New C23 working draft
Post by: magic on June 13, 2022, 01:51:15 pm
Are you sure of that?

There are no two dimensional arrays in C. Simply taking a[0] in your example gives type int[2] which can be passed to a function demanding int[] without even casting and given the well-defined memory layout of arrays I'm not sure at which point the code is supposed to break. Or which part of the standard says that you aren't supposed to do so. FWIW, I'm pretty sure I have seen such code in the wild.
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 13, 2022, 02:08:47 pm
Are you sure of that?

There are no two dimensional arrays in C.

??? Yes, there are.

Simply taking a[0] in your example gives type int[2] which can be passed to a function demanding int[] without even casting

Firstly, `a[0]` in my example is `int[3]`.

Secondly, of course, it can be passed without casting. `a[0]` is an one-dimensional array of type `int[3]`. It can be passed to a function expecting a one-dimensional array (i.e. a pointer, in my case). Nothing unusual here. However, the basic language rules still apply: you are not allowed to access beyond the boundary of that one-dimensional array.

and given the well-defined memory layout of arrays I'm not sure at which point the code is supposed to break.

The "well-defined memory layout" does not matter. What matters is that you are working with an `int[3]` array and language rules do not permit access beyond its boundary. Otherwise the behavior is undefined. The compiler is allowed to assume that undefined behavior never happens and translate (optimize) the code under that assumption. For example, the compiler is allowed to, say, inline the `test` call and, say, unroll the cycle to 3 iterations tops (since there "can't possibly be" more than 3).

FWIW, I'm pretty sure I have seen such code in the wild.

Of course. This has always been a hack that we had to employ when working with multi-dimensional arrays because there simply is no other way. As I said above, this is impossible to implement properly without VLA support.

There are many hack that exists "in the wild". Some of them have been killed for good at some point by the above "undefined behavior never happens" optimization rule (e.g. strict overflow semantics, strict aliasing semantics, returning null as pointers to locals, and so on and so forth). Some still survive...
Title: Re: New C23 working draft
Post by: Nominal Animal on June 13, 2022, 02:18:19 pm
Whenever I end up using nontrivial 2D arrays (for example linear algebra stuff), I end up having to use origin[row*rowstride + col*colstride] anyway.
It might seem like a slowdown, but is actually at the core of why naïve Fortran yields so good results.  I haven't microbenchmarked it either, but I suspect the "extra" multiplication is insignificant compared to memory bandwidth.

For trivial arrays, I tend to use inner loops that access consecutive memory via a pointer.  The cost of setting up that pointer just outside the inner loop gets lost in the noise.

As to the C23 draft, nothing pokes my eye immediately; looks like to be quite bog-standard (pun not intended) gradual development, and not pushing for anything "new"/"unexpected".  (Which is a good thing: the C standard should codify expected and existing practice, not dictate new behaviour.)

You are not allowed to reinterpret/access a two-dimensional [N][M] array as a one-dimensional [N*M] array in C. The behavior is undefined.
I am not so sure that array bounds actually apply that way here (making the behaviour undefined).

The way I interpret C99/C11/C17 6.5.2.1 paragraphs 2 to 4 is that the behaviour is fully defined, albeit implicitly, because each successive indexing converts the expression to a lower-dimensional array with pointer semantics.  In short, because m[x][y] is identical to (m[x])[y].

In particular, consider type punning via an union:
    union {
        type a[N*M];
        type b[N][M];
    } foo;
Assuming 0 <= n < N and 0 <= m < M, are (foo.a[n*M+m]) and (foo.b[n][m]) equivalent expressions accessing the same element, or not?
Title: Re: New C23 working draft
Post by: newbrain on June 13, 2022, 03:10:46 pm
Are you sure of that?
From 6.5 Expressions in C11:
Quote
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type
Aliasing an array using an incompatible type (arrays of different sizes and element types are not compatible) does not fall in any of the above cases, so, going against a "shall" outside of a Constraints section it's automatically UB.

Similarly, in Appendix J, J.2 Undefined Behavior:
Quote
— An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

Another case where "it works" "everybody does it" but still non conforming code.
There's a correct way to do it since C99, and C23, untying VLAs from VMT, improves its usability.
Title: Re: New C23 working draft
Post by: brucehoult on June 13, 2022, 09:36:53 pm
Quote
This is impossible to implement in C without VLA, i.e. in C it is impossible to work with plain multi-dimensional arrays in any reasonable way without VLA support.

Code: [Select]
void test(unsigned n, unsigned m, void *p)
{
  int *a = p;
  for (unsigned i = 0; i < n; ++i)
  {
    for (unsigned j = 0; j < m; ++j)
      printf("%2d ", a[i*m+j]);
    printf("\n");
  }
}

Code: [Select]
$ ./vla_fake 
 0  1  2
 3  4  5

 0  1  2  3  4  5
 6  7  8  9 10 11
12 13 14 15 16 17
18 19 20 21 22 23
$

WFM

Not type-safe, certainly, but I think it falls within the bounds of "reasonable".

Where is the rest of the code? What do the calls to `test` look like?

I did not change the rest of your code.

Quote
If you are implying that this can work with my original `main`, then no, it doesn't. You are not allowed to reinterpret/access a two-dimensional [N][M] array as a one-dimensional [N*M] array in C. The behavior is undefined.

Sure does work. I tried it on arm64, amd64, and riscv64 machines.

The only thing that could make it not work is if rows are padded, in which case m should be rounded up in some way before multiplying. If that happens on some platform you have then 1) that's weird, and 2) that can easily be incorporated with some #if.

The only time I could ever see that happening is if the array element size is smaller than the word size.
Title: Re: New C23 working draft
Post by: magic on June 14, 2022, 10:29:50 am
There are no two dimensional arrays in C.
??? Yes, there are.
... in languages like BASIC, Fortran or Matlab.

C has no first class two dimensional arrays, only arrays of arrays are sometimes known as such, and all the usual array rules apply to them.

An argument in favor of such conversion is:
1. we are allowed to convert an array of arrays to a pointer to an array and use that as we please
2. we are allowed to convert an array to a pointer to its individual element and use that as we please

I understand your objection to be that the scalar pointer obtained in step 2 is only permitted to refer to elements of the particular "inner" array obtained in step 1 and that the implementation is permitted to blindly assume that it is so.

The "well-defined memory layout" does not matter. What matters is that you are working with an `int[3]` array and language rules do not permit access beyond its boundary. Otherwise the behavior is undefined. The compiler is allowed to assume that undefined behavior never happens and translate (optimize) the code under that assumption. For example, the compiler is allowed to, say, inline the `test` call and, say, unroll the cycle to 3 iterations tops (since there "can't possibly be" more than 3).
I agree that my "solution" was a blatant cheat on the type system and the above is a potential concern.

So let's go to the original code posted by Bruce, where the full array is simply passed as a void pointer. Now, the compiler would need to somehow conclude that the void* converted to int* by the calle somehow is a pointer to one of the inner arrays, rather than the full array of arrays which has been passed by the caller (note that when the array of arrays decays to a pointer, the pointer can still be used by the callee to access all of the array, otherwise memcpy wouldn't work). And we could go further, using memcpy to exctract individual elements and only then converting them to int.

From 6.5 Expressions in C11:
Quote
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type
Aliasing an array using an incompatible type (arrays of different sizes and element types are not compatible) does not fall in any of the above cases, so, going against a "shall" outside of a Constraints section it's automatically UB.
[/quote]
I'm not 100% sure what this rule means in the context of accessing array members. Naive reading seems to imply that no access to an individual array member through a pointer is legal, if we read "the object" as "the array" and consider such member access to be an array access too.

Similarly, in Appendix J, J.2 Undefined Behavior:
Quote
— An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).
We are not accessing anything out of declared bounds, but declaring no bounds on an alias to an object declared as a particular array type elsewhere.

If there is something I would be concerned about in practical terms, it would be alias analysis.

Sure does work. I tried it on arm64, amd64, and riscv64 machines.
The discussion is whether it is permitted to stop working tomorrow out of a sudden.
Title: Re: New C23 working draft
Post by: brucehoult on June 14, 2022, 11:07:50 am
Sure does work. I tried it on arm64, amd64, and riscv64 machines.
The discussion is whether it is permitted to stop working tomorrow out of a sudden.

That's what unit tests are for, possibly run up front in a ./configure step.

I think that, as with very many UB things, it's not about future machines or compilers, but ones in the distant past, or highly specialised ones.
Title: Re: New C23 working draft
Post by: Nominal Animal on June 14, 2022, 12:21:02 pm
That's what unit tests are for, possibly run up front in a ./configure step.
Or as an optional build-check target, compiling a set of test programs that verify expected behaviour.

I kinda-sorta prefer the separate build check target, because that way one can still cross-compile the sources.  (That is, one can cross-compile the tests, and only need to run the tests on the target architecture to verify the compiler operation.)
Title: Re: New C23 working draft
Post by: free_electron on June 14, 2022, 12:55:06 pm
no more forward declaration.
Title: Re: New C23 working draft
Post by: magic on June 14, 2022, 01:44:17 pm
The discussion is whether it is permitted to stop working tomorrow out of a sudden.
That's what unit tests are for, possibly run up front in a ./configure step.
Unit testing may easily miss compiler bugs and overzealous "optimization".
Your trivial test case is not the same code which will run in production.

Things may get even worse nowadays with link time optimizations.

I think that, as with very many UB things, it's not about future machines or compilers, but ones in the distant past, or highly specialised ones.
Well, except for that time when GCC removed NULL checks from Linux on pointers that have been (accidentally) dereferenced earlier, creating serious kernel data corruption vulnerabilities visible only in the generated machine code.

I also recall fixing some open source code which interpreted float as int by means of pointers rather than union and stopped working in GCC 4.0. There may still be some codebases which pass -fno-strict-aliasing to GCC.
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 14, 2022, 01:53:44 pm
You are not allowed to reinterpret/access a two-dimensional [N][M] array as a one-dimensional [N*M] array in C. The behavior is undefined.
I am not so sure that array bounds actually apply that way here (making the behaviour undefined).

The way I interpret C99/C11/C17 6.5.2.1 paragraphs 2 to 4 is that the behaviour is fully defined, albeit implicitly, because each successive indexing converts the expression to a lower-dimensional array with pointer semantics.  In short, because m[x][y] is identical to (m[x])[y].

I don't see how this equivalence can possibly save the day here.

The issue is essentially the same as with the proverbial "struct hack". The "flexible array member" declaration with `[]` was introduced into the language specifically because all "hackish" variants with `[0]`, `[1]` or `[a lot]` suffered from various array access problems. The allegedly "cleanest" `[1]` variant was problematic because it declared an array of size 1 and then accessed beyond its bounds.

Again, the standard is pretty specific about it: if an array object is declared with a specific size, the language prohibits you from accessing beyond its boundary, i.e. use pointer arithmetic that crosses that boundary. No exceptions are made for sub-arrays within a multi-dimensional array.

A similar/related problem would arise in a case like this

Code: [Select]
int main()
{
  int a[][3] = { { 1, 2, 3 }, { 4, 5, 6 } };
 
  for (const int *p = &a[0][0]; p < &a[0][3]; ++p)
    printf("%d ", *p);

  printf("\n");

  for (const int *p = &a[0][0]; p < &a[1][0]; ++p)
    printf("%d ", *p);
   
  printf("\n");
}

Both cycles work the same in practice, but the second one is undefined. Even though you can "prove" that `&a[1][0]` is the same as `&a[0][3]`, you are still not allowed to compare `p` to `&a[1][0]`: pointer `p` works over `a[0]`, while `&a[1][0]` is obtained though a completely different array `a[1]`. These two pointers are incomparable.
Title: Re: New C23 working draft
Post by: newbrain on June 14, 2022, 01:59:52 pm
no more forward declaration.
Where do you see that?
In 6.7.2.1 (struct and union) I only see additions related to the attribute specifier sequence, another new thing introduced in C23.

The syntax for struct/union still includes:
Quote
struct-or-union attribute-specifier-sequenceopt identifier
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 14, 2022, 02:03:12 pm
Sure does work. I tried it on arm64, amd64, and riscv64 machines.

Sorry, but "I tried it" does not really hold any weight in such matters. It doesn't matter.

The only thing that could make it not work is if rows are padded, in which case m should be rounded up in some way before multiplying. If that happens on some platform you have then 1) that's weird, and 2) that can easily be incorporated with some #if.

An array cannot possibly introduce any padding beyond what is already present inside the individual element. The language guarantees that

Code: [Select]
sizeof(T [N]) == sizeof(T) * N

which immediately implies that

Code: [Select]
sizeof(T [A][B]...[Z]) == sizeof(T) * A * B * ... * Z

That immediately precludes any possibility of "row padding" or anything like that. So, from the pure address arithmetic standpoint your code "should work".

But the language says that your code has undefined behavior. Which means that the compiler can do anything with it. As I said above, the primary source of "undefinedness" in undefined code is the compiler deliberately screwing up your code either to optimize it or to teach you a lesson. The fact that your code "worked" in your experiments simply means that the compiler writers haven't gotten around to it yet.
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 14, 2022, 02:03:47 pm
no more forward declaration.

What exactly is this about?
Title: Re: New C23 working draft
Post by: Nominal Animal on June 14, 2022, 03:13:40 pm
Again, the standard is pretty specific about it: if an array object is declared with a specific size, the language prohibits you from accessing beyond its boundary, i.e. use pointer arithmetic that crosses that boundary. No exceptions are made for sub-arrays within a multi-dimensional array.
Consider the case where you start with a linear (single-dimensional) array, and type-pun that into a multidimensional one.  Where are the boundaries violated?
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 14, 2022, 04:46:35 pm
Again, the standard is pretty specific about it: if an array object is declared with a specific size, the language prohibits you from accessing beyond its boundary, i.e. use pointer arithmetic that crosses that boundary. No exceptions are made for sub-arrays within a multi-dimensional array.
Consider the case where you start with a linear (single-dimensional) array, and type-pun that into a multidimensional one.  Where are the boundaries violated?

Type-punning is type-punning. In your example you used type-punning through a union, which is a valid way to type-pun in C. But type-punning is a completely different story. You can basically type-pun anything into anything as long as you stay within the storage and don't run into a trap representation.

Meanwhile, accessing a two-dimensional array `a[N][M]` by pointer arithmetic from `&a[0][0]` with `M * N` range is not a valid way to type-pun a 2D array as a 1D array.
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 14, 2022, 05:12:52 pm
1. As K&R declarations/definitions go away, `()` becomes equivalent to `(void)`. No more chastising newbies for `int main()`... End of an era.

2. It is now permissible to omit parameter names in function definitions

Code: [Select]
void foo(int, double) {}

3. I remember someone mentioned that C23 will fix this silly annoyance (and the Wikipedia C23 article mentions it)

Code: [Select]
void foo(const int (*a)[10]) {}

int main()
{
  int a[10];
  foo(&a); /* error: pointers to arrays with different qualifiers are incompatible in ISO C */
}

but the draft document does not seem to mention it in the introductory list of changes.

4. The finally said it explicitly: "The calloc function returns either a pointer to the allocated space or a null pointer if the space cannot be allocated or if the mathematical product nmemb * size is not representable as a value of type size_t."
Title: Re: New C23 working draft
Post by: SiliconWizard on June 14, 2022, 06:16:42 pm
Whenever I end up using nontrivial 2D arrays (for example linear algebra stuff), I end up having to use origin[row*rowstride + col*colstride] anyway.

Yeah, I personally never *declare* multidimensional arrays in C anyway and never use arrays as function parameters (pointers instead).
The indexing is a trivial task.

I'd be curious to see if using multidimensional arrays might compile to more efficient code than handling the indexing as above, but I suspect this will not make any difference.

And I would have liked C to add an operator to get the number of elements of an array. 50 years of C and will still have to do it with sizeof(A)/sizeof(A[0]) or similar (or define such a macro).
Maybe I missed it though in one of the latest std revisions?
Title: Re: New C23 working draft
Post by: magic on June 14, 2022, 09:27:08 pm
I'm reading the PDF now, this is going to be the fundamental problem with casting "multidimensional" arrays to ordinary pointers or vice-versa.
Quote
6.2.7

2. All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.

Type compatibility is defined in 6.2.7 and further elaborated on in 6.7.6.2 when it comes to arrays.
Quote
6. For two array types to be compatible, both shall have compatible element types, and if both size specifiers are present, and are integer constant expressions, then both size specifiers shall have the same constant value.

For int[N][M] and int[N*M] to be compatible, the element types int[M] and int must be compatible, they are not.


This is the usual rule that breaks any type punning by means other than a union (union avoids multiple declarations by being a single declaration of two types simultaneously) or memcpy to a char buffer. It absolutely can be a real world problem, because compilers assume that pointers to different types (other than char) don't overlap in memory and will generate invalid code, such as caching the last value seen through one pointer while the object is being modified through the other pointer.

6.7.6.2 is quite explicit that no exception is being made for re-dimensioning arrays, although sane compilers may make such exception at this time because the construct is somewhat popular. I tried to produce miscompilation with GCC with -fstrict-aliasing and failed so far, but maybe I didn't try hard enough. GCC manual gives some examples of UB code that is likely to fail in practice, but doesn't say anything about arrays.
Title: Re: New C23 working draft
Post by: magic on June 15, 2022, 08:01:23 am
By the way, if C had two-dimensional arrays, you could take a pointer to element 0,0 and iterating it till the end of the two dimensional array would not access the array out of bounds. But you can't because of the almost-equivalence of pointers with 1D arrays and lack of true multi-dimensional arrays, with nested arrays being the widely used substitute.

And yes, I'm reading the PDF. They use the d-word when defining arrays. Obviously, it's marketing wank not unlike most of Baterriser claims. No one can be fooled who has ever seen Fortran matrix code and a comparable implementation in C.
 |O
Title: Re: New C23 working draft
Post by: brucehoult on June 15, 2022, 10:53:27 am
Whenever I end up using nontrivial 2D arrays (for example linear algebra stuff), I end up having to use origin[row*rowstride + col*colstride] anyway.

Yeah, I personally never *declare* multidimensional arrays in C anyway and never use arrays as function parameters (pointers instead).
The indexing is a trivial task.

I'd be curious to see if using multidimensional arrays might compile to more efficient code than handling the indexing as above, but I suspect this will not make any difference.

I ran both versions through Clang generating .ll intermediate code, without any middle-end getting called.

The "illegal" version used a multiply and then add and then getelementptr [1] with i8* while the VLA version used getelementptr twice, once for each dimension. That's no real difference in practice.

The bigger difference was that, as the dimension size arguments were declared as "int" on LP64 machines (all three of arm64, amd64, and riscv64), the multiply version used i32 = i32 * i32 while the VLA version immediately cast the dimensions to i64.

So, that's a difference if the array is bigger than 4 GB.

But it's easily fixed with a cast in the array indexing calculation.

[1] for those not familiar with LLVM internals, getelementptr in llvm intermediate code is pretty much exactly an "LEA [base + index*size + offset]" where everything can be variable. A lot of n00bs get confused, thinking it's a memory access. It's not. It's just pure pointer arithmetic.
Title: Re: New C23 working draft
Post by: Nominal Animal on June 15, 2022, 03:17:32 pm
No one can be fooled who has ever seen Fortran matrix code and a comparable implementation in C.
At the function call ABI level, because of array slicing support, one-dimensional array references tend to be origin,stride,length with each element accessed as (using C syntax) origin[stride*index] for index=1..length, inclusive.  (In fact, gfortran also records offset, minimum_index, and maximum_index also, making it origin[offset+stride*index] for index=minimum_index..maximum_index, or something along those lines.)

Even with this "added complication" in array indexing, naïvely written Fortran array/matrix code tends to beat code written in naïve C to achieve the same, i.e. by physicists and such, in my experience.  It does not mean that "Fortran is faster/better than C", unless you limit the domain to "for naïve programmers".  Since naïvety can be a positive quality for example when making new science utilising linear algebra – such is easier to review, reproduce, and verify –, Fortran will always have niches where it is preferable to C, unless something even more suitable for naïve programmers comes along.

I'm sure most of you have seen my own matrix interface in C (https://www.eevblog.com/forum/microcontrollers/(c)-pointers-what-is-the-point/msg1687913/#msg1687913), where both matrices and views to matrices are the same thing, so you can have e.g. two matrices and three vectors using the same data – row-major and column-major matrices (transposes), main diagonal and two sub-diagonal vectors (immediately above and below), as often needed for C² continuity in piecewise cubic curve models – where any change to one is immediately visible in all others, because they refer to the exact same data.  I do use row and column numbering starting from zero, though.  In my experience, any heavier operation like matrix-matrix multiplication, benefits from heuristic check that considers the matrix shapes, and reorders one or both matrices to temporary storage to optimize cache locality.  Much better gains are available if sequences of operations (like multiple matrices multiplied together, or raising a square matrix to a power) can be optimized.  Since element access to a matrix just isn't a bottleneck; it's the cache locality/memory bandwidth and algorithmic optimizations (to minimize the number of base operations needed) that yield true efficiency improvements.
Title: Re: New C23 working draft
Post by: free_electron on June 15, 2022, 03:29:13 pm
no more forward declaration.

What exactly is this about?
wishful thinking on my part. i hate the concept of header files. the compilers should collect the function definitions from the source files. you don't need to define a function ahead of time. it is irritating to have to define stuff two times.
Title: Re: New C23 working draft
Post by: Siwastaja on June 15, 2022, 05:24:38 pm
wishful thinking on my part. i hate the concept of header files. the compilers should collect the function definitions from the source files. you don't need to define a function ahead of time. it is irritating to have to define stuff two times.

As much we all hate collecting function prototypes manually to an another file in error-prone manual process (yuk), or automating that with code generation (yuk), this would be too big of a change to be realistic. This is not something you can just remove and replace with something simple; it would require inventing the wheel over again to the point it would be almost like a whole new language, not just improved version of C any more.

One of the key strengths of C is that it does not change too dramatically or fast, but is still maintained.
Title: Re: New C23 working draft
Post by: newbrain on June 15, 2022, 05:32:41 pm
wishful thinking on my part. i hate the concept of header files. the compilers should collect the function definitions from the source files. you don't need to define a function ahead of time. it is irritating to have to define stuff two times.
If you want Arduino, you know where to find it :-DD

Such a change is so backwards incompatible to be ever even considered.

Now, that the #include mechanism is not the best possible world and C lacks a proper module concept we can probably agree, but I think you are conflating two issues: the visibility of symbols inside a translation unit, and their accessibility outside, #include has nothing to do with the former.
Title: Re: New C23 working draft
Post by: Ed.Kloonk on June 15, 2022, 05:33:46 pm
wishful thinking on my part. i hate the concept of header files. the compilers should collect the function definitions from the source files. you don't need to define a function ahead of time. it is irritating to have to define stuff two times.

As much we all hate collecting function prototypes manually to an another file in error-prone manual process (yuk), or automating that with code generation (yuk), this would be too big of a change to be realistic. This is not something you can just remove and replace with something simple; it would require inventing the wheel over again to the point it would be almost like a whole new language, not just improved version of C any more.

One of the key strengths of C is that it does not change too dramatically or fast, but is still maintained.

Because I'm a dinosaur, I like keeping the func declarations in a list and use the rest of the line to describe more about it's intention (not to be confused with what it actually does).

The other reason is I prefer a basic editor and the handy list helps me pick the right sub with the right spelling.
Title: Re: New C23 working draft
Post by: SiliconWizard on June 15, 2022, 07:30:59 pm
no more forward declaration.

What exactly is this about?
wishful thinking on my part. i hate the concept of header files. the compilers should collect the function definitions from the source files. you don't need to define a function ahead of time. it is irritating to have to define stuff two times.

That's one of the benefits you would get from the implementation of "modules" that I highly suggested earlier.

But note that modules do not necessarily avoid having to define an *interface*, which is what function prototypes roughly provide.
Among languages implementing modules, some require a separate interface, some do not.
Separate interfaces have their benefits too.
Title: Re: New C23 working draft
Post by: free_electron on June 16, 2022, 01:10:36 pm
Such a change is so backwards incompatible to be ever even considered.

how hard is this to implement ? scan all sourcecode to collect function definition. (It's a simple string parser) and write a file that contains all the function prototypes. include that file at the top of the project and start compilation as usual.
you don't have to break anything. simple compiler flag "auto_function_declare = true"

the only thing backwards is C itself ! the compiler is lazy. it needs statement terminators ,  it cant figure out when = means assign and when compare , it needs this, it needs that. many other languages have solved all these irritations long before C even came along.
Title: Re: New C23 working draft
Post by: newbrain on June 16, 2022, 01:17:35 pm
it needs statement terminators ,  it cant figure out when = means assign and when compare
:horse:

Quote
how hard is this to implement ?
Without changing rules for linkage of objects (or should they not be included?) and functions, and their visibility?
And what about typedef, that share the name space of ordinary identifiers?
How are conflicts to be resolved?

How hard?
Very hard, I think impossible, keeping some kind of backwards compatibility.
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 16, 2022, 06:20:27 pm
the only thing backwards is C itself ! the compiler is lazy. it needs statement terminators ,

Becuse without them it is impossble to figure out where a statement end.

it cant figure out when = means assign and when compare

Becuse it is impossible to "figure out".

, it needs this, it needs that. many other languages have solved all these irritations long before C even came along.

Becuse those "many other languages" simply don't have the features that lead to those ambuguities.
Title: Re: New C23 working draft
Post by: free_electron on June 16, 2022, 08:43:21 pm

Becuse those "many other languages" simply don't have the features that lead to those ambuguities.
so remove the source of the ambiguity !
i like how you use amBUGuities... i'm gonna steal that... i came up with one this morning instead of simulator : semi-lie-er

as for the statement end : at the <CR> or<CR><LF>. if the line needs continuation use a continuation character. you need far fewer of those as lines are 80 characters. if the statement doesn't fit in 80 characters : split it. it is too complex to understand anyway.
Title: Re: New C23 working draft
Post by: brucehoult on June 16, 2022, 10:40:29 pm
the only thing backwards is C itself ! the compiler is lazy. it needs statement terminators ,  it cant figure out when = means assign and when compare , it needs this, it needs that. many other languages have solved all these irritations long before C even came along.

What prevents you from using one of those many other languages?
Title: Re: New C23 working draft
Post by: SiliconWizard on June 17, 2022, 12:52:38 am
That's getting pretty funny. ;D
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 17, 2022, 02:17:21 am

Becuse those "many other languages" simply don't have the features that lead to those ambuguities.
so remove the source of the ambiguity !

"The best cure for dandruff - guillotine"

We should all stick to programming Turing machines. No ambuguities whatsoever.
Title: Re: New C23 working draft
Post by: SiliconWizard on June 17, 2022, 03:06:25 am
Ahah. Ambiguity is a relatively subjective notion here anyway when it comes to operators.

In C, there is actually no ambiguity to speak of, since there are two distinct operators for assignment and equality. I would personally call it ambiguous if it had only *one* operator for both operations, which it does not.

Sure you can always prefer other combinations, such as ':=' for assignment and '=' for equality, but that doesn't fundamentally change anything regarding ambiguity. And as others have said, pick another language in this case.

You may also not like the fact that an assignment expression is a value, and maybe that's part of what makes it all feel ambiguous? (Sure this is the reason 'if (a = b)' can lead to programming errors, although those are now caught relatively easily by compilers and should give you a warning.) Well, other languages have that characteristic too.

Or maybe you should have a look at some languages which not only have the '=' and '==' operators, but also '===' and combinations thereof. Does that look even better? :-DD
Title: Re: New C23 working draft
Post by: magic on June 17, 2022, 07:27:50 am
This problem doesn't exist in Microsoft QBasic, Visual BASIC, etc; both operations are = and ambiguity is resolved by assignment not being permitted in expression context and an expression alone (other than function calls) not being a statement.

Not much of value is lost that way, you don't see that sort of C code anyway:
Code: [Select]
foo(a,b) == c+d;
a | c->bar(x);

Only chain assignments are not possible.
Title: Re: New C23 working draft
Post by: brucehoult on June 17, 2022, 10:27:31 am
This problem doesn't exist in Microsoft QBasic, Visual BASIC, etc; both operations are = and ambiguity is resolved by assignment not being permitted in expression context and an expression alone (other than function calls) not being a statement.

Not much of value is lost that way, you don't see that sort of C code anyway:

I disagree. When I moved from BCPL to C I *really* missed the ability to put a block with a series of statements/loops/local variables etc anywhere a subexpression can go.
Title: Re: New C23 working draft
Post by: magic on June 17, 2022, 01:20:39 pm
That's a completely different issue, and not really incompatible with C.

Indeed, in C++, you can do just that by defining and immediately calling an anonymous function, although the syntax is more awkward than it needs to be.

Code: [Select]
int main() {
        return []() {return 123;} ();
}
Title: Re: New C23 working draft
Post by: DavidAlfa on June 17, 2022, 02:05:31 pm
What's the benefit of using 2's complement? Just makes everything harder:

Decimal   8-bit 2's notation
−42      1101 0110
  42     0010 1010

I doubt many hardware supports that. The MSB bit as sign indicator is way more simple.
Can't understand the logic here, or I got completely lost.
Title: Re: New C23 working draft
Post by: magic on June 17, 2022, 02:19:07 pm
Simple parts of the hardware are made even simpler: signed addition and subtraction are exactly the same as unsigned, no hardware duplication, fewer instructions. OTOH, I suppose multiplication gets harder, but it is hard anyway.

Inverting the sign is easy if you have already implemented subtraction.
Title: Re: New C23 working draft
Post by: Siwastaja on June 17, 2022, 02:35:38 pm
how hard is this to implement ?

For someone who suffers from severe case of Dunning-Kruger's, problems seem much easier than they actually are.

Trust me, C is fine. It's just not for you. But professionals need professional tools, not sugar coated shiny toys or training wheels. Professionals are also totally fine specifying if they mean assignment or comparison, it carries zero mental load; the opposite would be a total disaster.
Title: Re: New C23 working draft
Post by: Siwastaja on June 17, 2022, 02:39:45 pm
What's the benefit of using 2's complement? Just makes everything harder:

Decimal   8-bit 2's notation
−42      1101 0110
  42     0010 1010

I doubt many hardware supports that. The MSB bit as sign indicator is way more simple.
Can't understand the logic here, or I got completely lost.

Are you joking or serious? All hardware for the last 30-40 years or so exclusively use 2's complement. That's exactly why the C standard now allows you to assume this is always the case. Earlier, everyone assumed that anyway because sign-and-magnitude hardware has always been extremely rare and esoteric.

The reason is simplicity of hardware: same old adder logic can just add/subtract numbers without special handling of the sign.

For the very same reason, it's kind of easy for human, too: the 2's complement number system forms a full circle, where wrapping occurs from most positive to most negative number. But doing direct conversions from negative decimal to 2's complement binary indeed requires some extra steps which may seem confusing for a human at first. I have utilized this property to deal with actual physical angles which also wrap around mechanically: if the full numerical range equals 360 degrees, you can just interpret the number as unsigned (0 - 360 deg) or signed (-180 - +180 deg). The latter interpretation only works if you use 2's complement: there is only one instance of zero, and numbers grow in the same direction. With sign-and-magnitude, you have two zeroes and around the zero, the direction is swapped! How painful; the only thing that seems a tad easier is manual human conversion of arbitrary negative numbers.
Title: Re: New C23 working draft
Post by: TheCalligrapher on June 17, 2022, 02:50:04 pm
That's a completely different issue, and not really incompatible with C.

Indeed, in C++, you can do just that by defining and immediately calling an anonymous function, although the syntax is more awkward than it needs to be.

Code: [Select]
int main() {
        return []() {return 123;} ();
}

It is awkward becuse you cluttered it with a superfluous pair of `()`. When lambda has no parameters the `()` can be omitted

Code: [Select]
int main() 
{
  return []{return 123;}();
}

Done. All awkwardness gone without a trace :)

P.S. On a serious note, the fact that it is permissible to drop the whole `()` in what is in essense a function definition is immensely awkward by itself.
Title: Re: New C23 working draft
Post by: SiliconWizard on June 17, 2022, 06:09:20 pm
What's the benefit of using 2's complement? Just makes everything harder:

Decimal   8-bit 2's notation
−42      1101 0110
  42     0010 1010

I doubt many hardware supports that. The MSB bit as sign indicator is way more simple.
Can't understand the logic here, or I got completely lost.

Are you joking or serious? All hardware for the last 30-40 years or so exclusively use 2's complement. That's exactly why the C standard now allows you to assume this is always the case. Earlier, everyone assumed that anyway because sign-and-magnitude hardware has always been extremely rare and esoteric.(...)

I was just wondering the same. :popcorn:

The obvious benefit of 2's complement is that you can use a simple adder to add two signed integers. The exact same adder as for unsigned operations.
You'd assume that this was obvious for most people these days. Apparently not.

Title: Re: New C23 working draft
Post by: DavidAlfa on June 17, 2022, 09:12:18 pm
Yeah I realized the hardware part shortly after...
But the representation is weird from the human side.
Also I expected most of the ALUs to natively handle "classic" representation after 40 years lol.
That's just weird to make such changes now.
Title: Re: New C23 working draft
Post by: magic on June 17, 2022, 10:17:25 pm
It is awkward becuse you cluttered it with a superfluous pair of `()`. When lambda has no parameters the `()` can be omitted
LOL, I didn't know about it. But it's still a few more characters than strictly necessary to implement code blocks as expressions.

Done. All awkwardness gone without a trace :)
Just wait till the "block expression" needs to access some of the local variables of the containing function :popcorn:
Title: Re: New C23 working draft
Post by: Nominal Animal on June 17, 2022, 10:43:39 pm
But the representation is weird from the human side.
Is it?

I think it is just the name that veers people into thinking it is unintuitive or weird.  Mathematically, if you have \$N\$ bits, \$a_0\$ through \$a_{N-1}\$, they correspond to a signed integer value \$v\$,
$$v = -a_{N-1} 2^{N-1} + \sum_{i=0}^{N-2} a_i 2^i$$
with negative values in two's complement form.  We call bit \$a_{N-1}\$ the sign bit, but mathematically it corresponds to value \$-2^{N-1}\$.  All the rest of the bits have positive value (\$a_i\$ corresponding to positive value \$2^i\$).

I find this form rather easy to grasp intuitively.  All you need to do is consider the most significant bit to have negative value if the bit pattern is a signed integer with two's complement form, and positive value if the bit pattern is an unsigned integer; then just sum the values corresponding to the bits set in the pattern to obtain the numerical value.  For the inverse (numerical value to bit pattern), do the most significant bit first.
Title: Re: New C23 working draft
Post by: brucehoult on June 18, 2022, 12:26:59 am
What's the benefit of using 2's complement? Just makes everything harder:

Decimal   8-bit 2's notation
−42      1101 0110
  42     0010 1010

Code: [Select]
 42  0010 1010

-42 1101 0101  1's complement. This at least is somewhat easy to work with.
-42 1101 0110  add 1 to get 2's complement. Best Add, subtract, and multiply are the same for signed and unsigned

-42 1010 1010  sign-magnitude. ugh. Far more complex to make an ALU.

Quote
I doubt many hardware supports that. The MSB bit as sign indicator is way more simple.

Basically all integer hardware since 1970ish.
Title: Re: New C23 working draft
Post by: brucehoult on June 18, 2022, 12:29:19 am
Simple parts of the hardware are made even simpler: signed addition and subtraction are exactly the same as unsigned, no hardware duplication, fewer instructions. OTOH, I suppose multiplication gets harder, but it is hard anyway.

In an NxN multiplication, the lower N bits of the result are the same no matter whether N is 2-s complement or unsigned.  Only a full 2N bit result needs different versions.
Title: Re: New C23 working draft
Post by: newbrain on June 18, 2022, 09:10:50 am
Basically all integer hardware since 1970ish.
So much so that on the 6502 one needed to explicitly set the carry flag before subtractions (and clear it before addition, but that's normal).

I'm quite convinced  that addition was implemented as:
   Accumulator + operand + Carry bit
(as discussed in other posts, this works beautifully for signed and unsigned maths), and subtraction in the same way, simply flipping the bits of the operand:
   Accumulator + ~operand + Carry bit
The ~operand + Carry(1) is simply -operand.
In this way an extra adder to perform the negation, or a different circuit to perform subtraction, was spared.
Title: Re: New C23 working draft
Post by: Siwastaja on June 18, 2022, 10:16:45 am
But the representation is weird from the human side.
Is it?

I think it is just the name that veers people into thinking it is unintuitive or weird.  Mathematically, if you have \$N\$ bits, \$a_0\$ through \$a_{N-1}\$, they correspond to a signed integer value \$v\$,
$$v = -a_{N-1} 2^{N-1} + \sum_{i=0}^{N-2} a_i 2^i$$
with negative values in two's complement form.  We call bit \$a_{N-1}\$ the sign bit, but mathematically it corresponds to value \$-2^{N-1}\$.  All the rest of the bits have positive value (\$a_i\$ corresponding to positive value \$2^i\$).

I find this form rather easy to grasp intuitively.  All you need to do is consider the most significant bit to have negative value if the bit pattern is a signed integer with two's complement form, and positive value if the bit pattern is an unsigned integer; then just sum the values corresponding to the bits set in the pattern to obtain the numerical value.  For the inverse (numerical value to bit pattern), do the most significant bit first.

In other words, for those who struggle with mathematical notation, if you plot unsigned representation on X axis, and actual value (how big the number is: -42 is bigger than -43) of the number on Y axis, sign and magnitude system creates a weird triangle where the value first goes down, then stays at the same value for two steps, and then goes up. Two's complement number system is just a straight line, which wraps around in the middle, creating a sawtooth-like curve. But each step is of the same size and direction.

I find nothing intuitive about sign-and-magnitude: we humans kind-of use something like it, but struggle with it in everyday life, at least in Finnish we talk about negative temperatures with a specific term, and use terms like increase/decrease and get confused whether the actual temperature is going up or down. I would dare to say a significant % of people, in the lower end of IQ bell curve, struggle with the whole concept of negative numbers, and trying to put them in order (ask "which is bigger, -42 or -43", and you won't get 100% the same answer). But if we have systems where negative numbers are not needed (i.e., with enough offset), this is easier. For example, with Fahrenheit, you need negative numbers much more rarely than positive numbers.

Two's complement is basically nothing else than a fixed offset with wrapping. You just don't have a separate concept of "negative numbers" needing special/manual handling. This is why it's easier for machine, and it would be easier for human, too, if we started from scratch without traditional package we carry.
Title: Re: New C23 working draft
Post by: brucehoult on June 18, 2022, 10:35:05 am
Basically all integer hardware since 1970ish.
So much so that on the 6502 one needed to explicitly set the carry flag before subtractions (and clear it before addition, but that's normal).

I'm quite convinced  that addition was implemented as:
   Accumulator + operand + Carry bit
(as discussed in other posts, this works beautifully for signed and unsigned maths), and subtraction in the same way, simply flipping the bits of the operand:
   Accumulator + ~operand + Carry bit
The ~operand + Carry(1) is simply -operand.
In this way an extra adder to perform the negation, or a different circuit to perform subtraction, was spared.

There's nothing to be convinced of, that's exactly how it works.

That's how it works on everything -- just many CPUs have an XOR on the path between the carry flag and the adder, and between the carry out and the flags registers, to invert the carry if it's a subtract.

But the 6502 is not the only ISA where you SEC before a subtraction. ARM is the same. It's just that on ARM there are both SUB and SBC instructions so you don't see it unless you specifically look at the carry flag at the end of a multi-precision operation. If you subtracted the LSBs with SBC instead of SUB then you'd have to set the carry flag at the start, same as 6502.

Other ISAs that are like the 6502 and ARM include:

- MSP430

- PA-RISC

- PIC

- Power/PowerPC

- System/360
Title: Re: New C23 working draft
Post by: tellurium on July 11, 2022, 12:32:09 am
What I personally missed in libc, is interfacing with *printf* family of functions. FILE is an opaque object, there is no standard API to make a custom FILE. I'd prefer to have a lower level printf API like this:

Code: [Select]
int xprintf(void (*x)(char, void *), void *param, const char *fmt, ...);
int vxprintf(void (*x)(char, void *), void *param, const char *fmt, va_list ap);

Where x(char, void *param) is a generic putchar-like function that outputs a given character, void *param is a user-specified pointer.

Also, printf specifiers for hex, base64, and json-string-escape formatting would be nice. Or maybe some way of specifying custom format specifier, like '%J' that expects a custom formatting function pointer.