Author Topic: An 'interesting' thing you can do in C++ (Read 18775 times)

nfmax · « **Reply #75 on:** September 12, 2018, 05:02:25 pm »

Ah yes, the aliasing problem. Interestingly, with default switches this code doesn't generate a warning. On my system, the assignment occurs after the post-increment, thereby clobbering it.

Code: [Select]

byrd:ctest max$ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

I think we can add the pre/post increment/decrement operators to the Dodgy C Constructs list! You can always write:

Code: [Select]

array[a] = array[b];
array[b] = array[b] + 1;

Which is unambiguous

newbrain · « **Reply #76 on:** September 12, 2018, 05:38:45 pm »

Quote from: NorthGuy on September 12, 2018, 04:17:28 pm

Quote from: newbrain on September 12, 2018, 02:19:42 pm
Moreover, note 8 of 5.1.1.3 clearly states that implementations can successfully translate an invalid program.

I would simply call it wrong.

Compiler may not know that the program is invalid. For example:
[...]

Of course!

Quote from: nfmax on September 12, 2018, 05:02:25 pm

I think we can add the pre/post increment/decrement operators to the Dodgy C Constructs list!

I don't see anything dodgy in those operators, following the line of thought "It's dodgy as it can result in undefined behaviour" one could also include all the arithmetic operators in the category.

As NorthGuy correctly described, UB is in most cases not something that resulted from sloppy thinking but enabled the compilers and the resulting code not to be burdened by excessive runtime checks, trading safety for efficiency.
Fortran, for example, has a different set of rules for aliasing, allowing an even higher degree of shortcuts and optimizations (and biting my ass more than once in faraway times

).

If one wants thoughtless safety, other languages are more suitable.

RoGeorge · « **Reply #77 on:** September 12, 2018, 05:45:44 pm »

Quote from: NorthGuy on September 12, 2018, 04:17:28 pm

Compiler may not know that the program is invalid. For example:

Code: [Select]
x[a] = x[b]++;
This code is invalid (and result is undefined) only if a == b.

If a = b, then the value of x[a] will remain unchanged.
The rule I know is that the right side of an assignment is evaluated first, then the left side, then the assignment is made.

x[a] = x[a]++ is the same as n = n++. Let's say n=10.
- First we read the value (10) stored at address n, in order to further use it in the calculation of the right side expression. Other said, we make a copy of n.
- Then, we increment the content of address n. Now n will contain the value 11
- Then we continue the evaluation of the right side expression (in our case, nothing to do), so the right side result is 10. In memory, at address n is stored 11.
- Now, it's time to evaluate the left hand side of the assignment (For n=n++, there is nothing to evaluate in the left side. For x[y]=x[z]++, in the left side we will evaluate &x+y*sizeoff(x) )
- Then, the assignment: Our n, which is now 11, will receive the value calculated in the right hand side, which is 10.

So, we will end up with an unchanged value. No undefined situation. What am I missing?
Isn't the right side of an assignment always evaluated first?

dmills · « **Reply #78 on:** September 12, 2018, 05:59:04 pm »

That is NOT the rule in C!

The problem comes from the fact that all side effects are only guaranteed to be evaluated before the next sequence point, and an assignment (=) is NOT a sequence point!

Consider the canonically broken i = i++;
There is only ONE sequence point, the semi colon.

i will be assigned to a temporary.
i will be incremented sometime before that sequence point.
the temporary will be assigned to i sometime before that sequence point, but there is no guarantee about which order these last two operations will occur in.

Here be Nasal Demons.

Regards, Dan.

glarsson · « **Reply #79 on:** September 12, 2018, 06:00:37 pm »

Quote from: RoGeorge on September 12, 2018, 05:45:44 pm

No undefined situation. What am I missing?
Isn't the right side of an assignment always evaluated first?

You are missing what is written in the C standard. Left vs. Right does not matter. The simple rule is that you may not modify s variable more than once between sequence points. The assignment is not a sequence point; the next one is at the semicolon.

For a=a++; the compiler is allowed to Instruct the processor to execute a=a and a++ at the same time for efficiency. In this case this creates a race condition or nasal daemons.

GeorgeOfTheJungle · « **Reply #80 on:** September 12, 2018, 06:04:00 pm »

Yep, that makes sense IMO, and in the same vein (1st evaluate the right side, then the left side)

Code: [Select]

#include <stdio.h>

int main (int argc, char *argv[]) {
    int a=  0, b = 0;
    a+= ++a;
    b+= b++;
    printf("%i,%i\n",a,b);
    return 0;
}

Gives 2,1. At least with my compiler!

Then there's whatever the C Standard says, but out of comp.lang.c and comp.unix.programmer, who reads that?

ralphrmartin · « **Reply #81 on:** September 12, 2018, 06:19:03 pm »

You could do this in Algol68, as in 1968...

glarsson · « **Reply #82 on:** September 12, 2018, 06:21:04 pm »

Quote from: GeorgeOfTheJungle on September 12, 2018, 06:04:00 pm

Gives 2,1. At least with my compiler!

That's stupid.

You might get another result in the next program because the compiler is allowed to generate different code depending on circumstances, e.g. if you add more complex code near the ub-invoking code, the compiler might run out of registers and generate different code.

newbrain · « **Reply #83 on:** September 12, 2018, 06:24:32 pm »

Quote from: GeorgeOfTheJungle on September 12, 2018, 06:04:00 pm

Then there's whatever the C Standard says, but out of comp.lang.c and comp.unix.programmer, who reads that?

Who wants working code.

The very case of a[ i] = a[i++] changed behaviour in a recent(ish) update of gcc, at the same level of optimization.
I still remember my friend complaining loudly that "the new gcc in Ubuntu is broken"...

Edit: [ i ] eaten by ~~dog~~ forum SW

GeorgeOfTheJungle · « **Reply #84 on:** September 12, 2018, 06:26:30 pm »

Quote from: glarsson on September 12, 2018, 06:21:04 pm

Quote from: GeorgeOfTheJungle on September 12, 2018, 06:04:00 pm
Gives 2,1. At least with my compiler!
That's stupid.

$gcc --version
Copyright (C) 2007 Free Software Foundation, Inc.

Go tell them!

newbrain · « **Reply #85 on:** September 12, 2018, 06:28:00 pm »

Quote from: GeorgeOfTheJungle on September 12, 2018, 06:26:30 pm

Quote from: glarsson on September 12, 2018, 06:21:04 pm
Quote from: GeorgeOfTheJungle on September 12, 2018, 06:04:00 pm
Gives 2,1. At least with my compiler!
That's stupid.

$gcc --version
Copyright (C) 2007 Free Software Foundation, Inc.

Go tell them!

Obvious troll is obvious.

glarsson · « **Reply #86 on:** September 12, 2018, 06:41:29 pm »

Quote from: GeorgeOfTheJungle on September 12, 2018, 06:26:30 pm

Go tell them!

No. I didn't say the compiler was stupid. It is allowed to give that result. I meant that it is stupid to try to figure out how a compiler handles undefined behavior. That information is useless as you shouldn't use it/depend on it.

GeorgeOfTheJungle · « **Reply #87 on:** September 12, 2018, 06:52:37 pm »

But the C standard != gcc documentation. Perhaps it's un-undefined somewhere there (in gcc). My gcc 4.2 also lets me use an rvalue as lvalue (the OP code runs fine), and that's not in the standard, so... what happens when UB (by the std) becomes defined somewhere else (e.g. gcc)? Can one use that or it's a sin?

Edit: Not this case.

$ gcc -O3 -Wsequence-point kk.c
kk.c: In function ‘main’:
kk.c:5: warning: operation on ‘a’ may be undefined
kk.c:6: warning: operation on ‘b’ may be undefined

RoGeorge · « **Reply #88 on:** September 12, 2018, 06:55:13 pm »

Now I noticed that the title says C++, and I was thinking about C all the time. I don't know about C++. My bad, sorry.

Please let me ask again: I always thought that in C (not C++), the postfix ++ is an atomic read-modify-write operation. Being atomic will also imply to calculate the increment on the spot, the ++ can not be postponed until the next ";". Example:
x=10;
y=x++ + x++;
will end IMO with x is 12, and y is 21 (NOT 20).

Is this correct for C?
Is this undefined for C++?

newbrain · « **Reply #89 on:** September 12, 2018, 07:04:15 pm »

Quote from: RoGeorge on September 12, 2018, 06:55:13 pm

Now I noticed that the title says C++, and I was thinking about C all the time. I don't know about C++. My bad, sorry.

Please let me ask again: I always thought that in C (not C++), the postfix ++ is an atomic read-modify-write operation. Being atomic will also imply to calculate the increment on the spot, the ++ can not be postponed until the next ";". Example:
x=10;
y=x++ + x++;
will end IMO with x is 12, and y is 21 (NOT 20).

Is this correct for C?
Is this undefined for C++?

No, it's not atomic. Its only guarantee is that the side effect (increment) will happen as if it was carried out before the next sequence point.
If the incremented variable is volatile, we have the guarantee that it actually happens before the next sequence point.
(But I should check the fine print).

So, it's never correct, in neither C or C++.

BTW, this is really, really, the most common (and asked about) case of UB, and it's clearly spelled out and described in the C standard as such (6.5, J.2).

Maybe GeorgeOfTheJungle is right, and I'm the only one who enjoys reading the standard (and the rationale! that was really a good read!)

Siwastaja · « **Reply #90 on:** September 12, 2018, 07:21:44 pm »

Even if you didn't want to read the standard, these particular Undefined Behavior (as well as aliasing rules) examples are discussed to death

everywhere on the 'net; Google easily gives you answers, rationale, and discussion.

C is an interesting language because it's quite braindead..ish, and the standard more or less sucks; but it's not catastrophic enough to be unusable, and most "elegant" "replacements" end up being more fundamentally problematic (or just don't gain traction); so C remains surprisingly usable decade after decade, in unforeseeable future; and I don't think it's completely due to legacy code. A lot of completely new projects, even from young, new developers, spawn in C all the time. (As a side note, I almost exclusively write in C, as well.)

Same is true with C++, except with 100x more bloat, 100x more uncertainty and 100x more issues - and even less "elegance". Unlike C, where the horror show has stalled to a stable state, in C++, the horrors continue developing to the Next Levels all the time, directed by a very productive committee. Somehow, it still remains both usable and popular as well. This is almost magical, IMHO.

GeorgeOfTheJungle · « **Reply #91 on:** September 12, 2018, 07:39:07 pm »

http://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/Warning-Options.html

Quote from: Free Software Foundation

-Wsequence-point
Warn about code that may have undefined semantics because of violations of sequence point rules in the C standard.
The C standard defines the order in which expressions in a C program are evaluated in terms of sequence points, which represent a partial ordering between the execution of parts of the program: those executed before the sequence point, and those executed after it. These occur after the evaluation of a full expression (one which is not part of a larger expression), after the evaluation of the first operand of a &&, ||, ? : or , (comma) operator, before a function is called (but after the evaluation of its arguments and the expression denoting the called function), and in certain other places. Other than as expressed by the sequence point rules, the order of evaluation of subexpressions of an expression is not specified. All these rules describe only a partial order rather than a total order, since, for example, if two functions are called within one expression with no sequence point between them, the order in which the functions are called is not specified. However, the standards committee have ruled that function calls do not overlap.

It is not specified when between sequence points modifications to the values of objects take effect. Programs whose behavior depends on this have undefined behavior; the C standard specifies that "Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.". If a program breaks these rules, the results on any particular implementation are entirely unpredictable.

Examples of code with undefined behavior are a = a++;, a[n] = b[n++] and a[i++] = i;. Some more complicated cases are not diagnosed by this option, and it may give an occasional false positive result, but in general it has been found fairly effective at detecting this sort of problem in programs.

The present implementation of this option only works for C programs. A future implementation may also work for C++ programs.

The C standard is worded confusingly, therefore there is some debate over the precise meaning of the sequence point rules in subtle cases. Links to discussions of the problem, including proposed formal definitions, may be found on our readings page, at http://gcc.gnu.org/readings.html.

glarsson · « **Reply #92 on:** September 12, 2018, 07:44:31 pm »

C is not braindead unless you have the wrong expectations. It was designed to write operating systems, device drivers etc. It will live on for a long time as it is very hard to design a competing language that is sufficiently better.

In my view C is like a small sports car with manual gearbox, no ABS, no airbags, no traction control etc. If you can handle the challenge you will drive very fast on the twisty roads, but it will not be a SUV with all the latest driver aids. If you need that kind of support you need a different language, e.g. Ada. :-)

GeorgeOfTheJungle · « **Reply #93 on:** September 12, 2018, 07:48:31 pm »

Quote from: newbrain on September 12, 2018, 07:04:15 pm

Maybe GeorgeOfTheJungle is right, and I'm the only one who enjoys reading the standard (and the rationale! that was really a good read!)

I have never read it. In my defence, there was no such thing when I started (1986) programming in C (only the C book by Kernighan). But I've had to read the EcmaScript one more than once, and that's a brick.

glarsson · « **Reply #94 on:** September 12, 2018, 07:55:36 pm »

I have read the C standard. I bought a copy from ANSI. Sometimes I think I'm the only one that have read it...

newbrain · « **Reply #95 on:** September 12, 2018, 08:04:26 pm »

Quote from: glarsson on September 12, 2018, 07:55:36 pm

I have read the C standard. I bought a copy from ANSI. Sometimes I think I'm the only one that have read it...

Ant that makes two!
Sweden - Rest of the world: 2 - 0

GeorgeOfTheJungle · « **Reply #96 on:** September 12, 2018, 08:17:29 pm »

Wow, this is even cool, look:

Quote

If two settings of the same object cannot be proven to be disjoint in time, the evaluation is considered undefined. Example:
(x = 1) * (x = 2)

(from http://www.open-std.org/jtc1/sc22/wg14/www/docs/n927.htm)

Code: [Select]

#include <stdio.h>

int main () {
    int a, b;
    b= (a = 1) * (a = 2);
    printf("%i,%i\n", a, b);
    return 0;
}

$ gcc -O0 -Wsequence-point kk.c -o kk.out
kk.c: In function ‘main’:
kk.c:5: warning: operation on ‘a’ may be undefined
$ ./kk.out
2,4

With a well chosen bunch of these ~WTFs someone could write a winner for the obfuscated C contest!

ejeffrey · « **Reply #97 on:** September 12, 2018, 08:29:26 pm »

Quote from: GeorgeOfTheJungle on September 12, 2018, 06:52:37 pm

$ gcc -O3 -Wsequence-point kk.c
kk.c: In function ‘main’:
kk.c:5: warning: operation on ‘a’ may be undefined
kk.c:6: warning: operation on ‘b’ may be undefined

Note that you can't rely on this warning. The compiler will try to help if you ask but it can't identify all undefined behavior.

Quote

so... what happens when UB (by the std) becomes defined somewhere else (e.g. gcc)

gcc doesn't define the behavior. When you write code with potentially undefined behavior gcc will pick some concrete implementation that is correct if no undefined behavior actually happens. It makes no promises to do the same thing every time and in every circumstance. That's the devil of undefined behavior: it might work the way you expect most of the time but can change based on unpredictable circumstances.

The fundamental issue here is that there are a lot of machine optimizations that are non-obvious, especially on modern superscalar processors. If you want the best performance you have to give the compiler some freedom to rearrange things without changing the meaning of the code. Its hard to do this without creating some undefined behavior.

The real problem is with aliasing. a = a++ is undefined by the standard but would be easy for the compiler to detect and do something sensible. The problem is when you have a = b++ where 'a' and 'b' are expressions that might or might not alias to the same variable. The C standard says that the compiler has to assume that any two expressions of the same type might be aliases (unless it can prove otherwise) but that only blocks reordering across sequence points. Between sequence points it is your job to make sure that you don't violate the rules.

RoGeorge · « **Reply #98 on:** September 12, 2018, 08:29:38 pm »

I'm not a programmer, so I go to standards only in very rare occasions, when something does not work how I expected. So far, my thumb rules kept me safe, especially because I don't do stunts, yet I was so wrong. Thank you all for helping me clarify this.

Volatile or not, it does not work how I supposed.
I just tested now, and to my surprise if a=10, then any of the following

b=a + a++
b=a++ + a
b=a++ + a++

returns b as 21, no matter if a is volatile or not, which is quite a surprise for me. Thanks again!

Especially the first one, seems straight broken. I would have swear it should return 20, not 21.

newbrain · « **Reply #99 on:** September 12, 2018, 09:42:20 pm »

Quote from: RoGeorge on September 12, 2018, 08:29:38 pm

I'm not a programmer, so I go to standards only in very rare occasions, when something does not work how I expected. So far, my thumb rules kept me safe, especially because I don't do stunts, yet I was so wrong. Thank you all for helping me clarify this.

Volatile or not, it does not work how I supposed.
I just tested now, and to my surprise if a=10, then any of the following
b=a + a++
b=a++ + a
b=a++ + a++
returns b as 21, no matter if a is volatile or not, which is quite a surprise for me. Thanks again!

Especially the first one, seems straight broken. I would have swear it should return 20, not 21.

Sorry for bringing up the volatile qualifier, is a bit of a red herring in this case.

What I poorly tried to explain is that in x++ the actual increment might happen in any moment, even after the semicolon sequence point, as long as the result of the program is the same (when I check x sometimes after this expression, I find it incremented).
If x is volatile, this side effect is guaranteed to happen before the sequence point.
From "inside" the program, in this case, the volatile qualifier makes absolutely no difference.
See also the last paragraph of ejeffrey's post.

I don't see why you would expect the first case to be 20, there is no strictly defined right to left or left to right evaluation order in C (APL, e.g., is strictly right to left, forth left to right).
And no, using parenthesis to force things is useless.

The most important concepts here are:

Side effects:
Things that happen to your variables, such an assignment or an increment, also a read if the object is volatile.
Sequence points:
They represent the anchor points that define the ordering of the side effects.
Semicolons, but also the ? in the conditional operator, the comma operator, && and ||, function calls.

Clause 2 in 5.1.2.3:

Quote

At certain specified points in the execution sequence called sequence points, all side effects
of previous evaluations shall be complete and no side effects of subsequent evaluations
shall have taken place.

Note that between two sequence points the standard does not impose any ordering requirement.

Don't do stunts (such as the ones you posted), and you'll be mostly safe.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: An 'interesting' thing you can do in C++ (Read 18775 times)

Share me