Author Topic: Previously unknown 'C' behavior (to me).  (Read 18576 times)

0 Members and 1 Guest are viewing this topic.

Offline hamster_nzTopic starter

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Previously unknown 'C' behavior (to me).
« on: April 22, 2017, 10:37:17 am »
Had to stomp on an interesting bug today. It seems:

  a = a << (b+c);

is not equivalent to :

  a = (a << b) << c;
 
Or at least for two compilers I tested. Here's the test case:

Code: [Select]
#include <stdio.h>

int main(int argc, char *argv[])
{
  /* int is 32 bits */
  int a,b;
  int n0 = 24, n1 = 8;

  a = -1;
  a = a << (n0+n1);

  b = -1;
  b = (b << n0) << n1;

  printf("%08X should equal %08X\n", a, b);
}

And for GCC, depending if I turn optimizations on I get a different answers for the same code:

Code: [Select]
$ gcc -o check check.c -Wall -pedantic
$ ./check
FFFFFFFF should equal 00000000
$ gcc -o check check.c -Wall -pedantic -O4
$ ./check
00000000 should equal 00000000
$
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19829
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Previously unknown 'C' behavior (to me).
« Reply #1 on: April 22, 2017, 11:11:03 am »
There are many many many such "curious cases" in C/C++, often related to overflow/underflow and shifting. You really have to be not far off a language lawyer to predict them all, and then to be able to do accurate "numerical analysis" to verify they cannot occur in each location you use them. Both those skills are demonstrably in short supply, and they are unlikely to occur in someone who knows the problem domain and is trying to use C/C++ as a tool to solve their problem.

Some consider that if you don't know the many C/C++ "gotchas", then it is your fault for not understanding your tool. Others consider that it indicates the tool itself has problems.

One way to minimise (not completely avoid) such surprises are to use a very constrained subset, e.g. MISRA-C. Another way is to choose a different tool which doesn't have the problems.

For more amusing C/C++ topics, see http://yosefk.com/c++fqa/
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline Jeroen3

  • Super Contributor
  • ***
  • Posts: 4081
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: Previously unknown 'C' behavior (to me).
« Reply #2 on: April 22, 2017, 11:17:44 am »
You're shifting a 31 bit value with 32. It should warn you about that.
 

Offline hamster_nzTopic starter

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Previously unknown 'C' behavior (to me).
« Reply #3 on: April 22, 2017, 11:31:48 am »
In this case it is exposing the Intel CPU's behavior - when shifting 32-bit types only the lowest 5 bits of the shift count are used (except on the 8086).

The case where it turned up was a bit obscure, but not _that_ obscure - I needed to quickly generate a varying length bit mask that covers between 1 and 32 bits, depending on phase alignment of a signal.  Most interesting is how the behavior changes depending on optimizer settings. Just glad I found it early....
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: Brutte

Offline hamster_nzTopic starter

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Previously unknown 'C' behavior (to me).
« Reply #4 on: April 22, 2017, 11:50:21 am »
You're shifting a 31 bit value with 32. It should warn you about that.
If the shift value is a constant value greater than thirty one (e.g. "z = x<<32;") then GCC will give a warning.

It it is a variable with a constant value it won't warn:
Code: [Select]
$ cat check.c
#include <stdio.h>

int main(int argc, char *argv[])
{
  int a;
  int n0 = 32;
  a = 1;
  a = a << n0;
  printf("%08X\n",a);
}
$ gcc -o check check.c -Wall -pedantic -O4
$

... unless the variable is 'const':

Code: [Select]

$ cat check.c
#include <stdio.h>

int main(int argc, char *argv[])
{
  int a;
  const int n0 = 32;
  a = 1;
  a = a << n0;
  printf("%08X\n",a);
}
$ gcc -o check check.c -Wall -pedantic -O4
check.c: In function ‘main’:
check.c:8:9: warning: left shift count >= width of type
   a = a << n0;
         ^
$

Well, you learn something every day :-)
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline mib

  • Contributor
  • Posts: 14
Re: Previously unknown 'C' behavior (to me).
« Reply #5 on: April 22, 2017, 11:55:44 am »
Shifting by more than the bit size of the operand is undefined. Left shift of a negative number is undefined. You're doing both, so shouldn't be a surprise that you get odd results.

Never shift signed integers unless you know exactly what you're doing and always explicitly limit calculated shifts to the bit size of the operand. Otherwise 'weird shit' will happen.
 
The following users thanked this post: Rasz, Frank, Richard Crowley, newbrain, JPortici

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4108
  • Country: nz
Re: Previously unknown 'C' behavior (to me).
« Reply #6 on: April 22, 2017, 02:29:43 pm »
Shifting by more than the bit size of the operand is undefined.

Yes.

Some CPUs will give you the least significant bits of the infinitely precise result. Others (most now) will only use the lower bits of the shift count.

On a 32 bit machine, a<<b is likely to be a<<(b%32). If you write it like that yourself then you can be absolutely sure what answer you will get, and on modern x86 and ARM the % will be optomized away.

So:

a << 32 -> a << (32%32)  ->  a << 0  ->  a

(a << 24) << 8   ->  (a << (24%32)) << (8%32)  -> 0

Quote
Left shift of a negative number is undefined.

True in theory, but only because you just might be on a sign/magnitude machine. At least on unoptimized code. If you run the optimizer and the compiler proves that you're shifting a negative number then it can go "oh ho ho! Undefined behaviour! Nasal demons for you...".

Best to only shift variables declared as unsigned, and include the % yourself, just to be sure what you'll get.
 
The following users thanked this post: GeorgeOfTheJungle

Offline Jope

  • Regular Contributor
  • *
  • Posts: 110
  • Country: de
Re: Previously unknown 'C' behavior (to me).
« Reply #7 on: April 22, 2017, 02:59:34 pm »
When it doubt, whip it out. That is, the C standard. You can download it here: C99.

Page 85:
"The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with
zeros. If E1 has an unsigned type, the value of the result is E1 x 2^E2, reduced modulo
one more than the maximum value representable in the result type. If E1 has a signed
type and nonnegative value, and E1 x 2^E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined."

 

Offline Kalvin

  • Super Contributor
  • ***
  • Posts: 2145
  • Country: fi
  • Embedded SW/HW.
Re: Previously unknown 'C' behavior (to me).
« Reply #8 on: April 22, 2017, 03:09:50 pm »
Never shift signed integers unless you know exactly what you're doing and always explicitly limit calculated shifts to the bit size of the operand. Otherwise 'weird shit' will happen.

Oops, you missed the "f" ;)
 
The following users thanked this post: hamster_nz, Ian.M

Online Kjelt

  • Super Contributor
  • ***
  • Posts: 6489
  • Country: nl
Re: Previously unknown 'C' behavior (to me).
« Reply #9 on: April 22, 2017, 03:24:08 pm »
I always try to prevent multiple operations on a single line of code.
If not for readability than for these kind of cases.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19829
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Previously unknown 'C' behavior (to me).
« Reply #10 on: April 22, 2017, 03:47:00 pm »
If you run the optimizer and the compiler proves that you're shifting a negative number then it can go "oh ho ho! Undefined behaviour! Nasal demons for you...".

If only that was the worst it could do! If unlucky, then your machine could appear to be "brickerbotted" and/or your company be destroyed :)

Before touching a keyboard to invoke a C/C++ compiler, best to read, learn and inwardly digest - and then apply - things like https://www.schneier.com/blog/archives/2017/04/new_c_secure_co.html
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline Kalvin

  • Super Contributor
  • ***
  • Posts: 2145
  • Country: fi
  • Embedded SW/HW.
Re: Previously unknown 'C' behavior (to me).
« Reply #11 on: April 22, 2017, 04:24:01 pm »
As the C/C++ languages are broken and unsafe by design (meaning they do not fail on invalid operation, but instead continue execution with the invalid result) you should always perform the checks by yourself:

Code: [Select]
const int n_left_shifts = number_of_left_shifts_you_want;
const int max_allowed_left_shifts = 31;
if (0 <= n_left_shifts && n_left_shifts <= max_allowed_left_shifts)
{
  a = a << n_left_shifts;
}
else
{
  /* put your error handling code here */
}
 

Offline rsjsouza

  • Super Contributor
  • ***
  • Posts: 6010
  • Country: us
  • Eternally curious
    • Vbe - vídeo blog eletrônico
Re: Previously unknown 'C' behavior (to me).
« Reply #12 on: April 23, 2017, 01:41:50 pm »
Never shift signed integers unless you know exactly what you're doing and always explicitly limit calculated shifts to the bit size of the operand. Otherwise 'weird shit' will happen.
That reminded me how certain architectures were designed to deal with this by design - I am familiar with C5000 DSPs from TI, where you can explicitly set a configuration bit that either filled with zeros or performed sign extension. Pretty handy for fast calculations, but certainly a pitfall if you migrated the code to other architectures.
Vbe - vídeo blog eletrônico http://videos.vbeletronico.com

Oh, the "whys" of the datasheets... The information is there not to be an axiomatic truth, but instead each speck of data must be slowly inhaled while carefully performing a deep search inside oneself to find the true metaphysical sense...
 

Offline Jeroen3

  • Super Contributor
  • ***
  • Posts: 4081
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: Previously unknown 'C' behavior (to me).
« Reply #13 on: April 23, 2017, 01:51:34 pm »
It is generally better for code cleanliness of you multiply and divide signed numbers. Some platforms do however have signed shifts.
If you use literals, the compiler might choose these signed shifts for you.
But, if you're very tight in performance you should write an platform specific assembly routine, with pseudo code documentation, that performs your signed shift.

Quote
As the C/C++ languages are broken and unsafe by design

Well you're both right and wrong, you should now that C is one level above machine code, C++ is somewhat higher, but you can still get uncomfortably close to the machine code.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3169
  • Country: ca
Re: Previously unknown 'C' behavior (to me).
« Reply #14 on: April 23, 2017, 02:30:48 pm »
As the C/C++ languages are broken and unsafe by design (meaning they do not fail on invalid operation, but instead continue execution with the invalid result)

This is what makes them more efficient. It is nothing broken or unsafe about it.

There's a popular misconception that the bugs needs to be detected by compilers and tools, but not by the programmer. This produced lots of bloat, but the software doesn't appear to be less buggy.
 
The following users thanked this post: janoc

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19829
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Previously unknown 'C' behavior (to me).
« Reply #15 on: April 23, 2017, 02:42:38 pm »
As the C/C++ languages are broken and unsafe by design (meaning they do not fail on invalid operation, but instead continue execution with the invalid result)

This is what makes them more efficient. It is nothing broken or unsafe about it.

No, it doesn't. It enables them to give incorrect results.

Now, if I am allowed write a program that is allowed to give the incorrect results, then I can make it much faster and "more efficient" than any C compiler. (And that's true for any program you might like to define!)
« Last Edit: April 23, 2017, 02:44:55 pm by tggzzz »
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline radar_macgyver

  • Frequent Contributor
  • **
  • Posts: 700
  • Country: us
Re: Previously unknown 'C' behavior (to me).
« Reply #16 on: April 23, 2017, 02:45:28 pm »
An interesting tool you can use to find out why such 'odd behaviors' occur is Compiler Explorer at godbolt.org

hamster_nz's original example compiles to the actual shifts when using -O0 on gcc 6.3. When enabling any optimization, it gets reduced down to a single "mov eax, 0".

https://godbolt.org/g/hzTnfV
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4108
  • Country: nz
Re: Previously unknown 'C' behavior (to me).
« Reply #17 on: April 23, 2017, 02:56:08 pm »
Well you're both right and wrong, you should now that C is one level above machine code, C++ is somewhat higher, but you can still get uncomfortably close to the machine code.

That's not quite right.

C is one level above a kind of least common denominator of a large set machine codes.

There will be many things that are perfectly well defined in any particular machine code, but if you translate a machine code program naively to C then you can easily hit undefined behavour that causes (or at least permits) the compiler to generate something quite different. Such as simply replacing your code by exit(0).

At least that's true with signed variables, as we see here.

Operations on unsigned, on the other hand, are much more rigidly defined. They happen to map efficiently to most modern CPUs, but other CPUs may be compelled to do something quite inefficient in order to get the results defined by C.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3169
  • Country: ca
Re: Previously unknown 'C' behavior (to me).
« Reply #18 on: April 23, 2017, 03:14:15 pm »
No, it doesn't. It enables them to give incorrect results.

This is not incorrect result. This is undefined result. On x86, the shift can be done in one assembler instruction. If you want to check the argument first, you need 2 more instructions. So, the code becomes three times longer (actually more than 3 times) and 3 times slower.

Shifting 32-bit integer by more than 31 bits is not a useful operation, so hopefully the programmer will not do this. If the compiler was forced to produce some defined result for it, then, depending on the platform, then extra checks would have to be employed to produce the desired result, making the code bigger and slower. This would "punish" good programmers as well as bad ones. Hence the undefined result for unlikely situation. The undefined result harms only a bad programmer who didn't take time to learn the standard. If the C standard required some sort of deterministic response, then all programmers would get the unefficient code.

edit: fixed the unefficient typo and re-phrased the last sentence.
« Last Edit: April 23, 2017, 04:31:23 pm by NorthGuy »
 
The following users thanked this post: janoc

Offline rsjsouza

  • Super Contributor
  • ***
  • Posts: 6010
  • Country: us
  • Eternally curious
    • Vbe - vídeo blog eletrônico
Re: Previously unknown 'C' behavior (to me).
« Reply #19 on: April 23, 2017, 03:48:39 pm »
Well you're both right and wrong, you should now that C is one level above machine code, C++ is somewhat higher, but you can still get uncomfortably close to the machine code.

That's not quite right.

C is one level above a kind of least common denominator of a large set machine codes.

There will be many things that are perfectly well defined in any particular machine code, but if you translate a machine code program naively to C then you can easily hit undefined behavour that causes (or at least permits) the compiler to generate something quite different. Such as simply replacing your code by exit(0).
Quite right. Another example of a "level above machine code" was something on the C5000 DSPs that was called "algebraic assembly", which allowed mathematical and logic operations to be written "in english" but using registers, symbols and addresses directly. This was a level below C programming, but it never took off across other devices and platforms (IIRC Analog Devices also had a similar thing).
Vbe - vídeo blog eletrônico http://videos.vbeletronico.com

Oh, the "whys" of the datasheets... The information is there not to be an axiomatic truth, but instead each speck of data must be slowly inhaled while carefully performing a deep search inside oneself to find the true metaphysical sense...
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19829
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Previously unknown 'C' behavior (to me).
« Reply #20 on: April 23, 2017, 03:58:22 pm »
No, it doesn't. It enables them to give incorrect results.

This is not incorrect result. This is undefined result. On x86, the shift can be done in one assembler instruction. If you want to check the argument first, you need 2 more instructions. So, the code becomes three times longer (actually more than 3 times) and 3 times slower.

Shifting 32-bit integer by more than 31 bits is not a useful operation, so hopefully the programmer will not do this. If the compiler was forced to produce some defined result for it, then, depending on the platform, then extra checks would have to be employed to produce the desired result, making the code bigger and slower. This would "punish" good programmers as well as bad ones. Hence the undefined result for unlikely situation. The undefined result harms only a bad programmer who didn't take time to learn the standard. Otherwise all programmers would get the efficient code.

Your argument about "punishing programmers" is bizarre. If anybody is "punished" it is the users of such malfunctioning programs!

The points I made in my earlier response https://www.eevblog.com/forum/microcontrollers/previously-unknown-'c'-behavior-(to-me)/msg1191462/#msg1191462 are relevant to your response, viz:
Quote
There are many many many such "curious cases" in C/C++, often related to overflow/underflow and shifting. You really have to be not far off a language lawyer to predict them all, and then to be able to do accurate "numerical analysis" to verify they cannot occur in each location you use them. Both those skills are demonstrably in short supply, and they are unlikely to occur in someone who knows the problem domain and is trying to use C/C++ as a tool to solve their problem.

Some consider that if you don't know the many C/C++ "gotchas", then it is your fault for not understanding your tool. Others consider that it indicates the tool itself has problems.

One way to minimise (not completely avoid) such surprises are to use a very constrained subset, e.g. MISRA-C. Another way is to choose a different tool which doesn't have the problems.

For more amusing C/C++ topics, see http://yosefk.com/c++fqa/
« Last Edit: April 23, 2017, 04:01:49 pm by tggzzz »
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 27232
  • Country: nl
    • NCT Developments
Re: Previously unknown 'C' behavior (to me).
« Reply #21 on: April 23, 2017, 05:17:16 pm »
This discussion is rather moot. At some point somebody is going to tell us we all need this in order not to hurt ourselves while eating:
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3169
  • Country: ca
Re: Previously unknown 'C' behavior (to me).
« Reply #22 on: April 23, 2017, 06:05:57 pm »
At some point somebody is going to tell us we all need this in order not to hurt ourselves while eating:


 
The following users thanked this post: janoc, nctnico

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19829
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Previously unknown 'C' behavior (to me).
« Reply #23 on: April 23, 2017, 06:16:56 pm »
This discussion is rather moot. At some point somebody is going to tell us we all need this in order not to hurt ourselves while eating:

The point is not whether a programmer hurts themselves with their tool, it is whether they hurt other people, e.g. the people that use the program directly or indirectly. And there are many many many examples of people being "disadvantaged" by programs written in C/C++ due any of the many many many reasons they can cause nasal daemons to appear at the least convenient moment.

I first used C professionally in 1981, for a hard realtime system. I think I have a better appreciation of its strengths and weaknesses than many young programmers, and know when it is and isn't the most appropriate tool. Claims of "better efficiency" (whatever that might mean) are pretty naive from several angles.

If you want to consider one of the more surprising ways of improving C's performance on benchmarks on machine X, don't run optimised output, but do run non-optimised postprocessed output on an emulation of machine X. Yup, emulation can make it faster :) See HP's experimental Dynamo compiler for the numerical results.
« Last Edit: April 23, 2017, 06:25:38 pm by tggzzz »
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 27232
  • Country: nl
    • NCT Developments
Re: Previously unknown 'C' behavior (to me).
« Reply #24 on: April 23, 2017, 07:07:44 pm »
This discussion is rather moot. At some point somebody is going to tell us we all need this in order not to hurt ourselves while eating:
The point is not whether a programmer hurts themselves with their tool, it is whether they hurt other people, e.g. the people that use the program directly or indirectly. And there are many many many examples of people being "disadvantaged" by programs written in C/C++ due any of the many many many reasons they can cause nasal daemons to appear at the least convenient moment.
If you want to measure by that metric then Delphi and Java are far far worse because they claim to make programming easy. Anyway, I don't think anyone is claiming C and/or C++ are the best languages around but for some applications there simply isn't a good alternative. In case of C++ you can use Boost and STL libraries which offer solutions for commonly used constructs (design patterns seems to be the phrase 'du jour') so chances of screwing things up badly are greatly reduced.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf