Author Topic: Previously unknown 'C' behavior (to me). (Read 18235 times)

hamster_nz · « **on:** April 22, 2017, 10:37:17 am »

Had to stomp on an interesting bug today. It seems:

a = a << (b+c);

is not equivalent to :

a = (a << b) << c;

Or at least for two compilers I tested. Here's the test case:

Code: [Select]

#include <stdio.h>

int main(int argc, char *argv[])
{
  /* int is 32 bits */
  int a,b;
  int n0 = 24, n1 = 8;

  a = -1;
  a = a << (n0+n1);

  b = -1;
  b = (b << n0) << n1;

  printf("%08X should equal %08X\n", a, b);
}

And for GCC, depending if I turn optimizations on I get a different answers for the same code:

Code: [Select]

$ gcc -o check check.c -Wall -pedantic
$ ./check
FFFFFFFF should equal 00000000
$ gcc -o check check.c -Wall -pedantic -O4
$ ./check
00000000 should equal 00000000
$

tggzzz · « **Reply #1 on:** April 22, 2017, 11:11:03 am »

There are many many many such "curious cases" in C/C++, often related to overflow/underflow and shifting. You really have to be not far off a language lawyer to predict them all, and then to be able to do accurate "numerical analysis" to verify they cannot occur in each location you use them. Both those skills are demonstrably in short supply, and they are unlikely to occur in someone who knows the problem domain and is trying to use C/C++ as a tool to solve their problem.

Some consider that if you don't know the many C/C++ "gotchas", then it is your fault for not understanding your tool. Others consider that it indicates the tool itself has problems.

One way to minimise (not completely avoid) such surprises are to use a very constrained subset, e.g. MISRA-C. Another way is to choose a different tool which doesn't have the problems.

For more amusing C/C++ topics, see http://yosefk.com/c++fqa/

Jeroen3 · « **Reply #2 on:** April 22, 2017, 11:17:44 am »

You're shifting a 31 bit value with 32. It should warn you about that.

hamster_nz · « **Reply #3 on:** April 22, 2017, 11:31:48 am »

In this case it is exposing the Intel CPU's behavior - when shifting 32-bit types only the lowest 5 bits of the shift count are used (except on the 8086).

The case where it turned up was a bit obscure, but not _that_ obscure - I needed to quickly generate a varying length bit mask that covers between 1 and 32 bits, depending on phase alignment of a signal. Most interesting is how the behavior changes depending on optimizer settings. Just glad I found it early....

hamster_nz · « **Reply #4 on:** April 22, 2017, 11:50:21 am »

Quote from: Jeroen3 on April 22, 2017, 11:17:44 am

You're shifting a 31 bit value with 32. It should warn you about that.

If the shift value is a constant value greater than thirty one (e.g. "z = x<<32;") then GCC will give a warning.

It it is a variable with a constant value it won't warn:

Code: [Select]

$ cat check.c
#include <stdio.h>

int main(int argc, char *argv[])
{
  int a;
  int n0 = 32;
  a = 1;
  a = a << n0;
  printf("%08X\n",a);
}
$ gcc -o check check.c -Wall -pedantic -O4
$

... unless the variable is 'const':

Code: [Select]


$ cat check.c
#include <stdio.h>

int main(int argc, char *argv[])
{
  int a;
  const int n0 = 32;
  a = 1;
  a = a << n0;
  printf("%08X\n",a);
}
$ gcc -o check check.c -Wall -pedantic -O4
check.c: In function ‘main’:
check.c:8:9: warning: left shift count >= width of type
   a = a << n0;
         ^
$

Well, you learn something every day :-)

mib · « **Reply #5 on:** April 22, 2017, 11:55:44 am »

Shifting by more than the bit size of the operand is undefined. Left shift of a negative number is undefined. You're doing both, so shouldn't be a surprise that you get odd results.

Never shift signed integers unless you know exactly what you're doing and always explicitly limit calculated shifts to the bit size of the operand. Otherwise 'weird shit' will happen.

brucehoult · « **Reply #6 on:** April 22, 2017, 02:29:43 pm »

Quote from: mib on April 22, 2017, 11:55:44 am

Shifting by more than the bit size of the operand is undefined.

Yes.

Some CPUs will give you the least significant bits of the infinitely precise result. Others (most now) will only use the lower bits of the shift count.

On a 32 bit machine, a<<b is likely to be a<<(b%32). If you write it like that yourself then you can be absolutely sure what answer you will get, and on modern x86 and ARM the % will be optomized away.

So:

a << 32 -> a << (32%32) -> a << 0 -> a

(a << 24) << 8 -> (a << (24%32)) << (8%32) -> 0

Quote

Left shift of a negative number is undefined.

True in theory, but only because you just might be on a sign/magnitude machine. At least on unoptimized code. If you run the optimizer and the compiler proves that you're shifting a negative number then it can go "oh ho ho! Undefined behaviour! Nasal demons for you...".

Best to only shift variables declared as unsigned, and include the % yourself, just to be sure what you'll get.

Jope · « **Reply #7 on:** April 22, 2017, 02:59:34 pm »

When it doubt, whip it out. That is, the C standard. You can download it here: C99.

Page 85:
"The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with
zeros. If E1 has an unsigned type, the value of the result is E1 x 2^E2, reduced modulo
one more than the maximum value representable in the result type. If E1 has a signed
type and nonnegative value, and E1 x 2^E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined."

Kalvin · « **Reply #8 on:** April 22, 2017, 03:09:50 pm »

Quote from: mib on April 22, 2017, 11:55:44 am

Never shift signed integers unless you know exactly what you're doing and always explicitly limit calculated shifts to the bit size of the operand. Otherwise 'weird shit' will happen.

Oops, you missed the "f"

Kjelt · « **Reply #9 on:** April 22, 2017, 03:24:08 pm »

I always try to prevent multiple operations on a single line of code.
If not for readability than for these kind of cases.

tggzzz · « **Reply #10 on:** April 22, 2017, 03:47:00 pm »

Quote from: brucehoult on April 22, 2017, 02:29:43 pm

If you run the optimizer and the compiler proves that you're shifting a negative number then it can go "oh ho ho! Undefined behaviour! Nasal demons for you...".

If only that was the worst it could do! If unlucky, then your machine could appear to be "brickerbotted" and/or your company be destroyed

Before touching a keyboard to invoke a C/C++ compiler, best to read, learn and inwardly digest - and then apply - things like https://www.schneier.com/blog/archives/2017/04/new_c_secure_co.html

Kalvin · « **Reply #11 on:** April 22, 2017, 04:24:01 pm »

As the C/C++ languages are broken and unsafe by design (meaning they do not fail on invalid operation, but instead continue execution with the invalid result) you should always perform the checks by yourself:

Code: [Select]

const int n_left_shifts = number_of_left_shifts_you_want;
const int max_allowed_left_shifts = 31;
if (0 <= n_left_shifts && n_left_shifts <= max_allowed_left_shifts)
{
  a = a << n_left_shifts;
}
else
{
  /* put your error handling code here */
}

rsjsouza · « **Reply #12 on:** April 23, 2017, 01:41:50 pm »

Quote from: mib on April 22, 2017, 11:55:44 am

Never shift signed integers unless you know exactly what you're doing and always explicitly limit calculated shifts to the bit size of the operand. Otherwise 'weird shit' will happen.

That reminded me how certain architectures were designed to deal with this by design - I am familiar with C5000 DSPs from TI, where you can explicitly set a configuration bit that either filled with zeros or performed sign extension. Pretty handy for fast calculations, but certainly a pitfall if you migrated the code to other architectures.

Jeroen3 · « **Reply #13 on:** April 23, 2017, 01:51:34 pm »

It is generally better for code cleanliness of you multiply and divide signed numbers. Some platforms do however have signed shifts.
If you use literals, the compiler might choose these signed shifts for you.
But, if you're very tight in performance you should write an platform specific assembly routine, with pseudo code documentation, that performs your signed shift.

Quote

As the C/C++ languages are broken and unsafe by design

Well you're both right and wrong, you should now that C is one level above machine code, C++ is somewhat higher, but you can still get uncomfortably close to the machine code.

NorthGuy · « **Reply #14 on:** April 23, 2017, 02:30:48 pm »

Quote from: Kalvin on April 22, 2017, 04:24:01 pm

As the C/C++ languages are broken and unsafe by design (meaning they do not fail on invalid operation, but instead continue execution with the invalid result)

This is what makes them more efficient. It is nothing broken or unsafe about it.

There's a popular misconception that the bugs needs to be detected by compilers and tools, but not by the programmer. This produced lots of bloat, but the software doesn't appear to be less buggy.

tggzzz · « **Reply #15 on:** April 23, 2017, 02:42:38 pm »

Quote from: NorthGuy on April 23, 2017, 02:30:48 pm

Quote from: Kalvin on April 22, 2017, 04:24:01 pm
As the C/C++ languages are broken and unsafe by design (meaning they do not fail on invalid operation, but instead continue execution with the invalid result)

This is what makes them more efficient. It is nothing broken or unsafe about it.

No, it doesn't. It enables them to give incorrect results.

Now, if I am allowed write a program that is allowed to give the incorrect results, then I can make it much faster and "more efficient" than any C compiler. (And that's true for any program you might like to define!)

radar_macgyver · « **Reply #16 on:** April 23, 2017, 02:45:28 pm »

An interesting tool you can use to find out why such 'odd behaviors' occur is Compiler Explorer at godbolt.org

hamster_nz's original example compiles to the actual shifts when using -O0 on gcc 6.3. When enabling any optimization, it gets reduced down to a single "mov eax, 0".

https://godbolt.org/g/hzTnfV

brucehoult · « **Reply #17 on:** April 23, 2017, 02:56:08 pm »

Quote from: Jeroen3 on April 23, 2017, 01:51:34 pm

Well you're both right and wrong, you should now that C is one level above machine code, C++ is somewhat higher, but you can still get uncomfortably close to the machine code.

That's not quite right.

C is one level above a kind of least common denominator of a large set machine codes.

There will be many things that are perfectly well defined in any particular machine code, but if you translate a machine code program naively to C then you can easily hit undefined behavour that causes (or at least permits) the compiler to generate something quite different. Such as simply replacing your code by exit(0).

At least that's true with signed variables, as we see here.

Operations on unsigned, on the other hand, are much more rigidly defined. They happen to map efficiently to most modern CPUs, but other CPUs may be compelled to do something quite inefficient in order to get the results defined by C.

NorthGuy · « **Reply #18 on:** April 23, 2017, 03:14:15 pm »

Quote from: tggzzz on April 23, 2017, 02:42:38 pm

No, it doesn't. It enables them to give incorrect results.

This is not incorrect result. This is undefined result. On x86, the shift can be done in one assembler instruction. If you want to check the argument first, you need 2 more instructions. So, the code becomes three times longer (actually more than 3 times) and 3 times slower.

Shifting 32-bit integer by more than 31 bits is not a useful operation, so hopefully the programmer will not do this. If the compiler was forced to produce some defined result for it, then, depending on the platform, then extra checks would have to be employed to produce the desired result, making the code bigger and slower. This would "punish" good programmers as well as bad ones. Hence the undefined result for unlikely situation. The undefined result harms only a bad programmer who didn't take time to learn the standard. If the C standard required some sort of deterministic response, then all programmers would get the unefficient code.

edit: fixed the unefficient typo and re-phrased the last sentence.

rsjsouza · « **Reply #19 on:** April 23, 2017, 03:48:39 pm »

Quote from: brucehoult on April 23, 2017, 02:56:08 pm

Quote from: Jeroen3 on April 23, 2017, 01:51:34 pm
Well you're both right and wrong, you should now that C is one level above machine code, C++ is somewhat higher, but you can still get uncomfortably close to the machine code.

That's not quite right.

C is one level above a kind of least common denominator of a large set machine codes.

There will be many things that are perfectly well defined in any particular machine code, but if you translate a machine code program naively to C then you can easily hit undefined behavour that causes (or at least permits) the compiler to generate something quite different. Such as simply replacing your code by exit(0).

Quite right. Another example of a "level above machine code" was something on the C5000 DSPs that was called "algebraic assembly", which allowed mathematical and logic operations to be written "in english" but using registers, symbols and addresses directly. This was a level below C programming, but it never took off across other devices and platforms (IIRC Analog Devices also had a similar thing).

tggzzz · « **Reply #20 on:** April 23, 2017, 03:58:22 pm »

Quote from: NorthGuy on April 23, 2017, 03:14:15 pm

Quote from: tggzzz on April 23, 2017, 02:42:38 pm
No, it doesn't. It enables them to give incorrect results.

This is not incorrect result. This is undefined result. On x86, the shift can be done in one assembler instruction. If you want to check the argument first, you need 2 more instructions. So, the code becomes three times longer (actually more than 3 times) and 3 times slower.

Shifting 32-bit integer by more than 31 bits is not a useful operation, so hopefully the programmer will not do this. If the compiler was forced to produce some defined result for it, then, depending on the platform, then extra checks would have to be employed to produce the desired result, making the code bigger and slower. This would "punish" good programmers as well as bad ones. Hence the undefined result for unlikely situation. The undefined result harms only a bad programmer who didn't take time to learn the standard. Otherwise all programmers would get the efficient code.

Your argument about "punishing programmers" is bizarre. If anybody is "punished" it is the users of such malfunctioning programs!

The points I made in my earlier response https://www.eevblog.com/forum/microcontrollers/previously-unknown-'c'-behavior-(to-me)/msg1191462/#msg1191462 are relevant to your response, viz:

Quote

There are many many many such "curious cases" in C/C++, often related to overflow/underflow and shifting. You really have to be not far off a language lawyer to predict them all, and then to be able to do accurate "numerical analysis" to verify they cannot occur in each location you use them. Both those skills are demonstrably in short supply, and they are unlikely to occur in someone who knows the problem domain and is trying to use C/C++ as a tool to solve their problem.

Some consider that if you don't know the many C/C++ "gotchas", then it is your fault for not understanding your tool. Others consider that it indicates the tool itself has problems.

One way to minimise (not completely avoid) such surprises are to use a very constrained subset, e.g. MISRA-C. Another way is to choose a different tool which doesn't have the problems.

For more amusing C/C++ topics, see http://yosefk.com/c++fqa/

nctnico · « **Reply #21 on:** April 23, 2017, 05:17:16 pm »

This discussion is rather moot. At some point somebody is going to tell us we all need this in order not to hurt ourselves while eating:

NorthGuy · « **Reply #22 on:** April 23, 2017, 06:05:57 pm »

Quote from: nctnico on April 23, 2017, 05:17:16 pm

At some point somebody is going to tell us we all need this in order not to hurt ourselves while eating:

tggzzz · « **Reply #23 on:** April 23, 2017, 06:16:56 pm »

Quote from: nctnico on April 23, 2017, 05:17:16 pm

This discussion is rather moot. At some point somebody is going to tell us we all need this in order not to hurt ourselves while eating:

The point is not whether a programmer hurts themselves with their tool, it is whether they hurt other people, e.g. the people that use the program directly or indirectly. And there are many many many examples of people being "disadvantaged" by programs written in C/C++ due any of the many many many reasons they can cause nasal daemons to appear at the least convenient moment.

I first used C professionally in 1981, for a hard realtime system. I think I have a better appreciation of its strengths and weaknesses than many young programmers, and know when it is and isn't the most appropriate tool. Claims of "better efficiency" (whatever that might mean) are pretty naive from several angles.

If you want to consider one of the more surprising ways of improving C's performance on benchmarks on machine X, don't run optimised output, but do run non-optimised postprocessed output on an emulation of machine X. Yup, emulation can make it faster

See HP's experimental Dynamo compiler for the numerical results.

nctnico · « **Reply #24 on:** April 23, 2017, 07:07:44 pm »

Quote from: tggzzz on April 23, 2017, 06:16:56 pm

Quote from: nctnico on April 23, 2017, 05:17:16 pm
This discussion is rather moot. At some point somebody is going to tell us we all need this in order not to hurt ourselves while eating:
The point is not whether a programmer hurts themselves with their tool, it is whether they hurt other people, e.g. the people that use the program directly or indirectly. And there are many many many examples of people being "disadvantaged" by programs written in C/C++ due any of the many many many reasons they can cause nasal daemons to appear at the least convenient moment.

If you want to measure by that metric then Delphi and Java are far far worse because they claim to make programming easy. Anyway, I don't think anyone is claiming C and/or C++ are the best languages around but for some applications there simply isn't a good alternative. In case of C++ you can use Boost and STL libraries which offer solutions for commonly used constructs (design patterns seems to be the phrase 'du jour') so chances of screwing things up badly are greatly reduced.

andyturk · « **Reply #25 on:** April 23, 2017, 07:19:15 pm »

Quote from: tggzzz on April 22, 2017, 11:11:03 am

Some consider that if you don't know the many C/C++ "gotchas", then it is your fault for not understanding your tool. Others consider that it indicates the tool itself has problems.

tggzzz,

Your critiques of C and C++ are all over these forums. What would you recommend as an alternative?

rsjsouza · « **Reply #26 on:** April 23, 2017, 07:23:46 pm »

Quote from: nctnico on April 23, 2017, 07:07:44 pm

Quote from: tggzzz on April 23, 2017, 06:16:56 pm
Quote from: nctnico on April 23, 2017, 05:17:16 pm
This discussion is rather moot. At some point somebody is going to tell us we all need this in order not to hurt ourselves while eating:
The point is not whether a programmer hurts themselves with their tool, it is whether they hurt other people, e.g. the people that use the program directly or indirectly. And there are many many many examples of people being "disadvantaged" by programs written in C/C++ due any of the many many many reasons they can cause nasal daemons to appear at the least convenient moment.
If you want to measure by that metric then Delphi and Java are far far worse because they claim to make programming easy. Anyway, I don't think anyone is claiming C and/or C++ are the best languages around but for some applications there simply isn't a good alternative. In case of C++ you can use Boost and STL libraries which offer solutions for commonly used constructs (design patterns seems to be the phrase 'du jour') so chances of screwing things up badly are greatly reduced.

I agree. In the case of embedded (my area of work), the use of peripheral or HW-centric libraries helps prevent screw ups as well, as they intrinsically inherit years of experience in a particular platform. Obviously that such libraries must go through the regular cycle of fixing bugs and testing as well.

C is not perfect but, as mentioned by nctnico, other paradigms were tried but are not free of the effects of bad or smart-assery programming. More than once in my life I found the latter, which brings me the phrase: "Just because you can do really intricate constructs with C, it doesn't mean you should".

NorthGuy · « **Reply #27 on:** April 23, 2017, 08:21:28 pm »

Quote from: tggzzz on April 23, 2017, 06:16:56 pm

The point is not whether a programmer hurts themselves with their tool, it is whether they hurt other people, e.g. the people that use the program directly or indirectly. And there are many many many examples of people being "disadvantaged" by programs written in C/C++ due any of the many many many reasons they can cause nasal daemons to appear at the least convenient moment.

I don't see it that way. If you program firmware for a product, you take total responsibility for the programming. If you made it buggy and didn't test it well, ii is you who's hurting the clients. Not the C language, nor any other tool you used, but you.

Moreover, if you use any libraries, Linux, or whatever you decide to drag into your embedded project, you take full responsibility for all the bugs and vulnerabilities contained therein. And if these bugs and vulnerabilities happen to hurt your users, then this is absolutely, 100% your fault, because it is caused by your decision to drag all this stuff in.

It is, of course, your choice what language and what tools you want to use. You decide whether you want full freedom or an illusion of safety. And you're responsible for all the consequences. There's no tool of any sort which will prevent you from making bugs. And there will never be such a tool. Programming with less bugs (and less bloat for that matter) is a responsibility of a human.

hamster_nz · « **Reply #28 on:** April 23, 2017, 08:59:03 pm »

Quote from: brucehoult on April 23, 2017, 02:56:08 pm

Operations on unsigned, on the other hand, are much more rigidly defined. They happen to map efficiently to most modern CPUs, but other CPUs may be compelled to do something quite inefficient in order to get the results defined by C.

Signed/unsigned is a distraction - the same problem exists for "unsigned int" variables too:

Code: [Select]

$ cat check.c
#include <stdio.h>

int main(int argc, char *argv[])
{
  /* unsigned int is 32 bits */
  unsigned int a,b;
  unsigned int n0 = 24, n1 = 8;

  a = 0xFFFFFFFF;
  a = a << (n0+n1);

  b = 0xFFFFFFFF;
  b = (b << n0) << n1;

  printf("%08X should equal %08X\n", a, b);
}
$ ./check
FFFFFFFF should equal 00000000

I think that there is something deeper here. The same problem exists with division. With unsigned numbers, do you not agree that 'shift left by n' is equivalent to division by a power of 2^n?

Code: [Select]

$ cat check.c
#include <stdio.h>

int main(int argc, char *argv[])
{
  unsigned int a,b;
  unsigned int n0 = 1<<23, n1 = 1<<8;

  a = 0xFFFFFFFF;
  a = a / (n0*n1);

  b = 0xFFFFFFFF;
  b = (b / n0) / n1;

  printf("%08X should equal %08X\n", a, b);
  return 0;
}
$ ./check
00000001 should equal 00000001

Have a look at what happens when

unsigned int n0 = 1<<23, n1 = 1<<8;

is replaced with

unsigned int n0 = 1<<24, n1 = 1<<8;

I do know what is happening and why, but I quite like the 'equivalence breaking' between the shift and division operators - they are only the same within limited bounds then each breaks in a different way.

I don't think it is a 'C' vs 'a better language' problem. Any programming language that ends up generating or executing bit-shift opcodes will most likely exhibit this platform specific behavior unless the language designer has gone out of their way to make it defined - (e.g. by masking with the size of the data type being shifted). Even those writing in assembler might not see it coming.

PS. We all know that "x>>1" is not the same as "x/2" for signed values? If not, try where x = -1.

tggzzz · « **Reply #29 on:** April 23, 2017, 09:02:59 pm »

Quote from: nctnico on April 23, 2017, 07:07:44 pm

Quote from: tggzzz on April 23, 2017, 06:16:56 pm
Quote from: nctnico on April 23, 2017, 05:17:16 pm
This discussion is rather moot. At some point somebody is going to tell us we all need this in order not to hurt ourselves while eating:
The point is not whether a programmer hurts themselves with their tool, it is whether they hurt other people, e.g. the people that use the program directly or indirectly. And there are many many many examples of people being "disadvantaged" by programs written in C/C++ due any of the many many many reasons they can cause nasal daemons to appear at the least convenient moment.
If you want to measure by that metric then Delphi and Java are far far worse because they claim to make programming easy.

That is a bizarre chain of - and I use the word loosely - reasoning.

Quote

Anyway, I don't think anyone is claiming C and/or C++ are the best languages around but for some applications there simply isn't a good alternative. In case of C++ you can use Boost and STL libraries which offer solutions for commonly used constructs (design patterns seems to be the phrase 'du jour') so chances of screwing things up badly are greatly reduced.

There is an extremely perceptive comment by someone that will be remembered after we are long gone:
"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult"
Tony Hoare, in his Turing Award lecture
http://zoo.cs.yale.edu/classes/cs422/2011/bib/hoare81emperor.pdf

In that he also notes about his Algol 60 implementation for a 2kIPS machine:
"Every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to - they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law." (My emphasis)

tggzzz · « **Reply #30 on:** April 23, 2017, 09:04:45 pm »

Quote from: andyturk on April 23, 2017, 07:19:15 pm

Quote from: tggzzz on April 22, 2017, 11:11:03 am
Some consider that if you don't know the many C/C++ "gotchas", then it is your fault for not understanding your tool. Others consider that it indicates the tool itself has problems.

tggzzz,

Your critiques of C and C++ are all over these forums. What would you recommend as an alternative?

I wouldn't be dogmatic, because there are several alternatives, each with their own set of advantages and disadvantages. An engineer would be aware of them.

BTW, you do me too much credit. Excellent C/C++ critiques are all over the web, in far more detail and with far more understanding that I have.

I've repeatedly pointed people towards the FQA, but here's another from someone that has decades (since the 60s) of experience finding and debugging foul problems in hardware and software: "What is an Object in C Terms?" http://www.open-std.org/jtc1/sc22/wg14/9350 Note the "wg14" in the URL; I presume everybody commenting on this thread knows what WG14 is.

One statement, chosen more or less at random, indicates the scope of the problems...
"C99 introduced the concept of effective type (6.5 paragraph 6), but it has had the effect of making a confusing situation totally baffling. This is because it has introduced a new category of types, it has invented new terminology without defining it, its precise intent is most unclear, and it has not specified its effect on the library. "

tggzzz · « **Reply #31 on:** April 23, 2017, 09:06:51 pm »

Quote from: rsjsouza on April 23, 2017, 07:23:46 pm

Quote from: nctnico on April 23, 2017, 07:07:44 pm
Quote from: tggzzz on April 23, 2017, 06:16:56 pm
Quote from: nctnico on April 23, 2017, 05:17:16 pm
This discussion is rather moot. At some point somebody is going to tell us we all need this in order not to hurt ourselves while eating:
The point is not whether a programmer hurts themselves with their tool, it is whether they hurt other people, e.g. the people that use the program directly or indirectly. And there are many many many examples of people being "disadvantaged" by programs written in C/C++ due any of the many many many reasons they can cause nasal daemons to appear at the least convenient moment.
If you want to measure by that metric then Delphi and Java are far far worse because they claim to make programming easy. Anyway, I don't think anyone is claiming C and/or C++ are the best languages around but for some applications there simply isn't a good alternative. In case of C++ you can use Boost and STL libraries which offer solutions for commonly used constructs (design patterns seems to be the phrase 'du jour') so chances of screwing things up badly are greatly reduced.
I agree. In the case of embedded (my area of work), the use of peripheral or HW-centric libraries helps prevent screw ups as well, as they intrinsically inherit years of experience in a particular platform. Obviously that such libraries must go through the regular cycle of fixing bugs and testing as well.

C is not perfect but, as mentioned by nctnico, other paradigms were tried but are not free of the effects of bad or smart-assery programming. More than once in my life I found the latter, which brings me the phrase: "Just because you can do really intricate constructs with C, it doesn't mean you should".

Agreed. The only caveat is that with C/C++ you don't need to have intricate constructs - even simple ones cause problems, as illustrated by this thread!

tggzzz · « **Reply #32 on:** April 23, 2017, 09:12:27 pm »

Quote from: NorthGuy on April 23, 2017, 08:21:28 pm

Quote from: tggzzz on April 23, 2017, 06:16:56 pm
The point is not whether a programmer hurts themselves with their tool, it is whether they hurt other people, e.g. the people that use the program directly or indirectly. And there are many many many examples of people being "disadvantaged" by programs written in C/C++ due any of the many many many reasons they can cause nasal daemons to appear at the least convenient moment.

I don't see it that way. If you program firmware for a product, you take total responsibility for the programming. If you made it buggy and didn't test it well, ii is you who's hurting the clients. Not the C language, nor any other tool you used, but you.

Moreover, if you use any libraries, Linux, or whatever you decide to drag into your embedded project, you take full responsibility for all the bugs and vulnerabilities contained therein. And if these bugs and vulnerabilities happen to hurt your users, then this is absolutely, 100% your fault, because it is caused by your decision to drag all this stuff in.

It is, of course, your choice what language and what tools you want to use. You decide whether you want full freedom or an illusion of safety. And you're responsible for all the consequences. There's no tool of any sort which will prevent you from making bugs. And there will never be such a tool. Programming with less bugs (and less bloat for that matter) is a responsibility of a human.

That would be a valid point if, and only if, programmers were held responsible for their failures. As it is they easily shelter behind EULAs and very very long disclaimers. They demonstrably have neither personal nor corporate responsibility.

In engineering disciplines the practitioners are indeed held responsible in civil law and, in many cases, in criminal law.

IanB · « **Reply #33 on:** April 23, 2017, 09:20:59 pm »

Is it possible to write an incorrect program? Yes.

Is it possible for any programming environment to prevent someone writing an incorrect program? It is not possible to catch all potential errors, so the answer is no.

Given this, it is all about degrees of incorrectness.

Every program ever written must first exist as a set of algorithmic steps in the mind of its creator. These algorithmic steps must then be translated into whatever real world programming environment will be used for implementation.

If the algorithmic steps are wrong, then the actual program will be wrong.

If the translation to a real system has errors, then the program will be wrong again.

Processes and tools can help to prevent mistakes, but ultimately the responsibility for correctness lies with humans, not machines. If you write a bad program, do not blame your tools.

On the subject of this thread, why would an algorithmic step call for shifting a 32 bit quantity by 32 or more bits in one operation? It's not a question of what the compiler or hardware do with this operation, it is about why do you logically need this operation to occur in your design?

tggzzz · « **Reply #34 on:** April 23, 2017, 09:37:56 pm »

Quote from: IanB on April 23, 2017, 09:20:59 pm

Is it possible to write an incorrect program? Yes.

Is it possible for any programming environment to prevent someone writing an incorrect program? It is not possible to catch all potential errors, so the answer is no.

Given this, it is all about degrees of incorrectness.

Every program ever written must first exist as a set of algorithmic steps in the mind of its creator. These algorithmic steps must then be translated into whatever real world programming environment will be used for implementation.

If the algorithmic steps are wrong, then the actual program will be wrong.

If the translation to a real system has errors, then the program will be wrong again.

Agreed so far.

Quote

Processes and tools can help to prevent mistakes, but ultimately the responsibility for correctness lies with humans, not machines. If you write a bad program, do not blame your tools.

That misses the point. You should choose the right tool (and use it appropriately), and you should avoid using inappropriate tools where better tools are available.

Hardware example (off the top of my head): don't choose and/or recommend using a multimeter with insulation clearances that are known to be defective. Or don't recommend using a scope to look at RF signals (an SA is almost always a more appropriate instrument).

hamster_nz · « **Reply #35 on:** April 23, 2017, 11:01:22 pm »

Quote from: IanB on April 23, 2017, 09:20:59 pm

On the subject of this thread, why would an algorithmic step call for shifting a 32 bit quantity by 32 or more bits in one operation? It's not a question of what the compiler or hardware do with this operation, it is about why do you logically need this operation to occur in your design?

I am moving my GPS receiver code from processing one sample at a time to processing 32 bits at a time. Because of pesky things like Doppler shift, sometimes a chip spans 15 samples, or other times it might span 17 samples - the upshot is that the phase isn't fixed, and slowly creeps. Because the sample rate is 16x the nominal Gold code chip rate the nominal 'chip' rate, 32 samples might cover up to three different code 'chips':

e.g. at one given point in time I might be 11111111111100000000000000001111, with the first set of 1s from bit 1021 of the gold code, the set of 0s from bit 1022, and the last set of ones is from bit 0 of Gold code (the codes are 1023 bits long).

I also need to create a mask, that represents which bits are in the next repetition of the Gold Code. in this case the mask will be 00000000000000000000000000001111, as the last four 1s are from the next repeat of the Gold Code.

I tripped over this issue, as I had 'n0' being the count of repeats from the oldest code bit. 'n1' being the count of bits from the middle code bit, and 'n2' being the count of bits form the most recent code bit - so as you would expect n0+n1+n2 = 32.

To make the mask quickly I am taking 0xFFFFFFFF, and then shifting it right by (n0+n1) - wanting branchless code for speed:
mask = 0xFFFFFFFF;
mask >>= n0+n1;

When the phase was such (or the doppler shift was such) that no bits were used from the most recent code bit it would mask out all the bits.

So now I am using this, and it is fine.
mask = 0xFFFFFFFF;
mask >>= n0;
mask >>= n1;

So it is a real world need / use case. It just was an interesting previously unknown behavior to me!

TNorthover · « **Reply #36 on:** April 24, 2017, 01:09:41 am »

Quote from: IanB on April 23, 2017, 09:20:59 pm

On the subject of this thread, why would an algorithmic step call for shifting a 32 bit quantity by 32 or more bits in one operation?

I've found shifts the same as the type's width come up pretty naturally, implementing rotates for example.

Unfortunately it's one of the harder things to specify at zero cost. Most CPUs have settled on 2s-complement arithmetic so the undefined behaviour with signed overflow is mostly good for loop optimizations in compilers these days. But real CPUs do lots of weird and wonderful things with out of range shifts (even 2 different behaviours in the same CPU).

Kalvin · « **Reply #37 on:** April 24, 2017, 06:52:34 am »

Quote from: NorthGuy on April 23, 2017, 02:30:48 pm

Quote from: Kalvin on April 22, 2017, 04:24:01 pm
As the C/C++ languages are broken and unsafe by design (meaning they do not fail on invalid operation, but instead continue execution with the invalid result)

This is what makes them more efficient. It is nothing broken or unsafe about it.

There's a popular misconception that the bugs needs to be detected by compilers and tools, but not by the programmer. This produced lots of bloat, but the software doesn't appear to be less buggy.

If that is the case, when the even simple microcontrollers have an exception trap for the division by zero. Some more advanced controllers have exceptions for misaligned data access. More advanced microprocessors may have segmentation fault exceptions. All these errors and exception are due to the programmer's errors.

The programming language needs to be able to verify that the actions generated are valid - either during compilation time or during execution time. Firstly, the programmer cannot know what kind of code the compiler is generating and secondly adding the validation code manually through out the source code makes it harder to read and maintain and the validation code may obstruct the meaning of the algorithm. Of course, there should be means to disable those features if the performance or code size is an issue, but they should be enabled by default.

If there is a violation during runtime, the system should rise an exception or call an exception handler and the programmer can get a notice of the problem and decide what to do - either continue with the wrong result or restart the system.

Addition:
The programming language should be constructed so that it allows the programmer to be able to write code in a secure manner so that the compiler would work hard during the compilation time to detect possible errors as early as possible. And the compiler should be able to emit code which checks the errors during runtime, for example checking the array boundaries and variable ranges. Of course, the programmer should be able to choose how strict checking the compiler will perform - if one wants to write sloppy code, that is just fine - but if someone wants to write strictly checked code the compiler should provide sufficient capability to check type compatibility, variable ranges, array access etc.

tggzzz · « **Reply #38 on:** April 24, 2017, 07:53:45 am »

Quote from: hamster_nz on April 23, 2017, 11:01:22 pm

So it is a real world need / use case. It just was an interesting previously unknown behavior to me!

The real nasty in your case is that it depended on compiler optimisation setting. In any sane system optimised externally-visible behaviour should be the same as unoptimised externally-visible behaviour (exception: faster or smaller code or or similar).

You should now be concerned that there is some other form of differing behaviour that you haven't spotted.
You should now be concerned that something might emerge when you recompile with the next version of the compiler (yup, that's a real life problem).
Ditto any library that you use.

hamster_nz · « **Reply #39 on:** April 24, 2017, 08:13:03 am »

Quote from: tggzzz on April 24, 2017, 07:53:45 am

Quote from: hamster_nz on April 23, 2017, 11:01:22 pm
So it is a real world need / use case. It just was an interesting previously unknown behavior to me!

The real nasty in your case is that it depended on compiler optimisation setting. In any sane system optimised externally-visible behaviour should be the same as unoptimised externally-visible behaviour (exception: faster or smaller code or or similar).

You should now be concerned that there is some other form of differing behaviour that you haven't spotted.
You should now be concerned that something might emerge when you recompile with the next version of the compiler (yup, that's a real life problem).
Ditto any library that you use.

I agree that it is nasty trap for those not in the know.

However the code I am now using is good - using two shifts, each less than 18 bits, will give the same result no matter which compiler... Also using uint_32 types to make sure the number of bits in each value is as expected.

janoc · « **Reply #40 on:** April 24, 2017, 09:18:58 am »

Quote from: tggzzz on April 24, 2017, 07:53:45 am

Quote from: hamster_nz on April 23, 2017, 11:01:22 pm
So it is a real world need / use case. It just was an interesting previously unknown behavior to me!

The real nasty in your case is that it depended on compiler optimisation setting. In any sane system optimised externally-visible behaviour should be the same as unoptimised externally-visible behaviour (exception: faster or smaller code or or similar).

You should now be concerned that there is some other form of differing behaviour that you haven't spotted.
You should now be concerned that something might emerge when you recompile with the next version of the compiler (yup, that's a real life problem).
Ditto any library that you use.

That's the ideal case, I agree. In the real world we have to deal with compiler/optimizer bugs, unfortunately.

And you still didn't show us that tool that would be up to your standards. E.g. I would love to write embedded code in Haskell where many of these classes of bugs are simply impossible due to the design of the language. But alas, the micro I am writing the code for is not able to run it.

So we can be doing this

until the cows come home or getting real work done in imperfect languages like C or even assembler instead. Heck, the Space Shuttle code was written in a high level assembly language and there was never a critical error that caused a problem in-flight found. The difference in the code quality was not the tool but the engineering process used to produce it (there is this well known article on it: https://www.fastcompany.com/28121/they-write-right-stuff ).

Don't blame the tools for engineer's mistakes - if you know that the tool has flaws, build safeguards in your process to compensate for it. The same as you do for human errors.

hamster_nz · « **Reply #41 on:** April 24, 2017, 10:32:08 am »

Just to show that my project is heading somewhere. from running at 10 seconds of CPU time per GPS channel per second (or a minute to process one second of samples), It now takes 0.02 seconds to process each channel - about 500x quicker than the original single sample at a time processing. Here's about 0.4 s of the received I/Q values - you can clearly see the BPSK data. Next up to add the two carrier and late/early tracking loops.

With a bit of luck and hard work I will soon have a real time GPS receiver up and running..

tggzzz · « **Reply #42 on:** April 24, 2017, 01:51:56 pm »

Quote from: hamster_nz on April 24, 2017, 08:13:03 am

Quote from: tggzzz on April 24, 2017, 07:53:45 am
Quote from: hamster_nz on April 23, 2017, 11:01:22 pm
So it is a real world need / use case. It just was an interesting previously unknown behavior to me!

The real nasty in your case is that it depended on compiler optimisation setting. In any sane system optimised externally-visible behaviour should be the same as unoptimised externally-visible behaviour (exception: faster or smaller code or or similar).

You should now be concerned that there is some other form of differing behaviour that you haven't spotted.
You should now be concerned that something might emerge when you recompile with the next version of the compiler (yup, that's a real life problem).
Ditto any library that you use.

I agree that it is nasty trap for those not in the know.

However the code I am now using is good - using two shifts, each less than 18 bits, will give the same result no matter which compiler... Also using uint_32 types to make sure the number of bits in each value is as expected.

Keep at the back of your mind that there are many many other such pitfalls with the definition and implementation of C/C++. I've already pointed to the "FQA" and "What is an object".

tggzzz · « **Reply #43 on:** April 24, 2017, 02:03:37 pm »

Quote from: janoc on April 24, 2017, 09:18:58 am

Quote from: tggzzz on April 24, 2017, 07:53:45 am
Quote from: hamster_nz on April 23, 2017, 11:01:22 pm
So it is a real world need / use case. It just was an interesting previously unknown behavior to me!

The real nasty in your case is that it depended on compiler optimisation setting. In any sane system optimised externally-visible behaviour should be the same as unoptimised externally-visible behaviour (exception: faster or smaller code or or similar).

You should now be concerned that there is some other form of differing behaviour that you haven't spotted.
You should now be concerned that something might emerge when you recompile with the next version of the compiler (yup, that's a real life problem).
Ditto any library that you use.

That's the ideal case, I agree. In the real world we have to deal with compiler/optimizer bugs, unfortunately.

Indeed, but that kind of starting point leads towards not trying to improve things. If you has a castle to build, what would you think of someone that suggested building it on sand rather than rock?

Do you think it is better to start by choosing tools that have more or fewer inherent problems?

Quote

And you still didn't show us that tool that would be up to your standards. E.g. I would love to write embedded code in Haskell where many of these classes of bugs are simply impossible due to the design of the language. But alas, the micro I am writing the code for is not able to run it.

That's an example of why I won't give a dogmatic answer.

Nonetheless I'm sure there are other less ambiguous/undefined languages that would run on your processor.

Quote

Don't blame the tools for engineer's mistakes - if you know that the tool has flaws, build safeguards in your process to compensate for it. The same as you do for human errors.

Indeed. But often the best thing is to avoid the unnecessarily dangerous tool in the first place.

Don't try and open a jamjar with a carving knife: use a jar opener or, if that isn't available, use an oyster knife

rstofer · « **Reply #44 on:** April 24, 2017, 03:34:59 pm »

The idea that there exists a perfect programming language with perfect run-time libraries (somebody wrote them!) that absolutely prevents programming errors is complete fantasy. Ada might be as close as it gets and it's a freaking nightmare! Even then, the error checking is totally dependent on the programmer. It's just as easy to code around the internal checks as it is with any other language. It is also possible to write very defensive code but that can be done in any language. Like taking the shift count modulo 32 and handling the possible zero shift count.

We don't need languages with more features (C++, Java) or oddball features (in my view, Python), we need simpler languages without features. K&R C comes to mind. Not that we couldn't screw up but at least the language didn't guide us down the path to destruction (objects).

Still, shifting a 32 bit value left by 32 places could reasonably be expected to be problematic. NO, I wouldn't have thought of it! I'm not that smart, before the fact. After the fact, I could see where the compiler would only allow 5 bits of shift count (0..31) and 32 modulo 32 is 0 so no shift occurs. Seems reasonable. After the fact... Why there are two results is interesting. I guess the compiler writer (another programmer) didn't consider the effect of optimization.

I think I would put the masks in an array. I think it might be faster to grab the pre-built mask than it would be to keep shifting a mask over and over. I don't KNOW that, but I might look at the assembly code to see.

I always liked Pascal... And I still use Fortran...

tggzzz · « **Reply #45 on:** April 24, 2017, 04:19:49 pm »

Quote from: rstofer on April 24, 2017, 03:34:59 pm

The idea that there exists a perfect programming language with perfect run-time libraries (somebody wrote them!) that absolutely prevents programming errors is complete fantasy. Ada might be as close as it gets and it's a freaking nightmare! Even then, the error checking is totally dependent on the programmer. It's just as easy to code around the internal checks as it is with any other language. It is also possible to write very defensive code but that can be done in any language. Like taking the shift count modulo 32 and handling the possible zero shift count.

Of course it is possible to produce bad programs in any language; that's so trivially true that I didn't think it was worth saying! But that's not the point.

The question is whether the language is sufficiently well defined as to allow you to predict what will happen when executed - or whether there are surprises that language lawyers can shelter underneath.

Quote

We don't need languages with more features (C++, Java) or oddball features (in my view, Python), we need simpler languages without features. K&R C comes to mind.

That's a more interesting observation.

IMNSHO C++ is so baroque as to be unusable. Java-the-language is becoming more complex than necessary, but the language and VM definition is demonstrably pretty sound w.r.t. unexpected behaviour.

Now C. There's a reasonable argument to be made that many of the compromises in the later versions of C are due to it being pulled in two directions: as a low-level language for embedded programming, and as a high level language for general applications. Since the two directions are fundamentally at odds with each other, it is unsurprising that the language is neither "fish nor fowl" but is a came (i.e. a horse designed by a committee) or an eierlegende-Wollmilchsau. In that case a return to (something near) K&R C would be attractive.

But it won't happen; hence the need to use alternatives.

Quote

Still, shifting a 32 bit value left by 32 places could reasonably be expected to be problematic. NO, I wouldn't have thought of it! I'm not that smart, before the fact. After the fact, I could see where the compiler would only allow 5 bits of shift count (0..31) and 32 modulo 32 is 0 so no shift occurs. Seems reasonable. After the fact... Why there are two results is interesting. I guess the compiler writer (another programmer) didn't consider the effect of optimization.

Don't feel too bad about it! The vast majority of C/C++ practitioners are the same, whether or not they like to admit it.

NorthGuy · « **Reply #46 on:** April 24, 2017, 04:51:04 pm »

Quote from: rstofer on April 24, 2017, 03:34:59 pm

Why there are two results is interesting. I guess the compiler writer (another programmer) didn't consider the effect of optimization.

In the first case, the shift is done at run-time. The amount of the shift is calculated, then the shift is done using a CPU command, which results in a shift by 0 bytes (CPU unly uses last 5 bits of the argument), that is no shift at all. The value of the "a" was -1, which remains unchanged, and then minus one is assigned to the result:

Code: [Select]

$ gcc -o check check.c -Wall -pedantic
$ ./check
FFFFFFFF should equal 00000000

In the second case, with the higher optimization level, the compiler has figured out that all the operands of the expression are always the same, so the whole expression can be pre-calculated at the compile time. Compiler is likely to use 64-bit arithmetic, so the result of the shift is zero. At the run time, the compiler simply assigns zero to the destination variable:

Code: [Select]

$ gcc -o check check.c -Wall -pedantic -O4
$ ./check
00000000 should equal 00000000

rstofer · « **Reply #47 on:** April 24, 2017, 04:54:37 pm »

Quote from: tggzzz on April 24, 2017, 04:19:49 pm

Quote from: rstofer on April 24, 2017, 03:34:59 pm
Still, shifting a 32 bit value left by 32 places could reasonably be expected to be problematic. NO, I wouldn't have thought of it! I'm not that smart, before the fact. After the fact, I could see where the compiler would only allow 5 bits of shift count (0..31) and 32 modulo 32 is 0 so no shift occurs. Seems reasonable. After the fact... Why there are two results is interesting. I guess the compiler writer (another programmer) didn't consider the effect of optimization.

Don't feel too bad about it! The vast majority of C/C++ practitioners are the same, whether or not they like to admit it.

How many people have ever read the formal definition of ANY language. About the only definition that might be readable by mere mortals is assembly language and even that will be based on the quality of the hardware description.

At the edges, what should happen when a 32 bit value is shifted left about a bazillion places? Let's suppose the definition actually provides an answer. How do we know that the compiler writer met the requirements? How can we ever test everything. Yes, I know GCC comes with a validation suite but I can't say that I ever looked at it. Is it complete? Well, it's pretty clear something slipped by...

This idea of correctness is a serious problem in the nuclear industry, as it should be. The verification programs were written in a very specific dialect of Fortran on a very specific computer with very specific libraries and there would be no changes. Ever! That created a situation where obsolete hardware was being maintained indefinitely simply because nobody wanted to go back through the task of verifying the verification programs. The NRC had already bought off on the existing code/hardware. I don't think there was a 'patch of the month' program.

A reactor is no place to find out that Intel has another bug in the FPU.

And what about that problem in the 14th decimal place of the SINH function? What's with that? No, I don't know if there is a real problem, I'm making it up. But you don't know that! Now everybody is going to go looking at the e^x code.

GeorgeOfTheJungle · « **Reply #48 on:** April 24, 2017, 05:01:54 pm »

Quote from: tggzzz on April 24, 2017, 04:19:49 pm

Don't feel too bad about it! The vast majority of C/C++ practitioners are the same, whether or not they like to admit it.

Yes, and the trap is quietly sitting there just waiting to catch the next unsuspecting poor guy.

rstofer · « **Reply #49 on:** April 24, 2017, 05:11:16 pm »

Quote from: GeorgeOfTheJungle on April 24, 2017, 05:01:54 pm

Quote from: tggzzz on April 24, 2017, 04:19:49 pm
Don't feel too bad about it! The vast majority of C/C++ practitioners are the same, whether or not they like to admit it.

Yes, and the trap is quietly sitting there just waiting to catch the next unsuspecting poor guy.

Or not...

A defensive programmer might have masked the sum of the two shift counts with 0x1F to force it into range or perhaps even deliberately handled the situation where the sum is greater than 31. I guess if you don't know how the compiler is going to react, you can take the position of defending against it.

I wish I were that smart and that, with luck, I might have done something like the above. Probably not...

What is more problematic is finding this kind of thing in somebody else's code. It's easy to see the problem when there are constants for the shift count. It might be more difficult if the count was a member of a struct pointed to by a pointer that was passed around shamelessly. Or, hey, it could be been a member of a union so the value could have different representations and the union could be in a struct. How cool is that?

tggzzz · « **Reply #50 on:** April 24, 2017, 05:14:38 pm »

Quote from: rstofer on April 24, 2017, 04:54:37 pm

Quote from: tggzzz on April 24, 2017, 04:19:49 pm
Quote from: rstofer on April 24, 2017, 03:34:59 pm
Still, shifting a 32 bit value left by 32 places could reasonably be expected to be problematic. NO, I wouldn't have thought of it! I'm not that smart, before the fact. After the fact, I could see where the compiler would only allow 5 bits of shift count (0..31) and 32 modulo 32 is 0 so no shift occurs. Seems reasonable. After the fact... Why there are two results is interesting. I guess the compiler writer (another programmer) didn't consider the effect of optimization.

Don't feel too bad about it! The vast majority of C/C++ practitioners are the same, whether or not they like to admit it.

How many people have ever read the formal definition of ANY language.

Normal users shouldn't have to read the formal definition. People implementing language tools (compilers etc) need to. They need a good formal definition, otherwise we are back to "operational semantics" where in order to see what it will do, you have to do it and observe what was done.

Quote

At the edges, what should happen when a 32 bit value is shifted left about a bazillion places? Let's suppose the definition actually provides an answer. How do we know that the compiler writer met the requirements?

We can't. But without a good unambiguous definition you can't even write the tests because you don't know what is meant to happen.

And the folklore told by language experts is that is exactly what occurs with C/C++; in some cases the compiler writers have had to ask experienced users "what is meant by X".

Quote

How can we ever test everything. Yes, I know GCC comes with a validation suite but I can't say that I ever looked at it. Is it complete? Well, it's pretty clear something slipped by...

You don't have to assume anything slipped by. Nasal daemons do occur in the C/C++ world.

Quote

This idea of correctness is a serious problem in the nuclear industry, as it should be. The verification programs were written in a very specific dialect of Fortran on a very specific computer with very specific libraries and there would be no changes. Ever! That created a situation where obsolete hardware was being maintained indefinitely simply because nobody wanted to go back through the task of verifying the verification programs. The NRC had already bought off on the existing code/hardware. I don't think there was a 'patch of the month' program.

ISTR recall adverts for people with PDP-11 experience, because they were going to continue to be used until the 2040s!

Quote

A reactor is no place to find out that Intel has another bug in the FPU.

And what about that problem in the 14th decimal place of the SINH function? What's with that? No, I don't know if there is a real problem, I'm making it up. But you don't know that! Now everybody is going to go looking at the e^x code.

Those are implementation issues, not definition issues. Besides, anyone that relies on the exact value of floating point numbers is living in a state of sin

(not sine

)

tggzzz · « **Reply #51 on:** April 24, 2017, 05:16:44 pm »

Quote from: rstofer on April 24, 2017, 05:11:16 pm

A defensive programmer might have masked the sum of the two shift counts with 0x1F to force it into range or perhaps even deliberately handled the situation where the sum is greater than 31. I guess if you don't know how the compiler is going to react, you can take the position of defending against it.

Precisely.

And, of course, you would have to assume that the compiler correctly implemented your defensive code, and/or didn't optimise it away!

GeorgeOfTheJungle · « **Reply #52 on:** April 24, 2017, 06:16:04 pm »

Quote from: rstofer on April 24, 2017, 05:11:16 pm

Quote from: GeorgeOfTheJungle on April 24, 2017, 05:01:54 pm
And the trap is quietly sitting there just waiting to catch the next unsuspecting poor guy.
Or not...

Honestly I also would have expected (uint32_t) 0xffffffff << 32 to be 0

Kalvin · « **Reply #53 on:** April 24, 2017, 07:51:56 pm »

Quote from: rstofer on April 24, 2017, 05:11:16 pm

A defensive programmer might have masked the sum of the two shift counts with 0x1F to force it into range or perhaps even deliberately handled the situation where the sum is greater than 31. I guess if you don't know how the compiler is going to react, you can take the position of defending against it.

... and create incorrect result. For example, what will happen if the shift count is 32 and you mask it with 0x1F ...

rstofer · « **Reply #54 on:** April 24, 2017, 08:11:46 pm »

Quote from: Kalvin on April 24, 2017, 07:51:56 pm

Quote from: rstofer on April 24, 2017, 05:11:16 pm
A defensive programmer might have masked the sum of the two shift counts with 0x1F to force it into range or perhaps even deliberately handled the situation where the sum is greater than 31. I guess if you don't know how the compiler is going to react, you can take the position of defending against it.

... and create incorrect result. For example, what will happen if the shift count is 32 and you mask it with 0x1F ...

It will go to 0 and be caught in the following code

Code: [Select]

shift_count &= 0x1f;
if (shift_count > 0) {
  value = value << shift_count;
}
else {
  <do whatever you want when shift_count = 0>
}

Something like that...

There is no point in shifting a 32 bit quantity 32+ bits to the left. You can set the value to 0 or trap on an erroneous shift_count.

Kalvin · « **Reply #55 on:** April 24, 2017, 08:21:50 pm »

Quote from: rstofer on April 24, 2017, 08:11:46 pm

Quote from: Kalvin on April 24, 2017, 07:51:56 pm
Quote from: rstofer on April 24, 2017, 05:11:16 pm
A defensive programmer might have masked the sum of the two shift counts with 0x1F to force it into range or perhaps even deliberately handled the situation where the sum is greater than 31. I guess if you don't know how the compiler is going to react, you can take the position of defending against it.

... and create incorrect result. For example, what will happen if the shift count is 32 and you mask it with 0x1F ...

It will go to 0 and be caught in the following code

Code: [Select]
shift_count &= 0x1f; if (shift_count > 0) { value = value << shift_count; } else { <do whatever you want when shift_count = 0> }
Something like that...

There is no point in shifting a 32 bit quantity 32+ bits to the left. You can set the value to 0 or trap on an erroneous shift_count.

Your code will fail miserably and give you incorrect value if the shift count happens to be 33 or more ...

GeorgeOfTheJungle · « **Reply #56 on:** April 24, 2017, 08:38:47 pm »

Quote from: Kalvin on April 24, 2017, 08:21:50 pm

Your code will fail miserably and give you incorrect value if the shift count happens to be 33 or more ...

Yes it will...

rstofer · « **Reply #57 on:** April 24, 2017, 09:20:55 pm »

Quote from: GeorgeOfTheJungle on April 24, 2017, 08:38:47 pm

Quote from: Kalvin on April 24, 2017, 08:21:50 pm
Your code will fail miserably and give you incorrect value if the shift count happens to be 33 or more ...

Yes it will...

Why would you want to shift a 32 bit value more than 31 places? OK, I can almost see shifting 32 places to clear the value but, really, is that the best way to set a value to 0? Why not just trap the condition and set the directly to 0? Then you know how it will turn out!

Of course, there is the possibility that 'value' is something other than 32 bits so maybe testing against 8*sizeof(value)-1 (which the compiler will treat as a constant) is a better upper limit. If somebody seriously wants to shift beyond the end bit position then 8 * sizeof(value) would work.

You would still need to test against 0 either to prevent shifting zero places (why bother?) or shifting to a negative number of bits (if shift_count is signed). One is a nop and the other just has to be a mistake.

Right or wrong, I'm going to define shift_count as unsigned and just mask it into the range of 0..31. I should certainly know enough about the range of shift_count values to know if that is satisfactory.

brucehoult · « **Reply #58 on:** April 24, 2017, 11:56:18 pm »

Quote from: rstofer on April 24, 2017, 08:11:46 pm

It will go to 0 and be caught in the following code

Code: [Select]
shift_count &= 0x1f; if (shift_count > 0) { value = value << shift_count; } else { <do whatever you want when shift_count = 0> }

Yeah, nah. You might want to check for great than zero *before* masking it.

Quote

There is no point in shifting a 32 bit quantity 32+ bits to the left. You can set the value to 0 or trap on an erroneous shift_count.

Code like the following is VERY common:

Code: [Select]

v = (u<<n) | (u>>(32-n))

It's a left rotate.

Incidentally, it works perfectly regardless of whether u<<32 and u>>32 give 0 or u :-) (quiz: why?)

You'd better hope that n isn't outside the range 0..32 though, unless you've got a CPU that only uses the lower bits. If in any doubt, use (n%32) in both places -- the compiler will optimize it away if it's not needed. And if it *is* needed ... then you need it.

Bruce Abbott · « **Reply #59 on:** April 25, 2017, 03:12:04 am »

Quote from: rstofer on April 24, 2017, 09:20:55 pm

Why would you want to shift a 32 bit value more than 31 places?

'Why' is irrelevant. The question is if you want to do that, how do do it in the language you are using? In assember you can shift as many times as you like and the result is well defined. In C it isn't so you have to trap or avoid out-of-range values. That's the price you pay for efficient portability.

Quote from: tggzzz

Normal users shouldn't have to read the formal definition.

So you think 'normal' users should be able to just wing it and hope for the best? No wonder there's so much crap code out there (and crap coders).

Speaking of which... how often have you looked at some new language or platform touted as being more powerful and easier to use, only to discover that the 'formal definitions' are either suspiciously terse or non-existent? But that's how it is these days - devices are made 'user-friendly' so you don't need a manual to use them - then you spend hours trying to figure out how to use them properly (and never really knowing whether you are doing it right).

IanB · « **Reply #60 on:** April 25, 2017, 05:24:56 am »

Quote from: Bruce Abbott on April 25, 2017, 03:12:04 am

In assembler you can shift as many times as you like and the result is well defined. In C it isn't so you have to trap or avoid out-of-range values.

If I remember previous comments in the thread, then apparently not. If the hardware masks off all but the lower 5 bits of the shift argument before applying it, then even in assembly the results may not be what you hoped for.

This may be well defined by the hardware specification, but fewer people read the hardware manual than read the compiler manual...

westfw · « **Reply #61 on:** April 25, 2017, 05:50:53 am »

Quote

Code like the following is VERY common:
v = (u<<n) | (u>>(32-n))
It's a left rotate.

Clearly what we need is a language with a native "rotate" operator!
(I'm amused that if this "undefined behavior" slips into someone's C "rotate" code, it's essentially because they were avoiding using assembly language (where there was probably a native "rotate" instruction that would have worked better.))

Quote

[whine, whine, C is such an awful language]

Someone asked for a credible replacement a while back. I haven't seen any actual suggestions...

My latest complaint is that some C compilers have stopped doing what I ask them to, because (apparently) the compiler writers have decided that they'd rather use those "undefined behavior" sections to "teach us a lesson", rather than have code follow the expected behavior.
I mean, if you do embedded programming, you'd expect:

Code: [Select]

    for (uint8_t count=1; count != 0; count++) {
       // Stuff
    }

to terminate after about 255 loops, right? Not be optimized into an infinite loop because "integer overflow behavior is undefined and the optimizer decided that you'll never hit zero by incrementing an unsigned variable" ? Ha hah! Surprise!

Code: [Select]

for (uint8_t i=0x80; i != 0; i>>=1) {}Isn't generating the obvious code, either :-(
( http://www.avrfreaks.net/forum/avr-gcc-creating-loop-counters-out-nothing-no-reason )

tggzzz · « **Reply #62 on:** April 25, 2017, 06:35:15 am »

Quote from: Bruce Abbott on April 25, 2017, 03:12:04 am

Quote from: tggzzz
Normal users shouldn't have to read the formal definition.
So you think 'normal' users should be able to just wing it and hope for the best? No wonder there's so much crap code out there (and crap coders).

Ideally yes, since the code's behaviour would be obvious from looking at it. That is obviously possible with languages like LISP and Forth, since they are so simple; other languages approach that ideal.
For most languages that won't be possible, but the formal definition should be readable and understandable.

With C/C++, the standards are so dense, impenetrable and incomprehensible that even the language designers have arguments about what various sections mean and whether or not one section conflicts with another.[1] Mere mortals - including compiler writers - are caught in the crossfire. The result is "operational semantics", where you have to run each compiler and each compiler version to find out what it actually does.

Quote

Speaking of which... how often have you looked at some new language or platform touted as being more powerful and easier to use, only to discover that the 'formal definitions' are either suspiciously terse or non-existent? But that's how it is these days - devices are made 'user-friendly' so you don't need a manual to use them - then you spend hours trying to figure out how to use them properly (and never really knowing whether you are doing it right).

Well, .... yes

It isn't a new phenomenon. As Tony Hoare (Quicksort, CSP, synchronisation/monitors, etc) wryly quipped long ago about one of the first high level languages, Algol-60 was an improvement on many of its successors.

[1] at first the design committee didn't believe the C++ template language was Turing-complete; they hadn't intended that was the case! Then their noses were rubbed in that fact when they saw a simple program that could never finish being compiled - because the compiler was (very slowly) emitting the sequence of prime numbers during compilation. The language itself was out of control.

brucehoult · « **Reply #63 on:** April 25, 2017, 07:05:41 am »

Quote from: Bruce Abbott on April 25, 2017, 03:12:04 am

Quote from: tggzzz
Normal users shouldn't have to read the formal definition.
So you think 'normal' users should be able to just wing it and hope for the best? No wonder there's so much crap code out there (and crap coders).

[/quote]

JavaScript has got to be the worst offender here! The semantic fuckups in it are legion.

brucehoult · « **Reply #64 on:** April 25, 2017, 07:13:00 am »

Quote from: westfw on April 25, 2017, 05:50:53 am

Quote
Code like the following is VERY common:
v = (u<<n) | (u>>(32-n))
It's a left rotate.

Clearly what we need is a language with a native "rotate" operator!
(I'm amused that if this "undefined behavior" slips into someone's C "rotate" code, it's essentially because they were avoiding using assembly language (where there was probably a native "rotate" instruction that would have worked better.))

Any decent compiler in the last 20 years will in fact turn that into a rotate instruction, if one exists.

Quote

I mean, if you do embedded programming, you'd expect:
Code: [Select]
for (uint8_t count=1; count != 0; count++) { // Stuff }to terminate after about 255 loops, right? Not be optimized into an infinite loop because "integer overflow behavior is undefined and the optimizer decided that you'll never hit zero by incrementing an unsigned variable" ? Ha hah! Surprise!

Maybe better read the manual :-) That will work fine. Overflow for unsigned integers is well defined. They are N bit binary modulo arithmetic, whether the underlying CPU is binary or not. Guaranteed. Decimal or trinary or whatever computers have to work harder to make shifts and & and | and ^ and ~ work the same as in binary.

It's *signed* integers where it's undefined.

NorthGuy · « **Reply #65 on:** April 25, 2017, 01:53:06 pm »

Quote from: westfw on April 25, 2017, 05:50:53 am

I mean, if you do embedded programming, you'd expect:
Code: [Select]
for (uint8_t count=1; count != 0; count++) { // Stuff }to terminate after about 255 loops, right? Not be optimized into an infinite loop because "integer overflow behavior is undefined and the optimizer decided that you'll never hit zero by incrementing an unsigned variable" ? Ha hah! Surprise!

This is not accurate because

"A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type". (C99 6.2.5.9)

To nitpick, this citation doesn't apply here because of the integer promotion. When "count++" is calculated, the value of "count" must be fetched, converted to "int" (this is called integer promotion), then 1 is added and the result is assigned back to "count". While assigning the value to "count", the result is converted to the type of "count" using this rule:

"if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type".(C99 6.3.1.3.2)

Thus, 255+1=256 should be converted to 0 when assigned to "count" (which is uint8_t), producing the exact result you would expect.

Therefore, the code should work as expected. Optimizing it to infinite loop would be against C99.

The development of the C standard took a long time and a lot of thinking, so it is very good at producing the most reasonable behaviour for the vast majority of the situations.

rstofer · « **Reply #66 on:** April 25, 2017, 03:05:09 pm »

The shift count for the ARM barrel shifter is just 5 bits and considered unsigned. Shifts of 0..31 are allowed. Masking the count with 0x1F makes a lot of sense even though the hardware is going to do it anyway. At the very least, it documents the programmer's intent. I haven't looked but I would expect a similar hardware limitation on every CPU that implements a multi-bit shift.

tggzzz · « **Reply #67 on:** April 25, 2017, 03:58:34 pm »

Quote from: NorthGuy on April 25, 2017, 01:53:06 pm

The development of the C standard took a long time and a lot of thinking, so it is very good at producing the most reasonable behaviour for the vast majority of the situations.

Yes, but... You need to define the criteria by which "most reasonable" is determined.

Is it so that it caters for mainstream and baroque machine architectures?
Is it so that non-expert users can understand and predict what will happen?
Is it so that compiler and tool writers can understand and predict what will happen?
Is it so that the pre-existing compilers don't have to radically change the code they generate?
Is it so that the committee can agree they understand the standard, and that the standard is finished?
etc etc.

GeorgeOfTheJungle · « **Reply #68 on:** April 25, 2017, 04:16:52 pm »

Quote from: rstofer on April 25, 2017, 03:05:09 pm

The shift count for the ARM barrel shifter is just 5 bits and considered unsigned. Shifts of 0..31 are allowed. Masking the count with 0x1F makes a lot of sense even though the hardware is going to do it anyway. At the very least, it documents the programmer's intent. I haven't looked but I would expect a similar hardware limitation on every CPU that implements a multi-bit shift.

Why would you want to do that? Why mask with 0x1f (or %32)? How is shifting say 32 times/bits to the left the same as not shifting? Or shifting 33 times/bits to the left the same as shifting once?

TNorthover · « **Reply #69 on:** April 25, 2017, 04:49:45 pm »

Quote from: rstofer on April 25, 2017, 03:05:09 pm

The shift count for the ARM barrel shifter is just 5 bits and considered unsigned.

If only it was that simple. If the amount is variable then what you actually get is "x << (amt % 256)". Unless it's 64-bit, in which case what you said holds. Unless it's actually a NEON instruction (both 32-bit and 64-bit), in which case it's the byte that matters again but it's signed and you get a right shift if it's negative (either arithmetic or logical, depending on preference).

Basically it's a complete mess and that's why no-one should be relying on the behaviour of out of range shifts in portable code (or ever, really).

NorthGuy · « **Reply #70 on:** April 25, 2017, 04:52:29 pm »

Quote from: tggzzz on April 25, 2017, 03:58:34 pm

Yes, but... You need to define the criteria by which "most reasonable" is determined.

Is it so that it caters for mainstream and baroque machine architectures?
Is it so that non-expert users can understand and predict what will happen?
Is it so that compiler and tool writers can understand and predict what will happen?
Is it so that the pre-existing compilers don't have to radically change the code they generate?
Is it so that the committee can agree they understand the standard, and that the standard is finished?
etc etc.

It is so that it hinders you the least.

hamster_nz · « **Reply #71 on:** April 25, 2017, 09:26:32 pm »

I'm thinking that the definition of "shifts of more than n bits shift is undefined" came later on in life for C.

The intent of shift operations is clear
: 'there was n bits in the bed, and the little one said "Roll over, roll over", so they all rolled over and one fell out'

At the time of the early versions of C most CPUs would not have a barrel shifter, so
"n>>6" would end up as 6 "shift left" opcodes or a small loop, or maybe a CISC instruction that took 'quite a few' cycles to execute. Hence why even within the 80x86s the behavior varies

"The C Programming Language" (edition 2) makes no reference to masking the shift count.

I can't be bothered to research it, but I would bet that when barrel shifters became common it broke code, and the standard was updated to note this as "undefined" behavior.

The whole "mask the shift operand" is just what hardware does, it isn't what it should do. It should do this:

Code: [Select]

if(n > 31) /* Protect against broken barrel shifters */
  a = 0;
else
  a <<= n;

IanB · « **Reply #72 on:** April 25, 2017, 09:35:55 pm »

Quote from: NorthGuy on April 25, 2017, 01:53:06 pm

To nitpick, this citation doesn't apply here because of the integer promotion. When "count++" is calculated, the value of "count" must be fetched, converted to "int" (this is called integer promotion) ...

Can you explain why count must be converted to int? Why can the ++ operator not be applied directly to an unsigned char type?

hamster_nz · « **Reply #73 on:** April 25, 2017, 09:57:47 pm »

Quote from: IanB on April 25, 2017, 09:35:55 pm

Quote from: NorthGuy on April 25, 2017, 01:53:06 pm
To nitpick, this citation doesn't apply here because of the integer promotion. When "count++" is calculated, the value of "count" must be fetched, converted to "int" (this is called integer promotion) ...

Can you explain why count must be converted to int? Why can the ++ operator not be applied directly to an unsigned char type?

It is a reflection that the intent of C's promotion rules, aimed at using the native word size of the platform.

if the range of type being used can fit into an 'int', then it is computed as an 'int'. eg.

A contrived example being:

Code: [Select]

uint_8 a = 16;
int_8 b = -128;
int_8 c = -64;
a = a * b / c;

Will set a to 32

So the value of "(count++)" will be an int, even when "count" is a 'unsigned char'.

NorthGuy · « **Reply #74 on:** April 25, 2017, 09:59:59 pm »

Quote from: IanB on April 25, 2017, 09:35:55 pm

Can you explain why count must be converted to int? Why can the ++ operator not be applied directly to an unsigned char type?

All operations on arithmetic operands which are shorter than "int" require integer promotion - they are first converted to "int" (or "unsigned int") then operated upon.

"If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int ; otherwise, it is converted to an unsigned
int . These are called the integer promotions". (C99 6.3.1.1.2)

Of course, in real life, the operations may be done without promotion, but only if the result is exactly the same as if the promotion has been applied.

hamster_nz · « **Reply #75 on:** April 25, 2017, 10:07:38 pm »

Quote from: westfw on April 25, 2017, 05:50:53 am

I mean, if you do embedded programming, you'd expect:
Code: [Select]
for (uint8_t count=1; count != 0; count++) { // Stuff }to terminate after about 255 loops, right? Not be optimized into an infinite loop because "integer overflow behavior is undefined and the optimizer decided that you'll never hit zero by incrementing an unsigned variable" ? Ha hah! Surprise!

Um, it does for me... (at least on GCC):

Code: [Select]

$ cat check3.c
#include <stdio.h>

int main(int argc, char *argv[])
{
    for (unsigned char count=1; count != 0; count++) {
      printf("x\n");
    }
}
$ ./check3| wc -l
255
$

Bruce Abbott · « **Reply #76 on:** April 26, 2017, 12:56:03 am »

Quote from: IanB on April 25, 2017, 05:24:56 am

If the hardware masks off all but the lower 5 bits of the shift argument before applying it then even in assembly the results may not be what you hoped for.

Competent assembly language programmers know what the instructions do, and don't hope for results that are impossible.

tggzzz · « **Reply #77 on:** April 26, 2017, 07:14:35 am »

Quote from: Bruce Abbott on April 26, 2017, 12:56:03 am

Quote from: IanB on April 25, 2017, 05:24:56 am
If the hardware masks off all but the lower 5 bits of the shift argument before applying it then even in assembly the results may not be what you hoped for.
Competent assembly language programmers know what the instructions do, and don't hope for results that are impossible.

Indeed. But they shouldn't have to. C is the only language I've used where I've found it beneficial to look at the compiler's output to see if it is doing what I intend.

Now I quite enjoy doing that, but that should only be necessary to see if the output hasn't been pessimised, not to check correctness.

brucehoult · « **Reply #78 on:** April 26, 2017, 01:20:54 pm »

Quote from: tggzzz on April 26, 2017, 07:14:35 am

Quote from: Bruce Abbott on April 26, 2017, 12:56:03 am
Quote from: IanB on April 25, 2017, 05:24:56 am
If the hardware masks off all but the lower 5 bits of the shift argument before applying it then even in assembly the results may not be what you hoped for.
Competent assembly language programmers know what the instructions do, and don't hope for results that are impossible.

Indeed. But they shouldn't have to. C is the only language I've used where I've found it beneficial to look at the compiler's output to see if it is doing what I intend.

Now I quite enjoy doing that, but that should only be necessary to see if the output hasn't been pessimised, not to check correctness.

If the assembly language output is incorrect according to the C specification then submit a bug report to your compiler vendor.

If the assembly language output is correct according the C specification but incorrect according to your notions of what your ideal language *should* do then I suggest you use a programming language more appropriate to your needs. Java or C# might be it. Or D. Or Go. Or a Lisp. Or Python. There are many many languages designed to provide safety and "mathematical" results.

C is a very sharp but incredibly useful knife. Skilled people should be allowed sharp knives if they want them and know how to use them safely.

Alas, many people who think they know, don't. And don't want to learn.

Kalvin · « **Reply #79 on:** April 26, 2017, 01:26:49 pm »

Quote from: tggzzz on April 26, 2017, 07:14:35 am

Quote from: Bruce Abbott on April 26, 2017, 12:56:03 am
Quote from: IanB on April 25, 2017, 05:24:56 am
If the hardware masks off all but the lower 5 bits of the shift argument before applying it then even in assembly the results may not be what you hoped for.
Competent assembly language programmers know what the instructions do, and don't hope for results that are impossible.

Indeed. But they shouldn't have to. C is the only language I've used where I've found it beneficial to look at the compiler's output to see if it is doing what I intend.

Now I quite enjoy doing that, but that should only be necessary to see if the output hasn't been pessimised, not to check correctness.

Me too. I typically take a look at the compiler generated assembly output to see what is going on and how well I can express my intentions for the compiler. Nowadays compilers are pretty good at optimizing so it is also useful to see how the compilers performed the optimization.

tggzzz · « **Reply #80 on:** April 26, 2017, 01:43:41 pm »

Quote from: brucehoult on April 26, 2017, 01:20:54 pm

If the assembly language output is correct according the C specification but incorrect according to your notions of what your ideal language *should* do then I suggest you use a programming language more appropriate to your needs. Java or C# might be it. Or D. Or Go. Or a Lisp. Or Python. There are many many languages designed to provide safety and "mathematical" results.

Alas, many people who think they know, don't. And don't want to learn.

Quite. In this case it is that none of those languages provide safety, and none of them provide "mathematical" results.

I like tools that help me concentrate on my problems. I dislike tools that force me to concentrate on them rather than my problem.

I like draining swamps, not being up to my neck in alligators

NorthGuy · « **Reply #81 on:** April 26, 2017, 05:40:36 pm »

Quote from: Kalvin on April 26, 2017, 01:26:49 pm

Me too. I typically take a look at the compiler generated assembly output to see what is going on and how well I can express my intentions for the compiler. Nowadays compilers are pretty good at optimizing so it is also useful to see how the compilers performed the optimization.

I never do this. If I want to get particular assembly, I use assembler. It's sort of silly to tweak C code into generating the assembler output you want while you can do it better and faster without the C compiler.

If I decided to use C, I don't care about assembler.

tggzzz · « **Reply #82 on:** April 26, 2017, 06:40:43 pm »

Quote from: NorthGuy on April 26, 2017, 05:40:36 pm

Quote from: Kalvin on April 26, 2017, 01:26:49 pm
Me too. I typically take a look at the compiler generated assembly output to see what is going on and how well I can express my intentions for the compiler. Nowadays compilers are pretty good at optimizing so it is also useful to see how the compilers performed the optimization.

I never do this. If I want to get particular assembly, I use assembler. It's sort of silly to tweak C code into generating the assembler output you want while you can do it better and faster without the C compiler.

If I decided to use C, I don't care about assembler.

That's a reasonable conceptual approach.

But it does bring the requirement for you to accurately inform the C compiler about what it can and can't presume about your assembler code, particularly w.r.t. optimisation. I don't know how the hell you can not only ensure but also assure that has been done adequately. N.B. assure != ensure, assure != insure, ensure != insure (although many nominally English speakers get the last wrong

)

Bruce Abbott · « **Reply #83 on:** April 26, 2017, 07:18:25 pm »

Quote from: tggzzz on April 26, 2017, 07:14:35 am

I've found it beneficial to look at the compiler's output to see if it is doing what I intend.

Now I quite enjoy doing that, but that should only be necessary to see if the output hasn't been pessimised, not to check correctness.

When I am programming in a high level language I 'think' in that language, so looking at the assembler output isn't very useful. If it isn't what I expect then either the source code is wrong (most likely), or the compiler is buggy (unlikely) or what I am trying to do isn't suited to that language. I have written a few PC programs in C and FreeBASIC, and wouldn't dream of looking at the machine code output. Just trying to understand it would be a nightmare, let alone determining if it was doing what I wanted!

Quote from: NorthGuy

If I want to get particular assembly, I use assembler.

Same here. If I want efficiency I use assembler. If I just want to bang some code out with the least effort I use whatever language does it for me best. No point agonizing over how efficient your compiler is if it gets the job done.

Quote

It's sort of silly to tweak C code into generating the assembler output you want while you can do it better and faster without the C compiler.

It does seem a bit bit silly, however I can see the appeal of being able to do everything in the same language.

Kalvin · « **Reply #84 on:** April 26, 2017, 07:47:38 pm »

Quote from: NorthGuy on April 26, 2017, 05:40:36 pm

Quote from: Kalvin on April 26, 2017, 01:26:49 pm
Me too. I typically take a look at the compiler generated assembly output to see what is going on and how well I can express my intentions for the compiler. Nowadays compilers are pretty good at optimizing so it is also useful to see how the compilers performed the optimization.

I never do this. If I want to get particular assembly, I use assembler. It's sort of silly to tweak C code into generating the assembler output you want while you can do it better and faster without the C compiler.

If I decided to use C, I don't care about assembler.

In the embedded systems in which the code size and RAM size are limited one typically needs to be very careful in how to write the code. Writing code for PC or Linux-based embedded system with lots of resources available, I do not really bother looking at the compiler output. When I really need to optimize the code for speed, I might take a look at the compiler output and try to figure out how the compiler will be able to create more optimized code.

grumpydoc · « **Reply #85 on:** April 26, 2017, 08:18:08 pm »

Quote from: NorthGuy on April 26, 2017, 05:40:36 pm

I never do this. If I want to get particular assembly, I use assembler. It's sort of silly to tweak C code into generating the assembler output you want while you can do it better and faster without the C compiler.

If I decided to use C, I don't care about assembler.

I have done this but as part of an effort to reverse engineer code which I knew had been built with a particular compiler. I knew roughly what the C code would look like based on the assembler, then tweaked it until the output matched exactly.

I briefly became interested in whether the process could be used to build a decompiler but quickly realised that it was a non starter as it required too much wetware input to get to the first approximation.

tggzzz · « **Reply #86 on:** April 26, 2017, 09:24:35 pm »

Quote from: Bruce Abbott on April 26, 2017, 07:18:25 pm

Quote from: tggzzz on April 26, 2017, 07:14:35 am
I've found it beneficial to look at the compiler's output to see if it is doing what I intend.

Now I quite enjoy doing that, but that should only be necessary to see if the output hasn't been pessimised, not to check correctness.
When I am programming in a high level language I 'think' in that language, so looking at the assembler output isn't very useful. If it isn't what I expect then either the source code is wrong (most likely), or the compiler is buggy (unlikely) or what I am trying to do isn't suited to that language. I have written a few PC programs in C and FreeBASIC, and wouldn't dream of looking at the machine code output. Just trying to understand it would be a nightmare, let alone determining if it was doing what I wanted!

Think of embedded systems where you need to be sure peripherals' registers have to be peeked/poked in very specific ways.

Think of embedded systems where you are cycle counting some parts of the code, and have to exclude unexpected code bloat in others.

Think of SMP systems where you have to ensure and assure multiprocessors, memory and caches interact correctly.

Think of the useful motto w.r.t both your tools and your understanding of them, "trust, but verify".

hamster_nz · « **Reply #87 on:** April 26, 2017, 09:51:57 pm »

I used to do a lot of looking at the compiler output, to see how I could influence the speed of code through styles used in the high level language.

However, if you do it for a while you get a feel for what does and what doesn't work, and you can look at code and go "well, that looks slow", "can I avoid this branch?", "how can I help this loop unroll", "if I align this data with that" and "I could speed this up by...", and you can get 50% the gain for 10% of the pain. I now only look at the generated code if I get something that is either spectacularly good or spectacularly bad.

Writing better performing code for me is now 80% of doing my job better (e.g. find a better way to do something), 19% of using the source to coerce the compiler to do a better job of code generation, and 1% of targeted optimization.

tggzzz · « **Reply #88 on:** April 27, 2017, 06:59:32 am »

Quote from: evb149 on April 27, 2017, 03:39:36 am

A lot of "High Performance Computing" HPC codes are written in languages like C++, C, sometimes even JAVA or Python simply because the compilers are pretty good compared to manually written ASM, in the few cases where you have an extreme bottleneck that benefits highly from ultra optimization there are libraries for just those core performance primitives, and the rest of the program can stay portable and high level.

For those interested in HPC, have a look at http://people.ds.cam.ac.uk/nmm1/ Of particular relevance to this thread are the sections on what you have to do to avoid "surprises" with C/C++, and also the tenuous relationships between computer arithmetic and maths. It is not beginner material. It is based on personal experience since the 1960s.

Quote

It is a bit surprising that there aren't more "formal methods" used to express algorithms and data structures and state machines and so on though so that more such "boilerplate" patterns and program elements can actually be "provably correct" and also more easily analyzable with respect to side effects, execution time, dependencies, hazards, error cases, etc. When you look at "design patterns" you see a lot of them implemented in high level languages but not so much integration of them into almost "atomic" memes / atoms of program construction so that you can sort of take the HLL out of the picture for those "elemental blocks / patterns" and then have a HLL that is more designed around working at the levels in between and above those pattern elements.

Yes and no.

"State space explosion" and the halting problem are "issues" for FSMs. In other cases the formal methods are more difficult than the code, and even if you do something useful there are two problems: ensuring correct mapping to a program, and interfacing with the non-formal stuff outside the proof.

As for design patterns, it is usually easier to throw something together, do a few unit tests to get a green light, ship the product and get customer service to deny there are intermittent problems

If you are interested in this topic, have a look at xC on the XMOS processors. It enables CSP/Occam style programming and interfacing with external devices on multicore processors, and the development environment will tell you exactly how fast each block will execute.

For example, there is hardware and language support to allow output to occur on a specific (500MHz) clock cycles, and waiting for inputs to change and recording the new value and clock cycle at which it occurred.

Kalvin · « **Reply #89 on:** April 27, 2017, 07:34:02 am »

Quote from: evb149 on April 27, 2017, 03:39:36 am

Anyone do anything interesting with code generation or domain specific languages or metaprogramming for embedded or otherwise?

I have through using Lisp-like language as an intelligent C macro preprocessor with hygienic macros and code generation. Simple code generation like iterators, linked list operations and simple code generation is something I would like to do. Unfortunately my Lisp in too weak for the task at the moment. Also, separating the preprocessor from the compiler makes it quite difficult to create a good code generator which would emit the code depending of the data types. In order to be able to extract the data types from the source code would require complex parser. The best option would be to integrate the preprocessors into the compiler as you could get the data type information from the compiler's parser.

tggzzz · « **Reply #90 on:** April 27, 2017, 08:04:10 am »

Quote from: evb149 on April 27, 2017, 03:39:36 am

Anyone do anything interesting with code generation or domain specific languages or metaprogramming for embedded or otherwise?

DSLs are superficially appealing, but they suffer from a number of disadvantages:

difficult to employ someone to use/maintain programs written in them; "5 years FrobDSL experience" on your CV counts for little
no tool support (e.g. browsing, syntax colouring, command completion) unless you write it yourself; vi and grep don't cut the mustard any more
they start out simple, but end up with cancerous growths that nobody understands; completely unlike C++, of course

A well-written library in a standard language has none of those problems.

NorthGuy · « **Reply #91 on:** April 27, 2017, 04:35:07 pm »

Quote from: evb149 on April 27, 2017, 03:39:36 am

With a good and "familiar" optimizing compiler one doesn't have to write particularly "tight" C code to get tight machine language. In fact one can rely very heavily on the compiler's optimizations and write extremely "simple" and "verbose" C code. One just can take it for granted that the compiler will analyze the dependencies and flow and constant / variable parts of the data in a given block and optimize accordingly.

It is true that if you write bad code the compiler can improve it with optimizations. Actually, the worse is your code, the more the compiler can do. Opposite is also true: if your code is already good, the compiler cannot improve it as much.

However, the difference between good and bad code begins long before the first line is written. There's very important planning and thinking phase. As a result, a good code may contain completely different data structure and have a different flow. The good code usually comes up tight (there is less of it), simple (efforts spent to simplify it), verbose (easy to understand), and reliable (less room for bugs) at the same time. Bad code is likely to be bigger, overcomplicated, hard to understand and thereby buggy.

The compiler has to preserve the functionality as it is written. Therefore, there's absolutely no way a compiler can turn a bad code into a good one. This is a misconception.

nctnico · « **Reply #92 on:** April 27, 2017, 05:05:08 pm »

Quote from: Kalvin on April 27, 2017, 07:34:02 am

Quote from: evb149 on April 27, 2017, 03:39:36 am
Anyone do anything interesting with code generation or domain specific languages or metaprogramming for embedded or otherwise?

I have through using Lisp-like language as an intelligent C macro preprocessor with hygienic macros and code generation. Simple code generation like iterators, linked list operations and simple code generation is something I would like to do. Unfortunately my Lisp in too weak for the task at the moment.

I wouldn't go that route. Your projects will be completely unmaintainable because the next person needs to understand two languages AND how you split things in between. Besides that going this route shows a lack of understanding of the programming language and the inability to look for existing solutions. Throw in a bit of the 'not invented here' syndrom and the chaos is complete. If you want iterators, linked lists and so on then use a C/C++ library which offers those and rewrite the memory allocation back-end if you have to.

Kalvin · « **Reply #93 on:** April 28, 2017, 02:23:33 pm »

Quote from: nctnico on April 27, 2017, 05:05:08 pm

Quote from: Kalvin on April 27, 2017, 07:34:02 am
Quote from: evb149 on April 27, 2017, 03:39:36 am
Anyone do anything interesting with code generation or domain specific languages or metaprogramming for embedded or otherwise?

I have through using Lisp-like language as an intelligent C macro preprocessor with hygienic macros and code generation. Simple code generation like iterators, linked list operations and simple code generation is something I would like to do. Unfortunately my Lisp in too weak for the task at the moment.
I wouldn't go that route. Your projects will be completely unmaintainable because the next person needs to understand two languages AND how you split things in between. Besides that going this route shows a lack of understanding of the programming language and the inability to look for existing solutions. Throw in a bit of the 'not invented here' syndrom and the chaos is complete. If you want iterators, linked lists and so on then use a C/C++ library which offers those and rewrite the memory allocation back-end if you have to.

If you are using C macros, you are maintaining and creating chaos - and you are using an inferior macro language with poor capabilities and unhygienic macros.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee