Author Topic: What did you learn or decided to figure out recently? by mostly nonembedded dev  (Read 4046 times)

0 Members and 1 Guest are viewing this topic.

Offline 0db

  • Frequent Contributor
  • **
  • Posts: 336
  • Country: zm
Found a bug in a colleague's source.

if (var1) then if (var2) then do1(); else do2();
gives rise to an ambiguity in interpretation.

And a simple bracket-block solve it.
But which one is correct?

  • a) if (var1) then { if (var2) then do1(); } else do2();
  • b) if (var1) then { if (var2) then do1(); else do2(); }

Solved by a couple of empirical speculative attempts to "understand" what the original author wanted to do with the code.
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
Solved by a couple of empirical speculative attempts to "understand" what the original author wanted to do with the code.
Don't leave us with a cliffhanger!

Which language is that?
What is the language interpretation?

At least in C the meaning is completely  unambiguous.
I would not like to encounter code like that, so I don't write it.
But the meaning is not ambiguous in any way. Any editor can indent it to make it, if not nice, at least readable.
Code: [Select]
    if (var1)
        if (var2)
            do1();
        else
            do2();

At least in Forth, there's no doubt. It's confusing no matter what.  ;)
Code: [Select]
var1 IF var2 IF  do1 ELSE do2 THEN THEN
var1 IF var2 IF do1 THEN ELSE do2 THEN
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline 0db

  • Frequent Contributor
  • **
  • Posts: 336
  • Country: zm
Which language is that?

C language.

At least in C the meaning is completely  unambiguous.

I don't know, ambiguous grammar is a "context-free grammar" for which there exists a string that can have more than one leftmost derivation or parse tree, while an "unambiguous grammar" is a context-free grammar for which every valid string has a unique leftmost derivation or parse tree.

It looks the  "selection-statement = IF ( expression ) statement ELSE statement" definition in the C grammar could ambiguously be parsed because "statement" contains the "selection-statement" itself,  although, it also looks like in C the first tree is "usually" chosen by associating the else with the nearest if.

I don't know, and I am not sure  :-//

But facts are that the above C lines were "ambiguous" because when tried on two different C compilers exposed two different behaviors, until a couple of block-brackets finally fixed the ambiguity.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14479
  • Country: fr
It looks the  "selection-statement = IF ( expression ) statement ELSE statement" definition in the C grammar could ambiguously be parsed because "statement" contains the "selection-statement" itself,

Whereas I do agree this is REALLY bad style, newbrain is right in saying this is unambiguous in C.

The standard states: "An else is associated with the lexically nearest preceding if that is allowed by the syntax.", so newbrain's interpretation is right.

Of course, yes this is definitely bad style - using indentation would be a bare minimum to make this remotely readable, and using braces when dealing with nested if's is a must IMHO for readability reasons.
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
C language.
Sorry, the "then"s throw my parser off  ::).
I thought it was some scripting language (PowerShell like).

Since SiliconWizard spared me the burden of quoting the content, I'll fill in the chapter and verse: 6.8.4.1, paragraph 3.
It's the same in C99, C11 and C18, while in C89 it was phrased slightly different:
Quote
An else is associated with the lexically immediately preceding else-less if that is in the same block (but not in an enclosed block).

But facts are that the above C lines were "ambiguous" because when tried on two different C compilers exposed two different behaviors, until a couple of block-brackets finally fixed the ambiguity.
Either this was many years ago (let's say before C89*) or at least one of the two compilers was shit: even if set in a non-conforming mode (the default for many compilers!) it would be idiotic to change a semantic that's 'always' been there.
The other might have also been crap for other reasons, of course.

*Though the C89 Rationale does not mention any change in this respect vs. 'K&R' C.
« Last Edit: August 02, 2020, 04:38:31 pm by newbrain »
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline Renate

  • Super Contributor
  • ***
  • Posts: 1460
  • Country: us
Some places are strict and insist that all ifs and elses must use braces.
I find that a bit overblown in simple else-less ifs.
Code: [Select]
if (verbose) printf("Found %u\n", count);Although I tend to use braces in an both halves of an if/else if one of the branches uses braces.
And I won't give an example because then we'd just squabble over where the braces fall!
 
The following users thanked this post: newbrain

Offline 0db

  • Frequent Contributor
  • **
  • Posts: 336
  • Country: zm
using indentation would be a bare minimum to make this remotely readable

lesson learned #1: braces unconditionally reduce the resulting clutter

Lesson learned #2: indentation is for sure a premium, but it's completely vanished when the code passes through code beautifiers, and I wouldn't rely on it.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14479
  • Country: fr
Some places are strict and insist that all ifs and elses must use braces.
I find that a bit overblown in simple else-less ifs.
Code: [Select]
if (verbose) printf("Found %u\n", count);Although I tend to use braces in an both halves of an if/else if one of the branches uses braces.
And I won't give an example because then we'd just squabble over where the braces fall!

Yes and same.

Cases for which I use braces:
- Nested if, or another flow control construct: so for instance: "if () { if () ... }", or "if () { while () ... }".
- As you do, if the "then" or "else" clause uses braces, I put braces for both clauses.

As for indentation, I ALWAYS indent "then" and "else" clauses. Never any "if" on a single line, even when the "then" and/or "else" clause is very short/simple.

As a "beginner" in C, I used to do otherwise. I often wrote "if"'s on a single line, would also avoid braces.
One thing I was often writing was things like this: "for () if() ... else ...". Although it looks less "ambiguous" than nested "if"'s, I consider this bad style these days and would never write code like this anymore. In such a case, I also use braces and indentation.

 
The following users thanked this post: newbrain

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
[...] vanished when the code passes through code beautifiers, and I wouldn't rely on it.
I agree, I would also not rely on any code beautifier that gets this wrong.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14479
  • Country: fr
[...] vanished when the code passes through code beautifiers, and I wouldn't rely on it.
I agree, I would also not rely on any code beautifier that gets this wrong.

Ahah, exactly. I guess 0db meant they wouldn't rely on indentation for this reason, but I agree with you: any "code beautifier" that doesn't indent code properly according to the C grammar should definitely not be used. Worse yet, a code beautifier that would actually *remove* indentation when the original indentation makes sense would be a code "uglifier".

I don't like code beautifiers anyway. The rules they use may not fit your own rules, and very few implement rules that actually are unquestionably good style. Code style is an integral part of writing code IMHO. I don't want tools that mess with this. And if used to make up for the bad style some developers use in a given team, then I would rather get those people to use a better style than use a code beautifier. But I do admit some developers have a hard time using a good and *consistent* style. I just do think this usually reveals a deeper problem that any automated tool won't solve. Just my 2 cents.



 

Offline 0db

  • Frequent Contributor
  • **
  • Posts: 336
  • Country: zm
Just found there were also problems with the C optimizer.

while (True) { if (var) { do1(); } else { do2() }; }
got translated at the machine level into something like
if (var) { while (True) do1() } else { while (True) do2() }

since "var" is expected to be always True then do2() is expected to be dead code, kept in the source for documenting reasons, and I was seriously surprised to see it running.

Looking at the assembly code revealed the mess made by the optimizer, while turning off optimization produces the expected behavior with do1() running!

Lesson learned #3: don't trust the compiler, and first compile with optimization off.

But what I do really think? At this point I seriously think that this proprietary C/89 compiler for SuperHitachi must be rather bugged to have all of these weird behaviors, and I am going to replace it with a Gcc-SH or LLVM.
 
The following users thanked this post: newbrain

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
That, under some conditions, is a good optimization.
It saves a number (infinite) evaluations of var.

Roughly speaking, 'var' must be side effects free if it is an expression or not volatile if a simple variable.

EtA: and, importantly, the compiler must know that do1() and do2() do not change var.
« Last Edit: August 02, 2020, 06:36:33 pm by newbrain »
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14479
  • Country: fr
Just found there were also problems with the C optimizer.

while (True) { if (var) { do1(); } else { do2() }; }
got translated at the machine level into something like
if (var) { while (True) do1() } else { while (True) do2() }

Well, it looks strictly equivalent to the original code to me? But as newbrain said, factoring out the "if (var)" test is an obvious optimization here. It's equivalent, but more efficient. As long, of course, as "var" doesn't change values while this piece of code executes.

If OTOH "var" can change values during the loops, then I'd agree this optimization is wrong. But you seem to be certain "var" is always true here, so the optimization would be correct.

since "var" is expected to be always True then do2() is expected to be dead code, kept in the source for documenting reasons, and I was seriously surprised to see it running.

If the "else" part is executed, then obviously 'var' is false at some point - at least the translated code you mention here can't mean otherwise. You should probably use a debugger or something to figure out WHY do2() is executed. A conditional breakpoint on "var" becoming false would be interesting here. You might have assumed it's always true, when it may not be.

If the behavior is different between optimized and non-optimized code, then I would suspect something wrong about how "var" is handled, but not about this piece of code, which again, unless I missed something, is strictly equivalent to the original code.

The fact the compiler doesn't strip off the "else" clause seems consistent with the fact you see it running at some point. Likely the optimizer didn't detect that "var" was always true, and the execution seems to be consistent with that. It isn't.

We'd probably need to see the rest of the code, because this short piece doesn't tell us anything much, but there's something fishy with "var". Clearly.

One possible cause - just a guess as I don't know anything about "var" - would be that "var" is some global variable, that can change values but has not been qualified volatile. Then the compiler's optimizer is allowed to compile the piece of code you mentioned as you saw. But then it would mean that "var" can actually change values somewhere else - for instance inside an interrupt, or another thread, something like this. If this could be the case, then the solution would be to qualify "var" as volatile to guarantee the correct compilation of this.

If this is not the case, another possible cause, even simpler, would be a missing or incorrect initialization of "var".

Unless the optimizer is completely buggy, what you seem to witness could be explained by the above. If the optimizer factors out the "if", then it means it assumes "var" doesn't change values while the loops execute, which would be the case if "var" is just a local variable, or a global, non-volatile-qualified variable. Then if the optimizer doesn't prune the "else" clause, it further means it had no way, from static analysis, to know that "var" is always true.

All this should give you things to look at in the code.
« Last Edit: August 02, 2020, 06:15:18 pm by SiliconWizard »
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
Exactly!
Lesson #3, for reasonably decent compilers, means really: compile without optimization, then with.

If the resulting behaviour differs, what part of the semantics of C am I forgetting or abusing?

Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline 0db

  • Frequent Contributor
  • **
  • Posts: 336
  • Country: zm
The default setup for optimization did some kind of "loop unswitching" transformation, for which you have to prove that the condition will always be constant during the loop, otherwise the optimization can't be done.

The "condition" is not undefined, it's declared as =True before the loop. Thus I really don't understand, but I tried "loop unrolling" instead of "loop unswitching", and this time it worked as expected.

edit:
well, looking at the machine code, "loop unrolling" didn't produce any modification. Thus the generated code is the same as for the case of "no optimization".

It seems the CASIO SuperHitachi C compiler has some problem with "Loop unswitching", which should only duplicate the function, one with the conditional section and one without, without altering the condition logic, but in my case it appears negated by judging what I see happening.

Code: [Select]
bool condition = True;
...
while(True)
{
    if (condition) do1();
    else do2(); /* deadcode */
}
  • no optimization: it worked as expected
  • "loop unswitching": the wrong branch was executed
  • "loop unrolling": it worked as expected

It seems the compiler is able to determine that condition does not change and is perhaps inlining the function and optimizing out the "if branch", but for some reason it issued a negative logic when the condition is a variable rather than a constant.

Code: [Select]
#define condition True
...
while(True)
{
    if (condition) do1();
    else do2(); /* deadcode */
}
  • no optimization: it worked as expected
  • "loop unswitching": it worked as expected
  • "loop unrolling": it worked as expected

This time it worked. Weird. Very very weird!

Anyway, I do not need any "loop unswitching". It can improves the speed of the code but keeping the code size smaller is more important for me.
« Last Edit: August 02, 2020, 08:58:32 pm by 0db »
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
Ugh.
A C89 compiler that can get so many things wrong should be canned without a second thought.
We produce so many bugs by ourselves that we don't need any help.

SuperH seems to be a target supported by gcc.

I understand that in a professional/enterprise environment a compiler switch is not easy to bring about, but it can be a worthy battle to fight.

When I started on my second job, we were using a C++ compiler (I've removed its name from my memory) that generated slow and mostly wrong code.
It had patchy support of C++ 2.0 (this was before the C++98 standard), and was based on C++ to C translator.

The day we switched to gcc (I had escaped that project since long time  :phew:) we had an immediate doubling of performance and the disappearance of many unresolved bugs.
Nandemo wa shiranai wa yo, shitteru koto dake.
 
The following users thanked this post: 0db

Offline 0db

  • Frequent Contributor
  • **
  • Posts: 336
  • Country: zm
SuperH seems to be a target supported by gcc.

Sure it is and I have just successfully compiled a gcc toolchain for SuperH  :D

The only problem is the CASIO environment and IDE because I don't have all the source for the libraries neither the full specs nor documentation.

Padding is a particularly complex part of the C specification and interacts poorly with other parts of the language, but a new defect I noticed with the CASIO C89 compiler is that you cannot absolutely compare two structs using a comparison via memcmp because sometimes the copy of a struct does retain its padding and it's awkwardly sized and aligned.

It's not predictable, it's completely out of control, there is neither magic-flag nor #pragma to force it, and that's a serious problem for the software I am trying to port, and I have really enough.

Thus or I will switch to Gcc, or I will rewrite the code. Or perhaps ... it is even worth having both.

Lesson learned #4: an apparently "simple task" may have a lot skeletons in the closet. Never be too optimistic with anything written in C, never think that if you have two compilers and targets it will be a piece of cake switching  between them.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14479
  • Country: fr
Well, this C compiler sure looks like a dead cow. :-DD
This reminds me of the optimizers in early C compilers and their bugs...

As to structs and padding, the standard is relatively clear about that, as it's implementation-defined IIRC, and you get no guarantee about the padding's actual content (it's supposed to be unused space anyway, so you're not supposed to assume anything as to what it may contain). This makes comparing structs with memcmp() impossible, and will only work on some given implementation if you're lucky. That may suck, but I would never rely on anything regarding padding, so unfortunately, the only sure and portable way of comparing structs is comparing them member by member. Of course if you're using some pragma or attribute to define that such structs are "packed" (no padding), you can use memcmp(), but then, attributes/pragmas for "packed" structures are not necessarily available with all compilers, or may differ (although #pragma pack(...) is pretty common). Another issue with packing is that it may lead to alignment issues depending on the CPU.

So, we have two different issues there. Whereas from your examples just above, it's now clear that this was an optimizer bug and nothing wrong with the code itself, comparing structs with memcmp() is OTOH clearly not portable, and should usually be avoided.

For the first, switching to GCC is guaranteed to get you a correct result with any level of optimization, but for the struct comparison thing, I would suggest rewriting your comparisons.
 

Offline 0db

  • Frequent Contributor
  • **
  • Posts: 336
  • Country: zm
portable way of comparing structs is comparing them member by member

That's how I am rewriting the original source code (not written by me), but it's a lot of work and to be honest I felt a bit lazy at the beginning and I looked for an easier "voodoo-workarounding" with just the C compiler :D
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
Comparing structs with memcmp if of course not guaranteed to work, even in the absence of padding, depending on the target architecture and C implementation.
Classic examples are (real mode) x86 pointers with segment:offset, the same pointer can be represented by different values.
Other examples include +/-0, floating point NaN etc.

Even using equality operator (==) might have surprising results if some of the fields are floating point, and NaN or Inf are involved - these should be specially treated according to the desired outcome, using the macros in <math.h>.

Nandemo wa shiranai wa yo, shitteru koto dake.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14479
  • Country: fr
Comparing structs with memcmp if of course not guaranteed to work, even in the absence of padding, depending on the target architecture and C implementation.
Classic examples are (real mode) x86 pointers with segment:offset, the same pointer can be represented by different values.
Other examples include +/-0, floating point NaN etc.

Even using equality operator (==) might have surprising results if some of the fields are floating point, and NaN or Inf are involved - these should be specially treated according to the desired outcome, using the macros in <math.h>.

Yes, of course that was assuming that the binary comparison of all members made sense, which is not always the case.

FP numbers are a good example. Exact equality of FP numbers usually doesn't make sense anyway, so yes, '==' is to be avoided as well in most cases. You should usually compare the absolute value of the difference with some small "epsilon" instead.

Related bonus question: before dividing some FP numbers (N/D), does comparing the divisor D like so guarantee that it will never raise a divide-by-zero condition?
Code: [Select]
if (D != 0.0)
    Q = N / D;

As to pointers, I guess that's another one too. I didn't find anything in the standard that guarantees equality of pointers pointing to the same memory location, except for statically allocated objects, for which it says that pointers to a given statically allocated object should always compare equal. Although I'm not sure I found this crystal-clear in the std, I'm assuming this is true when you use the "&" operator, or for arrays. Once you mess with pointer arithmetic, I guess all odds are off strictly speaking, although it's relatively common to see pointer comparisons as some sort of "sentinel value" when going through some memory block. Whereas it usually works, I guess this may be slippery on some platforms using segmented pointers?

The usual approach to comparing structs is to write comparison functions for each struct type for which you need a comparison.
It's a general rule of course, but DO NOT duplicate code for doing this. Write functions.
« Last Edit: August 03, 2020, 04:43:21 pm by SiliconWizard »
 

Offline Renate

  • Super Contributor
  • ***
  • Posts: 1460
  • Country: us
I don't see garbage in the padding to be a problem.
If the struct is on the stack or allocated I always do a memset(0);
If it's initialized then the compiler put in zeroes in the static copy that it memcpy's.
Code: [Select]
   struct {char a; int b;} q={0xfe, 0x12345678};

   printf("%08x %08x", ((int *) &q)[0], ((int *) &q)[1]);
I can't think of many (any?) times that I compare a whole struct, usually it's just the identifying field for a match.
If I needed to do that then memcmp would be good enough for me (given that I know the padding is zero and who uses selectors anyway?).
As far as pointers go, if you're getting that far, maybe you need a deep compare to see if two structs are equivalent.
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Related bonus question: before dividing some FP numbers (N/D), does comparing the divisor D like so guarantee that it will never raise a divide-by-zero condition?
Code: [Select]
if (D != 0.0)
    Q = N / D;
It does; but not from the overflow or underflow conditions, nor does it ensure the result is finite.

In C, a floating-point division can be a divide-by-zero, overflow (result inf or -inf), or underflow (result 0 or -0).
Also, when the result is not exact (say, 3/7, which is an irrational number without exact representation in IEEE-754 Binary32 or Binary64, which current architectures tend to use for floating-point math), the inexact condition will occur, but it is irrelevant to most people.
 
The following users thanked this post: newbrain, 0db


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf