C programming puzzle

#25 Reply
Posted by Cerebus on 22 Oct, 2020 17:25
Quote from: SiliconWizard on 22 Oct, 2020 15:40
Quote from: Cerebus on 22 Oct, 2020 14:22
I'd disagree slightly, but it's all about semantics (in the everyday sense). C and the whole Wurthian set of languages are equally easy to parse, but C is harder to translate.

Whatever you include in parsing and what you mean by "translating". Please elaborate, getting into academic semantic subtleties was not the point. To me parsing is the whole thing from tokenization to getting some kind of tree which makes sense of the source code from a syntactic POV.
(As defined on Wikipedia for instance:
Quote
A parser is a software component that takes input data (frequently text) and builds a data structure – often some kind of parse tree, abstract syntax tree or other hierarchical structure, giving a structural representation of the input while checking for correct syntax. The parsing may be preceded or followed by other steps, or these may be combined into a single step.
)

Again, with C, a recursive descent approach or whatever you choose is NOT sufficient, because of the heavy dependency on context. The whole idea is that the more context-dependent a given grammar is, the more difficult it is to parse. (And obviously the more difficult it is to correctly define the grammar itself to begin with.)

The grammar for C is context-free, it has a context free grammar. The semantics are not "context free" (in a more informal sense). If you think the grammar of C is "context-dependent" then you are confusing the grammar of the language and the semantics of the language. Parsing Context Sensitive Grammars is not merely hard, some problems in parsing context sensitive grammars are provably undecidable. If C had a Context Sensitive Grammar I suspect we would still be waiting for people to write the first ever compiler for it.

Quote
Have you actually written a complete C parser? It IS a bitch. I'll give you just one example: C declarations. Try writing a parser that correctly identifies all parts of a C declaration in the general case, and make it correct. Now do the same with Pascal (or Modula, Oberon...) Tell me what you think. I can guarantee you that C declarations are fun. As I said earlier, correctly parsing a C declaration involves taking the context into account in a non-trivial way. You may think that (as can be read sometimes) C declarations are hard to read for humans, but very easy to parse for a machine. It's not completely false, but a very naive way of seeing it. Take a complex C declaration and see how well your parser does for identifying the correct type and whatever is actually the identifier being declared (a task that is merely syntactic and that I would definitely include in a parser's job even with a strict definition of parsing.) It's impossible to do correctly without properly identifying the context, whereas parsing a - say - Pascal declaration can be done almost context-free.

To play with C declarations there's a fun site: https://cdecl.org/

Parsing = syntax and just syntax, all the rest is semantics (in the computer science sense).

Code: [Select]
void fred() { Marshmallow bert = charlie (7); }
is syntactically correct C. That will parse completely happily, what it wouldn't do is compile or indeed pass the subsequent tests for being a valid C program. The parsing is easy, figuring out that the semantics are correct is relatively hard (e.g. that "Marshmallow" ought to be a type and that the type declaration is missing). You can't do it from syntax alone (for CFLs), therefore you can't do it from parsing alone, at least not with any parser that computer science currently knows how to construct automatically [from the grammar of the language]. I may be wrong on the final point about automatic construction, I haven't kept up with the state of the art on producing parsers from 2 level grammars and informally (i.e. I haven't proven it) a 2 level grammar like VWF could represent the semantics of a C declaration in full. VWF certainly could represent the semantics of Algol 68 and, as we all know, C is partially descended from Algol 68.

At some point in trying to compile that the compiler has a complete, valid [abstract] syntax tree. Only then can it find the errors by running a semantic pass on the parse tree. Although it is possible to combine the parsing and semantic passes into one it is not necessary, and personally I dislike that style, it dates from when compilers tried to be single pass largely owing to restrictions on program size; it's much cleaner to separate a compiler into lots of passes with clean, well defined, interfaces between them.

Am I being pedantic by treating parsing as just the grammar bit? No, because if there's one field where precision is everything, and being pedantic is virtually a necessity, it's in programming. A parser generator like Yacc will deliver you a parser, it won't directly help you with validating the semantics of the language you're trying to parse.

Have I written a complete C parser? Yes, many years ago, as an exercise, for K&R C. Have I written a complete C compiler? No. Have I written complete compilers for other languages? Yes, both for personal fun and commercially, and worked commercially on parts of compilers with other people.

#26 Reply
Posted by golden_labels on 22 Oct, 2020 17:31
Reminds me of that beauty in PHP. Not intentional — it was a bug. But still amazing: Bug #61095 PHP can't add hex numbers.

#27 Reply
Posted by SiliconWizard on 23 Oct, 2020 16:35
Quote from: golden_labels on 22 Oct, 2020 17:31
Reminds me of that beauty in PHP. Not intentional — it was a bug. But still amazing: Bug #61095 PHP can't add hex numbers.

Nice find! Yes it looks sort of similar.
Note: admittedly the OP's case is not as bad, as it spits out an error. Giving a bogus result, as with the PHP bug, is way worse.

#28 Reply
Posted by SiliconWizard on 23 Oct, 2020 17:26
@Cerebus: I think this is sort of getting fruitless and largely outside of the scope of this thread.

I'll just once again make my point: to me parsing = syntax analysis. That's again the general definition: https://en.wikipedia.org/wiki/Parsing
Syntax analysis isn't just about checking the syntax is correct - it's also about splitting a given text into basic elements, each with an identified function. Nothing more, agreed, but nothing less. If you think it's not what parsing is about, then fine, we just don't agree.

I gave the C declaration example, as it's notoriously tricky to handle properly. As I said: take any C declaration. Now automatically analyze it so that you get the various elements of the declaration and their function. In particular, identify what is actually the identifier being declared (I'm of course not saying checking that any of the identifiers used in the declaration is an existing one or not). Would you consider that not part of syntax analysis? I do. It's easy with ultra-basic examples, but with complex ones, it's definitely not.

Now if, for this example, you're still certain identifying what is the identifier in a declaration is not part of syntax analysis, then we won't agree and that's ok. You just won't convince me. If you agree, I'm just saying that even this basic task IS difficult to implement properly for C, and much more so than for a Wirthian language. For examples of difficult-to-handle C declarations, I gave a web site, but you can find myriads all over the place, including in source code for actual projects.

Back to the OP's topic: to *me*, "0xE+1" interpreted as a single constant (that is then further considered having a syntax error, which is correct if it's considered a single constant) IS a parsing error, as I consider part of parsing the task of correctly identifying the "function" of each separate text element. The simple fact that the compiler spits out what looks clearly in the "syntax error" category makes it hard not to say it's related to the parsing itself. Now we can nitpick to no end if anyone feels like it.

As to the fact C has a context-free grammar or not, here is an article I personally second: https://eli.thegreenplace.net/2007/11/24/the-context-sensitivity-of-cs-grammar
Seems to be a nice source of debate and nitpicking overall!

So anyway, I still think it would be worth reporting this to GCC's team. At least see how *they* consider this issue. (And for people tempted to reply that GCC devs are dumbfucks that never listen to bug reports, I remember a thread a few months ago when I found an alignment bug in GCC for AArch64 targets, many told be GCC team would reject it, and it turned out they actually took it into account and fixed it, so I'm not too worried unless they can come up with a good rationale.)

#29 Reply
Posted by Cerebus on 23 Oct, 2020 18:00
Quote from: SiliconWizard on 23 Oct, 2020 17:26
@Cerebus: I think this is sort of getting fruitless and largely outside of the scope of this thread.

Agreed.

I'll just sign off by saying that I think you're still conflating syntax analysis - "this is an identifier, the string for which is 'fred' and it's part of a nonterminal arbitrary called 'declaration'" - (which is most definitely 'parsing', determining language sentence structure) with semantic analysis - "this is an integer declaration of 'fred'" (determining language meaning, which is not part of parsing proper).

Anyway, just at this precise moment I don't care what anybody thinks of anything - I have a hot BLT sandwich in my other hand and all is right with the world.

#30 Reply
Posted by brucehoult on 23 Oct, 2020 20:54
Quote from: Cerebus on 22 Oct, 2020 14:22
C was clearly designed with an expectation that it would be parsed by recursive descent, the Wurthian languages were most likely written with the assumption that a parser generator tool would be to hand, or at least more powerful parsing techniques than recursive descent.

That's exactly backwards.

Wirthian languages are designed to be parsed by very simple recursive-descent parsers, C assumes the use of a table-driven LALR parser.

#31 Reply
Posted by Nominal Animal on 24 Oct, 2020 13:13
Quote from: brucehoult on 23 Oct, 2020 20:54
C assumes the use of a table-driven LALR parser.
Which is rooted in C's origins, specifically lex and yacc. Even today, a lot of C compiler/interpreter projects use flex and bison, so one could say LALR parsers are kinda-sorta built-in to the C ecosystem; it is therefore not surprising that the standard (implicitly?) assumes one is used to parse C itself.

#32 Reply
Posted by Cerebus on 24 Oct, 2020 14:45
Quote from: Nominal Animal on 24 Oct, 2020 13:13
Quote from: brucehoult on 23 Oct, 2020 20:54
C assumes the use of a table-driven LALR parser.
Which is rooted in C's origins, specifically lex and yacc. Even today, a lot of C compiler/interpreter projects use flex and bison, so one could say LALR parsers are kinda-sorta built-in to the C ecosystem; it is therefore not surprising that the standard (implicitly?) assumes one is used to parse C itself.

An understandable mistake because of the association of them all with Unix, but there exists here (Ritchie's description) and here (just source) a "primeval" C compiler from circa 1972 the earliest working C compiler to be preserved, written by Dennis Ritchie, with a hand written parser and no trace of lex and yacc anywhere in it. Lex was written by Mike Lesk circa 1975, yacc by Johnson also circa 1975. Lex and Yacc were used for pcc the portable C compiler, but that came later. The original version of yacc was written in B, only later to be translated to C, so it was available to be used in bootstrapping a C compiler if required, but wasn't.

(Frank Deremer, wrote the paper "Practical Translators for LR(k) Languages" in 1969 while working on Project MAC. Also working on project MAC at the same time where all the key people from Bell Labs who came to be the Unix team, so they ought to have been aware of the possibilities, but didn't actually come to exploit them until later. Strangely it wasn't that association, but a conversation with Alfred Aho^*, also at Bell Labs, suggesting than Johnson looked at Knuth's work on LR grammars that led Johnson to use LR grammars as the basis for yacc.)

The nearest to a canonical account of the development of C that exists can be found here: The Development of the C Language - Dennis M. Ritchie

^*Contrary to your personal Finnish expectations he pronounces it "Ay-Ho", I know this because we once corresponded by email^** and I asked him explicitly.
^**Gone are the days where a stranger on the Internet could get the email address of a person who is one of "the names" in his field, write to them, and actually get a reply, not merely that but have a civilised conversation about that field with them.

#33 Reply
Posted by Nominal Animal on 24 Oct, 2020 15:04
Quote from: Cerebus on 24 Oct, 2020 14:45
An understandable mistake because of the association of them all with Unix, but there exists here (Ritchie's description) and here (just source) a "primeval" C compiler from circa 1972 the earliest working C compiler to be preserved, written by Dennis Ritchie, with a hand written parser and no trace of lex and yacc anywhere in it. Lex was written by Mike Lesk circa 1975, yacc by Johnson also circa 1975. Lex and Yacc were used for pcc the portable C compiler, but that came later. The original version of yacc was written in B, only later to be translated to C, so it was available to be used in bootstrapping a C compiler if required, but wasn't.
Right. However, when C started gaining traction, lex and yacc were commonly used to implement C compilers, which affected how the C standard (ANSI C) came to be. This was the "origin" I was referring to.

Quote from: Cerebus on 24 Oct, 2020 14:45
^*Contrary to your personal Finnish expectations he pronounces it "Ay-Ho", I know this because we once corresponded by email^** and I asked him explicitly.
Contrary to your personal British supremacist expectations, I don't expect people with Finnish surnames and origin, but a different native language, to pronounce their names like Finns do. Most Finns don't. They often say how that name would be pronounced if the person was Finnish, but do not expect them to pronounce it that way. Weird, eh?

#34 Reply
Posted by Cerebus on 24 Oct, 2020 15:45
Quote from: Nominal Animal on 24 Oct, 2020 15:04
Quote from: Cerebus on 24 Oct, 2020 14:45
^*Contrary to your personal Finnish expectations he pronounces it "Ay-Ho", I know this because we once corresponded by email^** and I asked him explicitly.
Contrary to your personal British supremacist expectations, I don't expect people with Finnish surnames and origin, but a different native language, to pronounce their names like Finns do. Most Finns don't. They often say how that name would be pronounced if the person was Finnish, but do not expect them to pronounce it that way. Weird, eh?

<Sarcasm on>Well I am so sorry for taking your perspective into consideration.<Sarcasm off> Generally, once one gets past the "angry potato" facade you come across as thoughtful, but you can be a real dick at times. Having worked with a fair number of Finns (I used to be operations manager for the UK arm of Saunalahti/Jippi), when I see "Aho" I expect the Finnish pronunciation - is it so foolish of me to expect a Finn would too? And how the hell do you get to "British supremacist expectations" from me expecting a Finnish name to be pronounced in the Finnish fashion - surely "British supremacist expectations" would be "You'll bloody well pronounce it the way we do, foreign scum!" rather than the other way around. Is that a chip I see on your shoulder?

Now, as you obviously want me to be a condescending Brit: Chip on one's Shoulder - "To have a chip on one's shoulder refers to the act of holding a grudge or grievance that readily provokes disputation.". There, feel better now? Prejudices suitably reinforced?

#35 Reply
Posted by Nominal Animal on 24 Oct, 2020 17:21
Quote from: Cerebus on 24 Oct, 2020 15:45
when I see "Aho" I expect the Finnish pronunciation - is it so foolish of me to expect a Finn would too? And how the hell do you get to "British supremacist expectations"
Exactly that. It's not foolish, just completely wrong. I believe you think so because English is the lingua franca on the internet, and naturally assume it extends to culture (general individual behaviour) as well. It does not.

There are threads on this forum that deal explicitly with how something is pronounced in English, and the one constant in those threads is the various English-speaking people claiming theirs is the right way, and everybody else is wrong. Jokingly in most instances, but still.

In Finland, kids learn English in school. They are taught to emulate British or American English pronunciation. It is so pervasive that many Finns who can speak Rally English just fine (think of things like directions to the nearest post office/store/pharmacy/bus stop), refuse to speak a word, because they're afraid of being laughed at because of their accent. This is more common among the elderly. Indeed, when Finns pick nicknames used in completely Finnish environments (as in Finnish web sites), they usually pick an English one; the more often the younger the person!

It took me over thirty years to get over that myself, and realize that Rally English is easier to understand, than the mangled emulation of American or British English I was taught to go for.

Now, this has nothing to do with Finnish per se, as in "Finnish is so different"; no. It is just that the pervasiveness of English has masked the variety in cultures, so much so that people tend to assume cultures don't differ that much – especially among those who speak English. Yet, people differ, a lot. (Even in Finland, you have a quite noticeable cultural difference between the "metropolitan" urban Finns and the Finns in smaller towns and rural areas. So much so, that it is becoming a political problem – as in, one side claiming the other must move to cities, because their desire to live outside urban environments is not only unreasonable and unfathomable, but also too costly.)

Quote from: Cerebus on 24 Oct, 2020 15:45
Chip on one's Shoulder
Yes, and it's a bad one (I mean literally, something I wish I didn't have): I vastly overreact when somebody claims I did or think something, when I definitely do not.

I used "supremacist" only to get a rise out of you, because "stuck up brit" is one of those stereotypes you often see in media, but in real life, is damned rare. (The ones I've met personally are definitely not, and calling them one would hurt them.)
I don't actually believe you are, if it matters.

If you knew your Finnish colleagues well enough, you'd realize that deep down, if they were told to welcome somebody named Alfred Aho to a tour of the company (or something similar), they'd worry most about their own pronouciation of Aho, and whether it would offend the prof or not. The concept of "but pronouncing it like a Finnish name is the correct one" would never occur to a typical Finn.

In some ways, it is funny (because it is so different to major cultures); in some ways, it is sad, because I actually think it is a lovely quirk. But I'm biased, of course.

#36 Reply
Posted by Cerebus on 24 Oct, 2020 17:50
Quote from: Nominal Animal on 24 Oct, 2020 17:21
Quote from: Cerebus on 24 Oct, 2020 15:45
when I see "Aho" I expect the Finnish pronunciation - is it so foolish of me to expect a Finn would too? And how the hell do you get to "British supremacist expectations"
Exactly that. It's not foolish, just completely wrong. I believe you think so because English is the lingua franca on the internet, and naturally assume it extends to culture (general individual behaviour) as well. It does not.

I get what you're saying, overall. But that bit above still seems arse about face - the "everybody speaks English" assumption, were one to apply Anglophone cultural imperialism, would be to assume that everybody would use the anglicised pronunciations (e.g. Aho => Ay-Ho, Braun => Brawn, Porche => Poorsh - score one for the Americans, who actually pronounce it like Ferdinand Porche did), not the other way around, making the effort to get the native pronunciation right.

The assumption that a native of a given language would pronounce a name from that language in the native fashion isn't some form of British cultural imperialism born out of the [typical] British inability to get 'foreign' names pronounced properly and therefore an assumption that the rest of the world would be equally crap at it, it's just a reasonable working assumption that a speaker of any language would make about natives of another country and its language.

#37 Reply
Posted by Nominal Animal on 24 Oct, 2020 18:14
Quote from: Cerebus on 24 Oct, 2020 17:50
Quote from: Nominal Animal on 24 Oct, 2020 17:21
Quote from: Cerebus on 24 Oct, 2020 15:45
when I see "Aho" I expect the Finnish pronunciation - is it so foolish of me to expect a Finn would too? And how the hell do you get to "British supremacist expectations"
Exactly that. It's not foolish, just completely wrong. I believe you think so because English is the lingua franca on the internet, and naturally assume it extends to culture (general individual behaviour) as well. It does not.

I get what you're saying, overall. But that bit above still seems arse about face - the "everybody speaks English" assumption, were one to apply Anglophone cultural imperialism, would be to assume that everybody would use the anglicised pronunciations (e.g. Aho => Ay-Ho, Braun => Brawn, Porche => Poorsh - score one for the Americans, who actually pronounce it like Ferdinand Porche did), not the other way around, making the effort to get the native pronunciation right.

The assumption that a native of a given language would pronounce a name from that language in the native fashion isn't some form of British cultural imperialism born out of the [typical] British inability to get 'foreign' names pronounced properly and therefore an assumption that the rest of the world would be equally crap at it, it's just a reasonable working assumption that a speaker of any language would make about natives of another country and its language.
I think that is plausible, although I did not see it that way. I rather saw it as "because you speak my native language, you obviously react to things just like I do". (As to arse about face, yeah, that happens to me often, even sober. The Finnish term for rock solid drunk is ass-over-the-shoulder, but I don't drink much nowadays.)

Consider it an overblown reaction, then; and that I wouldn't, nor would the majority of Finns I know, assume that a Canadian surname Aho should be pronounced Aho, or is "properly" pronounced Aho, just because the person has Finnish roots.

#38 Reply
Posted by Cerebus on 24 Oct, 2020 18:56
Quote from: Nominal Animal on 24 Oct, 2020 18:14
Consider it an overblown reaction, then; and that I wouldn't, nor would the majority of Finns I know, assume that a Canadian surname Aho should be pronounced Aho, or is "properly" pronounced Aho, just because the person has Finnish roots.

So noted. Alfred did give me quite a good potted history of the Aho family. From memory, so this may not be accurate (the emails are lost to about 3 mail server upgrades since), the family originally settled in Minnesota USA.

#39 Reply
Posted by Syntax Error on 24 Oct, 2020 19:40
For those still playing at home...

When MAXIMAL MUNCH attacks. "Due to maximal munch, hexadecimal integer literals ending in e and E, when followed by the operators + or -, must be separated from the operator with whitespace or parentheses in the source"

https://en.cppreference.com/w/cpp/language/translation_phases#maximal_munch

#40 Reply
Posted by Nominal Animal on 24 Oct, 2020 20:00
Quote from: Cerebus on 24 Oct, 2020 18:56
Quote from: Nominal Animal on 24 Oct, 2020 18:14
Consider it an overblown reaction, then; and that I wouldn't, nor would the majority of Finns I know, assume that a Canadian surname Aho should be pronounced Aho, or is "properly" pronounced Aho, just because the person has Finnish roots.
So noted. Alfred did give me quite a good potted history of the Aho family. From memory, so this may not be accurate (the emails are lost to about 3 mail server upgrades since), the family originally settled in Minnesota USA.
Makes sense; quite a lot of Finns emigrated to Minnesota Iron Range to become miners there. They were even recruited to emigrate from northern Norway by Michigan copper mining companies. Between 1870 and 1920, approximately 320,000 Finns emigrated to the United States, many working in mining and logging, the rest in farming. About 16% of people in the Upper Peninsula of Michigan have Finnish ancestry.

Of the Finnish mindset, a good example is Saint Urho's Day, a completely fictitious character, to be celebrated the day before St. Patrick's Day, to extend the festivities (invented by the emigrants, not celebrated in Finland). The closest Finns have to a saint is Lalli, an anti-hero of sorts, who killed Bishop Henry in 1156.

Quote from: Syntax Error on 24 Oct, 2020 19:40
When MAXIMAL MUNCH attacks. "Due to maximal munch, hexadecimal integer literals ending in e and E, when followed by the operators + or -, must be separated from the operator with whitespace or parentheses in the source"
I do believe that is a direct result of the typical lexical rules used with e.g. lex/flex when separating the input into preprocessing tokens.

#41 Reply
Posted by mikerj on 06 Nov, 2020 00:20
Quote from: hamster_nz on 20 Oct, 2020 02:54
Can you see what is wrong with this, without compiling it?

Code: [Select]
#include <stdio.h> int main(int argc,char *argv[]) { printf("The next number after 0xE is 0x%X\n", 0xE+1); }

No return value!

#42 Reply
Posted by golden_labels on 06 Nov, 2020 05:33
Quote from: mikerj on 06 Nov, 2020 00:20
No return value!
While this is code smell, in C this is a well defined function returning 0. main is a very special beast in C (and even weirder in C++). If main it returns int, reaching the terminating } is equivalent to calling exit(0).⁽¹⁾
____
⁽¹⁾ 5.1.2.2.3§1

#43 Reply
Posted by mikerj on 10 Nov, 2020 18:30
Quote from: golden_labels on 06 Nov, 2020 05:33
Quote from: mikerj on 06 Nov, 2020 00:20
No return value!
While this is code smell, in C this is a well defined function returning 0. main is a very special beast in C (and even weirder in C++). If main it returns int, reaching the terminating } is equivalent to calling exit(0).⁽¹⁾
____
⁽¹⁾ 5.1.2.2.3§1

As of C99 this is true. Previously having no specified return value would give an undefined termination status.

#44 Reply
Posted by golden_labels on 10 Nov, 2020 22:37
If we are talking about that archaic versions, this is even more complicated. While 9899:1990 indeed makes the value undefined⁽¹⁾, the Ritchie’s reference from 1975 has no return in main (p. 20).
____
⁽¹⁾ At least the final draft. I do not have access to the final 1990 publication.

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

There was an error while thanking

Thanking...

Go to page:

« 1 2 All

Full site Menu

Navigation

Powered by SMFPacks Advanced Attachments Uploader Mod