EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: PerranOak on September 12, 2019, 03:38:09 pm

Title: C word
Post by: PerranOak on September 12, 2019, 03:38:09 pm
I've succumbed to the pressure to learn C to programme microcontrollers instead of using assembly.

It seems to be going well but I get stuck on, I guess, simple things. For example:

while (b!=0)
… and …
while (b==0)

I understood that "==" is a check of "equal to" but it took ages to realise that the "!" meant "not ". I'm sure I came across this earlier but evidently forgot.

So, is there a resource (online?) that I can go to for simple help like this?

Cheers.
Title: Re: C word
Post by: nugglix on September 12, 2019, 03:48:29 pm
The search term would be something along "operators in c".

This would reveal something like:
https://www.tutorialspoint.com/cprogramming/c_operators.htm (https://www.tutorialspoint.com/cprogramming/c_operators.htm)

Exchanging operators with keywords in the search could lead to:
https://www.programiz.com/c-programming/c-keywords-identifier (https://www.programiz.com/c-programming/c-keywords-identifier)

Title: Re: C word
Post by: Bassman59 on September 12, 2019, 05:32:59 pm
So, is there a resource (online?) that I can go to for simple help like this?

Start here. (https://lmgtfy.com/?q=C+operators)
Title: Re: C word
Post by: obiwanjacobi on September 12, 2019, 05:51:54 pm
https://en.cppreference.com/w/c

[2c]
Title: Re: C word
Post by: golden_labels on September 12, 2019, 06:56:25 pm
PerranOak
I support obiwanjecobi’s suggestion — cppreference is the most well prepared reference for C and C++ I have seen in my life. In particular it covers important topics, which are often ommited even from most books (perhaps because authors do not really understand them), like the strict aliasing rule.

If you are just doing this as a hobby, you may skip this post and just enjoy coding.

But beware: programming in C is like walking through a minefield. Programming in C without a copy of the latest standard at hand is like dancing drunk on a minefield under artillery fire. And I am not talking about the infamous manual memory management, buffer overflows and so on. The true trap is that intuitive interpretation of what a given C code does may wildly differ from what it actually does. It can catch anyone with their pants down. It even happens to seasoned programmers. To the point at which Linus Torvalds, the man behind Linux (the kernel), is blaming compilers for properly interpreting his code — properly in terms of what language defined, but not in the way he believed it works.

If you want to go professional — and especially if safety or security will depend on your code — be fully aware that you should dig into that hard to understand mess of specifications spread across 650 pages. If you can’t get a copy of the latest C standard — and I do not suggest paying 180€ for it just to use the language — you may legally download the latest standard draft from the working group. The draft is nearlyt he same as the final version. You may also consider reading Stack Overflow often, in particular things related to Undefined Behaviour.

Don’t be fooled into thinking that C is “close to bare metal”. <justified-exaggeration> The only “bare metal” to which it is close is PDP-11 (https://en.wikipedia.org/wiki/PDP-11),</justified-exaggeration> which is dead for over 30 years. Don’t assume that given code will have any specific representation in the compiled binary. In particular this may be a PITA on microcontrollers — through the years I’ve came across a few low-level constructs that are not reliably expressible with C. E.g. operations timing: while with proper code the compiler will not reorder operations like setting pins, it may reorder/move other operations, changing relative time at which setting pins occurs.

Finally, a nice and famous example of why C may be hell:
Code: [Select]
unsigned short a = 65535u;
unsigned short b;

b = a * a; // UB on some platforms! :D
The marked line is undefined behaviour on platforms with 16-bit unsigned short and 32-bit int. To make things more puzzling: the UB is caused by signed overflow [sic! “signed”, not “unsigned”]. Compilers will likely convert that to code that does what most people would suspect it to do, but it is accidential.
Title: Re: C word
Post by: janoc on September 12, 2019, 07:36:20 pm
https://en.cppreference.com/w/ (There is both C and C++ reference, despite the domain name)

https://devdocs.io/ (Select the languages you want - fast reference doc access)

E.g. on the comparison operators:
https://devdocs.io/c/language/operator_comparison
https://en.cppreference.com/w/c/language/expressions
Title: Re: C word
Post by: rstofer on September 12, 2019, 09:19:20 pm
Don’t be fooled into thinking that C is “close to bare metal”. <justified-exaggeration> The only “bare metal” to which it is close is PDP-11 (https://en.wikipedia.org/wiki/PDP-11),</justified-exaggeration> which is dead for over 30 years.

Maybe just on life support...  I have two of the PiDP11/70s running 2.11BSD Unix.  These are the original disks running on a PDP11/70 emulation by 'simh' on a Raspberry Pi.  I rather enjoy old-school and this emulation has 4 MB of RAM and runs faster than the real machine.

https://obsolescence.wixsite.com/obsolescence/pidp-11

There's a lot of interest in older systems, particularly those you can get your arms around.


Title: Re: C word
Post by: SiliconWizard on September 12, 2019, 10:48:01 pm
Just a suggestion here - maybe actually learn C?
 :-//
Title: Re: C word
Post by: digsys on September 12, 2019, 10:56:42 pm
They'll move me away from machine code over my dead cold body :-)
Title: Re: C word
Post by: hamster_nz on September 12, 2019, 11:04:17 pm
Find a quick reference card..

Google "ansi c quick reference filetype:pdf"
Title: Re: C word
Post by: golden_labels on September 13, 2019, 12:31:51 am
Skimming over that quick reference card, there are few things to note.
On top of that there are issues that will not be a problem if someone already knows the language and uses the list as a cheatsheet, but if someone wants to learn from it, there are a few more catches:
Title: Re: C word
Post by: rstofer on September 13, 2019, 02:18:10 am
“The C Programming Language” ANSI Edition
Title: Re: C word
Post by: radioactive on September 13, 2019, 02:21:24 am
The bible for this question:  https://en.wikipedia.org/wiki/The_C_Programming_Language
Title: Re: C word
Post by: AndyC_772 on September 13, 2019, 09:57:49 am
On top of that there are issues that will not be a problem if someone already knows the language and uses the list as a cheatsheet, but if someone wants to learn from it, there are a few more catches:
  • The sizes for integers are the minimum requirements, not the actual values — yet there is no mention of that.

That's quite an interesting 'gotcha' for embedded programmers.

Not so long ago I was caught out writing an SPI driver for an STM32F7.

This particular device has a feature called 'packed writes' (or something along those lines). The way it works is:

- if you perform an 8-bit write to the SPI data register, it clocks out 8 bits.
- if you perform a 16 (or more) bit write to that exact same register, it clocks out 16 bits.

You've probably already guessed the problem - the driver inserted '0' bytes in between each data byte, because the compiled code used a 32-bit write instruction to copy each individual byte into the data register, and the hardware dutifully interpreted this as a 'packed' write, 16 bits wide, with bits 8..15 = 0.

Needless to say, it took a while to find that one, and a while longer to work out where exactly I needed to add (uint8_t) to force an 8-bit write.
Title: Re: C word
Post by: PerranOak on September 13, 2019, 11:35:16 am
Brilliant, thanks all.

SiliconWizard: yes, that's exactly what I am doing! It was literally "day 1" and I wondered if there was an additional resource available that I could use when my course book needed a helping hand … or I did(!)
Title: Re: C word
Post by: Karel on September 13, 2019, 11:36:36 am
http://shop.oreilly.com/product/9780596006976.do (http://shop.oreilly.com/product/9780596006976.do)
Title: Re: C word
Post by: PerranOak on September 13, 2019, 11:41:37 am
BTW books on C (e.g. The C Programming Language by Kernighan and Richie): do they cover any peculiarities (are there any) when using C in programming microcontrollers?
Title: Re: C word
Post by: SiliconWizard on September 13, 2019, 02:30:17 pm
SiliconWizard: yes, that's exactly what I am doing! It was literally "day 1" and I wondered if there was an additional resource available that I could use when my course book needed a helping hand … or I did(!)

Yep, what I meant is, instead of trying to find tips, tricks and "cheat sheets" to learn C (as you were maybe implying, and also as some here seemed to even suggest!), just properly learn the language.
My comment was related to one of my posts in another thread about C and how it's often not learned/taught properly compared to other languages. So, I'm just hoping you will do it right. :)

Of course, the K&R book (best use latest edition) is an obvious start. Once you're done with that, and you're proficient enough, I also suggest reading the C standard to complement your knowledge in depth, especially to understand what could be undefined behaviors. The C99 standard is now easy to find online for free. Later versions less so, and buying them IS expensive. C11 adds a few niceties, but you can certainly do without it for the time being, and especially if you target programming on MCUs. I recommend writing C99-compliant code though. C99 adds a number of important new things, such as stdint.h, which defines standard types for integers of all supported widths, something important for portability, especially for embedded development. Earlier C versions relied on using implementation-dependent tricks to use integers of a specific width, which was frankly not pretty.

BTW books on C (e.g. The C Programming Language by Kernighan and Richie): do they cover any peculiarities (are there any) when using C in programming microcontrollers?

Not that I know of.
And any "peculiarities" would be documented in said MCUs' reference manuals, or specific books, rather than in a generic book.
If you're going to use a modern, 32-bit MCU with for instance an ARM core or RISC-V, there is really no specificity in general, they pretty much look like any 32-bit CPU from the language POV.

Of course you'll need to know some additional things, like how to define interrupts for instance, but then read books and example code for your preferred platform. I'd suggest doing that AFTER you've learned enough C.


Title: Re: C word
Post by: PerranOak on September 13, 2019, 02:38:52 pm
Excellent, thanks mate. I did OK with learning assembly so I'll do my best to learn it well.
Title: Re: C word
Post by: TK on September 13, 2019, 03:02:10 pm
No wonder why there is so much confusion to learn C today... too much misinformation, oversimplification... when I started with C, there was only one reference: K&R C Programming Language book.
Title: Re: C word
Post by: xrunner on September 13, 2019, 03:05:54 pm
No wonder why there is so much confusion to learn C today... too much misinformation, oversimplification... when I started with C, there was only one reference: K&R C Programming Language book.

Same here - still have it too.  :)
Title: Re: C word
Post by: janoc on September 13, 2019, 03:09:54 pm
On top of that there are issues that will not be a problem if someone already knows the language and uses the list as a cheatsheet, but if someone wants to learn from it, there are a few more catches:
  • The sizes for integers are the minimum requirements, not the actual values — yet there is no mention of that.

That's quite an interesting 'gotcha' for embedded programmers.


No only embedded - there are tons of programmers that assume that int is 32bits or that pointers are 32bits (yes, still!). Then surprises happen.


Not so long ago I was caught out writing an SPI driver for an STM32F7.

This particular device has a feature called 'packed writes' (or something along those lines). The way it works is:

- if you perform an 8-bit write to the SPI data register, it clocks out 8 bits.
- if you perform a 16 (or more) bit write to that exact same register, it clocks out 16 bits.

You've probably already guessed the problem - the driver inserted '0' bytes in between each data byte, because the compiled code used a 32-bit write instruction to copy each individual byte into the data register, and the hardware dutifully interpreted this as a 'packed' write, 16 bits wide, with bits 8..15 = 0.

Needless to say, it took a while to find that one, and a while longer to work out where exactly I needed to add (uint8_t) to force an 8-bit write.

I got caught by this too, I think most STM32s do this and not only for SPI. I had this bug with I2C on STM32F0, if I remember right.
Title: Re: C word
Post by: rstofer on September 13, 2019, 03:34:30 pm
BTW books on C (e.g. The C Programming Language by Kernighan and Richie): do they cover any peculiarities (are there any) when using C in programming microcontrollers?

There are various documents scattered around the Internet but there's a problem.  When they show a peculiarity, the newcomer won't even understand the question, much less the answer.  These oddities are in the 'corners' of the language and you can spend a lot of time writing code and never bump into them.  This is particularly true in embedded programming where we're not trying to use C to solve world hunger.

What K&R does is serve as an example of good code without going exotic.  I like to steal the string and conversion functions from the original book and use them in my embedded work.  Why such old code?  It doesn't require a heap and I really don't like heaps colliding with stacks in limited memory uCs.

There are a few pitfalls here

https://pmihaylov.com/macros-in-c/

Pay attention toe Pitfall 1, it will jump up and bite you in the butt.

Like most things, C is best learned by practice and mistakes.  As each mistake is diagnosed, it adds to experience.  It is naive to think that there is a single book, somewhere, that will turn the newcomer into a wizard by just placing it under the pillow at night.  This stuff takes time and effort.



Title: Re: C word
Post by: rstofer on September 13, 2019, 04:01:31 pm
How about the local community college?  Ours has an EXCELLENT lower division curriculum.  How telling that the first language they introduce is Pascal.  Learn with an excellent language before moving on to C/C++ or Java.  Sometimes community college scheduling works out with  having a paying job, other times, not so much.
Title: Re: C word
Post by: SiliconWizard on September 13, 2019, 04:12:37 pm
Learn with an excellent language before moving on to C/C++ or Java.

Oh, I really agree with this.

And for those that are still impatient to move on to embedded dev, they could look at Ultibo. That's Pascal for bare-metal programming of RPis...
(see there: https://www.eevblog.com/forum/embedded-computing/ultibo-bare-metal-programming-rpis/ (https://www.eevblog.com/forum/embedded-computing/ultibo-bare-metal-programming-rpis/)  ;D )
Title: Re: C word
Post by: rstofer on September 13, 2019, 04:15:26 pm
Excellent, thanks mate. I did OK with learning assembly so I'll do my best to learn it well.

In my view, assembly language is orders of magnitude easier to learn than C.  Sure, you have to memorize a LOT of mnemonics but, structurally, coding practices in assembly language are much simpler.  The level of detail is higher but nobody expects to write a few hundred lines of assembly code per day.

C allows you to express higher level ideas in a concise manner.  Things like structs and unions, pointers to an array of function pointers returning <whatever>.   If someone tells you they really understand pointers, it's ok to giggle.  The idea is simple, even trivial, but the application will be a challenge.  What I don't like is two levels (or more) of indirection.  A pointer to a pointer to <something>.

I hate this code but it comes up from time to time...

https://stackoverflow.com/questions/42715876/levels-of-indirection-in-c

If you REALLY understand this code, you might actually understand pointers.  I shouldn't giggle...

Title: Re: C word
Post by: PerranOak on September 13, 2019, 04:51:44 pm
Cheers.
I have a "course" that is focussed on uC programming, which is my primary intent.

Going back to an earlier point, I just looked-up EEPROM writing/reading in the datasheet for the PIC16F1827 (this is the uC used in my course and the one I learned assembly on) and it gives code for writing/reading but only in assembly.

How on Earth do I know how to do this in C?  :-//

[True, my course may cover this later but as a general principle.]
Title: Re: C word
Post by: TK on September 13, 2019, 05:20:29 pm
Cheers.
I have a "course" that is focussed on uC programming, which is my primary intent.

Going back to an earlier point, I just looked-up EEPROM writing/reading in the datasheet for the PIC16F1827 (this is the uC used in my course and the one I learned assembly on) and it gives code for writing/reading but only in assembly.

How on Earth do I know how to do this in C?  :-//

[True, my course may cover this later but as a general principle.]
You should start looking for C examples for the specific uC or family and start from simple ones like blink, then advance to examples of each peripheral. 
Title: Re: C word
Post by: westfw on September 13, 2019, 07:37:22 pm
Quote
do [the usual references] cover any peculiarities (are there any) when using C in programming microcontrollers?
No, probably not.  There aren't that many:
Quote
I just looked-up EEPROM writing/reading in the datasheet for the PIC16F1827 and it gives code for writing/reading but only in assembly.

How on Earth do I know how to do this in C?
A key thing to understand is that the vendor will usually provide a set of definitions that permits the internal registers to be manipulated just like C variables.  (if they don't, the first thing to do is to figure out how to do this yourself, which will also not be covered in typical "intro to C" books.)
For example, the code form the PIC data sheet:
Code: [Select]
   BANKSEL EEADRL       ;
    MOVLW   DATA_EE_ADDR ;
    MOVWF   EEADRL       ;Data Memory Address to read
    BCF     EECON1, CFGS ;Deselect Config space
    BCF     EECON1, EEPGD;Point to DATA memory
    BSF     EECON1, RD   ;EE Read
    MOVF    EEDATL, W    ;W = EEDATL
Would be something like:
Code: [Select]
    EEADRL = data_ee_addr; // set address to read
    EECON1 &= ~(CFGS|EEPPGD);  // deselect config, point to data.
    EECON1 |= RD;
    result = EEDATL;
(SOMETHING LIKE!  C abhors short names like "RD" that are likely to be ambiguous and collide with "something", so it'll probably actually be something like:
Title: Re: C word
Post by: westfw on September 13, 2019, 07:39:32 pm
oops.Something like:
Code: [Select]
    EECON1 |= _EECON1_RD_MASK;
Or perhaps:
Code: [Select]
    EECON1bits.RD = 1;
Title: Re: C word
Post by: Jeroen3 on September 13, 2019, 08:05:24 pm
Treat C like scripted assembly. You'll be fine. It doesn't do much magic like the newest C++, Java or other languages may perform.

First determine where to get information. Since the company who made the chip might not be the company making the cpu, and the compiler might be some other entity as well. For example: ST makes a chip with an ARM Cortex processor core that you can compile code for using a compiler the GNU Project in an Eclipse IDE.

Then there are the architecture intricacies...
Those that involve the storage specifiers, volatile and static.
Also platform details, what memory you may access when and how wide.
Knowing what arithmetic you chip can do, or does slow. How to accelerate it with the FPU or crypto engine. (this often isn't automatic)
How much stack there is left, you often can't ask the OS.
How to real-time debug, since at some point you can't breakpoint the chip anymore without hardware damages.
The performance impact of the volatile keyword, or pointers. At only 20 Mhz these things count.
What functions your standard library offers, or not, or not optimized for the architecture.
Many little things....

Take a copy of the standard so you can lookup yourself if the people on the internet are correctly describing the intended behavior of the language.
Since I've seen a lot of different ways to describe what some keywords do.
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)

You can make it your job!
Title: Re: C word
Post by: westfw on September 14, 2019, 08:12:33 am
Quote
If you want to go professional — and especially if safety or security will depend on your code — be fully aware that you should dig into that hard to understand mess of specifications spread across 650 pages.
meh.  Nonsense.  A good programmer won't normally get very close to the edges, and even if they do, there code will still be more portable than if it were written in assembly language.  The most common pitfalls are well-known and well-described outside of the standard, and some of them are ignored because they seem to be "compiler gurus getting in the way of what needs to work."  If your work gets to be safety- or security- critical, you'll probably be at a large enough company where there is someone assigned to pay attention to such things, and the way you'll need to code will be a company standard.
That doesn't mean that it's not a good idea to be familiar with the standard, but it's not NECESASRY as an early step to learning C...

Quote
it took ages to realise that the "!" meant "not ".
Quote
Just a suggestion here - maybe actually learn C?
That said.  The original message suggests a level of unfamiliarity with high level languages in general that is worrying, and a "real class" or teaching-oriented text would probably be a good idea, rather than just looking for a "summary sheet."   (I always find it ... interesting..., the way that people with a pure assembly language and/or EE background have trouble getting started with high level languages.  And vis versa, too!)
Title: Re: C word
Post by: SiliconWizard on September 14, 2019, 04:14:30 pm
Quote
If you want to go professional — and especially if safety or security will depend on your code — be fully aware that you should dig into that hard to understand mess of specifications spread across 650 pages.
meh.  Nonsense.  A good programmer won't normally get very close to the edges, and even if they do, there code will still be more portable than if it were written in assembly language.  The most common pitfalls are well-known and well-described outside of the standard, and some of them are ignored because they seem to be "compiler gurus getting in the way of what needs to work."  If your work gets to be safety- or security- critical, you'll probably be at a large enough company where there is someone assigned to pay attention to such things, and the way you'll need to code will be a company standard.
That doesn't mean that it's not a good idea to be familiar with the standard, but it's not NECESASRY as an early step to learning C...

I still personally think this is a good idea to go through the C standard at least once, but I agree this is not necessary. It's just a plus to get an in-depth view of C. And as I said above, it's not for when you're just learning. It's an interesting step only when you're already proficient.

And I also agree that reading the standard(s) is absolutely NOT required to write good C (it would be a lame excuse to be claiming that).

Just a point to answer the other person's point: the C standard documents are definitely not hard to understand nor a mess of specifications. They are rather clearly layed out.
Of course as we said above, they are not meant for beginners in the least. They are meant for advanced users and compiler designers.

Quote
it took ages to realise that the "!" meant "not ".

It should have taken about 2 minutes opening a C book (even a lousy one!) and looking up the "operators" section. Dang...

That said.  The original message suggests a level of unfamiliarity with high level languages in general that is worrying, and a "real class" or teaching-oriented text would probably be a good idea, rather than just looking for a "summary sheet." 

Yes, that was exactly my point. But it's not just worrying as a programming language learning thing - it's worrying as a general learning process. It's like some people don't want to learn. They are just looking for recipes.
Title: Re: C word
Post by: FreddieChopin on September 14, 2019, 06:39:45 pm
If you're working with PIC microcontrollers don't be fooled into buying CCS C compiler (http://www.ccsinfo.com/). It's a complete and utter turd that doesn't conform to any standards, breaks in most weird ways and generally is a paint in the ass to use.
Title: Re: C word
Post by: janoc on September 14, 2019, 07:40:12 pm
If you're working with PIC microcontrollers don't be fooled into buying CCS C compiler (http://www.ccsinfo.com/). It's a complete and utter turd that doesn't conform to any standards, breaks in most weird ways and generally is a paint in the ass to use.

Isn't that the one which the old PIC C compiler (pre xc8) was based on? If yes, then I concur wholeheartedly - that was an utter turd ...
Title: Re: C word
Post by: westfw on September 15, 2019, 12:59:55 am
Quote
If you're working with PIC microcontrollers don't be fooled into buying
CCS C compiler (http://www.ccsinfo.com/).
Isn't that the one which the old PIC C compiler (pre xc8) was based on?

I don't think so.  CCS is one of the few remaining independent compiler vendors.

Arguably, a certain amount of "non-conformance" is a necessary thing for an 8-bit PIC compiler, if you want code that is even close to acceptable (performance and size-wise.)
Title: Re: C word
Post by: golden_labels on September 15, 2019, 12:52:33 pm
Since no one mentioned that yet: C-FAQ (http://c-faq.com/) contains a nice collection of common pitfalls.

meh.  Nonsense.  A good programmer won't normally get very close to the edges,
More often than you think. Not only I have shown a very trivial case of an expression that does a different thing that people may expect it does, but bugs caused by violating the strict aliasing rule are widespread.

and even if they do, there code will still be more portable than if it were written in assembly language.
That is right, but… non sequitur. A portable bug is still a bug. :D

The most common pitfalls are well-known and well-described outside of the standard, and some of them are ignored because they seem to be "compiler gurus getting in the way of what needs to work."
They are not(1). This is exactly a type of accusation heard from people, who never actually learned C, but are only guessing what a given expression does. And then they are surprised, when compiler properly interprets their code and emits binary with behaviour different than expected.

If your work gets to be safety- or security- critical, you'll probably be at a large enough company where there is someone assigned to pay attention to such things
As we can see with seemingly infinite train of IoT vulnerabilities… ;)

Note that I do not suggest reading the standard from cover to cover! That’s a waste of time and one will learn little from that. I am speaking about having it nearby, to determine whether some opinion is right, or if your interpretation of your code is right.
____
(1) Unless you can show they do not follow what the language requires them to do. This, of course, happens all the time. But those are normal bugs, that are off-topic.
Title: Re: C word
Post by: SiliconWizard on September 15, 2019, 02:22:24 pm
If your work gets to be safety- or security- critical, you'll probably be at a large enough company where there is someone assigned to pay attention to such things
As we can see with seemingly infinite train of IoT vulnerabilities… ;)

Oh I agree with you on the alarming number of bugs and vulnerabilities in IoT devices, but I don't quite agree with your "point" being valid within the context mentioned by westfw.

The overwhelming majority of companies designing IoT devices are nothing similar to the kind of companies westfw was referring to when saying "safety- or security- critical". So you're kind of twisting what he was on about. Most of them are just making gadgets with the mindset that comes with that.

Of course that doesn't mean that the current state of affairs with IoT stuff is right. But seriously, unless an IoT company becomes one that correctly deals with safety and security, their products should defnitely NOT be considered neither safe nor secure by their customers. So in that respect, the customers being fools by not realizing this would be almost as liable than the software developers designing the products.

Other than that, of course I agree with you on the merits of reading standards, as I was mentioning that earlier, but again that's not for beginners. So the basics need to be well known first, otherwise, the standard will just look like gibberish, like some have suggested here, which it's definitely not.
Title: Re: C word
Post by: golden_labels on September 15, 2019, 04:35:16 pm
I am continuing with the same security/safety I was talking about initially, so there is no deviation from that. Perhaps there was some communication problem and westfw has misinterpreted what I meant. Certainly I didn’t limited that topic to some specific subset of companies or products.

And yes, I do not recommend that for just learning the language. Formal specifications are never the best place to start learning things. In particular a language specifiation will not teach anyone programming: it could at most teach them the language syntax and semantics. Writing programs is a completely different beast. And no book, no teacher, not even gazing at expert’s work 24/7 will let anyone acquire experience.
Title: Re: C word
Post by: PlainName on September 17, 2019, 09:40:07 am
Quote from: SiliconWizard
I still personally think this is a good idea to go through the C standard at least once

I think this is excellent advice. One of the problems in learning new stuff is that there is so much and little of it makes any sense to start with. It is easy to zero in on a subset that will do the biz most of the time and treat everything else as somewhat esoteric.

If you go through the entire thing you'll remember not the detail but the functionality, and it's that functionality that will help you to program. You might not recall the detail of how variadic functions work, for instance, or even what they are called, but you will remember what they do. Later, when you need that kind of functionality you'll hopefully remember that it can be done, and then it's just a matter of looking up the details.
Title: Re: C word
Post by: SiliconWizard on September 17, 2019, 01:55:13 pm
Quote from: SiliconWizard
I still personally think this is a good idea to go through the C standard at least once
If you go through the entire thing you'll remember not the detail but the functionality, and it's that functionality that will help you to program. You might not recall the detail of how variadic functions work, for instance, or even what they are called, but you will remember what they do. Later, when you need that kind of functionality you'll hopefully remember that it can be done, and then it's just a matter of looking up the details.

That's actually true for any learning process IMO, and that's why, getting back to programming, I think going through a good book about the programming language you want to learn, as a first step, IS a good idea, so you get the general idea, the basic constructs and keywords. Sure you won't remember EVERYTHING by heart, but you will have a global view of the language and will know *where to look* when you don't know something.

This is what learning is all about.
Title: Re: C word
Post by: legacy on September 17, 2019, 03:39:48 pm
Learning Erlang has been *very* difficult. For me  :o
Title: Re: C word
Post by: golden_labels on September 17, 2019, 04:13:29 pm
The C standard, just like any similar document, is not linear. You can’t simply read it page by page. It would make as much sense, as reading a datasheet of a microcontroller. Skimming over introduction paragraphs to each of the chapters to know what you are sitting on — useful. Learning chracteristics of ADC inputs while your project is 100% digital — a waste of time. But it’s worse, as even if you would memorize the C standard in detail you are likely to miss most of the important information. That knowledge is hidden behind non-obvious interactions between many different sections.

This is why I said it is useful to have one nearby, not to make it into your bedtime/toilet read. It is just too big and complex to be useful if used that way. What is much more important, is being able to know where to search when you are in doubt.

An, of course, there is even more important thing. Knowing your toolset. If you want to read any specification cover to cover, let it be the documentation of your tool.

The C standard defines how the language should work. It doesn’t mean it will always work exactly like that or that your tool have no facilities beyond the basics. For example in the AVR Libc float and double are the same type, which is against the standard, but this is how it is. On 8-bit microcontrollers C can’t even be implemented, because the conflicting specification of int, which should be exactly 8-bit and at least 16-bit at the same time. int is also wrong on x86-64, because it is 32-bit while it should be 64-bit. The standard is also giving some freedom to implementations, so char (without further specifiers) may be either signed or unsigned — and that has much more serious implications than just differences in range, as valid code written for the unsigned version may cause UB with the signed version. wchar_t is a beast that is so ill-defined, that it is nearly useless in portable programs.
Title: Re: C word
Post by: PlainName on September 17, 2019, 04:49:20 pm
Quote
Learning chracteristics of ADC inputs while your project is 100% digital — a waste of time.

That's not what I meant. As a for instance, you would take away that there is only one ADC but eight analog inputs, and you can specify your own vref (somehow - you don't even need to remember how). And you ain't doing this bit of learning just for the immediate project. The one three down the line might be branching out from your 100% digital safety.

Quote
That knowledge is hidden behind non-obvious interactions between many different sections

Much is, yes. But without the background it's very hard to put things together. It's the difference between a man with a hammer seeing every fastener as a nail, and a workman recalling that you can join things with a twisty wosname, then looking up screws et al. on the web.

Title: Re: C word
Post by: SiliconWizard on September 17, 2019, 08:39:48 pm
Learning Erlang has been *very* difficult. For me  :o

This is not surprising. Functional languages ARE hard, and Erlang can be a bit cryptic on top of that.
The fact that you will find little available source code doesn't help. It's kind of a niche language, and I haven't seen many open-source projects in Erlang...
Title: Re: C word
Post by: legacy on September 17, 2019, 09:27:09 pm
This is not surprising. Functional languages ARE hard, and Erlang can be a bit cryptic on top of that.
The fact that you will find little available source code doesn't help. It's kind of a niche language, and I haven't seen many open-source projects in Erlang...

Airports do prefer Mnesia as database because transactions are really atomic so they are sure there is no probability of collision when they sell a ticket. There are also other reasons, but this is the most important.

So I was said  :-//

Mnesia is written in Erlang, as well as a lot of things made by Ericson and used in Denmark for weird purposes.

I am using this
Code: [Select]
*  dev-lang/erlang-21.1.1
      Homepage:      [url]https://www.erlang.org[/url]
      Description:   Erlang programming language, runtime environment and libraries (OTP)
      License:       Apache-2.0

and it's possible to write callbacks in C, which opens doors to hybrid applications  :D
Title: Re: C word
Post by: SiliconWizard on September 17, 2019, 09:32:12 pm
I don't really like functional languages, but Erlang is an interesting beast nonetheless.
Title: Re: C word
Post by: legacy on September 17, 2019, 09:48:42 pm
On 8-bit microcontrollers C can’t even be implemented, because the conflicting specification of int, which should be exactly 8-bit and at least 16-bit at the same time.

Yup. On Gcc-hc11 (EOL >3.4.*) there is a user flag to set the size of "int" and the size of "short".
You can also set the size of all the "soft-registers" (registers implemented in ram).

That was the best choice ever.
Title: Re: C word
Post by: golden_labels on September 18, 2019, 04:25:56 am
I don’t know about gcc-hc11, but avr-gcc has the -mint8 flag:
Quote
Assume "int" to be 8-bit integer.  This affects the sizes of all types: a "char" is 1 byte, an "int" is 1 byte, a "long" is 2 bytes, and "long long" is 4 bytes.
That way one may choose, which of the standard guarantees are broken. You either get a 16-bit int, which breaks the requirement for int to be 8-bit on AVR8, or you get an 8-bit int, which breaks the requirement for the int to be at least 16 bits wide. The flag has, however, a drawback: it also breaks other primitive types:

    Type     │   Normal   │   -mint8   
═════════════╪════════════╪════════════
    char     │      8     │      8
    short    │     16     │      8 !
    int      │     16 !   │      8 !
    long     │     32     │     16 !
  long long  │     64     │     32 !


(‘!’ marks invalid behaviour)
Title: Re: C word
Post by: legacy on September 18, 2019, 08:03:46 am
you can use { uint8_t , uint16_t , uint32_t } instead!
Title: Re: C word
Post by: AndyC_772 on September 18, 2019, 08:57:37 am
Indeed, why would you NOT choose to use these?

Surely if code is to be bug-free and portable, its meaning must first be predictable?

I've never liked the concept of "the width of this type is compiler dependent"; it's just asking for things to break when you port code between different architectures.
Title: Re: C word
Post by: PerranOak on September 18, 2019, 02:42:19 pm
BTW when the same compiler compiles a C-prog does it always create EXACTLY the same code?
That is, if you compile it, do nothing to the code then compile it again are those two files of machine-code the same?
Title: Re: C word
Post by: SiliconWizard on September 18, 2019, 02:47:48 pm
Of course. You should definitely use stdint types when you need integer types with a fixed, guaranteed size.
In cases when you don't, the base C types can still be adequate as long as you follow the minimum ranges the standard guarantees.

This is C99 though, and unfortunately, in a very few cases, you may not have access to a C99-compliant compiler. Either because your platform is too old and doesn't have any, or because allowed compilers can only be C89/90 compliant (which is still the case in a few safety-critical environments, fortunately not all!) In this case, the old and recommended way is still to define your own integer types in an implementation-specific manner in a separate header, and mark it clearly as non-portable. A few older compilers already did this, so if there's such a compiler-provided header, use it instead of your own, obviously!

As to the width of integers, this "issue" is absolutely not specific to C. In many high-level languages, the base "integer" type doesn't have any specific width either, and is implementation-specific. If the language is standardized, there may just be a minimum guaranteed width. The idea may be debatable of course, but it also goes with the fact they are high-level languages, so it kind of makes sense.

In any case, NEVER write code that assumes a specific width larger than the minimum guaranteed one, and if you must do so, use standard constructs (like stdint in C99+) or custom ones that are CLEARLY isolated and documented.
Title: Re: C word
Post by: SiliconWizard on September 18, 2019, 02:52:36 pm
BTW when the same compiler compiles a C-prog does it always create EXACTLY the same code?
That is, if you compile it, do nothing to the code then compile it again are those two files of machine-code the same?

With all the C compilers I have ever used, that was the case.
I don't think there is any strict guarantee on this based on standards, though. So a specific compiler may for instance implement some optimizations using some random factors inside some of the algorithms, which could yield a different object code every time it compiles. But that would be kind of weird and non-predictable, so I don't think compiler designers have been inclined to doing so. It would also make automatic testing of the compilers a lot harder, so that wouldn't really make sense.
Title: Re: C word
Post by: PlainName on September 18, 2019, 02:59:30 pm
Quote
That is, if you compile it, do nothing to the code then compile it again are those two files of machine-code the same?

Good question :)

I would expect them to be, yes. In fact, back when BC++3.1 was a thing for embedded C, we  hit a situation when the same code was producing different binaries on different build PCs. There were only ever two versions (that is, some version of the code produced a binary A on one PC, B on another, but there was never a C), but it should have been the same and we spend some effort in finding out why they were different and then fixing it. Can't recall the cause now, though.
Title: Re: C word
Post by: janoc on September 18, 2019, 05:40:12 pm
Here is a good resource on modern C language - e.g. did you know that C includes threads and atomics (nothing to do with bombs) as part of the standard library now?

http://modernc.gforge.inria.fr/ (http://modernc.gforge.inria.fr/)

IMO, this is a much more relevant than the old K&R book which describes a fairly obsolete standard of the language - the second edition covering the "ANSI C" (not the original K&R C) came out in 1988! C had several revisions of the standard since and the book has not been updated.
Title: Re: C word
Post by: golden_labels on September 18, 2019, 06:17:57 pm
Indeed, why would you NOT choose to use these?
I do not see any relationship between the presented issue and using exact-width types. The sub-topic was about standard conformance of the toolsets and the code doesn’t affect that in any way.

However, let me answer your question nonetheless, because it’s another trap. Putting aside exotic architectures which don’t have those exact-width types(1), the problem is that those are merely aliases. Calculations are never performed in the exact-width domain and int size affects the calculations the same way as it would without using exact-width types. If you replace unsigned short with uint16_t in the undefined behaviour example (https://www.eevblog.com/forum/programming/c-word/msg2684055/#msg2684055) I’ve posted earlier, it will change nothing. It will still cause a signed(!) overflow in a 32-bit(!) variable.

This is the thing I was talking about earlier. People guess the meaning of their code using their intuition, instead of learning what it actually means. The misconception, that declaring a variable as (u)intN_t will make calculations limited to N bits and — for unsigned types — being performed as unsigned, is widespread for that reason. But C not only makes no such guarantee, but specifically makes that behaviour impossible. The only reason why it “works” (with strong emphasis on the quotation marks ;)) is a combination of two factors. First: compilers are allowed to do anything with code containing UB. As it happens, the easiest thing is ignoring the corner cases and emitting code that would work for valid code. Second: nearly all platforms you are dealing with nowadays are using two’s-complement representation for integers. The effect: wrong calculation is performed and yields an invalid result, but if it is later interpreted as its perceived type it “magically” has the value you expected it to have. But, technically, this is a pure accident.

uint8_t, int8_t, int_least8_t and uint_least8_t are always useless(2), because the only type they could ever alias is (un)signed char.

I've never liked the concept of "the width of this type is compiler dependent"; it's just asking for things to break when you port code between different architectures.
This is not a random idea. It’s the effect of platforms supporting different types of data. The code will never break, as long as you write it properly, understanding what the code means — instead of guessing.
____
(1) All exact-width types are optional and in fact nearly all are never provided — that applies to all available toolsets. Even the paragrap 3 ones hapen to be missing: a nice edge case is avr-gcc not having uint64_t with 8-bit int.
(2) Unless you want to save some keystrokes.
Title: Re: C word
Post by: magic on September 18, 2019, 08:03:18 pm
Here's a fun trick that will cure you from thinking that using stdint.h is a panacea :D
Code: [Select]
#include <stdint.h>

int main() {
        uint8_t a = 128;
        int8_t b = -128;
        volatile int8_t *c = 0;
        *c = a/b;
}
Compiled without -mint8 writes -1 to NULL.
Complied with -mint8 writes +1 to NULL.
Title: Re: C word
Post by: legacy on September 18, 2019, 09:10:54 pm
I do not see any relationship between the presented issue and using exact-width types.

In your mind, you are more focused to the operation size if you write it explicitly, and this helps people at matching test-cases: for - expted - and - actual - values.
Title: Re: C word
Post by: legacy on September 18, 2019, 09:21:07 pm
Calculations are never performed in the exact-width domain

On m68k, the ALU has specific instructions with the size {1,2,4} (={.b, .w, .l} in the motorola assm syntax)  both available for signed and unsigned types, and the overflow and underflow flags are set accordingly to the instruction.

RISCs ... are naughty boys, a completely different story  :D
Title: Re: C word
Post by: legacy on September 18, 2019, 09:29:28 pm
Anyway, even the C compiler "SierraC/m68k" (one of the impressive in the 90s and 2000s)  simply does not care, not even if potentially the m68k hardware could help for free since the flags are already calculated, whereas on RISC (e.g. with MIPS) the ALU does not calculate them, and you would need to add a lot of extra instructions.

It's that ... it's the C language that is a bit rotten about that  :-//
Title: Re: C word
Post by: SiliconWizard on September 18, 2019, 09:55:50 pm
Note that here again, actually reading the standard helps.

Summing it up "quickly":

- stdint.h defines integer types that are guaranteed to be of an exact width, and others that are guaranteed to be at least some specified width. Those are respectively intN_t/uint_Nt and uint_leastN_t/int_leastN_t (the latter are well less known).
- None of the former are actually REQUIRED to be implemented. Your particular implementation may contain NONE, or just some of them. It's pretty rare that there be none, but the 64-bit ones may, for instance, not be implemented. The standard says: "these types are optional".
- For the latter, as far as I've gotten it, 8, 16, 32 and 64 are REQUIRED. Others are optional. So a conforming C99 compiler shoud actually define int_least64_t for instance, but may not define int64_t. The standard says: "The following types are required:" and lists int_leastN_t/uint_leastN_t for N=8, 16, 32, 64. (I would then conclude that any compiler NOT defining int_least64_t, for instance, IS non-conforming. But it may absolutely not define int64_t.)
- Some implementations may define other, more exotic widths such as 24 bits. This is allowed, but strictly optional of course, and in practice, a lot less common than the more ubiquitous 8, 16, 32, 64 (which again may not all be defined either...)
- There are yet other categories of integer types, such as intptr_t/uintptr_t, which are useful everytime you need to cast a pointer to an integer. This is the only conforming way of doing it. But those types are also optional per the standard. They are commonly defined on implementations targetting platforms with non-segmented addresses, and may not be defined on the latter. This is of course just a technical constraint, nothing is written about that in the standard.
- AFAIK, nothing specific is said about arithmetic with stdint types, so those merely define 'storage', which is kinda obvious in a way, since in C, all a typedef can do is define a derived type, not how operations should be performed on them. Thus, we can infer from the standard that stdint types just inherit the rules from the base integer C types. Which is, admittedly, not very consistent (but that is, OTOH, understandable, as the comittee decided to provide stdint's only through a standard header, and not through any extension of the language itself. This point is important. To ensure proper arithmetics with those types, it's definitely not trivial. You need cautious casting of intermediate computations all over the place, and you need not to assume anything much. To understand this particular point, you need to have a (very) good understanding of the C language.

Of course, the moral of the story is NOT that you shouldn't use some particular standardized feature (like for instance stdint here) just because many points of it are optional per the standard. Of course, you always have to know your tools. If something is not available on your particular implementation, you will obviously not be able to use it. But you should definitely use all that's available, and find decently elegant ways of implementing the rest (as I said, in seperate headers/source files).

And yes you also have to dig a little deeper... and stdint.h is actually a good example of why reading the standard will definitely help. You're not likely to find in-depth knowledge of how to use stdint.h types properly in any cheat sheet, and not that likely to find that in many books on C.

Finally, this is also an example of C being an high-level language (despite what some may say). stdint's are provided for data structures, storage and data exchange, not for fixed-width arithmetics. C is not assembly. C is not object-oriented either (at least per se), so you can't redefine operators for new types, which would have allowed guaranteed, fixed-width arithmetic for each integer type defined in stdint.h...

As I said before, many other high-level languages would exhibit the same "problem" with arithmetics. And yes, you can work around that with those that can redefine basic operators.
In C, it takes a lot more careful approach.
Title: Re: C word
Post by: legacy on September 18, 2019, 10:21:04 pm
all a typedef can do is define a derived type, not how operations should be performed on them

That's *the* problem, and why I said "rotten"  :D
Title: Re: C word
Post by: SiliconWizard on September 18, 2019, 10:39:25 pm
all a typedef can do is define a derived type, not how operations should be performed on them

That's *the* problem, and why I said "rotten"  :D

Just a matter of perspective. If you want to use a true OO language, use one. Now defining your own types and operations on them with standard operators is also a slippery slope, hiding what goes on behind the scenes.

The idiomatic C approach, in which you can't overload operators, is to define functions instead. May not look as nice syntactically, but it works, and it makes it a bit more clear what you are actually doing. You don't have to guess how a given operator is going to be implemented in a given statement... which can cause horror stories as well.

Edit: That said, and if you were thinking about that: I admit some languages handle this much better than C. Guess what. Ada for instance. ;D
Many people probably don't suspect that Ada actually allows you to define and use very low-level stuff in a clean and consistent way.
Integer ranges just blow away any C stdint.h attempt, and behave in a logical way.
Even bit fields are much easier and predictable in Ada.  ::)

How about that?
Code: [Select]
subtype Byte_t is Natural range 0..255;  -- There you have your [0..255] unsigned integer. But that's not all...
-- As is, a Byte_t can't be greater than 255. It won't roll-over. It will raise an error if you attempt to increment one that is 255.
-- Alternately, if you want Byte_t arithmetic to roll-over, as many low-level programmers would expect, declare it as so instead:
type Byte_t is mod 256;
for Byte_t'Size use 8;  -- Now it makes sure Byte_t has actually 8-bit storage on top of "acting" like an 8-bit unsigned integer...
-- Now some cherry on the cake, give Byte_t a default value:
type Byte_t is mod 256 with Default_Value => 0;
-- And I haven't even given examples of bit fields yet...

I love C, but the more I know Ada and the more it actually looks refreshing... ;D
Title: Re: C word
Post by: PlainName on September 19, 2019, 07:28:22 am
Quote
Here's a fun trick that will cure you from...

using inappropriate types. Should be a uint_least16_t in use there.
Title: Re: C word
Post by: magic on September 19, 2019, 08:15:33 am
No, should be int_least16_t because one is allowed to be negative ;)

Or even better, int_least9_t, in the event that there is some platform with efficient 9 or 12 bit ints :popcorn:

But of course we don't want to waste storage space, so both variables should be defined as they are and only cast to a wider type right before calculation.
Title: Re: C word
Post by: legacy on September 19, 2019, 10:44:53 am
Code: [Select]
boolean_t check_exactwidth()
{
     boolean_t ok;

     ok is True;
     ok And is { language'uint8_t'size , language'uint16_t'size , language'uint32_t'size }
                   NotEqualTo? { Undefined, Undefined, Undefined };
     ok And is { language'uint8_t'size , language'uint16_t'size , language'uint32_t'size }
                   UndefinedOrEqualTo? { 1, 2, 4 };
     ok And is { language'uint8_t'range , language'uint16_t'range , language'uint32_t'range }
                   UndefinedOrEqualTo? { machinelayer'uint8_t'range, machinelayer'uint16_t'range, machinelayer'uint32_t'range };
     ok And is machinelayer'property'overflowcareful EqualTo? True;
     ok And is machinelayer'property'underflowcareful EqualTo? True;
     ok And is machinelayer'property'carrycareful EqualTo? True;

     ans is ok;
}

void early_check()
{
    ...
    is check_exactwidth() EqualTo? False
    {
         do_panic(module, id, "exact_width not possible, the following algorithm won't work properly");
    }
    ...
}

This is how my Arise-v2 HL language operates. It's not C, and there is a clear separation among { language, machine layer, machine hardware } since the language is an abstract representation of the machine layer, which is how the language is implemented in the hardware.

So, you have the machine layer, where the implementation goes, and there is a mechanism able to report details to the user, who is using the language to express his/her needs.

is_exact_width() executed on an m68k will report True not because the hardware is able to perform exact_width calculations but rather because, and only if, the machine layer has been implemented to take advantage of it.

Of course, it's a private researching stuff, nothing serious, and I don't have the arrogance to reinvent any language used in the industry (and none in the squad has enough competence and time to make it serious), it's only how my team has had fun with our softcore :D
Title: Re: C word
Post by: SiliconWizard on September 19, 2019, 02:46:08 pm
(...)
This is how my Arise-v2 HL language operates. It's not C, and there is a clear separation among { language, machine layer, machine hardware } since the language is an abstract representation of the machine layer, which is how the language is implemented in the hardware.

That looks obviously a bit cryptic to me (without the manual  ;D ), but interesting.
Did you/your team write a full compiler for that? If so, did you write it 100% from scratch, or did you use any third-party tools to help?
Title: Re: C word
Post by: legacy on September 19, 2019, 03:10:26 pm
That looks obviously a bit cryptic to me

Oh, well ... one of the "designer" (so she calls herself) is a Bulgarian girl, who loves Erlang so much that ... wanted to make everything Erlang-like. We had to work hard to convince her to have ... *cough* *cough* ... "compromises". So, if you say that the above is "cryptic", well ... don't look at Erlang stuff, and still better, don't look at her earlier works, because it's even more cryptic :D


(without the manual  ;D )

ma.nu.what?  :D

Did you/your team write a full compiler for that? If so, did you write it 100% from scratch, or did you use any third-party tools to help?

Everything from scratch, written in C, so this is proof that we don't dislike the C language. Expressions are evaluated in RPN form, and we have an AST-approach.

The HL has been only partially implemented because the machine Layer of Arise-v2 is not stable, but rather prone to changes. This doesn't help.

-

*my* point was: we prefer to have details built-in the Compiler rather than in an external #include file, and we prefer to have a clear separation among { language, machine layer, and hardware }. This really helps.
Title: Re: C word
Post by: SiliconWizard on September 19, 2019, 03:34:52 pm
So, if you say that the above is "cryptic", well ... don't look at Erlang stuff, and still better, don't look at her earlier works, because it's even more cryptic :D

I took another look the other day at Erlang after we mentioned it. It turns out "simpler" than I remembered, but the all-functional approach and the debatable syntax don't really help indeed.
Now would I consider future developments in Erlang? Not really. ;D (Also because it basically runs on a virtual machine as far as I got, I hate those.)

*my* point was: we prefer to have details built-in the Compiler rather than in an external #include file, and we prefer to have a clear separation among { language, machine layer, and hardware }. This really helps.

I am actually currently working on some kind of extension of C (with modules, and other things), the parser/analyzer of which is written in C, so I can relate to this point. I considered the preprocessor-only approach to be clunky, not fail-safe, and very limited.
Title: Re: C word
Post by: legacy on September 19, 2019, 07:27:21 pm
I am actually currently working on some kind of extension of C (with modules, and other things)

using llvm?

what surprises me is ... why in 20 years nobody has already tried to *improve* the C language without making it *too much improved* like it happens with the C++ ?
Title: Re: C word
Post by: SiliconWizard on September 19, 2019, 07:44:45 pm
I am actually currently working on some kind of extension of C (with modules, and other things)
using llvm?

Not at the moment. I may consider this later on.

what surprises me is ... why in 20 years nobody has already tried to *improve* the C language without making it *too much improved* like it happens with the C++ ?

I don't know. Maybe, or probably, because there are too many different views of what a better C should be?
There have been MANY attempts, and not just C++. (D for instance, and a lot of others, more or less close to C.) But I agree those were almost always designed to be completely new languages, and not just improvements on C.

The C standard comittee has been pretty conservative all in all. Not saying this is bad per se, it avoids too much "fragmentation". But then, it only improves C slowly and marginally...

Like, I'm adding support for "modules" (trying something clean, no more header files, no more namespace problems, and, generic modules). I'm suspecting that many people might not care for that, and would like some OO constructs instead (simpler than C++). Or built-in support for parallel computing (I might add some features about that later, that said). Or better type definition. Etc. So many things can be improved, and we probably almost all have a different idea of what it should be.

Title: Re: C word
Post by: Nominal Animal on September 20, 2019, 11:56:21 am
BTW when the same compiler compiles a C-prog does it always create EXACTLY the same code?
That is, if you compile it, do nothing to the code then compile it again are those two files of machine-code the same?
The machine code in the object files should be exactly the same (assuming no libraries, the compiler, or compiler options are changed in the mean time) but the object files often contain other data that does vary; for example, the compilation timestamp.

If the code contains something like
    const char  built[] = "Compiled on " __DATE __ " " __TIME__;
then the contents of that string vary based on when the code is compiled, and because it is marked as an immutable string, it might reside in the code section, depending on how it is linked.  (The linker can merge sections, if told to do so.)

For object files compiled using GNU tools, I might use
    diff <(objdump -d FILE1.o) <(objdump -d FILE2.o)
(for code comparison), or
    diff <(readelf -x .text -x .data FILE1.o) <(readelf -x .text -x .data FILE2.o)
(for code and initialized data comparison), or
    diff <(objdump -s FILE1.o | sed -e '/^$/d; /^[^ ]/d') <(objdump -s FILE2.o | sed -e '/^$/; /^[^ ]/d')
(for comparing all sections, but ignoring file name and section names).

The <(command) redirection above is a Bash extension, that is expanded to a path to a pipe or file containing the output of command.
That is, diff <(cat FILE1) <(cat FILE2) is equivalent to diff FILE1 FILE2 .  If you happen to use Busybox or dash instead of Bash, use temporary files instead.
Title: Re: C word
Post by: FreddieChopin on September 20, 2019, 04:47:19 pm
I am actually currently working on some kind of extension of C (with modules, and other things)

using llvm?

what surprises me is ... why in 20 years nobody has already tried to *improve* the C language without making it *too much improved* like it happens with the C++ ?

Stick welding hasn't changed for decades either and yet gets the job done every day. You can't improve a perfect solution.
Title: Re: C word
Post by: Bassman59 on September 22, 2019, 07:43:15 pm

Edit: That said, and if you were thinking about that: I admit some languages handle this much better than C. Guess what. Ada for instance. ;D
Many people probably don't suspect that Ada actually allows you to define and use very low-level stuff in a clean and consistent way.
Integer ranges just blow away any C stdint.h attempt, and behave in a logical way.
Even bit fields are much easier and predictable in Ada.  ::)

How about that?
Code: [Select]
subtype Byte_t is Natural range 0..255;  -- There you have your [0..255] unsigned integer. But that's not all...
-- As is, a Byte_t can't be greater than 255. It won't roll-over. It will raise an error if you attempt to increment one that is 255.
-- Alternately, if you want Byte_t arithmetic to roll-over, as many low-level programmers would expect, declare it as so instead:
type Byte_t is mod 256;
for Byte_t'Size use 8;  -- Now it makes sure Byte_t has actually 8-bit storage on top of "acting" like an 8-bit unsigned integer...
-- Now some cherry on the cake, give Byte_t a default value:
type Byte_t is mod 256 with Default_Value => 0;
-- And I haven't even given examples of bit fields yet...

I love C, but the more I know Ada and the more it actually looks refreshing... ;D

It’s worth noting that VHDL has had this sort of range feature since forever. (VHDL borrows from Ada, of course.) It’s something Verilog never had — does SystemVerilog have such a feature?
Title: Re: C word
Post by: SiliconWizard on September 22, 2019, 08:16:22 pm
Yes of course, VHDL was directly derived from Ada, and actually borrows a lot from the Ada language. Much more so than Verilog "borrows" from C, which was often what could be heard (VHDL comes from Ada, Verilog from C: that's false! VHDL is a lot closer to Ada than Verilog ever was to C.)

As to SV, I don't know it much, so, hoping someone knowledgeable can answer your question.

And now just a bonus for bit fields with Ada:
Code: [Select]
-- The "high-level" definition of a record:
type Count7_t is mod 128;

type BitField_t is
record
Enable : Boolean;
Count : Count7_t;
Status : Natural range 0..255;
end record;

-- Now describe how it's layed out at the bit level:
for BitField_t use
record
Enable at 0 range 0..0;
Count at 0 range 1..7;
Status at 0 range 8..15;
end record;

-- Finally, ensure it's exactly 16-bit wide:
for BitField_t'Size use 16;

-- Define endian-ness:
for BitField_t'Bit_Order use Low_Order_First;
Title: Re: C word
Post by: westfw on September 23, 2019, 09:37:15 am
Quote
subtype Byte_t is Natural range 0..255;  -- There you have your [0..255] unsigned integer. But that's not all...
-- As is, a Byte_t can't be greater than 255. It won't roll-over. It will raise an error if you attempt to increment one that is 255.

You're saying that by default, Ada won't let you do math on 8bit variables without checking each result for overflow?
(unless your declaration gets more complicated...)

Ouch?  (I mean, I guess you define types the way you want them to behave, and then just use those all the time.  But still - ouch!)

(Isn't that exactly what crashed Ariane, too?  An overflow exception in code where no one thought they needed to care?)
Title: Re: C word
Post by: PlainName on September 23, 2019, 11:28:32 am
Quote
(Isn't that exactly what crashed Ariane, too?  An overflow exception in code where no one thought they needed to care?)

I don't know the details, but looking at this as failure to trap an exception...

It's a rock and a hard place kind of thing. If you don't trap the exception then you're in big trouble, but what if there was no exception? The options are to roll over or do nothing, and either could be as fatal as an unhandled exception. On the one hand they might coincidentally result in a non-fatal (at that point) value, but it's a wrong value nevertheless. On the other hand with the exception you at leas get to know something is wrong and have a chance to do something about it before it goes further.
Title: Re: C word
Post by: legacy on September 23, 2019, 11:45:18 am
Isn't that exactly what crashed Ariane, too?  An overflow exception in code where no one thought they needed to care?

Hard to believe since exceptions have low-level constraints, hence to pass the DO178B/level A-C they MUST be checked in both normal and abnormal conditions. However, the worst-case cannot consider what might happen when the hardware gets too seriously damaged. Modules have two ways of redundancy, but not for everything.

Title: Re: C word
Post by: Nominal Animal on September 23, 2019, 01:46:22 pm
Higher-level languages with an actual runtime will have a hard time replacing really low-level languages like C, because of the engineering differences.

It is easy to miss how truly simple C really is.  Almost everything is part of the standard library, which is only provided in hosted environments; a freestanding environment is truly simple.

I have played with the notion of replacing C with something more appropriate for the tasks I have used it (from microcontrollers to kernels to low-level libraries to applications), and I've come to realize that the features C is missing or should replace with something, are rather simple.

For example, we really need a notion of compile-time only data structures, as a completely new concept.  As a practical example, consider the I/O pin configuration on a microcontroller.  We really do not care how they are initialized, just that they are initialized to a specific state.  Fortran foreach loops are a small step in this direction, decoupling the order in which iterations are done, making it possible for a compiler to easily parallelize such loops.  The fundamental idea is a way to represent final states, when intermediate states or order of operations or other side effects are unimportant.

Currently, C has two contexts: hosted and freestanding.  Splitting these into facilities would make a lot of sense.  For example, when using GCC on microcontrollers you actually use an interesting subset of C++.  The Linux kernel uses a subset of C that excludes floating-point math unless special precautions are taken.  And so on.

As of 2019, there are two approaches to synchronization primitives in hardware: load-link/store-conditional (https://en.wikipedia.org/wiki/Load-link/store-conditional) and compare-and-swap (https://en.wikipedia.org/wiki/Compare-and-swap).  If we consider synchronization primitives a facility, then LL/SC and CAS would be alternate implementations of that facility.

A lot of hardware can do atomic loads, stores, and even additions and subtractions.  These should be exposed as a facility.

As you can see, the above points are really about the "standard library", and modularizing it in a way that is different to now, but better corresponds to the actual use cases we can see in various software project types.  Some of the facilities are provided by the user, some by the environment, some by the compiler (consider e.g. udivdi3 helper when using GCC).

The language itself needs some new concepts.  Arrays with compile-time boundary checking, for example; arrays with run-time boundary checking; vector types.  One important concept we need is splitting casting into conversion and reinterpretation.  Conversion occurs when a value of one type is converted to the same or as similar value as possible in the other type; reinterpretation occurs when the storage bit pattern is used with a different type of the same size.

I do not believe atomic types is the right direction, because atomicity is a property of the access and not of the variable itself.  I think a variable attribute indicating atomic access being required would be much better.  (Similarly, variable attributes specifying byte order would be useful.  As would be a way to specify data structures exactly.)

The sequence point semantics in C are too strong.  In many cases, the order of operations and their side effects really does not matter, and it would be better if the compiler could choose/optimize them as it sees fit.  A lot of initialization loops fall into this category.  Also, instead of global atomicity, it would be useful to give operations compile-time "tags", and provide primitives for compiler/hardware synchronization within each "tag".  An example of such use is dual counters for opportunistic read-only access for rarely-modified data structures: modifiers increment one before, and the other after, modifying the data.  An opportunistic reader gets the latter counter first, then the data, and then the first counter; the data is reliable only if the counter values matches.  The order in which the counter values and their increments are visible is paramount, and is easily b0rked by caching strategies.
In practice, foreach-type order-ignoring loops, and some way to mark entire scopes as "side effect order and sequence points irrelevant", might suffice.
I suspect that turning the entire thing on its head, dropping sequence point order unless explicitly marked, would be better.

The biggest flaw in C is that it does not have a native facility exposing the status flags.  For example, an internal ABI would be much more efficient, if it used a status flag, say carry flag, to indicate failure.  This way, a function could have one return type for success, and completely another for failure cases.  In essence, the status flag would allow a kind of "exception" handling, except in a form native to basically all current hardware that can support C.

Even more interesting would be if the language allowed multiple ABIs in the same project -- it would have to do that on a per-callable basis.  It would help a lot in implementing low-level libraries for various higher-level programming languages.  Static introspection would be useful here too, for example with a new ELF section describing exported callables, so that higher-level languages could call these directly, without any special shims in between; much like ctypes for Python.

As a whole, these changes would neither bring it closer to assembly, nor add higher-level abstractions; it's a step sideways.
Title: Re: C word
Post by: SiliconWizard on September 23, 2019, 03:28:51 pm
Quote
subtype Byte_t is Natural range 0..255;  -- There you have your [0..255] unsigned integer. But that's not all...
-- As is, a Byte_t can't be greater than 255. It won't roll-over. It will raise an error if you attempt to increment one that is 255.

You're saying that by default, Ada won't let you do math on 8bit variables without checking each result for overflow?
(unless your declaration gets more complicated...)

You think the type definition I gave just below this is complicated? :o

It's not a matter of being complicated? It's just what a "range" is about. Basic language stuff. If said variable gets out of its range and that can be caught at compile-time, you'll get a compiler error or  warning. If it can't, and that happens at run-time, you'll get an exception. Ranges add constraints, why would you want them to basically get ignored?

Again if you want an 8-bit type that behaves as you probably expect as a low-level programmer, just declare it as "mod 256". Make sure this is what you really want, because in real programs, assigning a value greater than 255 to an 8-bit variable, unless you want it truncated on PURPOSE, is really bad, and C in that respect has only very limited abilities to detect it (assigning from a larger width integer to a smaller one without a cast will get you a warning, provided you enabled the corresponding warning... which is often not the case by default. Now that is ouch, and very commonly seen in C code.)

And as I stated after, ranges and size are two different things in Ada. Just because you declared a range of 0..255 doesn't mean you'll directly get an 8-bit variable either. You'll just get an integer the acceptable values of which are in this range. The size, I think, would depend on target and compiler decisions by default, unless you give it a clear spec with the "Size" attribute.

Ouch?  (I mean, I guess you define types the way you want them to behave, and then just use those all the time.  But still - ouch!)

Not sure what the ouch is about actually. If you define constrained types, you'll get the constraints that go with them instead of just getting random/"implementation-defined" stuff as often happens in C.
See above once again. If you want an 8-bit type that rolls over by itself, declare it "mod 256".

If OTOH, you'd like specific features, such as having the value "capped" to the minimum or maximum of its range instead of rolling over, you'll have to define a specific type, overloading the base operators (yes Ada is also an OO language.) Really not that complicated either.

(Isn't that exactly what crashed Ariane, too?  An overflow exception in code where no one thought they needed to care?)

I don't quite remember, but any unexpected overflow can lead to catastrophic results. It's kind of weird to think that not taking any action when one occurs would be better than raising an exception.
Of course you have to handle exceptions properly.
Had it not raised an exception, it could have led to bogus results, possibly controlling a key element of the system unexpectedly and make it crash as well.

The problem here was "no one thought they needed to care". Not that an exception was raised, or the language!

Some people may prefer programming in a "I don't know for sure what will happen in some cases, but hey, it compiles".
I remember we had a team joke in a job years ago. One embedded developer, when asked if his code was ready to be integrated, would sometimes say: "it compiles, so it must work". We ended up using this as some kind of meme when we needed a laugh.
 ;D

Of course ranges, and parameters integrity in general, can be enforced in other languages including C. You'll just have to do this manually, which can get tedious (but has also distinct advantages, I talked about parameters/variable checking in another thread...)

As to the merits of exceptions themselves, which are nothing specific to Ada, but are commonly implemented in most high-level languages these days (except C, and if I got it right, Rust?), this is yet another debate, and would actually be an interesting one.
Title: Re: C word
Post by: FreddieChopin on September 23, 2019, 05:05:29 pm
Quote
subtype Byte_t is Natural range 0..255;  -- There you have your [0..255] unsigned integer. But that's not all...
-- As is, a Byte_t can't be greater than 255. It won't roll-over. It will raise an error if you attempt to increment one that is 255.

You're saying that by default, Ada won't let you do math on 8bit variables without checking each result for overflow?
(unless your declaration gets more complicated...)

You think the type definition I gave just below this is complicated? :o

It's not a matter of being complicated? It's just what a "range" is about. Basic language stuff. If said variable gets out of its range and that can be caught at compile-time, you'll get a compiler error or  warning. If it can't, and that happens at run-time, you'll get an exception. Ranges add constraints, why would you want them to basically get ignored?

Again if you want an 8-bit type that behaves as you probably expect as a low-level programmer, just declare it as "mod 256". Make sure this is what you really want, because in real programs, assigning a value greater than 255 to an 8-bit variable, unless you want it truncated on PURPOSE, is really bad, and C in that respect has only very limited abilities to detect it (assigning from a larger width integer to a smaller one without a cast will get you a warning, provided you enabled the corresponding warning... which is often not the case by default. Now that is ouch, and very commonly seen in C code.)

And as I stated after, ranges and size are two different things in Ada. Just because you declared a range of 0..255 doesn't mean you'll directly get an 8-bit variable either. You'll just get an integer the acceptable values of which are in this range. The size, I think, would depend on target and compiler decisions by default, unless you give it a clear spec with the "Size" attribute.

Ouch?  (I mean, I guess you define types the way you want them to behave, and then just use those all the time.  But still - ouch!)

Not sure what the ouch is about actually. If you define constrained types, you'll get the constraints that go with them instead of just getting random/"implementation-defined" stuff as often happens in C.
See above once again. If you want an 8-bit type that rolls over by itself, declare it "mod 256".

If OTOH, you'd like specific features, such as having the value "capped" to the minimum or maximum of its range instead of rolling over, you'll have to define a specific type, overloading the base operators (yes Ada is also an OO language.) Really not that complicated either.

(Isn't that exactly what crashed Ariane, too?  An overflow exception in code where no one thought they needed to care?)

I don't quite remember, but any unexpected overflow can lead to catastrophic results. It's kind of weird to think that not taking any action when one occurs would be better than raising an exception.
Of course you have to handle exceptions properly.
Had it not raised an exception, it could have led to bogus results, possibly controlling a key element of the system unexpectedly and make it crash as well.

The problem here was "no one thought they needed to care". Not that an exception was raised, or the language!

Some people may prefer programming in a "I don't know for sure what will happen in some cases, but hey, it compiles".
I remember we had a team joke in a job years ago. One embedded developer, when asked if his code was ready to be integrated, would sometimes say: "it compiles, so it must work". We ended up using this as some kind of meme when we needed a laugh.
 ;D

Of course ranges, and parameters integrity in general, can be enforced in other languages including C. You'll just have to do this manually, which can get tedious (but has also distinct advantages, I talked about parameters/variable checking in another thread...)

As to the merits of exceptions themselves, which are nothing specific to Ada, but are commonly implemented in most high-level languages these days (except C, and if I got it right, Rust?), this is yet another debate, and would actually be an interesting one.

It's ofen useful to ignore variable range (allow overflow) in order to do fun bit hacks. For example blink LED every three iterations of main loop (software PWM):
Code: [Select]

static uint8_t blinkCounter = 0x00;

/* This is called on every loop in main() */
void MyApplication::LoopFunc(void)
{
        blinkCounter++;

        if(blinkCounter % 3)
              *port_io1 = 0U;
        else
              *port_io1 = 1U;
}

void main(void)
{
      TRISA = 0x00;
      ANSEL = 0x0000;
      LATA = 0x00;

      MyApplication app = MyApplication(&LATA);
 
      app.SetupFunc();

      while(1U)
      {
             app.LoopFunc();
             
             if(app.ShouldExitLoop())
                     break;
      }
}

Title: Re: C word
Post by: SiliconWizard on September 23, 2019, 07:39:01 pm
It's ofen useful to ignore variable range (allow overflow) in order to do fun bit hacks. For example blink LED every three iterations of main loop (software PWM):

In your example, you're actually neither ignoring range, nor exactly "allowing overflow" (which is a vague term, as it could well overflow in any way, and not just roll-over, if you just state that you're "allowing overflow".) And it's not really a "hack" actually...

You've in fact selected uint8_t so that the counter variable could hold values that are at least in the 0 to 2 range (which is guaranteed indeed as uint8_t is defined to be exactly 8-bit). You selected that type for a reason. You didn't ignore its range!

And then, you assumed that it will roll-over on overflow. I'd have to re-read the C standard (or C++ here, as that's what you used) to check that, but I'm still not 100% sure it guarantees that integers actually roll over. It's a perfectly reasonable assumption when one know that those languages compile pretty close to the metal, and that myriads of programmers have assumed roll-over (including myself), I'm just saying I don't actually KNOW for sure it's even guaranteed! (Now if someone has enough courage to look that up in the standard right now, please have at it. ;D )
(And for the record, yes, some CPUs actually can be configured to do various things on overflow, and not just roll over! So the possibility is physically there.)

I gave enough info above, I think, so that you could write the exact same thing in Ada with as much ease.
Something like:
Code: [Select]
-- Assuming your blink counter is only used in the way you showed:
type BlinkCounter_t is mod 3 with Default_Value => 0;   -- The good thing with default values is that you don't need to actually initialize any variable of this type
...
blinkCounter : BlinkCounter_t ;
...
blinkCounter := blinkCounter  + 1;
if blinkCounter /= 0 then
...
else
...
end if;
...

If you are going to use the counter for something else and still want it to roll over exactly 8 bits, you can replace the above with:

Code: [Select]
type BlinkCounter_t is mod 256 with Default_Value => 0;
...
-- The 'if' is now:
if (blinkCounter mod 3) /= 0 then
....

I don't think it's really neither hard, not much longer to write than the C counterpart, and the good thing is you have absolutely no assumption to make. This would be guaranteed to work on absolutely ANY target.

Title: Re: C word
Post by: boz on September 23, 2019, 09:34:22 pm

Code: [Select]
if(DefCon=5) LaunchNukes(); 

Whats not to love about C  :)
Title: Re: C word
Post by: legacy on September 23, 2019, 09:58:06 pm
Why not hate C? Because I seriously *hate* people not using braces after an if-statement

if (expr) { foo(); }

Today I spent something like 3 hours at debugging u-boot for a *stupid* bug caused by someone who hadn't had included within braces the statement following "if" by a single statement and ... when his colleague applied the patch, it added something in the wrong way  :palm:
Title: Re: C word
Post by: magic on September 23, 2019, 10:11:52 pm
Unsigned overflow works normally in C/C++.

Signed overflow is undefined behavior, meaning the compiler is free to replace your whole code with one drawing an ASCII d*ck on the screen and that's standard-compliant.

They want to mandate two's complement signed numbers in the next version of C++. Maybe then signed overflow will be defined, maybe not.
Title: Re: C word
Post by: SiliconWizard on September 23, 2019, 10:58:00 pm
Why not hate C? Because I seriously *hate* people not using braces after an if-statement

if (expr) { foo(); }

Today I spent something like 3 hours at debugging u-boot for a *stupid* bug caused by someone who hadn't had included within braces the statement following "if" by a single statement and ... when his colleague applied the patch, it added something in the wrong way  :palm:

That is a common pitfall. Some teams have rules about that.
I personally have a less strict rule in most cases: don't mix braced and non-braced expressions in a single if statement. That looks confusing and can be risky for maintenance. But I don't require braces in all cases.

For instance, I'll be ok with:
if (....)
    ...;
else
    ...;

but not with:
if (...)
{
    ....;
    ........
}
else
    ...;

Of course your rule of always using braces can be defended.

As to applying patches... tell me about it.
I'm severly against applying patches (diff's) blindly to an existing code base. I usually always do it by hand-merging. In team work, that's an opportunity for code reviews and explaining to others why you modified or added this or that. Automatic merging looks nice, but it's an infinite plague.
Title: Re: C word
Post by: westfw on September 24, 2019, 02:00:26 am
Quote
It's just what a "range" is about.
I guess I'm more boggled by defining your single-byte variables as "ranges" in the first place.  Is "for Byte_t'Size use 8;" a modifier, or a replacement for the "range"?
What happens with "subtype Byte_t is integer; for Byte_t'Szie use 8'" ?


Quote
Unsigned overflow works normally in C/C++.
Signed overflow is undefined behavior, meaning the compiler is free to replace your whole code with one drawing an ASCII d*ck on the screen and that's standard-compliant.
nonsense.  Signed overflow is a runtime condition, so the compiler can't detect it.  I suppose that it could specifically detect a signed overflow and choose to do something weird, but...



Title: Re: C word
Post by: golden_labels on September 24, 2019, 06:32:02 am
nonsense.  Signed overflow is a runtime condition, so the compiler can't detect it.  I suppose that it could specifically detect a signed overflow and choose to do something weird, but...
I believe it’s pretty obvious that magic made a justified simplification here. But, if you want to be very precise, let me rephrase magic’s statement: signed overflow is undefined behavior, meaning the compiler is free to generate code that, if signed overflow occurs during runtime, will draw an ASCII d*ck on the screen and that's standard-compliant. ;)

What currently is observed, is that compilers generate code that happily calculates an invalid value if a signed overflow occurs. Since very often that value is later converted to unsigned range AND because two’s complement is widely used(1) AND most processors survive signed overflows(2), the final result “magically” matches the expected outcome. But that is a pure coincidence — it is not determined by what is written in the code. :)

____
(1) Therefore, for additive operations, there is no difference between unsigned and signed ranges.
(2) TBH I do not know hardware that doesn’t have well-defined signed overflow condition.
Title: Re: C word
Post by: legacy on September 24, 2019, 11:58:14 am
What currently is observed, is that compilers generate code that happily calculates an invalid value if a signed overflow occurs. Since very often that value is later converted to unsigned range AND because two’s complement is widely used(1) AND most processors survive signed overflows(2), the final result “magically” matches the expected outcome. But that is a pure coincidence — it is not determined by what is written in the code. :)

... in avionics you usually have redundancy and CBITs.

Redundancy is performed by a Voter as a trusted piece of hardware that is assumed to be "intrinsically safe" (yes, it's an hypothesis, but it's verified by severe grades), but this only has a crucial role regarding interrupts response and CPU cycles on the physical bus.

Hence, it monitors how each CPU physically operates on the BUS, measuring the integrity of the CPU from its behavior on the BUS.

But there is another grade to measure integrity, and this is where CBITs play their role.

Continouls bit internal tests are scheduled as an interrupt driven task. When the Voter requires a CBIT by firing an interrupt, each CPU must respond with the correct answer.

The Voter might ask "hey, CPU(i)? are you still alive? yes, because you have just answered my interrupt(i), but are you drunken or ready to rock? let's check out by submitting this simple question: let's calculate the sum of n over i x to the i y to the n minus i for i from nought [= zero] to n from tab1 (you have a copy in rom1), what is the result?"

This way the Voter can check both the ROM integrity and the full integrity of the ALU, and if the CPU cannot handle overflows, it will be embarrassing, because the given answer will be wrong, hence the Voter will think the CPU is drunken (or damage), therefore it will exclude it from the bus.


The above is usually written in assembly (BPUM) and C(BSP), but in Ada .... well, we have some applications that must be written in a similar way, since they need to check their operative integrity.


Mission-critical means that If you have a system with m grades of redundancy, and n nodes report failure, then (m-n) must be > the minimal grade to assure the mission can have success.

Otherwise, the mission is aborted, which usually means you cannot even take off if you are on the ground and the system calculates you need strong Maintenance, or ... even worse, when you are not on the ground, your aircraft is too damaged to go on.

In both cases, you need the software to be able to check overflows in the correct way, and it would be very stupid if a bug made the Voter think the hardware is defective when it's just a software bug.

You might be fired for this, or even worse, you will be forced to have the worst fifteen min of your life in the Colonel Mustard's office.
Title: Re: C word
Post by: magic on September 24, 2019, 01:35:33 pm
Signed overflow is a runtime condition, so the compiler can't detect it.  I suppose that it could specifically detect a signed overflow and choose to do something weird, but...
Sometimes it can be predicted in advance and the compiler may choose to do very weird things indeed.

Real bugs have been found in the wild when compilers decided that certain variable having certain value leads to UB, therefore the variable will never have that value in any valid run of the program, therefore some code can be modified by assuming the variable must have different values, because it doesn't affect any valid run of the program. Then of course somebody calls the code with a wrong argument and a complete clusterfuck unfolds, in addition to the UB, which may or may not even be a real problem by itself.
Title: Re: C word
Post by: legacy on September 24, 2019, 02:05:05 pm
And today I am fixing this

Quote
In most cases we improve GCC to exploit well defined behaviors of the standard. In this case we created defined __builtin_constant_p with insufficient documentation to allow a user to reasonably predict the surprising behavior shown in this testcase.

GCC has created a path which will never be executed and used that to introduce a constant which does not exist in the source. Unless you know what jump-threading can do, this transformation isn't obvious.


gcc-7 has an "optimization" pass that completely screws up, and generates the code expansion for the (impossible) case of calling ilog2() with a zero constant, even when the code gcc compiles does not actually have a zero constant.

And we try to generate a compile-time error for anybody doing ilog2() on a constant where that doesn't make sense (be it zero or negative). So now gcc7 will fail the build due to our sanity checking, because it created that constant-zero case that didn't actually exist in the source code.

There's a whole long discussion on the kernel mailing about how to work around this gcc bug. The gcc people themselevs have discussed their "feature" in

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785

but it's all water under the bridge, because while it looked at one point like it would be solved by the time gcc7 was released, that was not to be.

So now we have to deal with this compiler braindamage.

And the only simple approach seems to be to just delete the code that tries to warn about bad uses of ilog2().

So now "ilog2()" will just return 0 not just for the value 1, but for any non-positive value too.

It's not like I can recall anybody having ever actually tried to use this function on any invalid value, but maybe the sanity check just meant that such code never made it out in public.


Customers have just update their gcc compiler to v7 and a coupld of their 4.9 Kernels need to be ... patched  :scared:
Title: Re: C word
Post by: Nominal Animal on September 24, 2019, 02:36:30 pm
The reason C has Undefined Behaviour is itself an interesting tale; a relic from times when the compiler makers had no idea what kind new hardware would be, and they wanted to ensure they could support those as well.

Arithmetic rules are one of the "facilities" I'd like to control at compile time on an expression by expression basis.  I want to be able to choose between modular arithmetic with two's complement for negative values and bounded/signaling ranges.  The hardware can do this just fine (although in most cases the former is the "native" approach with zero overhead, and the latter involves generating explicit checks); it is just that C does not have a facility to express those.

If you look at e.g. C99 fenv() (http://man7.org/linux/man-pages/man3/fenv.3.html), the current approach is to treat everything "mathematical" as a thread/task property; this is an unnecessary abstraction that does not help in writing robust code.

(The reason I do not see types as a solution, is because that would force every expression combining variables from more than one type to use temporary variables or explicit conversions/casts.  IMNSHO, the arithmetic "rules" are not a type property, but a property of each expression.)

I've mentioned this before, but seeing what happened with C11, I have completely lost any faith in the direction the C standards committee and even GCC developers are pushing the language.  A large number of compiler developers are even pushing for C to be "merged" with C++ (perhaps not in a literal sense, but definitely in the sense of defining C concepts and approaches in terms of C++), which is kind of odd, but completely understandable when you realize that it is possible to consider C a subset of C++, even though it is factually incorrect.

The only true pushback against inaninites (as in, "but the standard says it is UB so we CAN do nasal daemons", when asked to make the compiler do the only sane thing in a particular case instead of shitting itself) in GCC I have seen have come from Linux kernel developers.  I got so fed up, I haven't submitted GCC bug fixes since 2015 or so.
Title: Re: C word
Post by: magic on September 24, 2019, 02:40:20 pm
What happened in C11?

I thought it only added a bunch of libraries (based on new C++11 libs, indeed).
Title: Re: C word
Post by: Nominal Animal on September 24, 2019, 03:40:29 pm
What happened in C11?
For starters, C++ stuff included, Microsoft-only (optional) "safe" I/O variants added, but not even getline() from POSIX.1.  The committee is now stuffed with MS stooges, whose main focus is making sure MS can keep treating C as a subset of C++ and does not have to change their product.  The C language itself is not making any progress, unless you consider pushing for C to become a strict subset of C++ progress.

See, I've helped C programmers for years, and know how useful it would be for learners to use a subset of POSIX.1 in the learning process.  The key things are getline() (http://man7.org/linux/man-pages/man3/getline.3.html) (unlimited line lengths), nftw() (http://man7.org/linux/man-pages/man3/nftw.3.html) (directory tree exploration), fnmatch() (http://man7.org/linux/man-pages/man3/fnmatch.3.html) and glob() (http://man7.org/linux/man-pages/man3/glob.3.html) (file name matching and globbing), regcomp()/regexec() (http://man7.org/linux/man-pages/man3/regex.3.html) (regular expressions for text matching), fwide() (http://man7.org/linux/man-pages/man3/fwide.3.html)/fgetws() (http://man7.org/linux/man-pages/man3/fgetws.3.html) and so on (for wide input; even POSIX has dropped the ball on getwline()).  All of these are trivially supported across all currently-used operating systems; it's just that there is one vendor who does not want to, with enough chairs in the C standard committee, stopping any such efforts.  Yet, I don't want to be the asshole who shows them that the desktop OS they so dearly love is actually a well-designed trap to keep them tied to a single OS vendor.

Unless something changes drastically about the C standard committee, the C language itself is fucked: it is destined to become a strict subset of C++ because a single vendor finds it suits their product better, and has spent enough money to stuff the committee.
Title: Re: C word
Post by: SiliconWizard on September 24, 2019, 03:58:03 pm
I guess I'm more boggled by defining your single-byte variables as "ranges" in the first place.  Is "for Byte_t'Size use 8;" a modifier, or a replacement for the "range"?

You may be too used to very low-level stuff maybe?
Value ranges and storage size are two different things. This is a place where C can be completely confusing. (And again, of course you can implement/enforce ranges manually, but that's not part of the language itself.) By default, implementations assume that the storage size implies value range. We're used to that, but Ada gives you more.

The 'Size' attribute (Ada has attributes, as some other languages do) defines storage size in bits. It's called an attribute, but I guess you can see that as a "modifier". It's absolutely no replacement for the range. A range makes a "subtype", which inherits its parent's attributes (for instance, integer or natural here).

What happens with "subtype Byte_t is integer; for Byte_t'Szie use 8'" ?

That's an interesting question, which shows you have grasped the idea!
(Just a side note, your declaration wouldn't work, you can't declare a subtype as a base type (integer here), but that's just a technical Ada thing. I'll assume you declared a type that requires more bits than what you then defined in the Size attribute.)

The above won't work on most platforms (compilers will reject that). I'd have to brush up on the standard, but I think "integer" is implementation-defined, a bit like "int" in C. And as with C, in MOST cases, "integer" is wider that 8-bit, with an according range. (I don't think you'll ever run into a platform for which an Ada compiler will define "integer" as an 8-bit word!)

Thus, your declaration would be contradictory, and not allowed. GCC, for instance, would give that: "size for "Byte_t " too small, minimum allowed is xxx" (xxx depending on how many bits are required for the integer type).

You can't define types with sizes too small to hold their ranges, but the other way around is perfectly possible.

Quote
Unsigned overflow works normally in C/C++.
Signed overflow is undefined behavior, meaning the compiler is free to replace your whole code with one drawing an ASCII d*ck on the screen and that's standard-compliant.
nonsense.  Signed overflow is a runtime condition, so the compiler can't detect it.  I suppose that it could specifically detect a signed overflow and choose to do something weird, but...

As I said earlier, I actually was not sure what the standard had to say about integer overflow (signed or unsigned for that matter), and wasn't feeling like looking that up.
But, I just did. And this is what C99 says:
Quote
A computation  involving  unsigned  operands  can  never overflow, because  a  result  that  cannot  be  represented  by  the  resulting  unsigned  integer  type  is reduced  modulo  the  number  that is  one  greater  than  the  largest  value  that  can  be represented by the resulting type."

So, it defines the "roll-over" behavior we talked about for unsigned indeed. As it doesn't actually say anything about signed operands, we can conclude that's it's by definition an undefined behavior indeed.
The compiler can't detect that an operation will overflow (unless its static analysis features are good enough and it can be deducted from static analysis only), but it can trivially detect that you are using signed integers in an operation that *could* overflow (such as + or -), and generate appropriate code to ensure a specific overflow behavior if overflow happens.
Title: Re: C word
Post by: golden_labels on September 24, 2019, 06:58:09 pm
For starters, C++ stuff included, Microsoft-only (optional) "safe" I/O variants added, but not even getline() from POSIX.1.  The committee is now stuffed with MS stooges, whose main focus is making sure MS can keep treating C as a subset of C++ and does not have to change their product.  The C language itself is not making any progress, unless you consider pushing for C to become a strict subset of C++ progress.
Microsoft is not even supporting C anymore. They are about 20 years late, so they would have to put a lot of effort now to support even the current version of the language. So how can they want to keep anything, if there is nothing to keep? While Microsoft has introduced the _s versions of some functions, it is not even supporting them in full (see K.3.7.4.3 as an example). And the idea itself is not Microsoft’s: avoiding returning newly allocated resources was a normal part of best practices before C11 was released. I don’t like Microsoft myself, but you can’t blame them for everything.

See, I've helped C programmers for years, and know how useful it would be for learners to use a subset of POSIX.1 in the learning process.  The key things are getline() (http://man7.org/linux/man-pages/man3/getline.3.html) (unlimited line lengths),
This would go against the minimalism principle of C. It would essentialy be more convenient fgets. While the _s functions are also duplicates, they are not a part of C core specification and they bring significant gains by offering the proper syntax, which should’ve been used from the begining.

nftw() (http://man7.org/linux/man-pages/man3/nftw.3.html) (directory tree exploration), fnmatch() (http://man7.org/linux/man-pages/man3/fnmatch.3.html) and glob() (http://man7.org/linux/man-pages/man3/glob.3.html) (file name matching and globbing),
That would require including a completely new concept to C: that of a file system and directories hierarchy. And that would be a very specific definition of it, which may become outdated very fast. Also see the next paragraph…

regcomp()/regexec() (http://man7.org/linux/man-pages/man3/regex.3.html) (regular expressions for text matching),
Writing a proper regex engine requires a lot of effort. Much more than the rest of the C library. For what? A few rare use cases, for which you would use a proper, 3rd party library anyway? That also applies to the filesystem-related suggestions. What would be the use of those in a C program? Those belong to the business logic, not interfacing system or writing firmware for a microcontroller, and they have limited use even in general. Do not confuse your specific domain with the whole world.

Unless something changes drastically about the C standard committee, the C language itself is fucked: it is destined to become a strict subset of C++ because a single vendor finds it suits their product better, and has spent enough money to stuff the committee.
If it is, it’s because the language is low-level, but in fact it doesn’t correspond to the real hardware in 2019. It’s still PDP-11, relaxed a bit to allow different platforms. But there is no competition.
Title: Re: C word
Post by: hamster_nz on September 24, 2019, 10:10:50 pm
...it’s because the language is low-level, but in fact it doesn’t correspond to the real hardware in 2019. It’s still PDP-11, relaxed a bit to allow different platforms.

All programming languages and runtime environments (and you could alos argue even ISAs, especially if they are microcoded) don't correspond to real hardware, but are pretty much abstracted virtual machines, with different levels of details hidden away from the programmer. The original PDP-11/20 was a 16-bit machine without any MMU, without hardware floating point, without H/W multiply, and with 32Kbyte of memory - almost below today's lowest-end micros.

Are you indirectly saying that C isn't suited to the following use cases:

- CPUs that use twos complement binary?
- CPUs with hardware multiply or divide?
- CPUs with 32-bit register machines?
- CPUs with 64-bit register machines?
- CPUs with an address space > 18-bits?
- CPUs with floating point H/W?
- CPUs that don't have a dedicated stack pointer (like most RISC designs?
- CPUs with support for paging/virtual memory?
- Storage that can go > 1,000 MB/s (I got an NVMe disk in my Intel NUC!)
- CPUs with vector instructions? (well, you may have a point there)
- Multi-threaded/multi-core CPUs? (well, this is true too, I guess...)

Which bits of the PDP-11 is C boat-anchored onto?
Title: Re: C word
Post by: Nominal Animal on September 24, 2019, 10:15:47 pm
That also applies to the filesystem-related suggestions. What would be the use of those in a C program? Those belong to the business logic, not interfacing system or writing firmware for a microcontroller, and they have limited use even in general. Do not confuse your specific domain with the whole world.
You're the confused one: you don't use a hosted environment (with the standard library available) for microcontrollers or kernels and such anyway.  The standard C library is only really useful for portable applications.  Adding "safe" optional variants instead of making them actually useful is idiocy.

Note that I personally am quite happy with C99 and POSIX.1-2008.  Not all of C11 is bad, but it just isn't good enough in practice for one to use or rely on.  I'm just saying that seeing where the C standards committee went with C11, indicates there is no hope C will actually become any more useful to developers like myself.

In practice, the C implementations have already split the standard library, as the C compiler provides some of the functions instead of the C library.  (For GCC, see here (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html).)  Some functions interfaces, especially printf() and scanf() families of functions formatting string formats, must be known to the compiler for it to check if the syntax makes any sense; yet, they are only defined in terms of a hosted environment.  None of these practical changes/developments have been acknowledged or considered by the C standards committee, which means they really aren't interested in how the C language is actually used, and what kind of changes or additions would make it even more useful.  They're just a circle jerk discussing what kind of changes make sense for their compiler products.
Title: Re: C word
Post by: PlainName on September 24, 2019, 10:41:06 pm
Quote
only defined in terms of a hosted environment

What do you mean by a hosted environment?
Title: Re: C word
Post by: westfw on September 25, 2019, 01:09:07 am
Quote
The 'Size' attribute (Ada has attributes, as some other languages do)

yeah, I looked up "ada attributes" and got really confused, since they seem to merge what I'd normally conside constant definitions, attributes, and methods.

Code: [Select]
for Byte_t'Size use 8;   -- clearly like a C attribute; makes the type use 8 bits.
val := Variable'Size;    -- behaves more like sizeof (same attribute, different function!)
val := Variable'Last;    -- sort of like MAXINT, if variable is an int (sorta nice, actually)
str := Variable'image;   -- definitely method-like (doesn't work for user-defined types?)

Sigh.  And I guess this is the sort of stuff that never actually shows up in most "beginning" tutorials.  One of those things that you either need to read a lot of code, write a lot of code that is ruthlessly peer-reviewed, or work somewhere where "common usage" is well-defined (which is a combination if "read a lot"+"write a lot", really.)
Title: Re: C word
Post by: golden_labels on September 25, 2019, 01:31:39 am
You're the confused one: you don't use a hosted environment (with the standard library available) for microcontrollers or kernels and such anyway.  The standard C library is only really useful for portable applications.
C has little use outside kernels, microcontrollers, implementing other programming environments and writing some snippets to optimize fragments that do not perform well enough. Sure, I know there are many people who still try, but that doesn’t mean it has any sense. C is not suitable for that purpose in 2019.

In practice, the C implementations have already split the standard library, as the C compiler provides some of the functions instead of the C library.  (…)
But that is not required by the language. Those are imlementation details.

Are you indirectly saying that C isn't suited to the following use cases: (…)
I have quite directly said “It’s still PDP-11, relaxed a bit to allow different platforms.

Which bits of the PDP-11 is C boat-anchored onto?
____
(1) C doesn’t prohibit different models and explicitly defines pointers in a way that makes them possible, but provides no actual means of ever using that directly.
Title: Re: C word
Post by: legacy on September 25, 2019, 06:13:38 am
Poor support for multithreaded programs, multiprocessor machines and interrupts/signals.[/li][/list]

Yup. C-threads cannot be a library, hence programming in C on machines like Parallela is ... well, if it's hobby, there are better way to obtain headache  :D
Title: Re: C word
Post by: FreddieChopin on September 25, 2019, 06:31:27 am
You're the confused one: you don't use a hosted environment (with the standard library available) for microcontrollers or kernels and such anyway.  The standard C library is only really useful for portable applications.
C has little use outside kernels, microcontrollers, implementing other programming environments and writing some snippets to optimize fragments that do not perform well enough. Sure, I know there are many people who still try, but that doesn’t mean it has any sense. C is not suitable for that purpose in 2019.

Gr8 b8 m8. C is like catholicism or SQL - has its flaws but gets the job done. It existed before us, it exists now and it'll be fine after we're long gone. I can assure you that in 30 years from now Rust, JS or Go may be gone or at least change so much that current code won't even compile.
Title: Re: C word
Post by: legacy on September 25, 2019, 06:40:35 am
No way to control timing and nearly no control over ordering of operations.

On superscalar machines like PowerPC, this is usually managed with assembly-inline instructions { eieio, sync, isync, ... }. But you need to manually create barriers here and there, and in the end, the resulting code looks like you have spent your precious time at making things ugly and not portable.

But it's not so negative, since C-guys call it "black voodoo magic", with neither a precise science nor a code style. It's hipster and pop, but they love because everyone can express their own style.

What is your proposal for supporting this directly by the language?  :D
Title: Re: C word
Post by: magic on September 25, 2019, 08:34:14 am
The whole point of C is that it is small, it is and it works.

I can't imagine who in his right mind would be interested in "evolving" C. If you need a bigger language which routinely receives every new feature invented in the industry, such a language already exists and it will even compile 99% of your existing C codebase with minor modification. For essentially the same language as C but with semantics more suitable for SIMD, see OpenCL

Shared memory multithreading is indeed one area where C kinda sucks and could use improvement without turning it into something it was never meant to be. And it is slowly receiving improvements in that.

Pushing POSIX into C doesn't make sense, there is already a standard for POSIX and any half-decent OS today supports POSIX. If you have an issue with one particular vendor refusing to support POSIX, talk to your vendor :P
Or use a 3rd party implementation like the Cygwin libraries.
Title: Re: C word
Post by: hamster_nz on September 25, 2019, 09:06:06 am
Shared memory multithreading is indeed one area where C kinda sucks and could use improvement without turning it into something it was never meant to be. And it is slowly receiving improvements in that.

I would rephrase that... as "Shared memory / Multithreading is indeed one area in C where you really need to know what you are doing, and do good design work up front before you start coding."

However, if you want things to be completely safe, where all resources are properly fenced, the garbage is collected regularly, and everything is properly maintained by the system you will pay for it by needing lots of CPU and other overheads, and reduced performance.

A few months ago I wrote a REST API & server in less than 300 lines of Python, using WSGIserver - "WSGIserver is a high-speed, production ready, thread pooled, generic WSGI server with SSL support" - https://pypi.org/project/WSGIserver/.

It took 0.7s to process a simple transaction, so could only support 1.4 transactions per second, and was in a 600MB+ Docker container.

Now that the API design has settled down I re-wrote in C with the same external interfaces (HTTPS, SQLite3, a SMB file share), It took 3,000 lines of C, but now can handle 120 transactions per second.  The new Docker container is a little over 100MB.

I guess I could have just stood up 25 quad-core instances in Azure to achieve the same scaling without any extra coding effort...
Title: Re: C word
Post by: Nominal Animal on September 25, 2019, 12:16:15 pm
What do you mean by a hosted environment?
I'm using the term as defined by the C standard.  See e.g. C99 (PDF) (http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf).

Basically, when you write code to run on a machine with an operating system that provides an implementation of the standard C library, you are using a hosted environment.  When you write code to run on bare metal, without the standard C library, you are using a freestanding environment.

C has little use outside kernels, microcontrollers, implementing other programming environments and writing some snippets to optimize fragments that do not perform well enough. Sure, I know there are many people who still try, but that doesn’t mean it has any sense. C is not suitable for that purpose in 2019.
Your opinion does not match reality.

Most libraries are written in C, because that way the set of dependencies is minimized.  Some do use C++, but that poses several difficult to solve issues, because of the memory management differences, exception propagation, and the run-time dependency on the particular version of C++ libraries.

As a practical example, all libraries you use to read/write image files, XML, access USB devices, et cetera, are most likely written in C.  (All the portable ones are written in C, but there are of course libraries that only work on a single hardware architecture on a single version of a specific OS.)

One of my suggestions for creating portable GUI applications is to use Python 3 and Qt for the user interface (which does not contain any features that need protecting, because any such features are trivially replicated even without the source code), and C for the underlying computation and access.  This is the same model as used in e.g. NumPy (https://numpy.org/) and SciPy (https://www.scipy.org/) libraries/library collections.

In practice, the C implementations have already split the standard library, as the C compiler provides some of the functions instead of the C library.  (…)
But that is not required by the language. Those are imlementation details.
That is exactly the simplistic view that kills the development of a programming language.

You see, programming languages do not evolve by the standard; the standard just codifies existing portable practices.  If the standard diverges from existing practices, it becomes unreliable; if the standard ignores changes in practices, it becomes stagnant.  Programmers read the standard to see what to expect from compilers, and compiler developers read the standard to understand what features the compiler should support.  The features I listed above are already available in all POSIXy operating systems' standard C libraries (Linux, Mac, BSDs), and as optional libraries in Windows and historical non-POSIXy Unixes.
Even including just getline() into the standard C library would solve an entire class of problems (line length limitations) for both new programmers and library implementors.

Currently, the best portable standard set is C99 with POSIX.1-2008.  The only major OS that does not support it natively by their C library (POSIX.1 is not a separate library, but a set of additional interfaces and functions provided by the standard C libraries the OS provides), is Windows; there, you need to use shims and whatnot.  It does not matter much, because only trivial code that runs on Windows runs on other OSes.

It mostly affects those learning C, because if they use Windows, they have a hard time separating what is C, and what are Microsoft extensions/weirdnesses.  (For example, fwide() does not work on Windows; you need to use a Microsoft-ism to do the same thing.  Why didn't MS just include their extension weirdness inside fwide() and make it work, is a mystery.)

So, when you train C programmers, you either use an OS other than Windows, or you spend an inordinate time explaining to them why the documentation they follow (from MS) is wrong, and why they need to go through all sorts of odd hoops to get their code working properly on Windows.  Or, you let them become Microsoft C programmers, who cannot write portable code.

The whole point of C is that it is small, it is and it works.
You are conflating the C standard library (hosted environment), and the C language itself (as used in a freestanding environment).

The C language itself is small, but lacks support for important features (the status flags register in particular).

What I am suggesting, is divided into two completely separate paths.  One is to include the POSIX.1 features that are already supported by all OSes in the standard C library proper, although in Windows via different interfaces.  This excludes things like signals, but includes things like getline().  Exactly what things to include should obviously be discussed; but since C99, has not at all.  The other is to add features to the language itself.  Unlike changes to the C library, changing the language syntax itself creates a new language.  So, adding things like unordered loops or variable attributes, creates a new programming language, a C derivative.

These two paths are not mutually exclusive.  Updating the C standard library would not affect existing C code, but would allow future C code to be both portable, and effective without additional dependencies.  (Existing de-facto standards like opendir()/readdir()/closedir() have fatal flaws; in particular, its depth is limited by the number of DIR* handles a process can have open at any moment, and it is very difficult to make it behave correctly when changes to the directory contents occur at the same time as the directory is scanned.  Having the C library implement glob() and nftw() allows OS-specific tricks to avoid both of these issues.  Instead of nftw(), one could also consider fts_open()/fts_read()/fts_children()/fts_set()/fts_close() from BSDs.)

Adding features to the C standard library does not affect C freestanding environments at all (say, writing a kernel, or code for a microcontroller).  Because these features already exist in standard libraries (albeit with different function names and calling conventions in Windows), they would not "bloat" the standard C library either: this stuff already exists in them, I'd just want to standardize their interfaces, so that C code would be truly portable, and not need a mess of preprocessor directives to cater for the oddities in one.

Adding features to the C language itself is a much longer-term proposition; we'd be talking about one or two decades instead of years for its adoption, due to the sheer amount of existing C code.  For me, it is more like an interesting research topic: what kind of language would match our uses of the existing hardware even better than C, while being easier to optimize, and not require any kind of runtime?
Title: Re: C word
Post by: magic on September 25, 2019, 03:47:27 pm
Shared memory multithreading is indeed one area where C kinda sucks and could use improvement without turning it into something it was never meant to be. And it is slowly receiving improvements in that.

I would rephrase that... as "Shared memory / Multithreading is indeed one area in C where you really need to know what you are doing, and do good design work up front before you start coding."

However, if you want things to be completely safe, where all resources are properly fenced, the garbage is collected regularly, and everything is properly maintained by the system you will pay for it by needing lots of CPU and other overheads, and reduced performance.
It's not just that, but also some standard library functions which aren't thread-safe for stupid reasons or lack of portable atomics and memory barriers until recently. And of course fanboys of Rust or ATS would want a word with you too ;)

Title: Re: C word
Post by: SiliconWizard on September 25, 2019, 06:15:13 pm
Shared memory multithreading is indeed one area where C kinda sucks and could use improvement without turning it into something it was never meant to be. And it is slowly receiving improvements in that.

I would rephrase that... as "Shared memory / Multithreading is indeed one area in C where you really need to know what you are doing, and do good design work up front before you start coding."

However, if you want things to be completely safe, where all resources are properly fenced, the garbage is collected regularly, and everything is properly maintained by the system you will pay for it by needing lots of CPU and other overheads, and reduced performance.
It's not just that, but also some standard library functions which aren't thread-safe for stupid reasons or lack of portable atomics and memory barriers until recently. And of course fanboys of Rust or ATS would want a word with you too ;)

Do you mean that they wouldn't agree about the required overhead? That would just be stupid if so. Guarding shared resources requires overhead whatever you do.
One way of circumventing this would be never to use shared resources (such as only passing messages), but that incurs other limitations and other forms of overhead. Nothing is free.
Title: Re: C word
Post by: PlainName on September 25, 2019, 07:14:39 pm
Quote
Basically, when you write code to run on a machine with an operating system that provides an implementation of the standard C library, you are using a hosted environment.  When you write code to run on bare metal, without the standard C library, you are using a freestanding environment.

OK. Just that I have written code for bare metal that has the usual library functions supplied by the compiler vendor that would class it as hosted, so I'm not sure the differentiation in this discussion is useful.
Title: Re: C word
Post by: Nominal Animal on September 25, 2019, 08:23:20 pm
Quote
Basically, when you write code to run on a machine with an operating system that provides an implementation of the standard C library, you are using a hosted environment.  When you write code to run on bare metal, without the standard C library, you are using a freestanding environment.
OK. Just that I have written code for bare metal that has the usual library functions supplied by the compiler vendor that would class it as hosted
No, that is more properly called the HAL, hardware abstraction layer.  Usually, vendors also provide a subset of useful functions, especially things like snprintf(), but that does not make those environments hosted; just providing a subset of standard C libraries.

This is strikingly similar to using C++ on microcontrollers: you are restricted to a subset, typically excluding exceptions and other features requiring a runtime, because there simply aren't enough resources to provide a standards-conformant C++ environment.

I'm not sure the differentiation in this discussion is useful.
It matters a lot, but not in the obvious sense.

You see, because of lack of standardization, each microcontroller vendor (and environments like Arduino or PlatformIO) provides their own subset.  While this gives the vendors a lot of leeway, it makes it difficult to port C code between vendors.  They like it that way, just like Microsoft prefers that programmers who learned C on their platform, will find all other OSes odd or uncomfortable, due to the differences.  This is standard vendor lock-in practice.

Because of the stuffing of the C standard committee, and opinions like golden_labels expressed (things being just "implementation details"), there is no way forward that helps developers (write portable code, and not locked-in to any specific vendor).  This affects both hosted and freestanding environments, but from the opposite ends of the same issue.

If we added features to the standard C library, from POSIX.1 or BSD, making it a complete systems programming language, low-level libraries would be much easier to port across OSes, and in many cases more efficient than they currently are.

At the same time, if we split the concept of "standard C library" into discoverable features, including features not useful in a hosted environment, we could unify the low-level interfaces used in C on top of bare metal.  No changes to the language are needed, as existing software uses the preprocessor to discover features (see e.g. Pre-Defined Compiler Macros (https://sourceforge.net/p/predef/wiki/Home/)); all we need is just agreement on the developer interfaces.

All this is exactly what should be discussed by the C standard folks.  They have zero interest to, because they are only interested in how the standard affects the C compilers they provide.

That said, I do believe that in the long run, we will need to replace C with something that occupies the same position in the software stack development hierarchy, but better utilizes current and future hardware.  I don't think Harvard architectures will be going anywhere soon, and memory segregation even within a process makes a lot of sense, so multiple address spaces and similar variable and expression attributes are needed.  (Currently, they use compiler-specific extensions, and even GCC is trying to grow real support for multiple address spaces.)

It's not that C is "bad", it is just that the hardware has evolved so much there is now a clear niche, IMNSHO, for a language that fits that stratum better.
Title: Re: C word
Post by: PlainName on September 25, 2019, 08:28:19 pm
OK. Thanks for the considered (and lengthy) explanation  :-+
Title: Re: C word
Post by: SiliconWizard on September 25, 2019, 08:52:55 pm
Here is exactly what the standard says:
Quote
A conforming
hosted  implementation shall  accept  any strictly  conforming  program. A conforming
freestanding implementation shall accept any strictly conforming program that does not
use  complex types  and  in  which  the  use  of  the  features  specified  in  the  library  clause
(clause  7)  is  confined  to  the  contents  of  the  standard  headers <float.h>,
<iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and
<stdint.h>. A conforming implementation may have extensions (including additional
library  functions),  provided  they do not  alter  the  behavior  of  any strictly  conforming
program.

"Hosted" is also mentioned as to how the main() function is handled.

In practice, that basically means that a freestanding implementation 1/ doesn't provide ANY library (the standard headers mentioned above don't declare anything that's needed to be implemented as code outside the core language) and 2/ that it doesn't provide, by itself, any means of calling main(), thus you usually have to provide a low-level way of calling your entry function - most often in assembly code, some linkers may give you options to define a specific symbol as your entry point as well.

Title: Re: C word
Post by: legacy on September 25, 2019, 09:34:09 pm
It's not that C is "bad", it is just that

it's not bad, it's just that it allows people to write shit like this (https://raw.githubusercontent.com/technoblogy/ulisp/master/ulisp.ino).

Good luck at making the code run on PowerPC e500, which already has its problems with cache, pipeline, etc  :-DD

edit:
two days spent at sanitizing u-boot. Now it works as it should, and even the blasted PCI does no more cause weird behaviors to the Linux kernel, this because if the firmware didn't properly initialize things ... ah, that's the true importance of the BSP :D

... the Sonoko project is costing me a lot of time time and effort, more than what I planned, but I have to say not so bad, in the end; at least Denx U-boot is of several orders of magnitude better than the piece of ugly and broken spaghetti code in the above link.
Title: Re: C word
Post by: SiliconWizard on September 25, 2019, 10:51:42 pm
Hmm, I have a problem with stack alignment on an AARCH64 processor. Anyone knowledgeable? ;D
Title: Re: C word
Post by: magic on September 26, 2019, 07:24:16 am
I suppose every call frame needs to be 16 bytes aligned like on x86-64?

As long as you use a competent compiler that shouldn't be a problem, so what did you mess up? :D
Title: Re: C word
Post by: legacy on September 26, 2019, 11:33:28 am
Code: [Select]
typedef struct object_t
{
    union value
    {
        long  mword;
        char* string;
        void* stream;
        struct object_t * (*pfn)(struct object_t *);
        struct object * (*psyn)(struct object_t *, struct object_t *);
    } value;
} object_t;

object_t object1;
ansi/C, the compiler considers "struct object_t" an incomplete type, and this trick can be used in declarations.

Code: [Select]
#include "types.h"

typedef struct
{
    union value
    {
        uint32_t mword;
        p_char_t string;
        p_that_t stream;
        p_that_t next;
        p_that_t (*pfn)(p_this_t);
        p_that_t (*psyn)(p_this_t, p_this_t);
    } value;
} object_t;
safe/C cannot use opaque-type, and each data MUST have its defined type, hence it introduces p_this_t and p_that_t as deferred-opaque (void*), hence the implementation needs unchecked converters to assign the deferred-opaque to object_t

Code: [Select]
object_t uc_obj_get(p_this_t p_this)
{
    object_t object;
    object=p_this;
    return object
}

Code: [Select]
define type pfn_t as function
{
     in  is object_t;
     ans is object_t
};

define type psyn_t as function
{
     in  is object_t;
     in  is object_t;
     ans is object_t
};

define type object_t as struct
{
    value is union
    {
        mword  is uint32_t;
        string is string_t;
        stream is stream_t;
        pfn    is pfn_t;
        psyn   is psyn_t;
    };
};
Arise-v2 HL does three passes and automatically manages opaque types.

Doesn't it look better? :D
Title: Re: C word
Post by: SiliconWizard on September 26, 2019, 02:39:53 pm
I suppose every call frame needs to be 16 bytes aligned like on x86-64?

As long as you use a competent compiler that shouldn't be a problem, so what did you mess up? :D

It looks simple enough if you're not using assembly directly. But it's not necessarily. I'm using GCC. And writing low-level stuff in C.

As long as you never define any alignment constraint, the compiler handles it.

Accessing packed structs on the stack (local variables for instance...) royally sucks though, even when your accesses are perfectly aligned for each of the member you access (but obviously not necessarily to 16 bytes). Not sure if I messed up anything, if this is normal, or if there is any option I can configure. I tried the strict-align GCC option, but it didn't help. I would have at least expected some kind of warning from the compiler - but nothing. (Using the latest official ARM AARCH64 GCC compiler.)

I have a lot of experience now with the x86_64 architecture, but I'm kinda new to the AARCH64 one. (First time I develop something for a 64-bit ARM CPU.) Oh well, this is getting off-topic. :D
Title: Re: C word
Post by: SiliconWizard on September 26, 2019, 02:45:18 pm
ansi/C, the compiler considers "struct object_t" an incomplete type, and this trick can be used in declarations.

Yup. Not quite sure, but I think it was even possible before ANSI C?
This is not really a "trick". This is just akin to a forward declaration. If I'm not mistaken, you can also do this with typedef and struct/union.
Such as: "typedef struct Toto_t; " and later define it completely: "typedef struct { ... } Toto_t;"
IIRC, forward declarations were available in other languages, including Wirthian ones like Pascal, so I don't see them as particularly "unsafe".

safe/C cannot use opaque-type, and each data MUST have its defined type, hence it introduces p_this_t and p_that_t as deferred-opaque (void*), hence the implementation needs unchecked converters to assign the deferred-opaque to object_t

That sucks actually, because it kind of looks like it could make you shoot yourself in the foot much easier than with proper forward declarations.
Weird rationale.
Title: Re: C word
Post by: legacy on September 26, 2019, 03:04:18 pm
Such as: "typedef struct Toto_t; " and later define it completely: "typedef struct { ... } Toto_t;"

Like this?

Code: [Select]
typedef struct object_t; /* attempt to define an opaque type */

typedef struct
{
    union
    {
        uint32_t mword;
        p_char_t string;
        p_that_t stream;
        object_t next;
        object_t (*pfn)(object_t);
        object_t (*psyn)(object_t, object_t);
    } value;
} object_t;

it has never worked with GreenHills CC.

Code: [Select]
- compiling ex3 ... error 
line 3: useless storage class specifier in empty declaration
line 11: unknown type name 'object_t'
line 12: expected specifier-qualifier-list before 'object_t'

Everything containing deferred-opaques must be isolated into special sections, where you apply the unchecked converters, this way the source can be submitted to AI-driven programs for checking.

The AI is weak, and it needs this kind of "instrumentation".  :-//
Title: Re: C word
Post by: Nominal Animal on September 26, 2019, 03:05:01 pm
it's not bad, it's just that it allows people to write shit like this (https://raw.githubusercontent.com/technoblogy/ulisp/master/ulisp.ino).
By that metric, Perl is horrible, and PHP is the spawn of satan.  I disagree: I just think the people who write shit like that should be educated to do better, or bashed on the head (with a cluebat, I'm not advocating physical violence here) to stop writing shit like that altogether.  They are tools for different tasks, and any tool can be misused.

(Some tools are bad, like shovels with just a straight handle, without a crossbar grip at the end.  That grip makes a significant difference to the leverage a human can apply with the shovel, as well as the amount of energy a human spends to shovel a given amount of stuff from one place to another.  PHP is poorly designed as a language, because it tries to be everything for everybody, and ends up like a Swiss army knife with 537 different blades.  C is interesting as a language, because it truly is simple at its core; it is only when you include the standard C library, or other libraries or HALs, that make it an appropriate or inappropriate tool for a given task.  Because C does not need a runtime environment on most architectures, it is at the same time simpler, but also more versatile, than higher-level programming languages like C++.  Well, except that you can use a runtime-less subset of C++ also; that subset just isn't standardized in any way yet.)

at least Denx U-boot is of several orders of magnitude better than the piece of ugly and broken spaghetti code in the above link.
Have you noticed yet how much easier well-structured modular projects are to maintain?  I don't mean code with plug-in capabilities, I mean code that is structured and organized such that you can test or replace parts of it, with documentation explaining the purpose of each such part?  (Or dare I say, unit?  :P)

And that the true sign of a master developer is in how intuitively descriptive (of the idea, not the code) the comments are?

This is one part of why I like to help others learn to become better developers, on the web and elsewhere.  I myself am pretty good, but I am terribly slow, and detest writing code I see pointless (commercial programming work burns me out).  I've seen, in real life, how ... twisting? snapping? ... the point of view, their programming paradigm to be precise, for a new(ish) programmer, can give just enough boost to help them later create things I cannot, while incorporating those structural and documentation aspects; and that difference is definitely worth the effort.  Also definitely the occasional hurt: sometimes a sharp nudge is needed to keep eyes open to the reality.

All my suggestions on how we should add to the standard C library (in the near term), and how we could derive a better replacement programming language for the same niche (or perhaps slightly closer to the metal, in the longer term), is just a continuation of that.  I do not fight against windmills like Don Quixote; I just go around telling people who care how to build better ones.

Hmm, I have a problem with stack alignment on an AARCH64 processor. Anyone knowledgeable? ;D
That depends on the cause of the problem.

A few years ago, I did develop a workaround for using quadmath in GNU Fortran.  You see, on x86-64, malloc() returns memory aligned to 8 bytes, because that is the largest alignment required by any standard C data type.  However, quadmath uses SSE vector registers, and requires them to be aligned to 16 bytes.  The workaround is to override malloc() et al. (they are weak symbols, interposeable) with custom ones using memalign() et cetera.  Modifying gfortran frontend to generate memalign() calls instead of malloc() calls (i.e. intrinsic calls generated by the Fortran compiler) was too much of a spaghetti mess to unravel for me, compared to the workaround.

So, if your problem is GCC or another compiler not generating the proper function preamble, good luck.  Even if you get your fixes approved, it'll take 2-3 years for the fixes to arrive in production, unless you can rope in one of the core developers.

If, however, your problem is something like writing assembly code that interfaces to C functions correctly on AARCH64, or a process preamble that initializes the stack properly, I might be able to help.

Such as: "typedef struct Toto_t; " and later define it completely: "typedef struct { ... } Toto_t;"
Rather,
    typedef  struct Toto_struc  Toto_t;
    struct Toto_struc {
        /* Can use pointers to  Toto_t  or to  struct Toto_struc  here */
    };
I like to use double spaces in typedefs, to make it easier to see the type being forward-declared, and the name it is declared as.
Title: Re: C word
Post by: Nominal Animal on September 26, 2019, 03:07:39 pm
Like this?
No.  Try
Code: [Select]
/* Forward-declare the type. */
typedef  struct object_struc  object_t;

/* Define the structure itself. */
struct object_struc
{
    union
    {
        uint32_t mword;
        p_char_t string;
        p_that_t stream;
        object_t *next;
        object_t (*pfn)(object_t);
        object_t (*psyn)(object_t, object_t);
    } value;
};
Note that next member must be a pointer, as otherwise you get infinite recursion within the type itself (p == p.next.next.next.next.next...next).
Title: Re: C word
Post by: SiliconWizard on September 26, 2019, 03:22:09 pm
Rather,
    typedef  struct Toto_struc  Toto_t;
    struct Toto_struc {
        /* Can use pointers to  Toto_t  or to  struct Toto_struc  here */
    };

I forgot the struct name itself in the forward declaration. Writing what I did makes it gramatically ambiguous, because a "struct" keyword followed by an identifier is interpreted as the identifier in the "struct" namespace (oddity of C, probably coming from an old age... yes, struct, union and enum have their own namespace separated from the typedef namespace!) So with "typedef struct Toto_t;", the compiler would be expecting an additional identifier for the typedef.

The following forward declaration is perfectly correct though:

Code: [Select]
typedef struct Something Something_t;

typedef struct Something
{
int n;
Something_t *pNext;

} Something_t;

So you can 1/ reiterate the "typedef", although it's not strictly necessary and 2/ use the typedef'ed type for struct members in the full declaration.

Obviously you can't recursively use a struct in itself, so only derived types that are not recursive are allowed (typically a pointer!)

You can also use forward declarations as "opaque" types (from which, here again, only pointers to them can be manipulated) by not exporting (making visible to others) the full declaration, like in the following.
Code: [Select]
// In a .h:
typedef struct Something Something_t;

// In the implementation source file (.c):
typedef struct Something Something_t;

typedef struct Something
{...
}    Something_t;

you can of course omit the typedef here, and refer to it as "struct Something" everywhere, but I personally don't like that. It just looks inconsistent, and I "typedef" everything.

Anyway, the opacity is not their only use. As seen above, one very common way of using forward declarations is for declaring structures with members that use a pointer to them (either directly or as a parameter of a function pointer, etc.)

Note: you could of course do without all this, and use a "void *" instead, since all you can do is declare pointers to forward declared structs anyway. But you'd lose any type checking if you did this, which is unfortunate. So, I think forward declarations are pretty useful.
Title: Re: C word
Post by: legacy on September 26, 2019, 03:48:59 pm
Code: [Select]
typedef struct object_st object_t;
typedef object_t* p_object_t;
typedef struct object_st
{
    union
    {
        uint32_t mword;
        p_char_t string;
        p_that_t stream;
        p_object_t next;
        p_object_t (*pfn)(p_object_t);
        p_object_t (*psyn)(p_object_t, p_object_t);
    } value;
} object_t;

Yup, this works on Gcc-v7-9 and GreenHills CC :D
Title: Re: C word
Post by: SiliconWizard on September 26, 2019, 04:06:04 pm
Yup, this works on Gcc-v7-9 and GreenHills CC :D

Well, yes. It's correct. My bad again for omitting one identifier in my first post. :P

Note that struct declarations with a struct identifier are implicit forward declarations, so you can do the following if you so wish:

Code: [Select]
typedef struct Something
{
int n;
struct Something *pNext;

} Something_t;

An additional forward declaration of "struct Something;" is unnecessary.
Title: Re: C word
Post by: SiliconWizard on September 26, 2019, 04:09:08 pm
Code: [Select]
typedef struct object_st object_t;
typedef object_t* p_object_t;
(...)

Just one remark on this piece of code. Several rules in safety-critical guidelines (including MISRA-C, and others, and myself  ;D ), recommend NOT to declare pointer types, which hide the fact that a given object is a pointer. So this: "typedef object_t* p_object_t;" would be a no-no. The fact you cared to use a "p_" prefix mitigates this somewhat, but still...

If you work under rules which allow this, or even ENFORCE this, well. All I can say is we are a bunch of people who would disagree. ;D
Title: Re: C word
Post by: legacy on September 26, 2019, 05:02:13 pm
Type pointer is the perfect way to instrument checkers and debuggers, so we largely use it; AI-driven checkers are weak but strong enough to strickly check if what follows the "p_" prefix matches with the referenced item.

And they are largely better than human beings.

That's why we conider largely better than pushing a lot of "*" within the code; this especially if you consider that HOOD-ICEs might have problems at understanding it, while they have no problem with a type-pointer. Long story on the why, but basically because even Stood has less problems this way.

We follow a mash-up between MISRA and DO178B not just MISRA, hence everything can automate and simplify the process and improve the management of a product during its life-cycle is preferred.
Title: Re: C word
Post by: magic on September 26, 2019, 07:03:47 pm
Accessing packed structs on the stack (local variables for instance...) royally sucks though, even when your accesses are perfectly aligned for each of the member you access (but obviously not necessarily to 16 bytes).
You know, some people actually pay money to get royally sucked off so further elaboration may be in order :-DD

Disassemble that crap (objdump -d xxx.o) and see what's generated. ARM assembly can't be rocket science, I suppose. There has to be some way of loading/storing 8 or 16 bits. OTOH, if you load/store 32bits unaligned, some ARMs supposedly may end up barrel shifting the word. Maybe wrong target core is specified?
Title: Re: C word
Post by: SiliconWizard on September 26, 2019, 07:12:09 pm
@legacy: Oh, I was sure you had a good reason. I just respectfully disagree. ;D
(I don't really see your point of having to push "*" or anything else; it's just part of the language. Are you complaining about having to push a lot of for's and while's? ...  )

Anyway, people can't agree on everything and that's alright.

Note: it doesn't come directly from MISRA-C, but from CERT-C.
Beyond the stylistic issue (hiding pointers), here is why it could go wrong: when you use qualifiers.

https://wiki.sei.cmu.edu/confluence/display/c/DCL05-C.+Use+typedefs+of+non-pointer+types+only

Title: Re: C word
Post by: legacy on September 26, 2019, 08:54:40 pm
Beyond the stylistic issue (hiding pointers), here is why it could go wrong: when you use qualifiers.

Quote
Using type definitions (typedef) can often improve code readability. However, type definitions to pointer types can make it more difficult to write const-correct code because the const qualifier will be applied to the pointer type, not to the underlying declared type.
Title: Re: C word
Post by: legacy on September 26, 2019, 09:07:40 pm
double pointers are banned and ... what have I now to fix now?

Code: [Select]
    int var = 789; 
    int *ptr2;
    int **ptr1;  /* <---------- this is banned */
    ptr2 = &var;
    ptr1 = &ptr2;

Code: [Select]
    p_object_t* env; /* <------------- this is banned too */

A source FULL of double pointers ...  :palm: :palm: :palm:


note to me:
remember to ban-by-design any double pointers from any draft of the Arise-v2 HL, since tracking them on the HOOD-ICE is like putting the head into the microwave.
Title: Re: C word
Post by: legacy on September 26, 2019, 09:26:13 pm
Code: [Select]
int fwritemem
(
    void *cookie,
    char *buf,
    int size
)
{
    /* funopen write */
    char **position  = (char * *) cookie;
    char *mem        = *(char * *) position;
    int  num_written = 0;
    for (; num_written < size && *buf != '\0'; num_written++)
    {
        *mem++ = *buf++;
    }
    *position = mem;
    return num_written;
}

the above comes from a bsd tool.

    char **position  = (char * *) cookie;

 :palm: :palm: :palm:
Title: Re: C word
Post by: SiliconWizard on September 26, 2019, 11:12:54 pm
Disassemble that crap (objdump -d xxx.o) and see what's generated. ARM assembly can't be rocket science, I suppose. There has to be some way of loading/storing 8 or 16 bits. OTOH, if you load/store 32bits unaligned, some ARMs supposedly may end up barrel shifting the word. Maybe wrong target core is specified?

Oh, I did that. I also generate assembly from GCC (with -S option), as it produces more readable code with symbols. I use objdump on the final linked binary, as it allows to pinpoint which instruction is at which address. All I got from the exception handler was a few registers, FAR giving you the faulty address for mem access, and ELR the "link" address (which points you the address at which the exception occured).

I finally figured out that the culprit was this instruction: "ldr d0, [sp, 52]". Guess what?. It points to an address that is 4-byte aligned (which is fine, since I was accessing a 4-byte aligned, 32-bit field of a struct): sp (16-byte aligned) + 52 => a 4-byte aligned address.

It was not a stack alignment issue per se, just happened to be on the stack (so at first that misled me, because a pure alignment issue from my code didn't seem to be possible otherwise.)

Problem is that GCC emits this ldr d0, ... instruction. Which is a 64-bit transfer. Why so? Well. because it wants to optimize accesses, and I was actually accessing two successive 32-bit struct members, so it decided to access both in one shot. Problem is that this creates a non-aligned access. Is that great or what?

If you dig a little deeper, you figure out that GCC allows itself to do this, because AARCH64 CPUs allow non-aligned accesses. So GCC optimizes away, and goes on. Now why am I getting an exception if this is allowed by the CPU? Because, it is, but not always. You have to enable the MMU, from what I gathered, to allow unaligned accesses. Otherwise, you get an exception. And at the stage my code runs, the MMU is not yet enabled... |O :-DD

But you know what's worst? After all, it's not trivial indeed, but you could still say it's just a case of RTFM. Which would be fair enough. Except that... I enabled the "-mstrict-align" option, which is supposed to PREVENT GCC from assuming it can generate unaligned accesses. But guess what, it doesn't do squat here, and it still generates this 64-bit access. I think this looks like a nice bug.

Hope this can be helpful for others. If you ever work on ARM CPUs in 64-bit mode, and at a very low level. This one took me a while, all the more that I'm not ultra familiar with AARCH64 assembly, so this 64-bit access was not immediately obvious to spot without reading the IS manual...

The only fix for now is to cut down the opt. level to -O1 max, and the code runs with no exception. No fucked-up unaligned access is generated.
The above instruction is duely replaced with "ldr w0, [sp, 52]" and later "ldr w0, [sp, 56]", which is what I expected.
For the record, d0 is a 64-bit register, w0 a 32-bit one.

Starting with -O2, it's fucked up and the  "-mstrict-align" option appears to be ignored.

First time I run into a GCC bug in a long time here. It had served me very well until now. Maybe the official ARM version (AARCH64-ELF 8.3) is borked? I'll have to try with a custom-built one off a newer GCC version (9.1/9.2)...

Thanks for your attention.  ;D
Title: Re: C word
Post by: SiliconWizard on September 27, 2019, 12:09:10 am
Code: [Select]
int fwritemem
(
    void *cookie,
    char *buf,
    int size
)
{
    /* funopen write */
    char **position  = (char * *) cookie;
    char *mem        = *(char * *) position;
    int  num_written = 0;
    for (; num_written < size && *buf != '\0'; num_written++)
    {
        *mem++ = *buf++;
    }
    *position = mem;
    return num_written;
}

the above comes from a bsd tool.

    char **position  = (char * *) cookie;

 :palm: :palm: :palm:

Well, the above code looks a bit hackish, but it's not THAT bad either. Although I would never write something like this. ::)
If you have a rule of no double pointers, sure this is a problem. This sounds pretty strict; IIRC MISRA-C, for instance, forbids MORE than 2 levels. You're forbidding more than 1... ;D

A double pointer here is just a table of pointers. Can be useful. Sure it would look nice to define intermediate types so there is only one level at a time.

The void * parameter is a bit weird here; since the function actually expects to get a table of pointers to char, the parameter should show this clearly. It's not like in just one-level cases in which void * can be useful (as long as there is no assumption about alignment in the function, meaning that only 8-bit accesses are allowed...), such as with the usual memcpy(), memset(), etc.

There are (admittedly confusing) but useless casts here:
char **position  = (char * *) cookie;
char *mem        = *(char * *) position;

Both are useless. You don't need to cast a void * as it's compatible with any pointer type in C. This includes a double pointer, which is still a pointer. So for the first line.
For the second line, they cast position to char **, position being already declared as a char **. It's just ridiculous. ;D

char **position = cookie;
char *mem = *position;

is all that''s needed, and looks less clunky.
Of course, they could also have done some basic checks on the pointers being NULL for instance...
Title: Re: C word
Post by: magic on September 27, 2019, 07:11:43 am
Seems like a compiler bug then.
Mark the struct member volatile and see what happens :-DD
Title: Re: C word
Post by: SiliconWizard on September 27, 2019, 04:59:38 pm
Following up on the AARCH64 GCC bug.

After considering filing a bug report, I searched first and found it: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71727
Looks like it's been there for 3 years at least. From what someone says, it was supposed to get fixed in GCC 7, but the case was never close, and I can confirm it's definitely NOT fixed. The bug report says at -O3, but I can confirm it's present at -O2 as well. Weird as this bug has remained rampant, and no clear situation. You can see someone asked to close the case end of 2018, but IT IS NOT fixed, and the status has not been updated since 2016. That doesn't really bode well...

We must now kindly request an account to file for bugs, so I did. Hoping I'll get one. ::)

Others have recently run into the issue (but apparently not the bug, although they don't say if the option -mstrict-align, in their case, actually really fixed the problem, nor at which optimization level they build their tools...)

https://patchwork.kernel.org/patch/11133703/
Title: Re: C word
Post by: Nominal Animal on September 27, 2019, 05:31:32 pm
Aaand now you know why I no longer submit patch fixes to GCC (unless I could discuss it with someone with commit access, I guess).  :horse:
Title: Re: C word
Post by: SiliconWizard on September 27, 2019, 05:34:28 pm
(...)
the above comes from a bsd tool.

    char **position  = (char * *) cookie;

 :palm: :palm: :palm:

If you need to rewrite it and keep its interface intact (well, almost), this is how I would basically do it (using your pointer types style):

Code: [Select]
typedef void * p_GenericBuffer_t;
typedef char * p_szBuffer_t;
typedef const char * p_cszBuffer_t;

size_t fwritemem(p_GenericBuffer_t *pCookie, p_cszBuffer_t pBuffer, size_t nSize)
{
    char *pMem;
    size_t nBytesWritten;
   
    if ((pCookie == NULL) || (*pCookie == NULL) || (pBuffer == NULL))
    return 0;
   
    pMem = *pCookie;
   
    for (nBytesWritten = 0; (nBytesWritten < nSize) && (*pBuffer != '\0'); nBytesWritten++)
    {
        *pMem++ = *pBuffer++;
    }
   
    *pCookie = pMem;
   
    return nBytesWritten;
}

Note that the typedef'ed pointers need the qualifiers (ie.: const, volatile...) to be part of the typedef declaration to work as expected. This way, no problem (you can check here that the compiler won't let you write to the buffer pointed to by pBuffer). You just need one type by qualifier/set of qualifiers. And I'm even beginning to like your notation, or at least not dislike it as much as I used to. ;D

Note that thanks to the pCookie declaration you can't pass anything else to pCookie than a pointer to a pointer (to anything...), instead of just a pointer to anything, which was pretty bad.
Title: Re: C word
Post by: SiliconWizard on September 27, 2019, 05:35:34 pm
Aaand now you know why I no longer submit patch fixes to GCC (unless I could discuss it with someone with commit access, I guess).  :horse:

Well, that's good to know. I'm a bit disappointed. Does LLVM support AARCH64 targets? If so... I might consider it.
(For the record, I even built a cross GCC 9.2.0, and the bug is, unsurprisingly, still there...)

And maybe reporting it to ARM (since they disitribute "official" GCC binaries...) could get us further than to the GCC team directly? I'm sure they'd have more incentive to get this corrected...
Title: Re: C word
Post by: magic on September 27, 2019, 07:38:26 pm
That's a different bug, involving vectorization of 64bit loads at O3. The fix has been committed and you can see the testcase is still there in 9.2.0 (https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/gcc.target/aarch64/pr71727.c;h=05eef3e91919289e224cebc0d9bee77a3edd1324;hb=3e7b85061947bdc7c7465743ba90734566860821).

I would file another one, stating specifically it concerns some different optimization which activates at O2 and combines 16b loads into 32b loads or whatever the problem is.

Also, have a short demo snippet ready and be sure not to look like a moron who doesn't know what he is doing ;D
That's sadly the reality open source projects have to deal with today. Commercial customer support frankly too, that's why they treat you like an enemy :D
Title: Re: C word
Post by: SiliconWizard on September 27, 2019, 07:54:17 pm
That's a different bug, involving vectorization of 64bit loads at O3.

Not it's not. It's the exact same bug: the fact that -mstrict-align doesn't always PREVENT unaligned accesses at certain optimization levels, which is what the overall bug is all about.
It is titled "-O3 -mstrict-align produces code which assumes unaligned vector accesses work" (it should add -O2, but whoever posted that probably didn't run into it or didn't care trying.)
It's exactly the same root problem.

We don't fucking care which exact optimization is concerned. It should activate NONE that can yield to unaligned accesses when "-mstrict-align" is enabled. Period. And again it's not just at -O3. I think they completely missed the point because they didn't look at the whole picture, but only a small part of it (not saying it's easy, GCC code is a monster), which is exactly what you just did as well. And it's probably why the ticket is not closed. It may have opened a can of worms.

Title: Re: C word
Post by: magic on September 27, 2019, 08:17:04 pm
There is most likely no magic which can prevent a dozen of optimizations from issuing unaligned loads. There is a global disable flag, each module has to handle it correctly for the whole thing to work.

Maybe we should let the original submitter speak about what his bug is or isn't about:
Quote
Hello all,

When using aarch64 gcc with -O3 the optimization tries to make use of 16byte vfp unit commands for accessing data structures containing 8byte data members, where possible, as shown in the example below.

I only told you what to do if you want it fixed :P
Title: Re: C word
Post by: Nominal Animal on September 27, 2019, 08:46:47 pm
I'm a bit disappointed.
When you have a patch that is tested to work, and just cannot get anyone with commit access to review and apply it, it becomes very frustrating.

If you are interested in workarounds, you could use
    #pragma GCC push_options
    #pragma GCC optimize("-O1")
    /* function implementation */
    #pragma GCC pop_options
around the affected functions, if -O1 stops GCC from miscompiling the code.

I fully expect that if a maintainer looks at that bug, they will close it because both the C standard and the architecture ABI allow it.  That's what usually happens; thinking about whether the allowed behaviour makes sense or not is .. atypical? rare? odd?.  Unless the behaviour hits code they themselves use or maintain, of course.

On the other hand, if you stalk a bit, and find someone who has commit access, and has fixed similar issues before, contacting them about a possible fix (if you have one) can make a HUGE difference.  Be brief, to the point, and leave links to bugs you've fixed, so they can evaluate your input with minimal effort.  Consider GCC a horribly badly managed organisation, and find a connection to a suitable person, in other words.  (I couldn't do that, because dealing with people instead of issues when solving a purely technical problem is one of my buttons.)
Title: Re: C word
Post by: SiliconWizard on September 27, 2019, 10:56:59 pm
There is most likely no magic which can prevent a dozen of optimizations from issuing unaligned loads. There is a global disable flag, each module has to handle it correctly for the whole thing to work.

No magic indeed, just hard work. Didn't say it was easy. Just saying that when some bug like this happens, it's often good measure for developers to analyze all the possible impacts.
Here, an option not being correctly handled in all cases smells fishy - it's good measure to anticipate that it may be worse than just in ONE particular case someone submitted. (Otherwise you work like in an agile team, strictly sticking to the small issue you're handed with, and NEVER trying to see what could be behind it. Thus the same bug could be submitted again and again for almost ever, everytime with just a slight change  - but the same root cause. Nice, that gives work for some people undefinitely.)

And that said, I got a prompt reply and my account was created  like just one hour later. I submitted a new bug to make sure it would not get lost, described it, and just added a reference to the other one so they can see a link between them.

I'll see how that goes.
Title: Re: C word
Post by: SiliconWizard on September 27, 2019, 11:07:41 pm
If you are interested in workarounds, you could use
    #pragma GCC push_options
    #pragma GCC optimize("-O1")
    /* function implementation */
    #pragma GCC pop_options
around the affected functions, if -O1 stops GCC from miscompiling the code.

Yeah, I know about that one... I just don't like that. Since the bug is there, there is no way of making sure it won't happen on any other part of the code. Thus sticking to -O1 for the whole code (at least the part that needs strict alignment) is the only option. Which wouldn't be horribly bad if I was actually SURE it won't happen with -O1. Or even -O0. Oh well...

I fully expect that if a maintainer looks at that bug, they will close it because both the C standard and the architecture ABI allow it.

I don't agree here. They have an option for that, "-mstrict-align". And many targets actually require strict alignment. The need is there. So either they remove the option, say fuck off that's allowed, and move on, or they fix it. This option is explicitely supported for aarch64. https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

And as I pointed out, there were other occurences of similar bugs that were reported. The closest I posted has never been closed...

Thing is, GCC is free and open source, so you get what you get. BUT it's the official compiler supported by ARM, which is a commercial company, so that's a bit more annoying there. If I'm using ARM products, which compiler(s) am I supposed to use and be confident in?
Title: Re: C word
Post by: westfw on September 28, 2019, 12:43:33 am
Quote
They have an option for that, "-mstrict-align". And many targets actually require strict alignment. The need is there. So either they remove the option, say fuck off that's allowed, and move on, or they fix it.
Obviously correct.  How "-mstrict-align" behaves is not defined inside the C standard; it's strictly a gcc option.  If it exists, it should work correctly.
Sometimes one gets the feeling that Compiler Gurus lose site of things like "the need is there" while searching for some sort of academic purity or something.  :-(

Quote
That's a different bug, involving vectorization of 64bit loads at O3.
No it's not. It's the exact same bug: the fact that -mstrict-align doesn't always PREVENT unaligned accesses at certain optimization levels, which is what the overall bug is all about.
You won't get any faster response by claiming it's the same bug if they already actually fixed one instance, and the newly-found symptom requires patching a different section of code...
Go ahead an submit a new bug; if one of the developers thinks it's the same thing, they can mark it as so, but a new bug describing it as "simlar to this other thing (see, I did search for it!)" is a better way to get response...  (IMO, having been on both sides of similar arguments...)
Title: Re: C word
Post by: Nominal Animal on September 28, 2019, 04:00:04 am
Focusing the report on that -mstrict-align does not work, and that causes code to miscompile in situations where there are stricter alignment requirements than normally supported by the ABI, might work.  That said, if I lurked correctly, there seems to already be a dev with commit access (at least to the git mirror; not sure about the still-upstream SVN repo), which means there is hope to get this particular fix upstream.  To which version of GCC remains to be seen, though.

I must say that Steve Kargl did commit a gfortran bug fix the same day I posted it (with an existing test case) years ago, so GCC does have very good, very friendly developers, too.  I am just saying that getting their attention AND getting them to think about the bug and its fix, as a nobody off the interweb, just isn't easy.  Even when you get their attention, if the standard allows silly nasal daemons, their first reaction (in my very limited experience years ago) is usually to ignore the underlying problem, because the spec says they do not have to care.  That said, I suspect that people with better social skills have better luck with that than I had.

Years ago, I tried to get some discussion going for a new built-in complimentary to memcpy()/memmove(), one that would repeat the initial data in the buffer to the rest of the buffer.  This would be particularly useful for floating-point fills, as the initial assignment would catch any signaling conditions, and the new built-in handle the duplication of the value to the rest of the buffer.  It got no traction in glibc, because it really is something the C compiler should do internally when optimizing the C code; and no traction in GCC because the standard doesn't define one, and it isn't even in glibc.  Yet, if you look at linear algebra code, that function would make an actual difference on some processors; even changing the result of C:Fortran efficiency comparisons.  (Again, tests were done about a decade ago, so might not be true anymore.)

If linear algebra in C is something one of you is interested in, check these (https://stackoverflow.com/a/34862940) data structures I outlined a few years ago.  It's basically a way to describe vectors and matrices as references to refcounted data, with any order.  You can have a matrix with a vector of its diagonal elements, say -- not copies; to the exact same data --, and since the data is refcounted, both that matrix and vector are "first-order" data types, unlike e.g. GSL "views".  As long as you remember to discard each matrix and vector when you no longer need (those particular variables), the data is managed for you, almost like garbage collection.  (In fact, it does support allocation pools, so instead of managing it individually, you can discard an entire pool at once, ignoring the reference counts.)  You don't need to discard them in any particular order, either; whenever you are sure you won't access that matrix/vector variable is fine.  I was thinking of expanding this into a full linear algebra library (BLAS/LAPACK/GSL), optionally using IMKL or ACML as a back end, maybe even FFTW3 for 1D and 2D transforms on the data, but kinda just lost steam due to lack of interest...
Title: Re: C word
Post by: magic on September 28, 2019, 07:49:34 am
No magic indeed, just hard work. Didn't say it was easy. Just saying that when some bug like this happens, it's often good measure for developers to analyze all the possible impacts.
Here, an option not being correctly handled in all cases smells fishy - it's good measure to anticipate that it may be worse than just in ONE particular case someone submitted.
Yeah, maybe. In a one man project - certainly.
But if GCC is anything like Linux, there may not even be one person familiar with all the codebase. Something was reported about the vectorizer, some vectorizer guy picked it up and pushed a fix, we are happy now.
Maybe he left the ticket open to remember about checking other modules at some later opportunity :D
Maybe he is just lazy and doesn't close fixed bugs.
Your problem may be so old that nobody even suspected it to exist given the lack of complaints.
Or on the contrary, your bug may have been created after 2016.
There is a million possible reasons.

agile
|O :scared: :-DD

Thing is, GCC is free and open source, so you get what you get. BUT it's the official compiler supported by ARM, which is a commercial company, so that's a bit more annoying there. If I'm using ARM products, which compiler(s) am I supposed to use and be confident in?
I think they offer some commercial compiler too if you don't mind the $$$. Then you presumably get some kind of official support.
And if you look around they may actually provide some means of reporting problems with their GCC build, particularly if it's some customized version like Linaro's.

Years ago, I tried to get some discussion going for a new built-in complimentary to memcpy()/memmove(), one that would repeat the initial data in the buffer to the rest of the buffer.  This would be particularly useful for floating-point fills, as the initial assignment would catch any signaling conditions, and the new built-in handle the duplication of the value to the rest of the buffer.
But you know, ignoring feature requests is not exactly the same as ignoring obvious bugs ;)
Title: Re: C word
Post by: Gandalf_Sr on September 28, 2019, 10:06:49 am
An interesting thread.  The OP was asking about learning C and I can recommend this website https://publications.gbdirect.co.uk//c_book/

I'm an embedded hardware+firmware engineer and started years ago using assembly; at one point I (masochistically) wrote an entire alarm clock program in assembly (I still have it) and I've since re-written it all in C as an exercise in realizing how much simpler things can be.  Having written in, and warmed up to C (GCC) over the last 5 years, I am happy to have made the transition although there are still aspects that I'm unsure of.  In particular, pointers and passing pointers as arguments has caused me confusion and grief.  I've also been caught out a couple of times by running outside the array bounds, especially when dealing with 'strings' as arrays of char e.g.

char mystring[128];

C will let you write code that tries to access mystring[300] and, if you write to such a location, your program will often crash or hang. Unfortunately, solving such issues is never as simple as finding a line with an obvious out of bounds index such as I just gave.

If anyone wants to help in my evolving education, please feel free to offer reading subjects or advice.

In short, C is great when you're trying to control low-level stuff like ports and serial (I2C and SPI) devices but it can be a challenge to find bugs like the one above.

FWIW, I originally worked in Microchip's MPLAB (now MPLABX) but a few years ago, I moved to working with Cypress PSoC devices using Cypress' PSoC Creator (and now MODUS), all using the GCC (free) compiler.  I have to say I am VERY happy with PSoC Creator and the PSoC device family 4, 5, and now dual core 6; they are amazing devices.  My most complicated recent project is on PSoC 6 where I write 2 separate main.c programs, one for each core, with rules about which core starts first and gives the other core permisson to start and allows the core to 'own' peripherals.  Setting up shared RAM was not straightforward as it should have been but I figured it out and it works really well.
Title: Re: C word
Post by: legacy on September 28, 2019, 12:23:53 pm
char mystring[128];

C will let you write code that tries to access mystring[300] and, if you write to such a location, your program will often crash or hang

For my lib_tokenizer (http://www.downthebunker.com/reloaded/space/viewtopic.php?f=45&t=422) I created a dedicated library called "lib_safestring", whose data is  structured this way:

Code: [Select]
typedef struct
{
    p_char_t p_context;
    uint16_t position_curr;
    uint16_t len;
    uint16_t size;
} safestring_t;

So a string is really a "object_string", with its methods to access it's data structure.

It adds more function-calls, but you are sure your algorithm is "string-safe" and not "off-by-one" regarding accessing strings.

Code: [Select]
safestring_ans_t safestring_context_assign
(
    p_safestring_t p_safestring,
    p_char_t p_context
)

Code: [Select]
p_safestring_t my_safestring;
char_t msg[]="hAllo world";
safestring_ans_t is_ok;

is_ok = safestring_context_assign(my_safestring, msg);

a single char can only be accessed via

Code: [Select]
safestring_ans_t safestring_geth_ch
(
    p_safestring_t p_safestring,
    uint16_t position
)

If your string is of the size M, and you try to access M+n, this function will panic, so you will be aware of the bug.


Besides, there are new methods (well "functions") to
- append char to the string
- reverse the string
- get a substring
- search in the string
- compare two safestrings
- compare a stafestring with an array of char
- hash the string
- test if the string is empty
- empty the string
- etc

You need to destroy a safestring once you no more need it, this it in order to release resources.
Title: Re: C word
Post by: Nominal Animal on September 28, 2019, 01:31:52 pm
But you know, ignoring feature requests is not exactly the same as ignoring obvious bugs ;)
Obviously! I included that anecdote to illustrate how getting a discussion going is the hard part.  If you do already have a bug fix at hand, just posting it to gcc-bugs, gcc-patches, or to the bugzilla entry is not enough; you need to find a dev with commit access and get them interested/involved, too.  You might even have to skate by a couple who are slightly too aggressive in pruning unnecessary requests and bug reports, though.

It is a social issue, not a technical one, and is rather common.  For example, the bugs in drivers/md/md-bitmap.c:md_bitmap_status(), fs/proc/task_mmu.c:show_map_vma(), fs/proc/task_mmu.c:show_numa_map(), fs/proc/task_nommu.c:nommu_vma_show(), and elsewhere in the Linux kernel -- that the set of escaped characters given as the third parameter for the seq_file_path() function should always be either empty, or include a backslash ("\\") to ensure all possible file paths are unambiguously represented --, still exist (just checked) and have actually even proliferated (copied to new places in the kernel), even though I reported these with a patch to LKML in 2016.  I got completely ignored, because I am a nobody.  (The bug allows a process to disguise the true path to its executable, as the incorrect escaping means paths with e.g. "\\012" or "\n" in them are both represented as the former.  It is especially nasty when the real full absolute path is longer than PAGE_SIZE in the escaped form, but shorter in the latter form, as it is darned hard to automatically detect then.)

Because this is detrimental to the community, the Linux kernel community grew kernel-janitors and kernel-newbies to avoid that.  I do not know of any similar effort with GCC.  I really should repost those bugs to kernel-janitors, but since being ignored is one of my personal buttons, I haven't.  It is one of my personality flaws, I know :-[.
Title: Re: C word
Post by: Nominal Animal on September 28, 2019, 01:38:35 pm
For my lib_tokenizer (http://www.downthebunker.com/reloaded/space/viewtopic.php?f=45&t=422) I created a dedicated library called "lib_safestring", whose data is  structured this way
Why uint16_t, and not size_t?
Or, if you need the ability to shrink the memory footprint by limiting the string length, a SLEN_T macro type, with
    #ifndef  SLEN_T
    #define  SLEN_T  size_t
    #endif
just before the structure definition?

Please don't be a Bill and tell me "65535 bytes is long enough for everyone"!  >:D

That said, I prefer the C99 flexible array member idiom, with slightly Pascal-y string definition:
Code: [Select]
typedef struct {
    size_t  size;  /* Number of bytes allocated for the data */
    size_t  used;  /* Number of bytes used; i.e., length; not including trailing nul char */
    char    data[];
} mystring;
and usually don't make a difference between a NULL mystring pointer, and pointer to a mystring with used==0.
Title: Re: C word
Post by: SiliconWizard on September 28, 2019, 02:13:02 pm
Just for the record - my bug report got accepted and confirmed in just under 3 hours from the moment I submitted it to the moment it got confirmed. So, I'm pleasantly surprised here, especially since it was already week-end time. ;D

Sure they asked for a minimal test case, which I had actually done to make sure I had correctly pinpointed the bug. I provided it, and then they confirmed the problem very quickly.

Now I don't know how long it will take for it to get fixed, and if it ever does, but at least they recognized the bug very promptly, admitting that the misalignment check was sort of borked.
Title: Re: C word
Post by: SiliconWizard on September 28, 2019, 02:29:00 pm
Why uint16_t, and not size_t?
Or, if you need the ability to shrink the memory footprint by limiting the string length, a SLEN_T macro type, with
    #ifndef  SLEN_T
    #define  SLEN_T  size_t
    #endif
just before the structure definition?

Agree.

That said, I prefer the C99 flexible array member idiom, with slightly Pascal-y string definition:
Code: [Select]
typedef struct {
    size_t  size;  /* Number of bytes allocated for the data */
    size_t  used;  /* Number of bytes used; i.e., length; not including trailing nul char */
    char    data[];
} mystring;

This construct can be handy. Before it got standardized, it was often already possible as an extension in many compilers.
It avoids having to allocate TWO objects for just ONE. (Like, if you want to dynamically allocate a "string", in legacy's approach you have to issue TWO allocations (one for the wrapping structure, one for the string buffer itself...) which is not that great performance-wise (multiplying small dynamic allocations is usually a bad idea with most allocators.) (And of course I'm already expecting a reply that he doesn't do dynamic allocations anyway, or that they do a very specific kind of allocations that are OK with this, etc. ;D )

And before extensions, the usual way of doing this was to declare just a one-item array for the last member, such as "char data[1];". When allocating the "dynamic" object, you would just take this extra byte into account for the allocation. The downside with that old approach was that any static checking for the indexes would yield warnings all over the place (but old compilers rarely had very clever static checkers anyway...), whereas with data[], static checkers won't care about indexes (which is, OTOH, not that much more useful, but you gotta know what you're doing in C...)

As to legacy's code, as I got it, we have to understand they seem to be heavily relying on specific analysis tools, and not on really standard stuff, making the usual (some of which I gave) advice and good practices relatively non-relevant. (So in that aspect, we tend not to talk about the same thing exactly, as it seems legacy doesn't actually really care about standard C per se..., thus possible misunderstandings.)

Title: Re: C word
Post by: legacy on September 28, 2019, 02:40:07 pm
Please don't be a Bill and tell me "65535 bytes is long enough for everyone"!  >:D

There is an hidden motivation for this: 16bit is the max I can allocate for a string-object, this due to how the object-collector does its job under the hood.

One can replace the size as he/she whises, but size_t is nod defined in any of my environment, usually because it confuses HOOD-ICEs during debugging sessions.
Title: Re: C word
Post by: SiliconWizard on September 28, 2019, 03:44:10 pm
char mystring[128];

C will let you write code that tries to access mystring[300] and, if you write to such a location, your program will often crash or hang. Unfortunately, solving such issues is never as simple as finding a line with an obvious out of bounds index such as I just gave.

This is a very common case of buffer overflow. Sure C is known to allow this much more easily than some other languages, but you still have to realize that buffer overflows can also happen in other languages - even some that claim otherwise.

As a tip, again, use static analyzers as much as you can when writing C. They won't catch everything of course, but are still a very big plus as opposed to doing nothing at all, or expecting to catch everything "obvious" by eye. Some are very expensive, but even the free ones can be immensely helpful. Just run one on some of your code base, and see how many potential problems it can spot. You may be in for a surprise.

As an example - yes it's trivial, but static analyzers can do much better than this:
Code: [Select]
void Test(void)
{
char DamnBuffer[128];
int i;

for (i = 0; i < 1000; i++)
DamnBuffer[i] = i;
}

gcc -Wall will yield: "warning: iteration 128 invokes undefined behavior" ( 7 |   DamnBuffer = i; )
cppcheck will yield: "error: Array 'DamnBuffer[128]' accessed at index 999, which is out of bounds." (note that cppcheck gives you the last index out of bounds, gcc the first, but it works in any case. If you replace the test with i <= 128, it will catch it as well.)

Surprisingly, clang-check (which is not that bad) doesn't catch anything here! Maybe it's an option thing? I'm not too familiar with it yet...

gcc, clang and cppcheck are all free. There are many other tools, some expensive, but the free ones can already get you significantly ahead. Try them.

And of course, there can be many cases of potential buffer overflows that can happen at run-time and that are almost impossible to spot with static analysis. Some tools exist for dynamic code analysis, but they are shit expensive... The additional thought about this, is that in many languages that have inherent "protection" against buffer overflows, attempting one may just yield an exception. Sure it's sometimes better than possible unexpected code execution, but exceptions, depending on how the programmer handles them, can still crash the program, or make it quit. It may be "safer" than random execution in some cases, not so much if the application must be guaranteed to run at all times. Also, some higher-level languages rely on heavy runtimes that, themselves, can have buffer overflows in some cases... (the more complex they are, and often the higher the probability you'll run into an issue with the runtime, on top of possible issues in your own code...)

Just saying that beware of silver bullets. When used as such, they may just yield as frustrating results as C will. Know your tools and use them properly, be them C or whatever else...
Title: Re: C word
Post by: magic on September 28, 2019, 04:30:01 pm
And of course, there can be many cases of potential buffer overflows that can happen at run-time and that are almost impossible to spot with static analysis. Some tools exist for dynamic code analysis, but they are shit expensive...
valgrind is free but you aren't going to run it on an MCU.
Title: Re: C word
Post by: legacy on September 28, 2019, 05:22:12 pm
Code: [Select]
struct header 
{
    size_t len;
    unsigned char *data;
};

   struct header *p;
   p = malloc(sizeof(*p) + len + 1 );
   p->data = (unsigned char*) (p + 1 );  // memory after p[0] is used for data

vs this approach
Title: Re: C word
Post by: magic on September 28, 2019, 05:23:52 pm
It is a social issue, not a technical one, and is rather common.  For example, the bugs in drivers/md/md-bitmap.c:md_bitmap_status(), fs/proc/task_mmu.c:show_map_vma(), fs/proc/task_mmu.c:show_numa_map(), fs/proc/task_nommu.c:nommu_vma_show(), and elsewhere in the Linux kernel -- that the set of escaped characters given as the third parameter for the seq_file_path() function should always be either empty, or include a backslash ("\\") to ensure all possible file paths are unambiguously represented --, still exist (just checked) and have actually even proliferated (copied to new places in the kernel), even though I reported these with a patch to LKML in 2016.
:palm:
I have a suspicion that nobody reads that linux-kernel behemoth anymore. I would rather address the maintainer of seq_file_path directly + whatever is the relevant mailing list and argue that this function should always escape \ because otherwise things become ambiguous. (Also, how exactly am I supposed to escape \0 if the list is null-terminated? :scared:) But that's gonna touch other subsystems and perhaps some userspace users so have fun with that. Yeah, it sucks.
Title: Re: C word
Post by: PlainName on September 28, 2019, 07:21:18 pm
Code: [Select]
struct header 
{
    size_t len;
    unsigned char *data;
};

   struct header *p;
   p = malloc(sizeof(*p) + len + 1 );
   p->data = (unsigned char*) (p + 1 );  // memory after p[0] is used for data

I think p->data might potentially point to the wrong place if it assumes sizeof(size_t) == sizeof(void *).
Title: Re: C word
Post by: SiliconWizard on September 28, 2019, 07:59:13 pm
Code: [Select]
struct header 
{
    size_t len;
    unsigned char *data;
};

   struct header *p;
   p = malloc(sizeof(*p) + len + 1 );
   p->data = (unsigned char*) (p + 1 );  // memory after p[0] is used for data

I think p->data might potentially point to the wrong place if it assumes sizeof(size_t) == sizeof(void *).

Nope. It's pointing to (p + 1), which, if you know your pointer arithmetic right, is right AFTER the whole struct header, p being a pointer to struct header. The malloc allocates the whole struct header ( sizeof(*p) ) PLUS the required additional buffer.

The only thing that makes me shiver, and not in a good way, is that no check of the return value of malloc() is made. If it runs out of memory, it will just write at address 0, which is rarely a good thing.


Title: Re: C word
Post by: Nominal Animal on September 28, 2019, 10:48:17 pm
Using GCC, pointers seem to generate more efficient code than indexing on x86-64 (the difference is small, and I last checked this on GCC 4.9, so take it with a pinch of salt), so the structure I've sometimes used for file or socket input parsers/chunkers is
Code: [Select]
typedef struct {
    unsigned char *next;  /* First buffered unread character */
    unsigned char *ends;  /* End of buffered data */
    unsigned char *data;  /* Dynamically allocated buffer */
    size_t         size;  /* Size of the dynamically allocated buffer */
    int            fd;    /* POSIX.1 file or socket descriptor */
    unsigned int   errs;  /* Error events, bitmask */
} inbuffer;
#define  INBUFFER_INIT  { NULL, NULL, NULL, 0, -1, 0 }
The buffer size is dynamically managed by an inbuffer_need(inbuffer *, size_t) function, which ensures that there are at least the specified number of bytes buffered, unless end-of-input is encountered (which is one of the events in the errs bitmask).  The same function obviously reads from the input stream when necessary, and before reallocating the buffer, moves existing data so that next==data.  This means that after consuming leading whitespace, the parser can do an inbuffer_need(&buf, MAX_TOKEN_LENGTH+1) call, and be assured that the entire token is in the buffer, starting at buf.next.

Data from the buffer is consumed using two helper functions, inbuffer_next(inbuffer *) and inbuffer_skip(inbuffer *, size_t):
Code: [Select]
static inline int  inbuffer_next(inbuffer *ib)
{
    if (!ib)
        return -1;
    else
    if (ib->next < ib->ends)
        return *(ib->next++);
    else
        return inbuffer_next_slow(ib);
}

static inline size_t  inbuffer_skip(inbuffer *ib, size_t n)
{
    if (!ib)
        return 0;
    else
    if (ib->next + n <= ib->ends) {
        ib->next += n;
        return n;
    } else
        return inbuffer_skip_slow(ib, n);
}
These use two internal "slow" helper functions,
Code: [Select]
enum {
    INBUFFER_EOF = 1<<0,
    INBUFFER_ENOMEM = 1<<1,
};

static int  inbuffer_next_slow(inbuffer *ib)
{
    if (ib->errs) {
        return -1;
    }

    if (ib->next < ib->ends) {
        if (ib->next > ib->data) {
            const size_t  have = ib->ends - ib->data;
            memmove(ib->data, ib->next, have);
            ib->next = ib->data;
            ib->ends = ib->data + have;
        }
    } else {
        ib->next = ib->ends = ib->data;
    }

    if (ib->ends - ib->data >= ib->size) {
        /* Omitted: increase ib->size, realloc ib->data. */
    }

    /* Omitted: Read up to (ib->data + ib->size - ib->ends) bytes,
       increment ib->ends by the number of bytes read. */

    /* If successful, return *(ib->next++), otherwise -1. */
}

static size_t  inbuffer_skip_slow(inbuffer *ib, size_t n)
{
    size_t  skipped;

    if (ib->ends > ib->next) {
        skipped = ib->ends - ib->next;
        ib->next = ib->ends = ib->data;
    } else
        skipped = 0;

    /* Omitted: Skip up to (n - skipped) bytes of input. */

    /* Return the actual number of bytes skipped;
        this is either n, or smaller (in case of end of input). */
}
Depending on the C library version, this can be much faster than standard C fgetc() input.

When parsing binary data structures, like PNG chunks, you do a inbuffer_need(&png, 8); to ensure png.next points to the 4-byte big-endian length and 4-byte chunk type; then inbuffer_need(&png, 12 + length); to read the entire chunk in memory (assuming acceptable wrt. memory use).  To move to the next chunk, you do inbuffer_skip(&png, 12 + length);  I use accessor functions to obtain the 32-bit values from png.next + offset in big-endian byte order (cast and shift each unsigned char, then OR them togeter; usually generates pretty efficient code).

Funniest thing is, I still haven't found a really good way to make the reallocation and low-level read() chunk size policy "automatic".  Such policies are hard!  It is always a tradeoff between efficiency and excessive memory use.  The st_blksize field in the struct stat for the descriptor is a good start, but sometimes you want to use minimum amount of memory even if it means more system calls (and thus slower program); sometimes you know the user will have lots of RAM available anyway for the data to be parsed and there being lots of it, it is better to waste some memory but be as fast as possible.  I can't even leave the decision to the programmer (even if it is myself), because they cannot be arsed to think about how to find that out (from the user); much less implement e.g. suitable compile-time defaults and command-line override options.  I am leaning towards adding minsize, maxsize, and a function pointer to a resize() size policy function to the structure, with compile-time defaults.
Title: Re: C word
Post by: PlainName on September 29, 2019, 07:59:44 am
Quote
PLUS the required additional buffer

That's the size_t  len, isn't it?
Title: Re: C word
Post by: hamster_nz on September 29, 2019, 09:25:28 am
Code: [Select]
struct header 
{
    size_t len;
    unsigned char *data;
};

   struct header *p;
   p = malloc(sizeof(*p) + len + 1 );
   p->data = (unsigned char*) (p + 1 );  // memory after p[0] is used for data

vs this approach

I'm slightly wondering why you have the "unsigned char *data;" rather than "char data[1];" where you don't have the pointer, but can do much the same by allocating extra data:

Code: [Select]
struct str {
   int  len;
   char data[1];
};

struct str *str_new(char *text) {
   int len = strlen(text);
   struct str *s;

   s = (struct str *)malloc(sizeof(struct str)+len);
   if(s == NULL) return NULL;

   s->len = len;
   strcpy(s->data,text);

   return s;
}

I' would much rather have a "char data[0]", so the allocation becomes much more like the C standard pattern of adding an extra byte for the terminator,:

Code: [Select]
struct str {
   int  len;
   char data[0];
};

  ...
   s = (struct str *)malloc(sizeof(struct str)+len+1);
  ...

It avoids padding the structure which adds extra bytes to the allocation but a zero-element array throws warnings (at least with GCC).
Title: Re: C word
Post by: SiliconWizard on September 29, 2019, 02:50:36 pm
I'm slightly wondering why you have the "unsigned char *data;" rather than "char data[1];" where you don't have the pointer, but can do much the same by allocating extra data:

I think you're going through the same mental process as I did... you have to realize that legacy uses specific tools to check their code, and I guess these tools would be completely baffled by the use of flexible array members... (legacy will probably confirm this...) So again, judging the code he posts with our standard C approach doesn't really work. (And even though it's always interesting to see how others work and what tools they use, I admit this can be confusing here to many that read legacy's posts and don't know or understand this fact.)

Title: Re: C word
Post by: SiliconWizard on September 29, 2019, 03:01:02 pm
Quote
PLUS the required additional buffer

That's the size_t  len, isn't it?

I'm not sure I'm following you, and even less what sizeof(size_t) would have anything to do with the correctness of his code?
He was just posting a short (and apparently incomplete) code piece. 'len' in the "p = malloc(sizeof(*p) + len + 1 );" statement is a variable or parameter that is not shown in the code he posted. It's certainly not the member of the 'struct header', that would be unitialized here anyway. What exactly dd you have in mind?

Here is a possible modified version which I think shows correctly and more completely what he meant:
Code: [Select]
struct header
{
    size_t len;
    unsigned char *data;
};

struct header * AllocateString(size_t len)
{
   struct header *p;

   p = malloc(sizeof(*p) + len + 1 );   // or "malloc(sizeof(struct header) + len + 1)": exactly the same, but I prefer the former version for code maintenance if you ever change the base type of p
   if (p == NULL)   // basic checks that should not be omitted IMO
      return NULL;
   
   p->len = len;
   p->data = (unsigned char*) (p + 1 );

   return p;
}
Title: Re: C word
Post by: Nominal Animal on September 29, 2019, 04:52:58 pm
I would rather address the maintainer of seq_file_path directly + whatever is the relevant mailing list
Yup, kernel-janitors, what git blame gives for the relevant lines (to ping users of seq_file_path() about the issue), plus Al Viro since he's the maintainer for fs/seq_file.c per MAINTAINERS (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/MAINTAINERS).

argue that this function should always escape \ because otherwise things become ambiguous.
No, because seq_file_path() (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/seq_file.c) is also used without any escaping at all.  That is, the backslash-escaping is optional, and only enabled if the escape string is non-empty.

Also, how exactly am I supposed to escape \0 if the list is null-terminated?
Nul is never escaped.  All strings in the Linux kernel are nul-terminated, and this particular interface is used to provide information in kernel-provided human-readable pseudofiles, as a single string; neither the content nor the paths have embedded nuls or escaped nuls.  This particular interface provides paths with characters escaped to make it easier to parse these files automatically; in things like the second field (in parentheses) in /proc/PID/stat and such.  The unescaped form is used for the /proc/PID/exe pseudo-symlink (and the bug allows a crafty process to execute a binary that fudges that to point to the incorrect file).

Not pushing for these bugs to be fixes really is kinda my fault, because I should have persisted, and pinged the relevant authors and kernel-janitors.  My point here is that for us not-very-social people, scaling the social hurdles is still quite a lot of work, and one should be prepared for that: it is normal and not personal.  These are communities, after all.

It was so much easier when being Politically Correct was not a requirement.  Nowadays, instead of asking a question like "okay, but why should we trust you? who the hell are you anyway?", it is much safer to just ignore submissions by nobodies, instead of react to them.  Many devs do not even read posts by people they don't know for that reason (unless CC'd directly).  The exception occurs when the dev-with-commit-access happens to take an interest in the subject, and immediately sees the technical merit in the post, and does not need to ask such questions.  Which has happened to me too before, and fortunately happened with SiliconWizard as mentioned here.
Title: Re: C word
Post by: PlainName on September 29, 2019, 06:27:53 pm
Quote
I'm not sure I'm following you, and even less what sizeof(size_t) would have anything to do with the correctness of his code?

Code: [Select]
p = malloc(sizeof(*p) + len + 1 )

p points to an array of memory which is the size of the header plus data length plus one extra for a null. OK so far.

Code: [Select]
p->data = (unsigned char*) (p + 1 )

The data member is now made to point to itself. p, remember, points to the start of the header, so that code says the data member is at the start plus the number of bytes in a pointer. The assumption here is that  it will be four bytes, and len is four bytes, so now p->data points to itself.

My suggestion is that size_t is not necessarily the same as the size of a pointer, so on a system where sizeof(size_t) != sizeof(void *) the code will be wrong. It's likely that a pointer would be smaller than size_t in that case (x86 segmented mode, perhaps?), thus p->data would point to the middle of p->len.

Title: Re: C word
Post by: westfw on September 29, 2019, 06:46:43 pm
Quote
p->data = (unsigned char*) (p + 1 )[/pre]The data member is now made to point to itself.
No.  Remember that (p+1) add the length of the structure being pointed to to the actual value in p.  (not 1.  Not "the size of a pointer.")
Thus the careful use of parenthesis: "(unsigned char *)p + 1" would do something different.So p ends up pointing to the section of memory just past the "header" (which was allocated.)

Title: Re: C word
Post by: PlainName on September 29, 2019, 06:59:09 pm
Ah! Easy mistake to make  :palm:
Title: Re: C word
Post by: SiliconWizard on September 29, 2019, 07:40:49 pm
Quote
I'm not sure I'm following you, and even less what sizeof(size_t) would have anything to do with the correctness of his code?

Code: [Select]
p = malloc(sizeof(*p) + len + 1 )

p points to an array of memory which is the size of the header plus data length plus one extra for a null. OK so far.

Code: [Select]
p->data = (unsigned char*) (p + 1 )

The data member is now made to point to itself. p, remember, points to the start of the header, so that code says the data member is at the start plus the number of bytes in a pointer. The assumption here is that  it will be four bytes, and len is four bytes, so now p->data points to itself.

No! That's not how pointer arithmetic works!
p + 1 will point to exactly the first byte after the whole header struct, and the data member will NOT point to itself! To remember that, remember 'p+1' is EXACTLY equivalent to &p[1].

For a pointer to a given type:

BaseType *p;

Byte-wise, this is how things work:

(char *)(p + 1) is equal to ((char *) p) + sizeof(BaseType)

Again, to remember pointer arithmetic, think "arrays".

And some strict rules (safety-related) actually recommend against using pointer arithmetic (to avoid fuck-ups) and only use the array-equivalent approach, which is less confusing to many.

So in his example:
"p->data = (unsigned char*) &p[1];" for instance.

But I can assure you it's exactly the same.
Title: Re: C word
Post by: Nominal Animal on September 29, 2019, 10:41:17 pm
For a pointer to a given type:

    BaseType *p;

Byte-wise, this is how things work:

    (char *)(p + 1) == ((char *) p) + sizeof(BaseType)
Yes!  And, since sizeof is a C operator that does not evaluate its argument expression except for its type, sizeof (Basetype) == sizeof *p, and thus
    (char *)(p + 1) == ((char *)p) + sizeof *p

Again, incrementing or decrementing a pointer by one, changes the memory address the pointer points to by the size of the type it points to.

Also, it may come as a surprise to some, but if you write e.g. sizeof(p++), it evaluates to the size of p, but p is not incremented. This is why I usually add a space after sizeof, to remind myself and others that it's not a function, but a special operator.
Title: Re: C word
Post by: Nusa on September 30, 2019, 12:16:43 am
For a pointer to a given type:

    BaseType *p;

Byte-wise, this is how things work:

    (char *)(p + 1) == ((char *) p) + sizeof(BaseType)
Yes!  And, since sizeof is a C operator that does not evaluate its argument expression except for its type, sizeof (Basetype) == sizeof *p, and thus
    (char *)(p + 1) == ((char *)p) + sizeof *p

Again, incrementing or decrementing a pointer by one, changes the memory address the pointer points to by the size of the type it points to.

Also, it may come as a surprise to some, but if you write e.g. sizeof(p++), it evaluates to the size of p, but p is not incremented. This is why I usually add a space after sizeof, to remind myself and others that it's not a function, but a special operator.

Truth. But as long as we're doing confusing C code: Try sizeof(int[x++]). x does get incremented. Do you know why?
Title: Re: C word
Post by: Nominal Animal on September 30, 2019, 01:27:12 am
Try sizeof(int[x++]). x does get incremented. Do you know why?
Yes. The int[expression] part describes a type, an array of ints, and the expression part must be evaluated to determine the number of elements in that array.  The entire expression evaluates to the size of that array.

Bugger me if I know whether that is according to the C standard or not (i.e. whether the increment should be visible outside the expression or not); I gave up language-lawyerism a long time ago.  It is such a corner case I would not dare assume all C compilers get it right.  Seeing any kind of increment, decrement, or assignment in a sizeof expression is a red flag to me: something's afoot, and that expression must be fixed.
Title: Re: C word
Post by: legacy on September 30, 2019, 06:30:26 am
The above is also the best way to make a HOOD-ICE confused.

Thank god, the Greenhills CC reports an error and refuses to compile.

You canna do it, you wanna not do it, don't do it. PleaZe  :D

edit:
even DiabCC/PPC refuses to compile.
Title: Re: C word
Post by: Gandalf_Sr on September 30, 2019, 08:30:12 am
All of which leaves mere C mortals like me very confused.
Title: Re: C word
Post by: SiliconWizard on September 30, 2019, 01:29:30 pm
Try sizeof(int[x++]). x does get incremented. Do you know why?
Yes. The int[expression] part describes a type, an array of ints, and the expression part must be evaluated to determine the number of elements in that array.  The entire expression evaluates to the size of that array.

Bugger me if I know whether that is according to the C standard or not (i.e. whether the increment should be visible outside the expression or not); I gave up language-lawyerism a long time ago.  It is such a corner case I would not dare assume all C compilers get it right.  Seeing any kind of increment, decrement, or assignment in a sizeof expression is a red flag to me: something's afoot, and that expression must be fixed.

As Nominal said, in the atrocious case sizeof(int[x++]), the type that is passed here to sizeof is a variable-length array (something that was introduced in C99 and that I rarely ever use, but that's not the point), so the compiler needs to evaluate x. As x is post-incremented, sizeof would strictly NOT require the post-incrementation to occur to evaluate the size; but I'm willing to think it's undefined behavior territory here.

As nasty as this construct looks, it still picked my curiousity, so I checked this out in the C99 standard.
And, actually, this is defined behavior! So when in doubt - check what the standard says.

Quoting:
Quote
The sizeof operator  yields  the  size  (in  bytes)  of  its  operand,  which  may  be  an
expression or the parenthesized name of a type. The size is determined from the type of
the operand. The result is an integer. If the type of the operand is a variable length array
type, the operand is evaluated; otherwise, the operand is not evaluated

and the result is an
integer constant.

Since the standard states that the operand is evaluated for VLAs, we can only assume that it's a FULL evaluation, not just a partial one for what would be strictly needed for sizeof (which could be hairier to implement that we may think anyway...)
Title: Re: C word
Post by: Nominal Animal on September 30, 2019, 03:45:46 pm
Yes, fully agreed with SiliconWizard above.

Perhaps some more waffling about the sizeof operator would be useful?  :P
If you are interested, the C99 standard with all corrigenda included is available as a PDF here (http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf).

The sizeof operator has two forms:
    sizeof (type)
    sizeof expression

The first form evaluates to the size of the type type.  It must not be an incomplete type, except that a structure with a flexible array member as the last member is allowed; then it evaluates to the size of the structure without that member (but including any padding before that member).
If the type is a variable-length array, then the expression specifying the array length is evaluated (C99 6.5.3.4p2 like SiliconWizard already explained above), and any side effects of that array length evaluation are visible outside the expression.

The second form evaluates to the size of the value the expression yields, but it cannot be a bit field or a function.  (Function pointer is fine.)
The expression itself is only evaluated for its type (unless it involves a variable-length array, in which case that length sub-expression is fully evaluated).  This means that if you want to know the size of the type that pointer p points to, use sizeof *p as it is always safe, even when p is NULL or undefined.

Even if you have something as odd as say struct foo ***p, you can use sizeof *p == sizeof (struct foo **), sizeof **p == sizeof (struct foo *), and sizeof *p = sizeof (struct foo).  Only the type matters.  No memory is ever examined, and the value of p is irrelevant.

As I already mentioned, I consider any increment, decrement, or assignment in the operand (right side) of a sizeof expression to be extremely suspicious: a sure sign of foul play.

There are three common patterns, that can confuse unaware C programmers.

Finally, when reading pointer types, split the definition at each *, read the type from rightmost part left, eplacing each * with "is a pointer to", to get the correct English human-readable definition.  For example, if you see something nasty like
    volatile struct bar *const *p;
you read it as "p is a pointer to a const pointer to a volatile struct bar".  A const pointer is a pointer that the code won't try to modify to point elsewhere, and volatile means the value can be changed at any time so the compiler must not generate code that caches/remembers the value. So, p points to a pointer that the code won't try to modify, and that pointer points to a struct bar whose value can change unexpectedly.  Both the members in that struct, and p itself, can be modified.
Title: Re: C word
Post by: SiliconWizard on September 30, 2019, 04:04:06 pm
Oh, the last part is important (although some people on here don't seem to even care about using const... ::) )

Don't confuse:
const TYPE *p;
with:
TYPE * const p;

 :D

In the former case, you can't write to the location pointed by p.
In the latter, you can write to the location pointed by p, but you can't modify p itself.

Of course you can mix both:

const TYPE * const p;

Have we lost anyone?
Title: Re: C word
Post by: emece67 on September 30, 2019, 05:40:13 pm
.
Title: Re: C word
Post by: Gandalf_Sr on September 30, 2019, 08:12:43 pm
Oh, the last part is important (although some people on here don't seem to even care about using const... ::) )

Don't confuse:
const TYPE *p;
with:
TYPE * const p;

 :D

In the former case, you can't write to the location pointed by p.
In the latter, you can write to the location pointed by p, but you can't modify p itself.

Of course you can mix both:

const TYPE * const p;

Have we lost anyone?
You lost me, can you please add a bit more explanation?
Title: Re: C word
Post by: SiliconWizard on September 30, 2019, 08:22:35 pm
Well, when using pointers, there are two possible ways a qualifier (such as const) can be applied:
- to the pointer itself,
- to the data pointed to by the pointer.

To differentiate, the C grammar looks whether the qualifier is BEFORE or AFTER the "*" symbol.
When it's after, it applies to the pointer itself. When it's before, it applied to what's being pointed to.

A common use of const with pointers can be found even in std functions such as memcpy():
typical prototype is like:
Code: [Select]
void * memcpy(void *pDest, const void *pSrc, size_t nLength);

The second parameter is 'const void *', which basically means the buffer pointed to by pSrc should be treated as read-only inside the function. For the implementers of the function, that means the compiler won't let you modify the data pointed to by pSrc. For the users of the function, that means you can safely assume it will never modify the buffer you pass as the second argument.

Title: Re: C word
Post by: Nominal Animal on September 30, 2019, 09:02:14 pm
Let's see:
    foo *p0;
    foo *const p1;
    const foo *p2;
    foo const *p3;
    const foo *const p4;
    foo const *const p5;

First of all, p2 and p3 have the exact same qualified type ("qualified" meaning the type including qualifiers like const or volatile), because you can put the const qualifier before of after the type name, and it'll still be the same thing.

Obviously, so have p4 and p5.

We cannot use foo *p6 const;, because the qualifiers must be listed before the variable name.

Pointer p1 is constant, but the thing it points to is not.  That is, p1 is a const pointer to foo.  This means that you cannot change the value of p1 , but you can modify the thing it points to.

Pointers p2 and p3 are pointers to const foo.  This means you can change the pointers (where they point to), but not the thing they point to.

Pointers p4 and p5 are const pointers to const foo.  This means you cannot change the pointers, nor the data they point to.



In function declarations and definitions, it is important to remember that in C, parameters are passed by value (but arrays decay to pointers), and any changes to the parameter values are only visible within the function, not to the caller.

To summarize, the differences between
    void  foo1(char *p) { ... }
    void  foo2(char *const p) { ... }
    void  foo3(const char *p) { ... }
    void  foo4(const char *const p) {...}
is that in the body part (...), foo2() and foo4() won't change the value of p; and foo3() and foo4() won't change the data pointed to by p.

You often see only foo1() and foo3() forms used, because current C compilers are smart enough to know when we don't try to modify the value of p, and do what they'd do if we had used foo2() and foo4() instead.  So, the compilers do not care much.  (This may not be true if you use a proprietary or old C compiler, though; you can check by compiling with and without, and comparing the generated object code.)

However, if the function is complicated, using foo2() and foo4() forms can help us humans, because then we know that the value of p (i.e, where it points to), will stay the same throughout the function; we know there isn't code that does say p++ or similar somewhere easily missed in the function body.
I personally like these even forms, because it helps me, and I do not trust myself to not make an arse of myself now and then; such practices help me catch my own stupidity.  And others', too, of course.



One interesting feature of C99 variable length arrays is that a function declaration like
    double  polynomial(const double x, const size_t n, const double coeff[n]);
is perfectly standard.  Within the function body, the compiler knows coeff is an array of n doubles, with n specified as the second parameter.
You do need to have the parameters that affect the array size before the array in the function parameter list, though; but it can be an expression like eg. 2*n+1.
(You can fudge the parameter order with a preprocessor macro, or use a wrapper function, but it can be VERY confusing unless well documented.)

This particular function is usually implemented as
    double  polynomial(const double x, const size_t n, const double coeff[n])
    {
        double  result = 0.0;
        double  arg = 1.0;
        for (size_t i = 0; i < n; i++) {
            result += coeff * arg;
            arg *= x;
        }
        return result;
    }
where we don't really need help from the compiler to check for array bounds; but this form allows a good compiler to check it at compile time, without run-time overhead!

It is much more useful when you do filters and such, accessing elements offset by an index, as a good compiler (gcc, clang) can usually detect if the indexes can pass outside the array bounds.  Also note how useful it is to see immediately that the function body won't modify x or n; they'll retain their values throughout the function.  (The coeff array is const, which means that the entries in it are not modified in the function; the "value" of coeff itself is constant, because arrays always decay to the pointer to its first element, and you cannot modify the array variable itself, only its entries.  That is, coeff++ is not allowed, because it is an array, and not a pointer.)

Furthermore, if you have say double *c;dynamically allocated for say k coefficients (k-1'th degree polynomial), you can call v = polynomial(x, c, k); because of how arrays decay to pointers, and the variable length array declaration like above "reconstitutes" it into an array for purposes of the called function.  Nice.
Title: Re: C word
Post by: Nominal Animal on September 30, 2019, 09:19:43 pm
Code: [Select]
void * memcpy(void *pDest, const void *pSrc, size_t nLength);
But note that using Hungarian notation (https://en.wikipedia.org/wiki/Hungarian_notation) for your variable and parameter names is, well, less than useful.

If you want to find the type of a variable, just use your code editor for that.  It is much better to use descriptive names, and leave the type out.  You see, if you trust the variable name, you risk creating bugs, because sometimes backwards compatibility requires the type of a variable to be changed, while the variable name must stay the same.  After the change, when someone needs to use that variable in a computation, they use the incorrect type based on the variable name.

The 32-bit to 64-bit shift (from ILP32 to LP64 or LLP64) was full of such bugs.  They hurt.  (Nowadays, we should use uintptr_t or intptr_t for unsigned or signed integer representations of pointers; size_t for sizes or lengths of in-memory things, and off_t for file sizes and positions.)

It's the same thing as trying to memorize, or being proud of remembering, library function prototypes.  memset(void *dest, int c, size_t n) is a perfect example.  The middle parameter is the byte value to be used to fill the region with, the last parameter being the size of that region in bytes; but would you be surprised to know that even in the Linux kernel, developers sometimes mix those two?

So no, don't use Hungarian notation, and don't try to memorize stuff.  Use your editor, and library references.  I like Linux map pages online (http://man7.org/linux/man-pages/index.html) for Linux, C99, and POSIX.1 stuff.  (The pages have a Conforming to section, which tell you which standards or systems provide the interface.)
Title: Re: C word
Post by: SiliconWizard on September 30, 2019, 09:58:07 pm
This is yet another completely fruitless debate that serves no purpose whatsoever. This one is virtually endless, you'll find people with good arguments at either side, and it's all a matter of style. To each their own. Like the opening braces at the end of lines, that I find horrendous looking - but it's still the preferred style of many. (Oh, I think my style is actually close to Stallman's one - with what's happening recently, that's probably not "politically correct" either?  ;D )

Did it make my posted code less readable and my points less accurate? I don't think so. All that matters here.

I personally find it infinitely more readable, and any code not using this looks actually sloppy to me. It's absolutely NOT made to help writing code, but to help READing it afterwards, so letting that cause you bugs would be completely mind-fuckingly stupid. It's for readabiity questions. And if you use it, use it properly. I don't see any more odds of getting this wrong than adding a +1 when a -1 was required. A fuck-up is a fuck-up.

Of course everyone has different taste and views on what looks readable or not (mind you, most code I run into looks atrocious to me anyway). Readability has some level of subjectivity as well. Relying on a specific editor is nice, but it doesn't help readability. You don't read a book while inspecting each word in a dictionary, that would be awful, so, this way I get a better overall view. It's not a replacement for checking that I use types correctly either.

Frankly I absolutely don't care about what you think of it, or getting into the typical flame wars that have infested usenet before forums even existed. (I think this one was popular.) It has served me pretty well for years. Now if you like writing code in Eclipse with a dark theme, that would be also your choice. Not mine.
Title: Re: C word
Post by: emece67 on September 30, 2019, 11:58:03 pm
.
Title: Re: C word
Post by: Nominal Animal on October 01, 2019, 01:24:31 am
This is yet another completely fruitless debate that serves no purpose whatsoever.
Hey, if it works for you, go for it; no need to get your asbestos suit on.  :-+

As to coding style, including naming, I'm used to adjusting to the existing style guide; it definitely helps getting ones patches accepted.
My recommendation to those learning C is to try writing code using different coding styles, because that will help a lot when collaborating with others.
Hungarian notation (type prefix in variable names) is not at all rare in C code, so try it as well.

Much more important than that, is to write comments that describe the programmer intent -- what the code should accomplish --, instead of explaining what the code does.  Learning to write good, useful comments is an invaluable skill, and it is much easier to learn early than late.  Me myself, I'm still working on that.

Did it make my posted code less readable and my points less accurate? I don't think so. All that matters here.
It did not, that I agree; but, I think that when giving advice, we should consider the intuitive associations our advice is likely to generate.

For example, if you consider that space-after-sizeof thing I've suggested, it is not even style issue, but a tool to internalize that particular oddity: sizeof being an operator, and not a function.

I don't see any more odds of getting this wrong than adding a +1 when a -1 was required. A fuck-up is a fuck-up.
I disagree, because of the bug pattern that I described.  Because of architecture/hardware changes, the type of the variable has to be changed, but because of compatibility reasons, the variable name cannot be changed; thus, the prefix and real type are detached, and causes issues when one later on edits the code but does not realize that the prefix is incorrect.  That situation does not have any good fixes, in my opinion, other than not using type prefixes in the first place.

That said, if you find the type prefixes as useful and don't see them as a maintenance/portability issue, do ignore me; I am only describing my own observations, and basing my advice on what kind of practices I believe lead to most robust and maintainable code.
Title: Re: C word
Post by: rstofer on October 01, 2019, 01:31:37 am

Much more important than that, is to write comments that describe the programmer intent -- what the code should accomplish --, instead of explaining what the code does.  Learning to write good, useful comments is an invaluable skill, and it is much easier to learn early than late.  Me myself, I'm still working on that.

Comments?  Why do you think they call it code?
Title: Re: C word
Post by: Nusa on October 01, 2019, 01:43:25 am

Much more important than that, is to write comments that describe the programmer intent -- what the code should accomplish --, instead of explaining what the code does.  Learning to write good, useful comments is an invaluable skill, and it is much easier to learn early than late.  Me myself, I'm still working on that.

Comments?  Why do you think they call it code?

Even good comments are code to those that don't know the subject matter. It's like trying to understand medical terminology without being educated in the field.
Title: Re: C word
Post by: Nominal Animal on October 01, 2019, 01:44:13 am

Much more important than that, is to write comments that describe the programmer intent -- what the code should accomplish --, instead of explaining what the code does.  Learning to write good, useful comments is an invaluable skill, and it is much easier to learn early than late.  Me myself, I'm still working on that.

Comments?  Why do you think they call it code?
:P

No code is perfect.  Or even if it is perfect right now, it won't be tomorrow, or a week or a month or a year from now.  Because things change, we need to maintain code; either to fix it if things change enough to break it, or to extend or adapt it to fit our changing needs better.

Code implements algorithms.  If the implementation has a bug, it is easier to find if you have comments explaining what the intent, the underlying algorithm, of the code is.  Compilers ignore comments, but to us humans, they are like waypoints on a map, or sanity checks, that we can use to check the code against our/developer expectations.  Without comments, we must extrapolate the underlying algorithm or approach from the code, which is an added stage where human errors or misunderstandings can occur.

Thus, by writing good comments describing the algorithm or intent behind what the code should accomplish, makes it easier to maintain that code.
Title: Re: C word
Post by: Gandalf_Sr on October 01, 2019, 09:41:39 am
Thanks guys (and girls?).  The insight and examples you have walked me through provide incredibly useful insight into my clunky code writing.

Apologies to the OP who has probably run away to join a monastery by now.
Title: Re: C word
Post by: legacy on October 01, 2019, 10:33:36 am

Much more important than that, is to write comments that describe the programmer intent -- what the code should accomplish --, instead of explaining what the code does.  Learning to write good, useful comments is an invaluable skill, and it is much easier to learn early than late.  Me myself, I'm still working on that.

Comments?  Why do you think they call it code?

Do you know the good point of Prologic? You do not have to express a comment to describe your intent, your code is perfectly able to express it in a human form of language.

Ah, Prologic ... it was great, but it's currently of the color of a TV a tuned to a dead channel.

Title: Re: C word
Post by: Nominal Animal on October 01, 2019, 05:03:06 pm
Do you know the good point of Prologic? You do not have to express a comment to describe your intent, your code is perfectly able to express it in a human form of language.
Do you mean Prolog (https://en.wikipedia.org/wiki/Prolog)?

Ah, Prologic ... it was great, but it's currently of the color of a TV a tuned to a dead channel.
:o
Title: Re: C word
Post by: legacy on October 01, 2019, 05:40:22 pm
yup, Pro Logic, aka Prolog; the tool Stood is partially written in Prolog ;D

But Stood costs too much money, so a couple of weeks ago I got my copy of Turbo Prolog v1 and v2 (Borland). It works on a 486 guest-card under RiscOS v4.39; while GNU Prolog on Linux is rather 0xdeadbeaf ... a lot of stuff is broken, and it's easy to crash.

But Turbo Prolog is solid like a stone.
Title: Re: C word
Post by: legacy on October 01, 2019, 05:49:50 pm
Code: [Select]
        dev->resource[0].start = dev->resource[0].end = 0;
(linux kernel, in a driver)

Why do I hate C? Because it allows crazy lines like this.
Are you sure it's correct? Is it a bug? .... not easy to say.
Title: Re: C word
Post by: magic on October 01, 2019, 06:39:03 pm
I have no idea if it's correct or buggy but it's easy to say what it does :P
Title: Re: C word
Post by: SiliconWizard on October 01, 2019, 06:46:52 pm
Code: [Select]
        dev->resource[0].start = dev->resource[0].end = 0;
(linux kernel, in a driver)

Why do I hate C? Because it allows crazy lines like this.
Are you sure it's correct? Is it a bug? .... not easy to say.

You don't need to use that construct if you don't want to. Assignments are actually expressions which hold the value of the assignment; that kind of makes sense actually. So that allows this kind of multiple assignement construct. If it doesn't work for you, don't use it.

I admit I very very rarely use this (if ever). One reason is maintainability: it assumes that two lvalues (or more) are to be assigned the same value. Your code may change at some point, where it's not true anymore, and you'll need to split this into two separate assignments. You may forget doing it if the assignments are written this way...

This is really a shortcut that should probably not be used anymore. One of the rationale (as other shortcuts), IMO, was to save screen estate, which was at a premium back in the day. Now we have large screens with huge resolution, but when you only had like 80 columns and 40 lines, shortening everything you could made sense...
Title: Re: C word
Post by: SiliconWizard on October 01, 2019, 07:00:15 pm
For example, if you consider that space-after-sizeof thing I've suggested, it is not even style issue, but a tool to internalize that particular oddity: sizeof being an operator, and not a function.

I agree with this particular part, although I admit not caring much for sizeof, because it's not a function, but it still "returns" (yield) a value, so writing it like a function doesn't bother me much.
OTOH, I fully agree about all C constructs that are not function calls, such as if, while, for, etc. I ALWAYS put a white space between the keyword and the opening '(', whereas I never when calling a function, so this looks a lot clearer and logical. I've seen code doing the exact opposite of this, which looks insane. The natural way of function calling, if you at least have learned some maths, is "f(x)", not "f (x)". No whitespace, which could be ambiguously seen as an implicit multiplication in maths. Sure C is not maths, but you get the idea.

That said, if you find the type prefixes as useful and don't see them as a maintenance/portability issue, do ignore me; I am only describing my own observations, and basing my advice on what kind of practices I believe lead to most robust and maintainable code.

I indeed do. I'm still not sure about your maintenance points; if I change the type of a variable or parameter to the point that its prefix should change, I'll change the prefix everywhere it needs to. I don't see it as a problem actually. If you change a type, it's likely (at least not unlikely) to have impacts EVERYWHERE it's used, so looking that up is natural. For instance, I'll prefix a pointer with p. If suddenly I change types, and the identifier is NOT a pointer anymore, that's likely to be a major change that needs many other changes anyway.

Now maybe you're thinking of the abused way of using hungarian notation, and taking it too far, which I don't do. Some abusive users of it have actually gone way too far (I remember Microsoft was a heavy user of it, and went a bit too far with it), and coming up with prefixes for almost every possible type definition, even custom ones! This is not at all what I do. I keep it clean and simple, and this way maintenance is absolutely no issue. I'll prefix pointers with p. If a given identifier is not a pointer, it's not prefixed with p. End of story. If I change a function parameter, it was a pointer, and is suddenly not anymore, you'll bet you'll have to change many things, and the prefix will be the least of your problems here... So. I have a small set of rules for prefixing, and stick to it. I avoid going too far.

Title: Re: C word
Post by: legacy on October 01, 2019, 07:05:57 pm
You don't need to use that construct if you don't want to.

It was not written by me. That is the point. It was a mistake made by someone who edited the file making a mess without knowing it, and GCC silently accepted it.
Title: Re: C word
Post by: SiliconWizard on October 01, 2019, 07:11:21 pm
You don't need to use that construct if you don't want to.

It was not written by me. That is the point. It was a mistake made by someone who edited the file making a mess without knowing it, and GCC silently accepted it.

Well, as I said, it's a perfectly valid C construct, so of course it'll accept it. It's only a mistake if the assignment is not what was intended (here to initialize both lvalues to 0). It's not a mistake per se.

Again, I agree it's a bit unfortunate, and I gave the potential pitfalls and the probable initial rationale to use it.

And if it's strictly against your rules to use this construct (which I would second), just add a check for that in your static analysis tools.
Title: Re: C word
Post by: legacy on October 01, 2019, 07:17:57 pm
I have no idea if it's correct or buggy but it's easy to say what it does :P

Yup. There are similar points in the kernel source and they are correct. But, in the specific case, it was "disabling" a kernel driver like if needed to be "skipped", which then actually made the probe to fail.

Why does it happen? What's wrong? it's a hardware failure? you have to investigate. And this opens doors to the slowest and most tedious ways of finding a bad git commit. You checkout some old commit, make sure the broken code isn't there, then check out a slightly older commit, check again, and repeat over and over until you find the flawed commit.

Code: [Select]
do 
{
    branch_back();
    is_ok=does_it_work();
}
while (is_ok isEqualTo False);

Which, in the end, was a stupid mistake: a line of code deleted and appended with the following, which "looks" correct because the C grammar allows A = B = 0  :palm: :palm: :palm:
Title: Re: C word
Post by: Nominal Animal on October 01, 2019, 07:18:28 pm
Now maybe you're thinking of the abused way of using hungarian notation, and taking it too far, which I don't do. Some abusive users of it have actually gone way too far (I remember Microsoft was a heavy user of it, and went a bit too far with it), and coming up with prefixes for almost every possible type definition, even custom ones! This is not at all what I do. I keep it clean and simple, and this way maintenance is absolutely no issue. I'll prefix pointers with p. If a given identifier is not a pointer, it's not prefixed with p. End of story.
You hit the nail on the head.  Looking at the Wikipedia article (https://en.wikipedia.org/wiki/Hungarian_notation), it's the Systems Hungarian I oppose, because I've seen issues with short/int/long and unsigned prefixes (and near/far pointers on code I haven't used but have looked at).
Apps Hungarian looks non-problematic to me.

Your variant, Silicon Hungarian, with just pointers marked for easier human parsing, has nothing objectionable to me.

Some other prefixes that might be useful would be n for counts, s for human-readable strings, and b for binary blobs.  The point is, these act like comments: they help humans parse the programmer intent.  One could claim that that's not the case with pointers, but because * and & are common in expressions, having an additional marker that a variable is a pointer, does help express programmer intent, and probably makes expressions easier to parse correctly for us humans.
Title: Re: C word
Post by: Nominal Animal on October 01, 2019, 07:23:01 pm
I have no idea if it's correct or buggy but it's easy to say what it does :P
Yup. There are similar points in the kernel source and they are correct. But, in the specific case, it was "disabling" a kernel driver like if needed to be "skipped", which then actually made the probe to fail.

That's an exact example about what I wrote earlier, about comments and trying to remember without looking up the order of parameters for memset(): even Linux kernel developers make mistakes like that surprisingly often.  Which just shows why expressing developer intent is so important for maintainability and bug-fixability.

(Which is kind of why I feel the same way about Obfuscated C contests as I do about freediving in sewers.)
Title: Re: C word
Post by: SiliconWizard on October 01, 2019, 07:23:55 pm
(...)
Which, in the end, was a stupid mistake: a line of code deleted and appended with the following, which "looks" correct because the C grammar allows A = B = 0  :palm: :palm: :palm:

Oh, now I get it why you said it was a "mistake". Didn't know it was an accident and not written on purpose...

Well, again, I think this construct should indeed be banned from C, and I guess the std committee has probably raised that many times and never resolved to ban it; that would probably break a lot of legacy code.

The simple rule to add would be that an assignment expression would have no definite type instead of being a rvalue of the type of its lvalue. It could for instance have the "void" type, so such constructs would all be incorrect C in any case.

The example you gave is not the only one, there's also the very famous: "if (a = b)". It's correct C. Was "if (a == b)" meant instead? Who knows... but it's correct C for the same reason: "a = b" is an rvalue! If "a = b" expression was of type void, neither "a = b = c" or "if (a = b)" would compile. End of the story. So here is my proposal for the next std version...

And that said, the "if (a = b)" was so ubiquitous and problem-ridden that most decent C compilers will now give you a warning for that. (Such as: "did you mean a == b ?"), but will give you no warning whatsoever for "a = b = c = ..."