-
If you have picked up plain old ANSI C in the last few years, how and why did you learn it?
I originally learnt C as it was the standard compiled 3GL for PC programming during the 16-bit era, (along with with maybe Pascal). I still use it often and respect it as being the swiss army knife of programming languages.
What would drive people to take the leap from the highly abstracted world of things like Python to the 'madness' of C today?
It can't just be Arduino and embedded development...
(I'm currently writing a REST API service in C for only one reason - performance. Python gave 2 transactions per second, my C implementation gives 200... Most likely with a bug count to match!)
-
I learned C makings scripts for video game maps.
I use it a lot. Pet projects and work -- only the other week I was facing 20 seconds of json parsing in python vs 1.5 seconds of parsing (with an inefficient parser) in C. Waiting 20 seconds for every graph to plot was a nightmare.
Touchscreens never killed laptops. Laptops never killed desktops. No language will ever kill C. We end up with a bit of everything.
-
Luckily people still learn it. There is a lot of legacy code to be maintained, esp in the machine building industry.
However many modern big paid IT jobs don't require it anymore.
Cloud programming is all about functional programming, dockers and all kind of newer languages.
Non it related jobs use Mathlab, simulation tools etc.
-
I don't see microcontroller companies ever supporting anything other than C (and sometimes C++), so I'll say it'll always have its use. I don't particularly think its a good language though.
-
What would drive people to take the leap from the highly abstracted world of things like Python to the 'madness' of C today?
Simple. Just list the languages written in C. Start with Python. Then list the languages influenced by C. Start with Python.
-
I don't see microcontroller companies ever supporting anything other than C (and sometimes C++), so I'll say it'll always have its use.
This. If you work with microcontrollers then C is a must, even though there are things like micropython or GNAT ports for specific targets like some STM32 families
I don't particularly think its a good language though.
Meh, i like my hammer :) other languages are just a curiosity to me when i work with micros: basically i can't find any support for anything that's not C/C++
-
Well, Windows is written in C as is the Linux kernel and all of the utilities. As I said before, C runs the world. Fortran tells it how fast to go. Everything else is just passing through.
And Unix is written in C so MAC OS X must be as well.
https://www.toptal.com/c/after-all-these-years-the-world-is-still-powered-by-c-programming (https://www.toptal.com/c/after-all-these-years-the-world-is-still-powered-by-c-programming)
You really need to separate application languages like Java and C++ from system languages like C or math oriented languages like Fortran.
-
For a language with a LOT of legacy code and no new practitioners, think about COBOL. The banks are in serious trouble trying to find programmers. I would be surprised if ANY university offered a course. And the old guys will be retiring soon! The job market is going to explode!
Look at the list of businesses that still use COBOL:
https://www.quora.com/Is-COBOL-programming-language-still-in-use-in-the-year-2018 (https://www.quora.com/Is-COBOL-programming-language-still-in-use-in-the-year-2018)
The language is still in use because it is auditable! And COBOL uses fixed point arithmetic so no round-off errors accumulate.
-
Well, Windows is written in C as is the Linux kernel and all of the utilities.
Windows is written in C++. The funky exception handling in C++ is said to be a major factors that kept Windows horribly buggy for years.
-
...
...
Touchscreens never killed laptops. Laptops never killed desktops. No language will ever kill C. We end up with a bit of everything.
...
Depends on what you mean by "kill C".
FORTRAN is still in use, COBOL is still in use. In today's world no one would think of FORTRAN or COBOL as a growth language, but FORTRAN or COBOL surely is not "killed", at least not yet.
I do think that C will loose popularity like FORTRAN or COBOL at least in the "developed economies". India and China will likely be the power houses of "real" programming. "Real" in the sense that the "programmer" is not dragging pre-programmed ICON around and use drop down menu to check/uncheck boxes of what you want that icon/program to do, but actually developing the code behind those ICONs.
While waiting for my car tire repair, I was in the waiting room with a large TV set. I saw a TV show segment about kid's learning "computer programming" using some Microsoft tools - dragging icons around and use the drop down to refine function that icon/app offers. Microsoft paraded it as the next generation of programming AND as training the next generation of programmers. None of the show host know any better. They were "amazed" at the tool as being able to create the next generation programmers creating a new class of (verbatim quote:) "programmers making a good living."
It may well be a world that in the the developed economies we don't learn C or any real language to program. The macro-tools will be developed by the developing economies where wages are lower. The developed economies will use these higher level tools to control the thing that will do our actual production of of value.
-
I'm not going to go so far as to say C will always be around, but I think it will be with us for the foreseeable future. C is universal in the embedded microcontroller world, and C++ is still heavily used in other areas of software development. I don't see this changing any time soon.
-
As someone making a living with embedded development in microcontrollers, DSPs and ARM Linux that absolutely breathe C everywhere, I am absolutely certain that C will not vanish in the foreseeable future as well.
-
Another in embedded, its pretty much either C, C++ or assembler, and in a larger amount of projects than I would probably like, a mix of one of the C's and assembler.
For data analysis, and odd job stuff I still prefer python as its easier for me to bodge things together in a short time frame, Its generally a lot slower than C could do things, but I would prefer to have a program in 1 hours and burn 14 on processing it, than 4 hours making it and 3 hours to process it, the total time is less for C, but I am free to do other things while python runs.
-
I still think in ASM because its the first thing I hard core learned, and its interesting to very precisely control the MCU and understand what the blocks are doing... and actually be able to figure out hardware bugs and semi faulty chips or internal damage detection/evasion
-
It's 2019, people!
C is obsolete, we gotta learn LISP Smalltalk Java Haskell Intercal Rust :scared:
-
Well, Windows is written in C as is the Linux kernel and all of the utilities.
Windows is written in C++. The funky exception handling in C++ is said to be a major factors that kept Windows horribly buggy for years.
I don't know what Windows is written in now, but since Microsoft started it in around 1984 and released version 3.0 1990 while C++ wasn't generally available until about the same time -- it's 100% certain early Windows wasn't written in C++.
Microsoft wanted exception handling and invented their own version of it with a specific runtime implementation (SEH). At first I think programmers had to set it up manually in each function using macros, but gradually compilers got features to do it automatically.
It was not terribly compatible with standard C++ exceptions, once that appeared, and this caused untold problems.
-
Windows NT has been rewritten from scratch so it has nothing to do with v3.0 or 9x.
Everything since 2000 was NT-derived, although it did receive some injections of 9x codebase here and there. I think the graphics stack used to be one example before it got rewritten in Vista(?).
NT kernel code has been leaked at one point. If this repo is what it purpotes to be, it's written in C.
https://github.com/ZoloZiak/WinNT4
Userspace probably is a mix of C/C++ and I presume that they are moving onto C# in the latest versions.
The oldschool win32 API DLLs are all C.
-
...
...
Touchscreens never killed laptops. Laptops never killed desktops. No language will ever kill C. We end up with a bit of everything.
...
<snip>
While waiting for my car tire repair, I was in the waiting room with a large TV set. I saw a TV show segment about kid's learning "computer programming" using some Microsoft tools - dragging icons around and use the drop down to refine function that icon/app offers. Microsoft paraded it as the next generation of programming AND as training the next generation of programmers. None of the show host know any better. They were "amazed" at the tool as being able to create the next generation programmers creating a new class of (verbatim quote:) "programmers making a good living."
<snip>
That reminds me of an old Gary Larson Far Side cartoon which showed a child playing a Nintendo with 'hopeful parents' in the background imagining help wanted ads in the year 2005: ''Nintendo Expert Needed. $50,000 salary + bonus" or ''Can You Save The Princess? We need skilled men and women, $75,000 + Retirement.'
-
It's 2019, people!
C is obsolete, we gotta learn LISP Smalltalk Java Haskell Intercal Rust :scared:
why in the 2019, if they have to invent a new language, why they have to bring the fucking ";" as linefeed along? chr(13) and chr(10) have been used for millenia as linefeed and carriage return, a relic brought down from C/C++ :palm: everything else is a good thing in C/C++ except this (this make C is 99.99% my favourite language) why?
-
Mecha, you are not the only one that mentions this, but I really don't understand the anger towards the ; character. It helps with extra ling lines and is visible as a delimiter. LF, CR or CR/LF are not visibile (by default on a text editor) and OS-dependent. Also, once you get it in your system, you use it without thinking.
Regarding Windows: the adoption was in stages. 3.11 and prior were entirely in C (although DOS was assembly for the most part, and you couldn't have windows without it). Windows 95 was the first to start having functions written in C++ but, IIRC, Winows NT4 was the one advertised as having a significant portion being developed in C++ (with Windows 2000 and XP increasing its spread even more). If Microsoft was ever fully out of the "C", it will certainly comeback with the adoption of WSL
-
It's 2019, people!
C is obsolete, we gotta learn LISP Smalltalk Java Haskell Intercal Rust :scared:
why in the 2019, if they have to invent a new language, why they have to bring the fucking ";" as linefeed along? chr(13) and chr(10) have been used for millenia as linefeed and carriage return, a relic brought down from C/C++ :palm: everything else is a good thing in C/C++ except this (this make C is 99.99% my favourite language) why?
The use of a statement delimiter predates C by a long way. It was introduced because of problems found in early languages, like Fortran, where the end of the line meant the end of the statement. Fortran fixed this with something really clunky - a special continuation mark in a special column on the following line allowed for more complex statements. We don't end an English statement at the end of a line. We end it with a full stop. Full stop being the same as decimal point made ending statements with a full stop somewhat error prone, so most languages chose a semi colon. The problem in C is not the use of a semi colon to end a language statement. Its the inconsistency that a special continuation marker ("'\") is used to force statement continuation in pre-processor statements.
-
How and why did you learn it. I think it was Microsoft QuickC because it was a lot easier than MASM.
Second C compiler I tried was the Motorola DSP56K Optimizing C Compiler which didn't support a fractional data type making it useless.
-
For the hobbyist, C was the first higher level language available on early 8080 and Z80 machines. So, no, I didn't learn it in the last few years. I have been using it for nearly 40 years (starting around '80). I preferred Pascal but that came along a bit later.
I would still be inclined to use C as a language for embedded programming simply because I don't believe in dynamic memory allocation (heap) on memory limited embedded processors. I can avoid the standard string functions (that use a heap) and everything works out fine.
C and Fortran are the only languages I actually use. I have tried, off and on, over the last 20 years of so to develop an interest in C++ and I just can't quite 'get it'. It's not the mechanics of the language (well, ok, it is), it's the 'why' I can't wrap my head around. Same with Python. I want to like it, I admire the built in structures, but then they bastardize the syntax with indent levels. Not that I don't indent code levels, I just want some kind of delimiters like '{' and '}'.
I'm just too old to learn new stuff and, given that the old stuff is still in use, have no real desire to push beyond my comfort zone. It's not like I need (or even want) a job.
-
NT kernel code has been leaked at one point. If this repo is what it purpotes to be, it's written in C.
https://github.com/ZoloZiak/WinNT4
Userspace probably is a mix of C/C++ and I presume that they are moving onto C# in the latest versions.
The oldschool win32 API DLLs are all C.
Yep.
The base kernel is C, as is the Linux one, and AFAIK, MacOS X one (XNU).
When Windows introduced COM stuff, I think they wrote most of that in C++, but still kept an interface for C.
I'd guess a lot of higher-level components in Windows are written in C++, and as you said, some in C#, but the core is still C.
Although MS promotes other languages and tools, the Windows API is still mostly C, and you can still write full apps entirely in C on Windows directly using Windows SDK.
-
For the hobbyist, C was the first higher level language available on early 8080 and Z80 machines. So, no, I didn't learn it in the last few years. I have been using it for nearly 40 years (starting around '80). I preferred Pascal but that came along a bit later.
...
...
If you consider BASIC, FORTH high level languages, both were available for CPM on 8080. z80 was a cloned (functional clone, not hardware clone) of 8080 with much enhancement over the 8080, so 8080 predates z80. I had BASIC on my SOL20 (8080) in 1977 before C was available for the 8080, CPM BASIC was the standard then.
I don't rule out that C was out there for 8080 before 77, but I was religiously reading the main available computer magazine at the time - "Byte Magazine" and I don't recall any advertisement of C compilers out there. Anything available on CPM would likely be before DOS.
Altair BASIC was used by Microsoft to make Altair-Microsoft BASIC in 1975 before DOS existed. Microsoft did not make MS-DOS until IBM contracted them to create DOS for the IBM-PC (which was 8088 a mixed 8/16 bit improvement of 8080 which was 8 bit only). That was the time when MS purchased OS from Seattle Computer Products and made it into DOS for IBM-PC. IBM PC with PC DOS was introduced in 1981. So Microsoft BASIC pre-dates MS-DOS and PC-DOS.
Running with under 64K ram that the PC can use for DOS, any C compiler back than were primitive. You cannot consider C then to be anything like C later when "extended memory" came into play. I think the Greenleaf C compiler was the first "real" industry standard C compiler for PC. It later was purchased by Microsoft as the first version of Microsoft C which then became the industry standard.
Now if you are talking about "Structured Language", I agree with you. BASIC and FORTH are high level languages, but not Structured Languages. Borland Pascal came much later than C.
-
It's 2019, people!
C is obsolete, we gotta learn LISP Smalltalk Java Haskell Intercal Rust :scared:
c-sharp ??
-
It's 2019, people!
C is obsolete, we gotta learn LISP Smalltalk Java Haskell Intercal Rust :scared:
why in the 2019, if they have to invent a new language, why they have to bring the fucking ";" as linefeed along? chr(13) and chr(10) have been used for millenia as linefeed and carriage return, a relic brought down from C/C++ :palm: everything else is a good thing in C/C++ except this (this make C is 99.99% my favourite language) why?
The use of a statement delimiter predates C by a long way. It was introduced because of problems found in early languages, like Fortran, where the end of the line meant the end of the statement. Fortran fixed this with...
for new language... no excuse, regardless... many new languages such as Phyton dont require ";", the cute, lovely and sweety Basic never requires this since decades... any OS will use some sort of CR and/or LF that can be interpreted as ";". for end of statement that can be fixed with a single "END" syntax (or whatever) for a unit program, not thousands of ";" in each line we have to type... real inefficient... i can accept ";" as separator between "lines" of code in a single line of text editor, or clarity if anybody chooses to... but not ";" + CR/LF requirement when i dont want it. ";" + CR/LF can easily interpreted as few extra empty line(s)...
ps: currently programming in RobotC for something serious... :P
-
It's 2019, people!
C is obsolete, we gotta learn LISP Smalltalk Java Haskell Intercal Rust :scared:
why in the 2019, if they have to invent a new language, why they have to bring the fucking ";" as linefeed along? chr(13) and chr(10) have been used for millenia as linefeed and carriage return, a relic brought down from C/C++ :palm: everything else is a good thing in C/C++ except this (this make C is 99.99% my favourite language) why?
The use of a statement delimiter predates C by a long way. It was introduced because of problems found in early languages, like Fortran, where the end of the line meant the end of the statement. Fortran fixed this with...
for new language... no excuse, regardless... many new languages such as Phyton dont require ";", the cute, lovely and sweety Basic never requires this since decades... any OS will use some sort of CR and/or LF that can be interpreted as ";". for end of statement that can be fixed with a single "END" syntax (or whatever) for a unit program, not thousands of ";" in each line we have to type... real inefficient... i can accept ";" as separator between "lines" of code in a single line of text editor, or clarity if anybody chooses to... but not ";" + CR/LF requirement when i dont want it. ";" + CR/LF can easily interpreted as few extra empty line(s)...
ps: currently programming in RobotC for something serious... :P
The terminator for a C statement is ";", not ";" + CR/ LF.
-
With µCs C is still pretty much standard. Due to GCC one can use ADA or maybe even FORTRAN if one really wants, but not many do that. One can use C++ on larger µCs, but it is still slightly tricky to really use the C++ features and not just the slightly more stringent checks and still use C like code.
Todays C99 or ISO C is quite different from the original C that was used to write the early UNIX.
I don't like C very much, but it's an common evil one has to live with, just like windows or x86. Still around though technically full of :palm: :wtf: |O.
-
one can use ADA or maybe even FORTRAN if one really wants, but not many do that
ADA yes, FORTRAN, I've never seen it used on MCUs. If you have any related project in mind, I'd be curious to see that. ;D
I don't remember much of FORTRAN, but I'd guess it would not be easy to directly access memory/registers with it. If there is a way, again I'd be curious.
I don't like C very much, but it's an common evil one has to live with, just like windows or x86. Still around though technically full of :palm: :wtf: |O.
Well, not to get into yet another fruitless language war, but I don't really see the point in "still around'. It's a tool and it does the job.
Forks are still around as eating utensils, should we use connected bluetooth things instead to put food inside our mouth? Hammers are still around to bang nails, does that make them obsolete? ;D
-
For the hobbyist, C was the first higher level language available on early 8080 and Z80 machines. So, no, I didn't learn it in the last few years. I have been using it for nearly 40 years (starting around '80). I preferred Pascal but that came along a bit later.
I would still be inclined to use C as a language for embedded programming simply because I don't believe in dynamic memory allocation (heap) on memory limited embedded processors. I can avoid the standard string functions (that use a heap) and everything works out fine.
C and Fortran are the only languages I actually use. I have tried, off and on, over the last 20 years of so to develop an interest in C++ and I just can't quite 'get it'. It's not the mechanics of the language (well, ok, it is), it's the 'why' I can't wrap my head around. Same with Python. I want to like it, I admire the built in structures, but then they bastardize the syntax with indent levels. Not that I don't indent code levels, I just want some kind of delimiters like '{' and '}'.
I'm just too old to learn new stuff and, given that the old stuff is still in use, have no real desire to push beyond my comfort zone. It's not like I need (or even want) a job.
I can relate to the languages you describe in much the same way being a 'old timer' myself. I also liked Fortran and Cobol when I tried them about 20 years ago.
However C wasn't the first higher level language available on early 8080 and z90 machines ( as also mentioned by Rick Law in a newer post):
https://www.forth.com/resources/forth-programming-language/ (https://www.forth.com/resources/forth-programming-language/)
In 1976, Robert O. Winder, of RCA’s Semiconductor Division engaged FORTH, Inc. to implement Forth on its new CDP-1802 8-bit microprocessor [Rather, 1976b], [Electronics,1976]. The new product, called “microFORTH,” was subsequently implemented on the Intel 8080, Motorola 6800 and Zilog Z80, and sold by FORTH, Inc. as an off-the-shelf product. microFORTH was successfully used in numerous embedded microprocessor instrumentation and control applications in the United States, Britain and Japan.
-
Long live C! Long live buffer overflows! :-DD
-
For the hobbyist, C was the first higher level language available on early 8080 and Z80 machines. So, no, I didn't learn it in the last few years. I have been using it for nearly 40 years (starting around '80). I preferred Pascal but that came along a bit later.
...
...
If you consider BASIC, FORTH high level languages, both were available for CPM on 8080. z80 was a cloned (functional clone, not hardware clone) of 8080 with much enhancement over the 8080, so 8080 predates z80. I had BASIC on my SOL20 (8080) in 1977 before C was available for the 8080, CPM BASIC was the standard then.
I don't rule out that C was out there for 8080 before 77, but I was religiously reading the main available computer magazine at the time - "Byte Magazine" and I don't recall any advertisement of C compilers out there. Anything available on CPM would likely be before DOS.
Altair BASIC was used by Microsoft to make Altair-Microsoft BASIC in 1975 before DOS existed. Microsoft did not make MS-DOS until IBM contracted them to create DOS for the IBM-PC (which was 8088 a mixed 8/16 bit improvement of 8080 which was 8 bit only). That was the time when MS purchased OS from Seattle Computer Products and made it into DOS for IBM-PC. IBM PC with PC DOS was introduced in 1981. So Microsoft BASIC pre-dates MS-DOS and PC-DOS.
Running with under 64K ram that the PC can use for DOS, any C compiler back than were primitive. You cannot consider C then to be anything like C later when "extended memory" came into play. I think the Greenleaf C compiler was the first "real" industry standard C compiler for PC. It later was purchased by Microsoft as the first version of Microsoft C which then became the industry standard.
Now if you are talking about "Structured Language", I agree with you. BASIC and FORTH are high level languages, but not Structured Languages. Borland Pascal came much later than C.
The first high level language available on most early microprocessors was a stripped down form of PL/1. PL/1 was really gaining traction in the early 70s, but was too big and complex a language to be fully implemented on small machines. So, Motorola, Intel and a number of others came up with MPL, PL/M and various other names for their own stripped down PL/1 dialect.
-
Well, not to get into yet another fruitless language war, but I don't really see the point in "still around'. It's a tool and it does the job.
Forks are still around as eating utensils, should we use connected bluetooth things instead to put food inside our mouth?
They both work but one has surely done more accidental damage than the other :D
The entirety of <string.h> for example should have never been created. I would really prefer if everybody had his own string library, at least life would be harder for hackers. And maybe even a precious few of them wouldn't be as bad as the standard one.
Or the syntax of declarations, it's horrible. Just recently I thought I was very smart because I knew how to declare a function returning int[2], but it didn't work. "Expected something something", very helpful, thank you. I spent a minute trying different combinations of parentheses even though I was quite sure I got it right the first time. Finally gave up, went to the Internet, turns out that fixed size arrays are the only type which cannot actually be returned from a function :wtf:
-
PL/M was available to the well-financed, not to hobbyists. Digital Research did release PL/I for the CP/M system but it was a good deal after C. Think BDS C in '79 versus PL/I in 1983. That was long after I wrote an 8080 Assembler using PL/I for a grad school requirement ('75). IBM 360/45 - a truly grim machine but affordable at the time.
https://en.wikipedia.org/wiki/BDS_C
https://winworldpc.com/product/digital-research-pl-i-compiler/1x
FWIW, the PL/M compiler was written in Fortran as a cross compiler.
Yes, Altair Basic was available MUCH earlier - like when the machine was released in '75 - but Basic wasn't what I would consider a high level language or a reasonable application language (by definition, you had to release source although later on there were some true compilers). There was also the requirement to actually purchase overpriced MITS memory boards in order to qualify to purchase Altair Basic. I still have Bill Gates' article in a user group newsletter ranting about people stealing his software.
Let's not forget Li-Chen Wang's Palo Alto Tiny Basic ('75). When I brought up a new 8085 system, this was one of the first things I would port. The Intel Monitor was useful but I could get more done with Tiny Basic.
https://en.wikipedia.org/wiki/Li-Chen_Wang
UCSD Pascal came out around 1977 but I didn't get a copy until 1980. It was a much more practical language for business applications. It's still a terrific language and I'm happy to see it included in 2.11BSD for the PiDP11/70 project
Things were pretty fluid in the late '70s and early '80s. Byte Magazine was important but Dr Dobbs was a more definitive source for what was happening.
I got my Altair in '76, shortly before finishing grad school. A couple of years later I had a home-built floppy disk controller (based on Western Digital FD1771 chip) and dual drives. The datasheet says April '79, ok, I was an early adopter. I played with CP/M for a very long time, made some money writing custom BIOSes for folks and, in fact, I have it running on a 50 MHz Z80 with all 16 logical drives and most of the toys. I also got heavily involved with UCSD Pascal around '80. You would be amazed at how well CP/M 2.2 runs when clocked at 50 MHz!
It's been an interesting ride over the last 40 years.
All of the above in the context of microcomputers. For mainframes: PL/I was introduced in 1964, COBOL in 1959, Algol in 1958 and FORTRAN in 1957. C is a newcomer, first released in 1972. Just a "johnny come lately" entry to computer languages. Let's not give it more credit than it's due.
-
Long live C! Long live buffer overflows! :-DD
and the lazy bunch of programmers aka codewarrior speed demon wannabe. which can easily be overcomed with custom array/linked list/tree class. python or java just have overbloated version of that... 2 transactions per second on 6GHz 4-12 cores CPU cough cough. as people ask for more power due to their simulation and graphics lagging, and the programmer gives excuse that its a math intensive algorithm... long live the CPU ::)
-
I just joined a new company as a mixed hw / embedded firmware guy, and I was disappointed to find out their codebase is 100% straight C.
I've been using C++ for embedded projects for years, and though I don't really like C++ as an application programming language, it's actually pretty great for embedded. The key is to restrict yourself in what parts of C++ you use. Stay way from most of the STL and dynamic memory allocation in general. But templates instead of macros? Heck yeah. Default function arguments? Yes. Classes? Sure, when it is helpful. consts rather than defines? For sure.
-
Long live C! Long live buffer overflows! :-DD
and the lazy bunch of programmers aka codewarrior speed demon wannabe. which can easily be overcomed with custom array/linked list/tree class. python or java just have overbloated version of that... 2 transactions per second on 6GHz 4-12 cores CPU cough cough. as people ask for more power due to their simulation and graphics lagging, and the programmer gives excuse that its a math intensive algorithm... long live the CPU ::)
Lazy? Buffer overflows are literally the #1 cause of vulnerabilities, have been for decades, and C is literally responsible for it. We can do a lot better and still be fast, e.g. Rust.
I'm not saying C is all bad, but when you start composing complex systems out of it (such as an OS kernel), you are asking for trouble. Because programmers, even the best ones...and this might blow your mind...are humans.
-
I just joined a new company as a mixed hw / embedded firmware guy, and I was disappointed to find out their codebase is 100% straight C.
I've been using C++ for embedded projects for years, and though I don't really are for C++ as an application programming language, it's actually pretty great for embedded. The key is to restrict yourself in what parts of C++ you use. Stay way from most of the STL and dynamic memory allocation in general. But templates instead of macros? Heck yeah. Default function arguments? Yes. Classes? Sure, when it is helpful. consts rather than defines? For sure.
I mostly agree with the "C with a few helpful bits from C++" philosophy - what are your thoughts on using virtual functions, virtual classes and base clases/inheritance in general for embedded work? I flipflop between liking and hating it.
Dynamic memory allocation is pretty hard to avoid if you get passed external data to process (e.g. my current joy is JSON files). My own view is you are better off developing a nice "the way we do things here" standard, so everybody is in agreement of what patterns are good, and what patterns are bad, allowing oversights to be spotted quickly. Also having tools to see what is going on is good - e.g. a custom wrapped malloc()/free() that lets you know everything during development is a good idea, IMO.
-
I just joined a new company as a mixed hw / embedded firmware guy, and I was disappointed to find out their codebase is 100% straight C.
I would be glad to read your reasons for this. I usually read the assembly output from my code to "trust but verify" it is correct. How do you verify that your C++ output is correct? Is it your experience with the compiler that helps you see the patterns that you see as good code?
My limited experience with embedded C++ is with KDE, where one of our programmers had a performance issue. I looked into the code, and there were several similar, but slightly different code paths, and they busted the I-Cache. A simple C function with ONE code path would have reduced the cache problem.
In the embedded world,. LOOK AT YOUR ASSEMBLY CODE when time is an issue. Of course, before writing embedded code, READ AND KNOW the architecture.
-
If you have picked up plain old ANSI C in the last few years, how and why did you learn it?
I originally learnt C as it was the standard compiled 3GL for PC programming during the 16-bit era, (along with with maybe Pascal). I still use it often and respect it as being the swiss army knife of programming languages.
What would drive people to take the leap from the highly abstracted world of things like Python to the 'madness' of C today?
It can't just be Arduino and embedded development...
(I'm currently writing a REST API service in C for only one reason - performance. Python gave 2 transactions per second, my C implementation gives 200... Most likely with a bug count to match!)
Now that I have unloaded on some of the issues I disagreed with...Good time to segway back to the OP.
I learn C by myself but I have learned FORTRAN in college.
Here is why:
By the time I learn C, I have learned a few other language by myself including assembler and ALGOL. I learn C because it has the clean Structured Language construct like ALGOL, but I can't find ALGOL for a small machine at the self-paid purchase price level affordable to a student. I was really looking at only C, BASIC, FORTH as choices due to budget. In my experience then, Structured Language is a lot easier to develop and maintain over unstructured ones particularly when requirements changes. With C being a Structured Language and light with resource requirement during run time, it was the best choice. My programming work was not directly work/study related but for fun. I do FORTH and BASIC just for the hack of it but none with apps I like to keep.
By the time budget Pascal (Borland) became available, I have no need for it. Beside, I feel it was too resource heavy.
I did a few year (5 < n <10) programming professionally (mostly with C) as a path to management. Programming continued to be something I did for fun when time permits.
I've learned C++ sometime along my path, but frankly I found it too resource wasteful. With my background (being used to resource constrains), I use C++ when I must, but avoid it when I can - and I included "reuse" and maintenance as part of the decision process of C vs C++. I think if you are a good C programmer, there is nothing to gain using C++ over C.
-
On the note of structured/unstructured languages, I recall how cool it was the first time I saw Borland's Turbo Basic 1.0. That was pretty cool, although at that time I already had my feet deep into Pascal.
-
I just joined a new company as a mixed hw / embedded firmware guy, and I was disappointed to find out their codebase is 100% straight C.
I've been using C++ for embedded projects for years, and though I don't really are for C++ as an application programming language, it's actually pretty great for embedded. The key is to restrict yourself in what parts of C++ you use. Stay way from most of the STL and dynamic memory allocation in general. But templates instead of macros? Heck yeah. Default function arguments? Yes. Classes? Sure, when it is helpful. consts rather than defines? For sure.
I mostly agree with the "C with a few helpful bits from C++" philosophy - what are your thoughts on using virtual functions, virtual classes and base clases/inheritance in general for embedded work? I flipflop between liking and hating it.
Dynamic memory allocation is pretty hard to avoid if you get passed external data to process (e.g. my current joy is JSON files). My own view is you are better off developing a nice "the way we do things here" standard, so everybody is in agreement of what patterns are good, and what patterns are bad, allowing oversights to be spotted quickly. Also having tools to see what is going on is good - e.g. a custom wrapped malloc()/free() that lets you know everything during development is a good idea, IMO.
There are times you can't avoid it, but you can do stuff to restrain the pain. One thing I have done is force libraries to use a pool I have pre-allocated. At least when I'm done with that library, I can reclaim the pool. JSON is a pain, I agree, but I have to say it makes development on the "other" side so much better, that I'm happy to take the hit on the embedded side. There are some tiny low-footprint json libraries out there. However, years ago where memory was super tight. I wrote a simple library to parse JSON strings "in situ" with no tree structure created anywhere. Every time you wanted something from the JSON stream, you had to rescan the string, but in our case, performance wasn't so important. The library is here, haven't used it in years. It's not heavily tested, but worked for my purposes. (https://github.com/djacobow/djs)
I'm working on a personal project right now on 8bit AVR where I have made use of inheritance and virtual functions, and so far ... no regrets. The alternative was function pointers and this is just the same thing, but neater. Well, maybe not neater as I think the syntax is probably more typing, not less. But I get nice type checking and a warm feeling that everything fits together.
-
I just joined a new company as a mixed hw / embedded firmware guy, and I was disappointed to find out their codebase is 100% straight C.
I would be glad to read your reasons for this. I usually read the assembly output from my code to "trust but verify" it is correct. How do you verify that your C++ output is correct? Is it your experience with the compiler that helps you see the patterns that you see as good code?
My limited experience with embedded C++ is with KDE, where one of our programmers had a performance issue. I looked into the code, and there were several similar, but slightly different code paths, and they busted the I-Cache. A simple C function with ONE code path would have reduced the cache problem.
In the embedded world,. LOOK AT YOUR ASSEMBLY CODE when time is an issue. Of course, before writing embedded code, READ AND KNOW the architecture.
I will sometimes compile with -S to see what the compiler is doing, and am more likely to do it for performance-critical code, but I'll be honest and say I don't do it that much these days. Most of the code is not performance critical, and my productivity is more important. Of course, you never know for sure that a compiler is doing what you told it to do (or even what you thought you told it to do), but I'm not sure that's a solid reason not to use compilers and more advanced languages. Of course, unit tests help build confidence in the output. I also will cross-compile on my host when I can and run unit tests in a hosted environment, if it's not too much trouble to get working. Of course, this helps you with correctness of your code, but it doesn't help at all if the target compiler is screwing up what your host compiler doesn't.
-
...
The first high level language available on most early microprocessors was a stripped down form of PL/1. PL/1 was really gaining traction in the early 70s, but was too big and complex a language to be fully implemented on small machines. So, Motorola, Intel and a number of others came up with MPL, PL/M and various other names for their own stripped down PL/1 dialect.
I have fond memories of developing embedded code in PLM80 on 80C51 family devices and variants. The compiler was very predictable, so, with experience, you could write the source in a way that gave the execution times you needed.
Right now I'm learning Python, for a quantitative finance application. C# and Python seem to be pervasive in that world.
I'd happily go back to C for embedded applications, if it felt like the right tool for the job.
-
All of the above in the context of microcomputers. For mainframes: PL/I was introduced in 1964, COBOL in 1959, Algol in 1958 and FORTRAN in 1957. C is a newcomer, first released in 1972. Just a "johnny come lately" entry to computer languages. Let's not give it more credit than it's due.
C deserves credit for being the first portable macro assembler and enabling probably the first portable operating system. It also did away with all those stupid special statements and commands which plague old imperative languages, replacing them with plain functions, written in C. You can implement full C standard library in C, try that with Pascal's writeln. The languages you listed really are DSLs in comparison, I would never call COBOL or FORTRAN a "general purpose" language. Maybe Algol, if it's true what they say about similarities to Pascal, particularly the kinds of Pascal that have been adapted to low-level programming by adding pointers and whatnot.
Buffer overflows are literally the #1 cause of vulnerabilities, have been for decades, and C is literally responsible for it. We can do a lot better and still be fast, e.g. Rust.
I'm not saying C is all bad, but when you start composing complex systems out of it (such as an OS kernel), you are asking for trouble.
As usual, no C thread is safe from the Rust Evangelism Strikeforce.
Here's a quick one for you fanboys: I have a billion of short-lived lived heap objects which need to be indexed in a few long-lived trees/hashmaps, by reference of course, I'm not going to make ten copies of each.
How does Rust help me prevent dangling references, other than "hire sufficiently competent programmers, use unsafe and employ some code review and unit testing, as you would in C++, but please teach them this new esoteric language because C++ is so last year".
programmers [...] are humans
You would be surprised. Some are robots.
-
Well, not to get into yet another fruitless language war, but I don't really see the point in "still around'. It's a tool and it does the job.
Forks are still around as eating utensils, should we use connected bluetooth things instead to put food inside our mouth?
They both work but one has surely done more accidental damage than the other :D
And you may be surprised which it is. :D
https://www.cpsc.gov/s3fs-public/hazard_housewares.pdf (https://www.cpsc.gov/s3fs-public/hazard_housewares.pdf)
-
Because programmers, even the best ones...and this might blow your mind...are humans.
You are right. The only reason why I still use C is that can "do Rust" only on x86 (and maybe on Arm), while you can "do C" on MIPS (llvm is weak), HPPA (llvm is unsupported) and PPC (llvm is experimental). That is my problem :-//
-
I'm not saying C is all bad, but when you start composing complex systems out of it (such as an OS kernel), you are asking for trouble. Because programmers, even the best ones...and this might blow your mind...are humans.
And, yet, C is used for all the main OS kernels on the planet. I don't keep up, I am just too old, but is there a single OS of any consequence written in Rust? I tend to think in terms of OS kernels being the most complex code around and Rust is purported to be a systems programming language. There should be some example OSes out there somewhere.
Buffer overflow was always due to sloppy code, not the C language and certainly not the standard libraries. Many string functions have a 'size' parameter to prevent overrun. It's up to the programmer to use the functions correctly. Use strncpy(), not strcpy().
Better yet, write your own versions and, among other things, they probably won't use the heap. Win-win! And you will know that they are thread safe. Win-win-win!
Sometimes you have to wonder: How did the fellows at Bell Labs ever create Unix and C at the same time (more or less)?
-
Sometimes you have to wonder: How did the fellows at Bell Labs ever create Unix and C at the same time (more or less)?
The first Unix compared to a modern Linux Kernel has only the 1% of the complexity. The PDP11 compared to a modern RISC (PPC multi-cores) has just a fraction of the complexity, and it had no problem concerning "unaligned access, cache-coherence, etc". The datasheet about the Freescale e500 is 4000 pages!!! The latest Clement's book about the m68k is no more than 400 pages with exercises and teaching material, and the first UNIX was not designed with SMP, AMP, threads, and all the complexity that we have nowadays. The latest Linux 5.3 for HPPA2 is 22Mbyte with everything compiled "static", XINU, which is a near heir of UNIX, is no more than 512Kbyte on a 68020 machine.
-
There is something about C that is special; I've written in it for 25+ years and it will get the job done, usually with excellent performance. Like assembly it does what you tell it and if mess up telling it what to do, you'll see the results. Buffer overruns are caused by bad programming. You can prevent them by writing proper code. There are things that modern languages make easier on the developer, but with every concession and automation, raw control is lost. Same as moving form assembly to C. I can appreciate how quickly one can put together a C# application, and how it protects you from yourself, but there are always trade offs. I like some of the benefits of C++ and will cherry pick them when they are useful. One thing about C# that annoys me is how many different ways there are to do the same thing - if they want to establish proper patterns, why not just have a single way to do something? I think it has too much complication, but I guess that is what keeps those C# devs in business. I'm still trying to work my way through C# 7.0 in a nutshell and it is head spinning...
-
What would drive people to take the leap from the highly abstracted world of things like Python to the 'madness' of C today?
Simple. Just list the languages written in C. Start with Python. Then list the languages influenced by C. Start with Python.
sadly very true.
that's why so many languages need that stupid semicolon at the end of a statement. semicolons move my colon...
-
It's 2019, people!
C is obsolete, we gotta learn LISP Smalltalk Java Haskell Intercal Rust :scared:
why in the 2019, if they have to invent a new language, why they have to bring the fucking ";" as linefeed along? chr(13) and chr(10) have been used for millenia as linefeed and carriage return, a relic brought down from C/C++ :palm: everything else is a good thing in C/C++ except this (this make C is 99.99% my favourite language) why?
The use of a statement delimiter predates C by a long way. It was introduced because of problems found in early languages, like Fortran, where the end of the line meant the end of the statement. Fortran fixed this with...
for new language... no excuse, regardless... many new languages such as Phyton dont require ";"
Python, where python is not python and whitespace is critical. Seriously, no, not an advantage.
-
You can't compare python to c.
Apples and oranges
-
You can't compare python to c.
Apples and oranges
I can compare the insanity of whitespace dependency vs anger over having to terminate a statement with a semicolon...
-
you need to terminate somehow.
-
you need to terminate somehow.
Yup! Sooner or later...
Eliminating the semicolon is easy: Just come up with a way to separate multiple statements per line, multiple lines per statement and some way to tell the compiler where the end of a statement is (useful for error recovery) and you're all set!
Fortran didn't consider white space at all. It would interpret the following as a DO loop
DO 10 I=1,10
It would also accept as a DO loop
DO10I=1,10
Basically, the compiler didn't know, until if found the comma, whether it was creating a loop or a simple assignment. If it saw the comma, it had to back up and break the DO10I into pieces.
Not a very nice syntax to compile. Strictly ad-hoc, no clean recursive descent.
Fortran makes the assumption that every statement terminates at the end of the line unless column 6 in the following card has a mark (or number). The following statement extends over two lines although legally complete at the end of the first line:
area = 3.14159265358979
+ * r * r
What a mess to compile!
-
Perhaps it should also be easily readable by a human :-//
-
I don't really get the issue with the semicolon, once you've written a few dozen lines it becomes second nature.
I don't really have a problem with the whitespace dependency of Python either, as long as you use an editor that doesn't use nonstandard tabs. You get used to it and it's fine, at least I do.
Some people just need something to moan about I guess.
-
I don't really get the issue with the semicolon, once you've written a few dozen lines it becomes second nature.
I don't really have a problem with the whitespace dependency of Python either, as long as you use an editor that doesn't use nonstandard tabs. You get used to it and it's fine, at least I do.
Some people just need something to moan about I guess.
I moan about so many things. I'm really an expert moaner. Should be on my resume.
But neither semicolons or syntactic whitespace bug me much. Actually, have plenty of gripes about Python, but text formatting isn't one of them. As for tabs, I use tools to keep them out of my code entirely. I actually think the tab character was probably a bad idea from the start.
-
I've thought that as well, why do we even have a tab character that is not simply a defined number of space characters? I suppose it must have had some purpose, anyone know the history of that?
-
Tab isn't just 8 spaces, it aligns to the next multiply of 8 columns. It's useful for, wait for it, ... tables :)
1<tab>1
10<tab>10
comes out neatly aligned
1 1
10 10
-
you need to terminate somehow.
yes. CR/LF that is in the code file the moment you hit the return key on your keyboard. How's that for an elegant solution ?
And actually you don't. The parser should be smart enough to figure it out. it knows when a statement is complete. so the next keyword is the beginning of a new statement.
and don't tell me about multiline commands. Then you use a continuation character.
It is madness to demand from the programmer to tell the compiler when a line ends. it makes more sense to have a continuation character as you will need fewer of those than you need line-end characters.
-
Because programmers, even the best ones...and this might blow your mind...are humans.
You are right. The only reason why I still use C is that can "do Rust" only on x86 (and maybe on Arm), while you can "do C" on MIPS (llvm is weak), HPPA (llvm is unsupported) and PPC (llvm is experimental). That is my problem :-//
RISC-V just went from experimental to mainstream in LLVM. The Rust people seem happy so far.
I guess I program mostly in what is effectively C, and I do more assembly language than C++.
-
I've thought that as well, why do we even have a tab character that is not simply a defined number of space characters? I suppose it must have had some purpose, anyone know the history of that?
Tab characters were introduced by FIELDATA, and later by ASCII.
-
I've thought that as well, why do we even have a tab character that is not simply a defined number of space characters? I suppose it must have had some purpose, anyone know the history of that?
Are you kidding us? Did you never play with your (grand)father's typewiriter when you were a kid?
(http://i.imgur.com/mi0fg2Q.jpg)
Those are tab stops. You position them where you want them and when you press the tab key the carriage moves to the next stop.
Computerised word-processors from Wordstar on to modern MS Word implement the exact same feature.
Here's MacWrite in 1984. The hollow triangles on the ruler are the tab stops.
(https://i1.wp.com/9to5mac.com/wp-content/uploads/sites/6/2017/04/macintosh-emulator.jpg)
Computer terminals and printers such as the vt100 and la120 implemented tab stops. Assuming you wanted tab stop at columns 20 and 40 your program could output something like:
<esc>[3g clear all tab stops
<esc>[99d make sure we're at the first column
<esc>[H set tab stop
<esc>[20C move right 20 columns
<esc>[H set tab stop
-
I've thought that as well, why do we even have a tab character that is not simply a defined number of space characters? I suppose it must have had some purpose, anyone know the history of that?
Tab characters were introduced by FIELDATA, and later by ASCII.
The TAB key and tab stops were introduced in the 19th century.
-
I've thought that as well, why do we even have a tab character that is not simply a defined number of space characters? I suppose it must have had some purpose, anyone know the history of that?
Tab characters were introduced by FIELDATA, and later by ASCII.
The TAB key and tab stops were introduced in the 19th century.
We're not talking about the TAB key. We're talking about the TAB character, which was introduced by the above-mentioned codes.
-
What would drive people to take the leap from the highly abstracted world of things like Python to the 'madness' of C today?
[writing] in C for only one reason - performance. Python gave 2 transactions per second, my C implementation gives 200...
Did you really have to write that first question, when you had that second sentence so soon after?100x the performance (relatively trivially?)
The illusion that you NEED performance, either in terms of speed or program space, is horribly persistent. Even when you don't.
Also, C is essentially the first compiled language I expect to see "deployed" on any new chip that comes out. Because it can be implemented as essentially a fancy macro assembler, and offered without any libraries, and ... unusually ROMable, compared to many languages.
Python is not ever going to run on a $0.03 Padauk microcontroller, but it showed up with "Min-C", and is well on its way to having SDCC support...
-
We're talking about the TAB character
DEC, at least, made good use of the TAB character to save storage space.
When many fortran programs had lines that started with six spaces (5 digits of potential line number label, and a "C" line-continuation column), DEC was "<tab><statement> is fine, <tab>C<space>continuation works too." Their assemblers favored tabs as well.
At one time, that could've been important - that paper tape cost by the inch, you know!
-
you need to terminate somehow.
yes. CR/LF that is in the code file the moment you hit the return key on your keyboard. How's that for an elegant solution ?
And actually you don't. The parser should be smart enough to figure it out. it knows when a statement is complete. so the next keyword is the beginning of a new statement.
Clearly, it can't. My Fortran example above shows a complete statement on one line just before the next card is read with a continuation character. What keywords? You can have multiple arithmetic assignment statements and the only symbol you can count on is the '=' sign. So I guess the syntax should say, in effect, if you see a second '=' sign, the previous statement might be complete except in the case of:
a=b=c=d....;
If there was any way in the world to eliminate the ';', the language designers, the smartest people in computer science, would have done it. Fortran doesn't have it but it's not a regular grammar and Python, well, Python is special. If there was a short bus for languages, Python would be on it. What's with all the ':' symbols?
-
you need to terminate somehow.
golang was created by Ken Thompson (the same Unix / C guru from Bell Labs) and they got rid of the semicolon
-
golang was created by Ken Thompson (the same Unix / C guru from Bell Labs) and they got rid of the semicolon
BCPL, which begat B, which begat C, didn't use semicolons either. However, due to storage limitations, the B compiler had to generate an output in a single pass. This technique was carried forward into C.
In fact, Thompson wanted to use BCPL, but created B as a stripped-down version of BCPL, because the only computer he had at his disposition to write Unix at the time was an obsolete PDP-7 that was collecting dust in a corner of the Bell Labs.
The result is that no one complains about how fast C compiles, despite the so-little-loved semicolon.
-
I've thought that as well, why do we even have a tab character that is not simply a defined number of space characters? I suppose it must have had some purpose, anyone know the history of that?
Are you kidding us? Did you never play with your (grand)father's typewiriter when you were a kid?
(http://i.imgur.com/mi0fg2Q.jpg)
Those are tab stops. You position them where you want them and when you press the tab key the carriage moves to the next stop.
Computerised word-processors from Wordstar on to modern MS Word implement the exact same feature.
Here's MacWrite in 1984. The hollow triangles on the ruler are the tab stops.
(https://i1.wp.com/9to5mac.com/wp-content/uploads/sites/6/2017/04/macintosh-emulator.jpg)
Computer terminals and printers such as the vt100 and la120 implemented tab stops. Assuming you wanted tab stop at columns 20 and 40 your program could output something like:
<esc>[3g clear all tab stops
<esc>[99d make sure we're at the first column
<esc>[H set tab stop
<esc>[20C move right 20 columns
<esc>[H set tab stop
Well as someone else said already I was asking about the character, not the tab key or stops. That is fascinating though, I used to play with my mom's typewriter when I was a kid but I don't think I've touched one in 25 years and I don't remember ever being aware of tab stops on it. I'm just young enough to have never used a typewriter for actually typing something useful. I can see now why they got their name.
-
If there was a short bus for languages, Python would be on it. What's with all the ':' symbols?
It is also very self-ish.
-
Also, C is essentially the first compiled language I expect to see "deployed" on any new chip that comes out.
Which is madness in itself. C was developed for the PDP architecture and wants to allocate stack and heap. C fits like a wrench on a pig on anything that is not such an architecture...
-
If there was any way in the world to eliminate the ';', the language designers, the smartest people in computer science, would have done it.
The simplest, and one of the oldest, language did that : BASIC. no need for semicolons or other line terminators. Has continuation character.
According to your statement all those 'language designers' must be total idiots if they can't do something simple BASIC can...
-
However, due to storage limitations, the B compiler had to generate an output in a single pass. This technique was carried forward into C.
That was more than 60 years ago ... can we PLEASE update the syntax a bit ? it's nearly 2020... storage is no longer an issue.
Update the compiler so that semicolons are 'optional'.
-
However, due to storage limitations, the B compiler had to generate an output in a single pass. This technique was carried forward into C.
That was more than 60 years ago ... can we PLEASE update the syntax a bit ? it's nearly 2020... storage is no longer an issue.
Update the compiler so that semicolons are 'optional'.
No! Leave the historical relics alone!
Each semicolon is a time to pause and reflect on on why on earth '\' was use for the DOS file separator!
-
;
-
The "tab" alias "	" alias "\t" alias 0x09 has been always forbidden and banned in every task I have ever done in Avionics.
Before committing a source to Doors (it's like Git) I have to filter the file and replace each 0x09 with 0x20, and it's has been so boring that the first thing I did was a C program that does the job, and does format as well the text too according to a template.
-
Yeah C is not perfect but its does what it was meant to do really well.
I do agree that they should get rid of the ; and replace it by something like the _ line continuation character from VisualBasic, but apart from some extra syntax sugar here, what more would you want from C anyway?
The C language was meant to stay close to the hardware, as a result while you are coding you have a good idea what sort of machine code is going to come out of it when compiled and how things are going to be arranged in memory. Its almost just assembler language turned into a form that is actually readable by humans. It does this really well in my opinion.
I'm not saying fancier higher level are bad because they hide away the hardware from you. In fact when programing under a OS i much prefer C#. The real question is why the hell does C# have those damn semicolons;
-
Which is madness in itself. C was developed for the PDP architecture and wants to allocate stack and heap.
This depends on the "machine layer" of the compiler, not by the language itself.
On a RISC machine, you can use the stack as well as registers for passing parameters to a subroutine (function or procedure), as well as you can use a register for the returning value.
On HC11 it's mandatory that you use the stack (because HC11 is a stack-machine, which only has one general purpose register), but on PowerPC (with has many gp registers) it's optional.
-
C was developed for the PDP architecture and wants to allocate stack and heap. C fits like a wrench on a pig on anything that is not such an architecture.
Lots of C programs never go near the heap...
And - do you have an example "modern" language that doesn't want a stack? (that's actually a serious question. I used to use a Fortran Compiler that used "store the current PC at the target memory location and resume execution at target+<PCsize>" and didn't allow recursion, but that was 40-odd years ago.
-
but apart from some extra syntax sugar here, what more would you want from C anyway?
In avionics, we have to filter each source on MisraC validators. This means that the standard C language has too many degrees of freedom, and too many of them are poison traps.
Besides, people do usually abuse of these degrees of freedom offered by the standard C language, and sources do usually look ugly and crappy.
A native "Misra compliant C language" (we call it "safeC") would solve it.
I would also appreciated some new features grabbed from Ada (strong types, restricted counters, barriers, etc) and a minimal mechanism to facilitate OOP, because the C++ language offers too many, and it's too bloated.
Oh, and even something to avoid people to write "goto err;". When you see it in the C source of a serious operating system (e.g. Integrity OS, for which you have to pay 20K euro) ... it means that the C language does not offer any better alternative.
"Goto err:" has been found in
- Linux
- VxWorks
- NewOS (BeOS clone)
- Integrity OS
- ucOS/2
- XINU
- ...
-
do you have an example "modern" language that doesn't want a stack?
it depends on what. Anyway, Erlang is a modern language and it's entirely based on a stack, this because it's a recursion oriented language, like LISP, so it wants a big stack to operate.
-
Also in terms of stack, how do you think pretty much all modern CPUs call a subroutine? They all have a register that is used as a stack pointer and the instruction for returning from the subroutine requires the return pointer to be sitting on the stack.
As for the heap pretty much every programing language has something to do with it. As soon as you want to dynamically allocate memory you need it. If you are running on bare metal without the OS then the heap is all of your leftover RAM. If you are running under a OS then your heap is the leftover chunk of RAM that was allocated to you by the OS. When you call malloc(1) you don't get the OS giving you 1 byte of RAM, it just gives you a free address in the chunk of RAM it set aside for you before hand as your heap, if you run out then it assigns another chunk to you.
-
Concerning semicolon ";", it's a statement separator, so I don't understand the point because it simply signals the end of statement, and this is pretty logical. The philosophy is that whitespace is irrelevant, and this allows flexibility in how the code is formatted as formatting is not part of the semantic meaning.
I mean, some languages use the new line character as statement separator, but languages which ignore all whitespace tend to use the semicolon, thereore the statement separator is a logical choice.
-
Without a statement separator "a tool" might not be able to recognize each statement, with at least two possible interpretations.
It might be easy to resolve in a "compiler" because who writes a compiler is usually a badass and super smart dude in computer science, but the point is that it is hard to design a programming language without a statement separator to be not ambiguous, and when you have to write "a tool" to operate on the source ... well ... you might be not enough badass for it.
I know, it's an ego problem, just face it.
-
Each semicolon is a time to pause and reflect on on why on earth '\' was use for the DOS file separator!
I have got it! Balls of confusion when you type on the keyboard. Too many meanings for chars, and thinking about Python, if you wish to extend your statement to more of one line, you have to use the special character '\' to say that the statement has not finished, and ... it's the same '\' char that has a different meaning when you try to type a file path for dos emulator (DOSBox) launched by a python script ... oh, and the 90% of times I do it wrong :D
This is also a big problem converting Windows file paths to Unix file paths ... ah, the computer science
-
Yes the compiler needs to know where a statement begins and ends, but newlines can give it a pretty good idea.
Python is a good example. A lot may not know but Python also has semicolon statement separators. If you put ; on the ends of your lines in python it wont throw an error, you can put multiple statements in one line with ; between them and it works just fine exactly like in C. Python won't throw an error if it gets to the end of the line and the statement makes sense.
But then what if you want to split up a line that is too long in Python? In C you just go to a new line and it doesn't care. Well Python does the same, if it gets to the end of the line and the statement does not make sense it just continues reading until it gets to the next new line and checks again if it makes sense now. And if you don't want that you can still just put a \ on the end, this essentially escape characters out the new line so the compiler will ignore it. And there we go the compiler knows where the stamens are without constantly putting ; everywhere.
Still i don't quite like the idea of using tab/space whitespace indenting to define blocks of code, but it works fine and it forces careless people to properly structure there code so its not all bad.
-
And - do you have an example "modern" language that doesn't want a stack? (that's actually a serious question. I used to use a Fortran Compiler that used "store the current PC at the target memory location and resume execution at target+<PCsize>" and didn't allow recursion, but that was 40-odd years ago.
Yeah, I think free_electron has some explaining to do, lots of hardware is wired to use a call stack in RAM.
That being said, MIPS for example has a variant of branch instruction which stores the old program counter in designated general purpose register. You could use it to implement a DIY stack or perhaps even some coroutines or whatever.
MIPS is still available in PIC32 and some embedded systems.
edit
Actually it's more than that: every "jump and link" instruction saves the PC to a register, but by default it is one particular register intended for it. So if you are a subroutine and you want to call another subroutine, you have to explicitly push that register on stack if you are ever to use it again to return to your caller.
-
yes. CR/LF that is in the code file the moment you hit the return key on your keyboard. How's that for an elegant solution ?
Yes but somehow the LF character was made invisible/blanked out somewhere in the history before C was created.
This would lead to much confusion esp. nowadays with Windows needing a CRLF for a next line.
IIRC the reason for the semicolon originated that Cobol ended their statement with a space character (need confirmation, not sure).
Since LF was invisible just as the space character … well you get the point.
And actually you don't. The parser should be smart enough to figure it out. it knows when a statement is complete. so the next keyword is the beginning of a new statement.
Oh hell no! Please.
Some programmers can already make code practically unreadable.
Google on best C oneliners for instance and get some spaghetti code that actually works.
It is madness to demand from the programmer to tell the compiler when a line ends.
Yes but it is not for the compiler but for your colleague who has to review your code.
Humans are pretty good at interpreting stuff as you can see at the example below but that does not mean you can't make it more easy for them.
7H15 M3554G3
53RV35 7O PR0V3
H0W 0UR M1ND5 C4N
D0 4M4Z1NG 7H1NG5!
1MPR3551V3 7H1NG5!
1N 7H3 B3G1NN1NG
17 WA5 H4RD BU7
N0W, 0N 7H15 LIN3
Y0UR M1ND 1S
R34D1NG 17
4U70M471C4LLY
W17H 0U7 3V3N
7H1NK1NG 4B0U7 17,
B3 PROUD! 0NLY
C3R741N P30PL3 C4N
R3AD 7H15.
-
And another reason the ; can be handy.
Some smart ass programmer once put in an easter egg after a line of code after the 250th character on the line.
The code comparator we used back then went to 250 characters so it only reported the difference in the ; character.
Still we did not see it till it hit the customer and then after two reviews did we find it :)
-
I remember an older paper I know it exist but I cannot find it at the moment:
The main message is:
A Real Man uses FORTRAN.
Mueslieater* use Pascal.
Only a Real Man can write and understand FORTRAN spaghetti code.
And a Real Man can write FORTRAN code in every language.
A gleam of hope is C, because it's possible to make the code unreadable with some cryptic pointer modifications.. :-+
* Maybe there is a better translation for the German word "Müslifresser" but I don't know it.
-
how do you think pretty much all modern CPUs call a subroutine?
I think that most modern (RISC) CPUs (ARM, MIPS, and RISC-V as examples) call a subroutine by putting the return address in a register, and loading a new value into the PC. The whole stack-based "call" instruction that modifies a stack pointer, does a store based on the SP, and changes the PC, is rather CISC-y (especially if the PC is wider than memory.) Any stack manipulation is done by the prologue and epilogue of the called function (usually as part of a vendor-defined standard API to-be-used for all languages. But you can get really zippy leaf subroutines that never touch the stack, too.)
(but then they go and implement interrupt controllers that automatically stack stuff. I'm not quite sure how that works, RISC-iness wise.)
As for semicolons...I don't think enough of you remember just how painful and ugly code looked that needed "line continuation" characters. The semicolon (from Algol, first?) was a major improvement.(less so now that compilers are good enough to generate equally good code from a series of short statements as from a long expression, and functions with dozens of parameters have been replaced with structure references. But still...)
-
how do you think pretty much all modern CPUs call a subroutine?
I think that most modern (RISC) CPUs (ARM, MIPS, and RISC-V as examples) call a subroutine by putting the return address in a register, and loading a new value into the PC. The whole stack-based "call" instruction that modifies a stack pointer, does a store based on the SP, and changes the PC, is rather CISC-y (especially if the PC is wider than memory.) Any stack manipulation is done by the prologue and epilogue of the called function (usually as part of a vendor-defined standard API to-be-used for all languages. But you can get really zippy leaf subroutines that never touch the stack, too.)
(but then they go and implement interrupt controllers that automatically stack stuff. I'm not quite sure how that works, RISC-iness wise.)
Yeah ARM as a good example of RISC does have a so called link register that holds the return address, so you can call a function and return without using the stack. But you are also required to preserve register state so you do need to use the "push" and "pop" instructions that seam more CISCy. But yes you can optimize it a bit and avoid pushing registers to the stack by doing all the calculations in the same register that is used to give back the return value of the function, and not calling other functions leaves the link register unchanged so it doesn't need to be put on the stack. Also if the function has too many parameters that won't fit in the registers then the compiler will instead chose to pass them in by putting them on the stack before the call. So for more complex functions the stack inevitably gets used for something, not just for holding the return address.
Exceptions to this are some old architectures like the PIC16F where a special area of RAM is dedicated to holding information needed to return from subrutines and the CPUs internal logic uses it to magically return to the right spot when executing the return instruction. This means that the call depth is limited to a certain number of levels before that RAM is used up and it crashes. This sort of concept lives on for interrupt handling where the CPU typically has dedicated hardware to quickly restore its original state after an interrupt.
As for semicolons...I don't think enough of you remember just how painful and ugly code looked that needed "line continuation" characters. The semicolon (from Algol, first?) was a major improvement.(less so now that compilers are good enough to generate equally good code from a series of short statements as from a long expression, and functions with dozens of parameters have been replaced with structure references. But still...)
I think id prefer to see something like a \ on the end to tell me i haven't seen everything and i should keep on reading. I rarely see statements split across lines in C and its usually done when calling functions with a ton of parameters or long constant definitions.
In practice you don't really want to stuff too many things in one line anyway, it tends to make code hard to read. After all the very point of programing languages is to make it easy for humans to read code made for machines. You do get used to these ; being everywhere and put them on the end of things without even thinking about it, but really mostly serve to help the compiler, not the person reading the code. Putting the ; in the wrong spot can make for some of the most confusing and embaressing bugs as people who read C code often don't see them anymore as there brain ignores them as just noise, but the oh boy does the compiler see them.
-
Some of you guys really don't like the semicolon!
The best things about C are that it is portable and produces native code. I've taken an application written in C from a PC back down to an old NEC PC-6001 running a Z80 by compiling it with z88dk and put it on a virtual "cartridge" to run. It seems like a lot of people used assembly to get top performance back in the day, but without portability, trying to migrate to something new was very difficult or required a complete rewrite.
-
Concerning semicolon ";", it's a statement separator, so I don't understand the point because it simply signals the end of statement, and this is pretty logical. The philosophy is that whitespace is irrelevant, and this allows flexibility in how the code is formatted as formatting is not part of the semantic meaning.
I mean, some languages use the new line character as statement separator, but languages which ignore all whitespace tend to use the semicolon, thereore the statement separator is a logical choice.
Agree. I personally don't have any problem with semicolons (or TABs, which to me are pretty natural for code indentation, as they are for any text indentation really).
OTOH, relying on whitespace for anything else than as identifiers/keywords ('tokens') separator is way slippery IMO. Not even talking about the potential issues when using some diff tools.
-
It seems like a lot of people used assembly to get top performance back in the day
Yup. If you love RISCOs by Acorn, the first releases of their RISCOS were written entirely in arm assembly with a couple of scripts written in Acorn-basic. The same applies to the first AmigaOS classic.
I have recently bought a book written by a few writers that usually wrote for Dr. Dobb's Journal (DDJ). It's about 200 pages, a true collection of routines written in 68k and hc11 assembly. Some are very nice, neat, and useful, and there is the full source of a gw-basic, which was originally written for the 68k Tutor board.
Those lists show there were some tips and tricks to organize the code in a portable way, like the choice of a documented way to attribute a meaning to each register, or to some special location of the ram, or even more interesting, a way to define system traps and system services. And, of course, there are also macros, even for the earlier Acorn DDE that did support them.
Those programmers were more well organized than me about "how you manage your code" :D
-
And a Real Man can write FORTRAN code in every language.
That's exactly what my grad school adviser said after I wrote an 8080 assembler in PL/I.
With an adequate amount of white space separating 'thoughts' of code and indentation of DO loops, Fortran can look pretty good. Most of it does not. Then again, there is going to be a lot of array subscripting because we're basically doing math and function calls look exactly the same.
I still like 'simple' languages like Fortran77 (or earlier), I don't think I'll ever use the object features of later incantations. Do we really need to overload operators in Fortran? Apparently...
-
The "tab" alias "	" alias "\t" alias 0x09 has been always forbidden and banned in every task I have ever done in Avionics.
Before committing a source to Doors (it's like Git) I have to filter the file and replace each 0x09 with 0x20, and it's has been so boring that the first thing I did was a C program that does the job, and does format as well the text too according to a template.
Yup. Historical reasons aside, the tab character just has no place in a modern text file! Once you are rid of them indenting becomes easier, not harder.
-
I hate tabs much more than semicolons!!!
-
Notepad++ does a search and replace of EOL, Tabs to spaces, etc. on a single file.
It can also do a search and replace these special characters in all files in a subdirectory. I do this all the time:
DOS to *NIX EOL
[attach=1]
Tab to three spaces
[attach=2]
-
It seems like a lot of people used assembly to get top performance back in the day
Yup. If you love RISCOs by Acorn, the first releases of their RISCOS were written entirely in arm assembly with a couple of scripts written in Acorn-basic. The same applies to the first AmigaOS classic.
I have recently bought a book written by a few writers that usually wrote for Dr. Dobb's Journal (DDJ). It's about 200 pages, a true collection of routines written in 68k and hc11 assembly. Some are very nice, neat, and useful, and there is the full source of a gw-basic, which was originally written for the 68k Tutor board.
Those lists show there were some tips and tricks to organize the code in a portable way, like the choice of a documented way to attribute a meaning to each register, or to some special location of the ram, or even more interesting, a way to define system traps and system services. And, of course, there are also macros, even for the earlier Acorn DDE that did support them.
Those programmers were more well organized than me about "how you manage your code" :D
Writing good ASM code needs a lot of additional, soft rules to organize things and comment things.
C also need such additional coding conventions to prevent the code to get unreadable and no longer portable. Just the C language as enforced by the compiler is only a small part. The next part are the standard libraries. Finally it takes some more or less conventional rules on how to name variables, write comments and indent things. Especially the usual plentiful use of #define in C code can cause quite some confusion, if used different than normal.
It may be more important to learn those more informal rules than the actual C languish.
-
Notepad++ does a search and replace of EOL, Tabs to spaces, etc. on a single file.
It can also do a search and replace these special characters in all files in a subdirectory. I do this all the time:
Or just use the built in indentation handling of a lot of text editors, here is sublime for example:
[attachimg=1]
You get the menu by simply clicking the indentation indicator in the status bar, letting you covert any sort of indentation you would want and it autodetects what kind of indent to use by analyzing the file when opening it. When editing it doesn't matter if you use space or tab, the editor always puts in the correct indent type for what it is set to use.
-
I have to admit that every time I think something like "what the embedded ARM world needs is a port of avr-libc, because newlib-nano is just too huge and bloated (even though it's better than newlib)", or "I now know about six different ways to make Arduino's DigitalWrite() a lot better, maybe I can sneak them into the new XXX core where there aren't any "legacy issues"", I self-censor myself and don't say anything because I fear that everyone under 40 or so will laugh at the very idea!
(Well crap. Make that "amost every time." ;D )
-
It may be more important to learn those more informal rules than the actual C languish.
Definitively! This is the first thing in my head when I came across to the VxWorks source. Windriver spent a lot of time and resources on documenting how to write stuff, and they are still paying guys to keep sources aligned with their templates.
If you randomly open a file about the BSP or a part of the kernel or even application, you will always find things written strictly according to the template, which means that if you want to write your own driver for something (in my case I am on a USB-can driver for a customer) you have to know how to make it *compliant* to the template, which also tells you *how* you have to name variables and functions.
Abuses of #define's are not tolerated by the internal QA, so it can't be "WxWorks compliant".
-
I have to admit that every time I think something like "what the embedded ARM world needs is a port of avr-libc, because newlib-nano is just too huge and bloated (even though it's better than newlib)", or "I now know about six different ways to make Arduino's DigitalWrite() a lot better, maybe I can sneak them into the new XXX core where there aren't any "legacy issues"", I self-censor myself and don't say anything because I fear that everyone under 40 or so will laugh at the very idea!
(Well crap. Make that "amost every time." ;D )
The real problem is that so many young engineers have only ever known bloated solutions and keep saying chip X can't do job Y when it just needs some attention to detail to make the job fit very nicely. If you are working on low volume products this attitude is OK, as simplifying development counts for more than minimum BOM. As the volumes rise it puts companies at a real competitive disadvantage.
-
The real problem is that so many young engineers have only ever known bloated solutions and keep saying chip X can't do job Y when it just needs some attention to detail to make the job fit very nicely. If you are working on low volume products this attitude is OK, as simplifying development counts for more than minimum BOM. As the volumes rise it puts companies at a real competitive disadvantage.
Has nothing to do with young engineers, has everything to do with management and stakeholders.
They want fast turn around development, each year something new so consumers buy again.
You can't have fast 4 month R&D cycles && cost optimization && (code) quality.
If you choose a platform based on a 4 year platform lifetime and you need to add features each half year that nibbles of the CPU and memory reserve, you better make sure your platform can handle that growth.
Twenty years ago there was a 2 yr cycle, one year development, half year production optimization and half year testing, everything to make it as robust and cost / production effective as possible.
Gone those days are.
-
You get the menu by simply clicking the indentation indicator in the status bar, letting you covert any sort of indentation you would want and it autodetects what kind of indent to use by analyzing the file when opening it. When editing it doesn't matter if you use space or tab, the editor always puts in the correct indent type for what it is set to use.
Yes, the editors are quite advanced these days. However, when working with multiple people touching the same source code across multiple OSes, I usually want to be sure the files are actually using the same special/invisible characters. YMMV, but in my case I had enough compatibility problems in the past when consuming the same files on three OSes and having all sorts of weird errors.
-
The real problem is that so many young engineers have only ever known bloated solutions and keep saying chip X can't do job Y when it just needs some attention to detail to make the job fit very nicely. If you are working on low volume products this attitude is OK, as simplifying development counts for more than minimum BOM. As the volumes rise it puts companies at a real competitive disadvantage.
Has nothing to do with young engineers, has everything to do with management and stakeholders.
They want fast turn around development, each year something new so consumers buy again.
You can't have fast 4 month R&D cycles && cost optimization && (code) quality.
If you choose a platform based on a 4 year platform lifetime and you need to add features each half year that nibbles of the CPU and memory reserve, you better make sure your platform can handle that growth.
Twenty years ago there was a 2 yr cycle, one year development, half year production optimization and half year testing, everything to make it as robust and cost / production effective as possible.
Gone those days are.
This is small production volume thinking. If you are working on big production volume items the margins are usually so small you have to get every cent out of the BOM to survive.
-
You get the menu by simply clicking the indentation indicator in the status bar, letting you covert any sort of indentation you would want and it autodetects what kind of indent to use by analyzing the file when opening it. When editing it doesn't matter if you use space or tab, the editor always puts in the correct indent type for what it is set to use.
Yes, the editors are quite advanced these days. However, when working with multiple people touching the same source code across multiple OSes, I usually want to be sure the files are actually using the same special/invisible characters. YMMV, but in my case I had enough compatibility problems in the past when consuming the same files on three OSes and having all sorts of weird errors.
Git does a very good job of normalizing line endings when files are checked in, and then returning them to the correct line endings for the system they are subsequently checked out on.
-
This is small production volume thinking. If you are working on big production volume items the margins are usually so small you have to get every cent out of the BOM to survive.
What do you call small? This was experience with +/- 600k to 1 million products/year @ BOM $15 a piece.
-
Also in terms of stack, how do you think pretty much all modern CPUs call a subroutine? They all have a register that is used as a stack pointer and the instruction for returning from the subroutine requires the return pointer to be sitting on the stack.
Hardly ANY popular ISA designed since 1980 requires a subroutine return address to be sitting on the stack!
It's certainly not true for SPARC, MIPS, ARM, PA-RISC, PowerPC, Alpha, Itanium, RISC-V, SuperH which all store the return address for a function call into a register, and the return instruction takes it from the same register.
AVR and MSP430 do store/take return addresses on the stack. Exceptions to the rule.
x86, m68k, 8080, 8051, z80, 6800, 6502, 6809, VAX all store return addresses on the stack. They were all designed in the 1970s -- 40+ years ago!
-
Yes, but the discussion was about MCUs, so AVR, PIC and 8051 are perhaps more relevant than DEC Alpha or Itanic :)
ARM, MIPS - sure.
The original complaint was that using C on those chips forces the use of stack, as if stack weren't already required for function calls by the ISA of most MCUs.
-
Also in terms of stack, how do you think pretty much all modern CPUs call a subroutine? They all have a register that is used as a stack pointer and the instruction for returning from the subroutine requires the return pointer to be sitting on the stack.
Hardly ANY popular ISA designed since 1980 requires a subroutine return address to be sitting on the stack!
It's certainly not true for SPARC, MIPS, ARM, PA-RISC, PowerPC, Alpha, Itanium, RISC-V, SuperH which all store the return address for a function call into a register, and the return instruction takes it from the same register.
AVR and MSP430 do store/take return addresses on the stack. Exceptions to the rule.
x86, m68k, 8080, 8051, z80, 6800, 6502, 6809, VAX all store return addresses on the stack. They were all designed in the 1970s -- 40+ years ago!
Okay i did put it a bit broad. Yeah there are architectures that use things like a link register to hold the return address, but as i have explained later on even those eventually end up pushing it on the stack as soon as functions become complex enough. You are typically required to restore registers to there previous state before finishing a subroutine. The easiest way to preserve the link register is to push it on the stack, do your stuff (including call other subrutines), pop it back off the stack and return to it. So in more RISC like architectures you end up doing the same as CISC does, just doing it step by step instead of a magical single instruction. Yes you can avoid putting it on the stack if your subroutine never calls other subroutines but all practical programs do that at some point.
It's a bit of a optimization to skip stacking the return address on small simple functions, but its even more of an optimization to simple tell the compiler to inline the function rather than calling it, and CISC like systems that offer the magical stack instruction for returning can also be made to call and return in the RISC like way of using a link register.
You can write programs without using a stack at all, but then again you can also compute everything with only a single accumulator register and a few bitwise instructions. But just because it can be done in a simpler way does not automatically mean its a good way of doing it. And a stack is one of those things that makes life a lot easier.
-
Also in terms of stack, how do you think pretty much all modern CPUs call a subroutine? They all have a register that is used as a stack pointer and the instruction for returning from the subroutine requires the return pointer to be sitting on the stack.
Hardly ANY popular ISA designed since 1980 requires a subroutine return address to be sitting on the stack!
It's certainly not true for SPARC, MIPS, ARM, PA-RISC, PowerPC, Alpha, Itanium, RISC-V, SuperH which all store the return address for a function call into a register, and the return instruction takes it from the same register.
AVR and MSP430 do store/take return addresses on the stack. Exceptions to the rule.
x86, m68k, 8080, 8051, z80, 6800, 6502, 6809, VAX all store return addresses on the stack. They were all designed in the 1970s -- 40+ years ago!
Okay i did put it a bit broad. Yeah there are architectures that use things like a link register to hold the return address, but as i have explained later on even those eventually end up pushing it on the stack as soon as functions become complex enough.
Yes of course, but that is typically far fewer than half of all function calls. Often it will be 10% or less as most functions that call other functions either call more than one of them or else call the same one repeatedly. Functions that only call one other function are I'd say more likely than not to be able to do that as a tail call (i.e. just a jump, no return) as they're more likely to be fiddling with the arguments than fiddling with the return result.
You are typically required to restore registers to there previous state before finishing a subroutine.
Some of them. RISC-V, for example, gives you 15 registers (a0-a7, t0-t6) that a function can use without restoring them, which very often reduces the memory load of leaf functions (which is most functions, as pointed out above) to only the absolute essentials of explicitly accessing arrays or pointer structures in the actual source code.
The easiest way to preserve the link register is to push it on the stack, do your stuff (including call other subrutines), pop it back off the stack and return to it. So in more RISC like architectures you end up doing the same as CISC does, just doing it step by step instead of a magical single instruction. Yes you can avoid putting it on the stack if your subroutine never calls other subroutines but all practical programs do that at some point.
Absolutely! If you do need to, which isn't actually all that often, dynamically. It's true that if you have a link register then you need extra instructions to save and restore the link register, but an advantage is you can do those any time you like, when it's convenient, for example re-loading the link register a few instructions before the actual return, so that you already have it when you need it. You can also avoid the save and re-load on any execution paths through the function that don't actually call another function.
It's a bit of a optimization to skip stacking the return address on small simple functions, but its even more of an optimization to simple tell the compiler to inline the function rather than calling it, and CISC like systems that offer the magical stack instruction for returning can also be made to call and return in the RISC like way of using a link register.
Sure, but you have to be careful with inlining. On small systems total code size is often critical, and even on bigger ones excessive inlining destroys the effectiveness of the instruction cache.
Learning that avoiding the "magical" CISC instructions on VAX and IBM 370 actually made programs up to two or three times faster on the same machine was exactly how RISC got started :-)
You can write programs without using a stack at all, but then again you can also compute everything with only a single accumulator register and a few bitwise instructions. But just because it can be done in a simpler way does not automatically mean its a good way of doing it. And a stack is one of those things that makes life a lot easier.
I would never suggest that! The stack is a fantastic invention and the advent of instruction sets that made it natural and easy to use it was one of the biggest advances in computer instruction sets. Making PIC code natural and simple (which x86 *still* doesn't do!) is another. And the development of first index registers (full absolute memory address in the code, and small array index in a register) and then base registers (full memory address in a register, and small offset in the code) is the 3rd.
The trick is in finding the minimal instruction set that gives all the nice features you want (recursion, PIC, stack, heap, dynamic linking, virtual functions, switch statements) with maximum performance, minimum energy usage, and minimum code size.
-
Yep all valid points.
What i was trying to say is that its very hard to have no stack at all in modern programing (to the argument that C heavily relies on the stack).
Yes there are cases where you can optimize away the stack and even make things faster, but it doesn't cover all the use cases so a stack still is needed for some of them. In some cases the stack reduces the memory footprint since it provides a spot for functions to store local variables without statically allocating them somewhere (And also breaking recursion with that). Since its so prevalent there is almost always a dedicated stack pointer register in CPUs, even if they are designed for avoid using the stack for certain things.
There are more arguments to be made against having a heap, but again for some use cases its a solution that works well.
I do have high hopes for RISC-V since its one of the very few architectures that does not drag on all sorts of legacy crap, so it can be designed from the ground up to fit our modern computing needs and efficiently use the abundance of transistors. On top of it all not being chained down by heavy licensing fees of the creator.
-
Yes there are cases where you can optimize away the stack and even make things faster, but it doesn't cover all the use cases so a stack still is needed for some of them. In some cases the stack reduces the memory footprint since it provides a spot for functions to store local variables without statically allocating them somewhere (And also breaking recursion with that).
Sure, LIFO is a very common allocation pattern, and a stack is perfect for managing that.
Since its so prevalent there is almost always a dedicated stack pointer register in CPUs, even if they are designed for avoid using the stack for certain things.
Interestingly, in the base RISC-V instruction set (32 bit opcodes) there is no stack pointer and no link register. All registers (except x0) are identical and you can use any of them as a stack pointer or link register as you wish. Have multiple stacks! If you only have a limited depth of subroutine calling and no recursion (as some are advocating) then you can use a different link register for every level of function -- no need to save and restore them! And keep a few for function calling in interrupts too, if you want.
The optional compressed instructions do make assumptions about where the stack pointer and link register are. And some cores with branch prediction assume that x1 and x5 are used as link registers.
I do have high hopes for RISC-V since its one of the very few architectures that does not drag on all sorts of legacy crap, so it can be designed from the ground up to fit our modern computing needs and efficiently use the abundance of transistors. On top of it all not being chained down by heavy licensing fees of the creator.
I'm kinda hoping it will be successful too :-)
-
You can't do without a stack unless you want to code as a huge spaghetti monster.
What exactly would "no stack at all" mean? Try and imagine any program that's more than a dozen lines without any stack. That would basically mean not call/return (or just one level) and no allocated local variables (only fixed in memory, or using only registers). Not something I'd want to deal with for any serious work. That's seriously limited. I'll leave that to whoever likes it. ;D
Now the ability to handle several "stacks" concurrently is interesting (but not necessarily any "safer", as it adds complexity.)
One simple approach that has been used on some MCUs/CPUs is to have a dedicated stack as a return stack. It basically prevents many of the potential stack corruption issues leading to the potential execution of unwanted code. I think mixing return and data stacks in a single stack has been a serious mistake.
-
If you only have a limited depth of subroutine calling and no recursion (as some are advocating) then you can use a different link register for every level of function -- no need to save and restore them!
Oh yes, and each function needs to be written to work at one particular call depth and use the right link register. Screw something up and you are returning to a wrong address. Gonna be fun :)
-
You might also ask exactly what "having a stack" really means.
We mentioned ARM, MIPS, and RISC-V as not using the stack for the bottom level of subroutine linkage. But they each have general purpose registers with indexed addressing, so it's trivial to implement one or more stacks, and the ABI specifications call for doing so. Usually there is a designated SP register (even though other registers can be used as stacks, nearly as easily.) Being able to use the SP as an index register makes stack frames for local variables much easier.
At the other end of the scale, you have chips like the 8bit PIC microcontrollers, with a limited depth stack implemented in hardware, that is good ONLY for storing the return address of "call" instructions (and Interrupt return addresses.) Really annoying, if you're used to having better. And yet, plenty of useful products have been created using those...
I'm pretty sure I've used compilers (ie for Fortran) that didn't use stacks. Each subroutine had an area to store the return PC and any registers that needed to be preserved, and another area for input parameters. It works fine if you don't need recursion, and all of your memory is RAM, anyway...
-
If you only have a limited depth of subroutine calling and no recursion (as some are advocating) then you can use a different link register for every level of function -- no need to save and restore them!
Oh yes, and each function needs to be written to work at one particular call depth and use the right link register. Screw something up and you are returning to a wrong address. Gonna be fun :)
Of course this needs care! Basically calling each function needs to use Link Register #N where N is the maximum of the LR numbers for every function that calls it, plus 1. Trivial for a compiler to do.
It's far easier to just use a stack, of course.
The RISC-V ABI defines register x5 (aka t0) as a secondary Link Register. It is used for certain simple runtime library functions to not disturb the standard Link Register. One example is the subroutines that are used by gcc and clang if you give the -msave-restore option to save registers on the stack, or to restore them and return, instead of having a store-multiple instruction. It can also be used by things such as transcendental functions, or code to emulate multiply or divide, although it is not used for that at present.
-
I'm pretty sure I've used compilers (ie for Fortran) that didn't use stacks. Each subroutine had an area to store the return PC and any registers that needed to be preserved, and another area for input parameters. It works fine if you don't need recursion, and all of your memory is RAM, anyway...
In the far distant past (introduced in '65), the IBM1130 didn't have enough registers to even contemplate having a stack so...
The first word of every subroutine was used to store the return address. There was a BSI instruction (Branch and Store IAR) that placed the return address in the first word of the called subroutine and execution continued with the next work. As a single thread machine this worked ok as long as the interrupt handlers weren't re-entered and that couldn't happen based on the hardware design of the interrupt system.
On exit from a subroutine, the code issues a BSC <indirect> instruction Branch or Skip on Condition <indirect> with no condition and pointing to the saved return address.
The first word of each subroutine, by convention, was coded as
ENTRY: DC *-*
Define Constant of the Program Counter minus the Program Counter (obviously 0) but it was a nice flag. It told you right up front that something was going to be stored there. It wasn't simply a constant of 0.
I guess you had to be there to actually like the scheme. Of course, once DEC machines came along with more registers and more addressing schemes, you might want to throw rocks at the 1130. But it all worked and I still use my FPGA implementation to this day.
And the Fortran compiler had 27 phases (passes) and the entire source and resulting executable had to fit in memory. Intermediate files weren't used because the compiler had to work from paper tape on some of the smaller machines.
-
I guess you had to be there to actually like the scheme. Of course, once DEC machines came along with more registers and more addressing schemes, you might want to throw rocks at the 1130. But it all worked and I still use my FPGA implementation to this day.
When I got to university in 1981 the 1st year students learned Pascal or BASIC (for the business students) on a PDP 11/34 with 256 KB of RAM, two 5 MB disk packs (one for OS and software, one for home directories for several hundred students), 22 VT100 and VT52 and Visual100 terminals, and two LA120 (dot matrix) "line printers". As I recall you were allowed to use 64 KB or so of disk while you were logged on but had to reduce it to maybe 8 KB to log off. Good thing disk blocks were small. Everyone ran the same shared copies of a tiny custom CLI shell and line editor and could use something like 8 KB each for heap/stack/globals. The Pascal compiler (NBS, then later OMSI) ran one copy at a time in a batch queue, but got a whole 64 KB to romp around in. That thing was tiny but tuned to within an inch of its life and actually ran quite acceptably with compile times of just a few seconds for our tiny student programs.
There was an IBM 1130 in the corner, gathering dust.
2nd year and later students got to use the PDP 11/70 (and later a couple of VAX 11/780s) which actually had several MB of core.
And the Fortran compiler had 27 phases (passes) and the entire source and resulting executable had to fit in memory. Intermediate files weren't used because the compiler had to work from paper tape on some of the smaller machines.
I think they told us 28, but anyway something ridiculous for what it did. The Pascal compilers we were using on the PDP-11 were single pass (or 1.5, including patching offsets for linking)
-
I think they told us 28, but anyway something ridiculous for what it did. The Pascal compilers we were using on the PDP-11 were single pass (or 1.5, including patching offsets for linking)
Completely off topic but...
I have a couple of PiDP11/70 consoles running 2.11BSD. These are emulations using a Raspberry Pi and simh as the simulator. They have 4MB of RAM and run considerably faster than the real hardware.
I never used DEC machines but I bought these two specifically for Pascal, C and Fortran 77 programming. Even though I wind up using vi as the editor (and I'll eventually make peace with that) it is just plain fun to write F77 code.
https://obsolescence.wixsite.com/obsolescence/pidp-11
Both of them also run a web server. When I get time I'll try to create a decent home page.
So, my grandson is starting a Differential Equations class. The first project, of course, is the repeated solution of an exponential growth problem. How fun is that for an old Fortran programmer? I eat that stuff for breakfast!
MATLAB is cool, Fortran under BSD on a PDP11/70 is cooler!
-
MATLAB is cool, Fortran under BSD on a PDP11/70 is cooler!
"enable-languages=c,c++,fortran" has always been the default set for gcc-{v4.1.2 ... v8.2.0} on everything regarding Gentoo/Linux :D
-
MATLAB is cool, Fortran under BSD on a PDP11/70 is cooler!
Somehow Fortran isn't quite Fortran unless its on punched cards. :)
-
MATLAB is cool, Fortran under BSD on a PDP11/70 is cooler!
"enable-languages=c,c++,fortran" has always been the default set for gcc-{v4.1.2 ... v8.2.0} on everything regarding Gentoo/Linux :D
And on quite a few other distributions as well. Funnily enough, ADA support often comes as a separate package. Probably comes from the fact that Fortran is still used in academic circles a lot, whereas ADA is comparatively used little. (*sigh*)
My own config when I hand-build GCC is more like: "enable-languages=c,c++,ada" or even sometimes just: "enable-languages=c,ada" ;D
-
MATLAB is cool, Fortran under BSD on a PDP11/70 is cooler!
Somehow Fortran isn't quite Fortran unless its on punched cards. :)
In all upper case and limited to the 64 char set. There was a time... Control Data 6400 had a 60 bit word and could hold 10 chars in a word. In the early years what it lacked was the 'compare move' unit and running COBOL applications was a drag because it was difficult to extract strings or even manipulate strings. It was fast on an IBM 360 but truly grim on the 6400. OTOH, if you want high speed calculation with good precision, the 6400 was the way to go. A lot of the space program was done on the 6400 (and higher capability machines like the 7600).
Those were great days in computing. Today it's all about cell phone apps and twits and snaps. Sounds like a breakfast cereal.
-
Even though I wind up using vi as the editor
Can you still find "elle" ("Elle Looks Like Emacs")? It'll run on an 11/70 (did it, back in the day!)
-
Damn... http://towo.net/mined/ (http://towo.net/mined/)
-
I originally learned C to make video games, but it ended up being useful still when doing low level Linux development.
-
Damn... http://towo.net/mined/ (http://towo.net/mined/)
Mined does look pretty darn capable on this FreeBSD workstation. It's in Ports.
-
If you want to produce a compiled code, C is quite enough. It is certainly far from perfect, but
- it is very flexible - you can do almost everything which could've done with assembler
- it is very simple - the description of common syntax is only few pages. If you don't use heap, you typically get by even a subset of the syntax - you could've learned it just by looking at the example code (I guess many people actually do this). Sometimes the syntax subset actually used in the project is so small that it doesn't even provide any advantages over assembler except portability.
- it is very widespread. Practically any platform supports C.
Therefore, when it comes to producing compiled code, I don't think there's a need for anything else except C. I don't consider any advantages of other languages to be sufficient to justify the associated bloat.
There always will be attempts to create something new (such as C#) which may even become very popular (sales force has strong influence on a weak mind), but I don't think they're going to be better than C.
-
Or the syntax of declarations, it's horrible. Just recently I thought I was very smart because I knew how to declare a function returning int[2], but it didn't work. "Expected something something", very helpful, thank you. I spent a minute trying different combinations of parentheses even though I was quite sure I got it right the first time. Finally gave up, went to the Internet, turns out that fixed size arrays are the only type which cannot actually be returned from a function :wtf:
Usually you would just pass a buffer to be filled by the function, but you can return an fixed array if you encapsulate it into a struct:
typedef struct {
int array[2];
} int2;
int2 func() {
return (int2){1,2};
}
void main(){
printf("it is %d\r\n",func().array[1]);
}
In this case, C compiler will create a buffer in the stack and then pass the pointer to it to the function. This is exactly the same as if you created and passed the buffer to the function manually:
void func(int* array) {
array[0] = 1;
array[1] = 2;
}
void main(){
int array[2];
func(array);
printf("it is %d\r\n",array[1]);
}
-
It is very widespread. Practically any platform supports C.
C does compile for i51, but i51 does not fully support C.
e.g. functions on i51 are not handled as functions due to the limitations imposed by the hw.
-
It is very widespread. Practically any platform supports C.
C does compile for i51, but i51 does not fully support C.
e.g. functions on i51 are not handled as functions due to the limitations imposed by the hw.
There are C compilers for the 8051 available commercially from Keil or some open source ones like SSDC.
There are even worse architectures for running C code, one of the very popular ones being the PIC16F family. There you get the same tiny addressable memory range that makes banking a must, but on top of it you get even less registers to work with, a weird hardware stack that's not part of RAM and not even a multiply instruction. Yet there are lots of C compilers out there for it. Even its manufacturer Microchip claimed that the PIC16 family is not for use with high level compilers and thus only offered a official compiler for assembler. But as third parties got C compilers working on it, so later on even Microchip gave in and started offering a offical C compiler for that family (They just bought out a company that already made an 8bit C compiler for it and rebranded it). I feel sorry for the people who had to develop this compiler as it must have been hell.
EDIT: Oh and of crouse you have to write your C code for these chips with the chip in mind. There are all sorts of issues if you try to straight out compiler a large complex C program made for a proper computer. There are some really bad architecture annoyances that even a good C compilers will struggle to work around.
-
It is very widespread. Practically any platform supports C.
C does compile for i51, but i51 does not fully support C.
e.g. functions on i51 are not handled as functions due to the limitations imposed by the hw.
There are C compilers for the 8051 available commercially from Keil or some open source ones like SSDC.
SSDC has limitations, but Kiel and IAR have 8051 compilers that will correctly compile pretty much anything written in C. Some features of C produce huge code, and/or very slow code, but it works. 8051 cored devices get used in many applications where a complex protocol has to be thrown into the software mix. Typically protocol libraries are written in full fat C, and they usually compile and run just fine with Kiel or IAR. As long as the bloated clunky binary code will fit in the available memory, and you don't need much speed, you are usually good to go.
-
The only C compiler I have used with the PIC16F family is CC5X
http://www.bknd.com/cc5x/ (http://www.bknd.com/cc5x/)
At the time SDCC didn't support mid-range PICs (10 years ago?) and GNU would never support them.
It's disheartening to look at the assembly code and see the amount of paging and banking going on. A huge percentage of the emitted code is involved with overcoming the architecture.
-
The only C compiler I have used with the PIC16F family is CC5X
http://www.bknd.com/cc5x/ (http://www.bknd.com/cc5x/)
At the time SDCC didn't support mid-range PICs (10 years ago?) and GNU would never support them.
It's disheartening to look at the assembly code and see the amount of paging and banking going on. A huge percentage of the emitted code is involved with overcoming the architecture.
The best C compiler for PIC and the 8051 was the HiTech compiler. It was built specifically with crude architectures in mind, and served them well. MicroChip bought Hi-Tech, closed down the 8051 version, and made the PIC version the basis of their current compiler offering for PIC.
-
I also mainly used the HiTech C compiler for the PIC16F family and it worked pretty well for me. I never really looked into the assembler it generated, but knowing how annoying the architecture is id imagine its a mess.
Technically you can make a C compiler for anything that is a touring complete machine and has enough memory for the job. It might not be pretty when you are missing things we consider essential in modern CPUs, but it will work.. usually slowly.
-
turns out that fixed size arrays are the only type which cannot actually be returned from a function :wtf:
I'm sorry to say this, but this is very basic C here. As said above, a possible workaround is to put the array inside a struct and return that.
One very simple way of at least "trying" what you wanted to do, instead of having to look for the appropriate syntax, would have to have defined an array type and use this as the return type. Like so:
typedef int MyArray_t[N];
(...)
MyArray_t SomeFunction(...) ...
Of course, then you get an appropriate error instead of a cryptic syntax error: "error: 'SomeFunction' declared as function returning an array".
That is something I recommend for instance to use function pointer types. Easier to deal with with a typedef.
And that said, returning an "array" (or any struct that is big enough for that matter), is pretty inefficient due to how C handles return values, so that's probably the main reason why the authors thought it was silly.
The associated C rule that makes it consistent is that you can't assign an array to an array. The corresponding error will be: "error: assignment to expression with array type".
So returning an array from a function wouldn't really make sense. Arrays are a special beast in C, designed merely for convenience of simple allocation. The "value" of an array in C is not an "array" of items, it's just a pointer to it. The only exception to this is that sizeof() returns the correct size. Yes it seems a bit inconsistent, but again it's very basic C.
Now why did the authors of C think returning a struct was interesting, and not an array: well as much as they decided, again, that structs could be used as values. That was probably meant to be used with relatively small structs as a convenience to avoid passing many parameters to functions instead of one, or likewise returning several scalar values from functions instead of just one. Arrays have been thought of as a convenience from the start, structures have always been a "proper" data structure.
And there is one of my biggest gripes, no offense meant to anyone, just something that I've seen often enough to think it's not just a random fact: C is very often seen as so "simple" that many people actually do not care learning the language properly. An astounding number of C programmers actually *don't* know the language. They just appear to know enough to write code. And this is one of the biggest problems with C: not the language itself (even though there are things to be said here), but the way it's taught (or even self-taught most often)... It's simple enough that many don't learn it right, yet flexible enough that you may think you don't *need* to learn it all, and then you can shoot yourself in the foot with it. Some languages are even simpler than C, such as Pascal or even more so, Oberon, but then they are so rigid that people have no choice but learn them right. So even though those languages are too rigid for my taste, I definitely think Wirth was right about the learning factor.
And so to finally answer the OP's question: "does anybody learn C anymore?" Well, the right question, to me, would look more like: how many have actually learned C properly, and why would you think things had gotten any better? Most C courses are a complete disaster.
And obviously, the part of the question dealing with "is C still useful today", the answer is a gigantic YES.
-
So returning an array from a function wouldn't really make sense.
Yup. Precisely.
-
but Kiel and IAR have 8051 compilers that will correctly compile pretty much anything written in C.
It's not the compiler, it's the CPU that is limited, so "smart compilers" need to use a couple of tricks to *try* to workaround limitations, but if you don't take care about that, you will have surprises at run-time.
In "regular" C, automatic variables are usually allocated on the stack, but one of the hardware limitations in 8051 is that it doesn't have any stack worth speaking of, so another approach is needed.
Smart compilers do use an "overlaying" approach, which is basically a static analysis of the calls-tree, and this solves the problem for a program under the strict hypotheses that:
- it doesn't use recursion
- it doesn't have reentrant calls
- it doesn't call the same function from both main loop and from an interrupt (this is usually catastropic
- it doesn't use function pointers
- the compiler is guaranted to always know which function a function pointer points to at any one point in the code
So, basically you have to avoid all the cases where the compiler can't know which function a function pointer points to at any one point in the code because in these cases the compiler can't compute perfect call trees to see which scopes are alive in relation to other variable scopes.
Besides, the quality of that call tree is very important, so a compiler MUST be really smart since the call-tree controls how well the compiler+linker can manage to reuse memory between different function calls.
-
Smart compilers do use an "overlaying" approach, which is basically a static analysis of the calls-tree, and this solves the problem for a program under the strict hypotheses that:
(...)
I remember using SDCC quite a while ago with Cypress FX1/2 targets. Yep there were several limitations like this.
-
(http://www.downthebunker.com/chunk_of/stuff/public/boards/board-8051-elisa1.jpg)
(http://www.downthebunker.com/chunk_of/stuff/public/boards/board-8051-elisa2.jpg)
I made this board between 1995 and 1997. But I have always programmed it Basic11, and assembly thaks to the Basic-routines that you can call as you wish (sort of BIOS, if you remember how we programmed with DOS).
The intel micronctroller book suggested a hardware trick to read/write/fetch from the same ram. This was a good idea because the 8051 has two separated spaces, one for the code, one for the data. I didn't have any C good compiler, and SmallC51 was too limited for my programming style. I learned C on 68000 around 2001, before the university, and I played with a commercial C compiler released for DOS for cheap.
Then, years later, I started the university, and I bought a couple of other compilers: Turbo Pascal and Turbo C, both released by Borland for the DOS platform, both requested by my university for laboratories.
And it was fine, and a good experience. Just of one thing I do regret: to have spent two years in my university at following my teachers' code-style, which was found pretty crap when I started working in avionics. So I had to learn the C again, from scratch.
-
And that said, returning an "array" (or any struct that is big enough for that matter), is pretty inefficient due to how C handles return values, so that's probably the main reason why the authors thought it was silly.
This is clearly stated in K&RC 2nd. ed. right in the introduction (page 2):
C provides no operations to deal directly with composite objects such as
character strings, sets, lists, or arrays. There are no operations that manipulate
an entire array or string, although structures may be copied as a unit.
And so to finally answer the OP's question: "does anybody learn C anymore?" Well, the right question, to me, would look more like: how many have actually learned C properly, and why would you think things had gotten any better? Most C courses are a complete disaster.
And obviously, the part of the question dealing with "is C still useful today", the answer is a gigantic YES.
+1000 :-+
-
Now why did the authors of C think returning a struct was interesting, and not an array: well as much as they decided, again, that structs could be used as values. That was probably meant to be used with relatively small structs as a convenience to avoid passing many parameters to functions instead of one, or likewise returning several scalar values from functions instead of just one. Arrays have been thought of as a convenience from the start, structures have always been a "proper" data structure.
I seem to recall that K&R C only allowed passing or returning a struct by value if it was the size of an int (i.e. register) or smaller. e.g. on a 32 bit machine you could have two shorts, or a short and two chars or something like that.
I think it was probably only with the introduction of ANSI C that you could (portably) pass or return larger structs. But you usually shouldn't -- if you want to return a struct it's almost always better for the caller to allocate a local variable for the result and then pass a pointer to it as an extra argument. Many other languages in fact do this for you automatically.
-
In "regular" C, automatic variables are usually allocated on the stack, but one of the hardware limitations in 8051 is that it doesn't have any stack worth speaking of, so another approach is needed.
That's only true on machines that are just *slightly* less limited than the 8051.
Ever since MIPS and SPARC and other RISC processors with 32 registers appeared 35 years ago, along with the advent of ABIs with a good number of both caller-save and callee-save registers and good register allocation algorithms, the vast vast majority of automatic variables never see the stack. Leaf functions (where most of the work is done) usually don't touch the stack at all, and non-leaf functions might use a lot of automatic variables but seldom save more than two or three registers to the stack.
The same is mostly true even with 16 register machines such as 32 bit ARM and 64 bit x86. It *could* have been true with 68k and VAX, but ABIs and register allocation philosophy and algorithms hadn't advanced enough at the time and they just used their quite plentiful registers as a cache for stack-based variables, just as 16 and 32 bit x86 are basically forced to by lack of registers.
If I was writing a compiler for a limited machine such as the 8051 or 6502 or things of that ilk (8080/z80 too) I would allocate 32 ints worth of global memory as RISC-style registers (preferably addressable using 8 bit or smaller addresses in each instruction e.g. 6502 "zero page"), assigned in a similar way to Aarch65 or RISC-V registers (arguments / temps / saved). At function entry I'd copy (in non-leaf functions) only the return address and necessary saved registers to another memory area, managed as a stack. The only access that is necessary there is raw push/pop or an ability to store/load at an offset from a pointer (the same as fields in a struct allocated at a non-static address). On 8080/z80 I'd probably actually use push/pop (or maybe (IX+0xNN) (or IY)), and on 6502 (zp),y addressing from a pseudo stack pointer. On the 8051 you'll obviously need external RAM to run reasonable sized C programs, and will use DPTR to save "register" contents to a large stack there at the start of a non-leaf function, and restore them at the end.
-
I seem to recall that K&R C only allowed passing or returning a struct by value if it was the size of an int (i.e. register) or smaller. e.g. on a 32 bit machine you could have two shorts, or a short and two chars or something like that.
I think it was probably only with the introduction of ANSI C that you could (portably) pass or return larger structs.
That might be the only reasonable explanation of this absurd.
But you usually shouldn't -- if you want to return a struct it's almost always better for the caller to allocate a local variable for the result and then pass a pointer to it as an extra argument. Many other languages in fact do this for you automatically.
C will do it automatically for you if you simply return a struct ;)
-
I seem to recall that K&R C only allowed passing or returning a struct by value if it was the size of an int (i.e. register) or smaller. e.g. on a 32 bit machine you could have two shorts, or a short and two chars or something like that.
I think it was probably only with the introduction of ANSI C that you could (portably) pass or return larger structs.
K&RC 1st edition (1978)
There are a number of restrictions on C structures. The essential rules
are that the only operations that you can perform on a structure are take its
address with c, and access one of its members. This implies that structures
may not be assigned to or copied as a unit, and that they can not be passed
to or returned from functions. (These restrictions will be removed in forth-
coming versions.) Pointers to structures do not suffer these limitations,
however, so structures and functions do work together comfortably. Finally,
automatic structures, like automatic arrays, cannot be initialized; only exter-
nal or static structures can.
K&RC 2nd edition (1989)
The only legal operations on a structure are copying it or assigning to it as a
unit, taking its address with &, and accessing its members. Copy and assign-
ment include passing arguments to functions and returning values from func-
tions as well. Structures may not be compared. A structure may be initialized
by a list of constant member values; an automatic structure may also be initial-
ized by an assignment.
Let us investigate structures by writing some functions to manipulate points
and rectangles. There are at least three possible approaches: pass components
separately, pass an entire structure, or pass a pointer to it. Each has its good
points and bad points.
That might be the only reasonable explanation of this absurd.
"C is quirky, flawed, and an enormous success."
-- Dennis Ritchie
-
But you usually shouldn't -- if you want to return a struct it's almost always better for the caller to allocate a local variable for the result and then pass a pointer to it as an extra argument. Many other languages in fact do this for you automatically.
C will do it automatically for you if you simply return a struct ;)
That depends on the C compiler (or the platform ABI if there is a standard one), and the size of the struct.
For example in the standard RISC-V System V Unix ABI structs up to two pointers in size are passed or returned in registers, larger structs are passed by reference.
-
I seem to recall that K&R C only allowed passing or returning a struct by value if it was the size of an int (i.e. register) or smaller. e.g. on a 32 bit machine you could have two shorts, or a short and two chars or something like that.
I think it was probably only with the introduction of ANSI C that you could (portably) pass or return larger structs.
K&RC 1st edition (1978)
There are a number of restrictions on C structures. The essential rules
are that the only operations that you can perform on a structure are take its
address with c, and access one of its members. This implies that structures
may not be assigned to or copied as a unit, and that they can not be passed
to or returned from functions. (These restrictions will be removed in forth-
coming versions.) Pointers to structures do not suffer these limitations,
however, so structures and functions do work together comfortably. Finally,
automatic structures, like automatic arrays, cannot be initialized; only exter-
nal or static structures can.
K&RC 2nd edition (1989)
The only legal operations on a structure are copying it or assigning to it as a
unit, taking its address with &, and accessing its members. Copy and assign-
ment include passing arguments to functions and returning values from func-
tions as well. Structures may not be compared. A structure may be initialized
by a list of constant member values; an automatic structure may also be initial-
ized by an assignment.
Note that the first version of ANSI C was ratified in 1989 (and in 1990 by the ISO).
-
C is not the only language that developed into a few different "dialects" over the years. Quite a lot of old languages had the same happen to them. For example Verilog had some pretty major features added over the years, things like support for signed numbers (Yes it didn't have that way back)
-
That depends on the C compiler (or the platform ABI if there is a standard one), and the size of the struct.
For example in the standard RISC-V System V Unix ABI structs up two two pointers in size are passed or returned in registers, larger structs are passed by reference.
But look, that's even better than creating an on-stack temporary :)
I am really not that convinced that returning structs by value in C is actually a bad thing. I mean, it is bad for all the known reasons, but not worse than the alternatives. Bummer that it doesn't work on arrays, but at least we can understand the historic reason.
edit
Under the condition that your compiler is smart enough to elide the copy in the return statement.
-
I think it was probably only with the introduction of ANSI C that you could (portably) pass or return larger structs. But you usually shouldn't -- if you want to return a struct it's almost always better for the caller to allocate a local variable for the result and then pass a pointer to it as an extra argument. Many other languages in fact do this for you automatically.
With SierraC 68K, the C compiler has no problem at return a large struct since things are already allocated on the stack so the compiler simply has to *adjust* it before the function return.
------------------------------------------- caller space
push sizeof(return ans)
push sizeof(parameters vars)
push sizeof(local vars)
jsr (function)
------------------------------------------- called space
pop sizeof(local vars)
pop sizeof(parameters vars)
return
------------------------------------------- caller space
consume (ans)
pop sizeof(ans)
The problem arises with RISC machines, since they try to use registers rather than the stack, and forcing a large struct return would cause a different approach, which is proven to be "less performant" than the "passing-parameters-via-registers" and "returning-ans-via-a-couple-of-register" approach.
------------------------------------------- caller space
put parameters into register { 1, 2, 3, 4 }
jsr (function)
------------------------------------------- called space
put ans into register { 1 }
return
------------------------------------------- caller space
-
This is a rpn-evaluator, developed in 2017. It's built on a lib_tokenizer, and it checks if an expression is valid before trying to generate the machine code for an "ijvm-modified" machine.
# defvar uint32_t a;
# defvar uint32_t b;
# defvar uint32_t c;
# defvar uint32_t d;
# defunc uint32_t Foo(uint32_t b, uint32_t c, uint32_t d);
In this example, it takes the expression "a + Foo( b, c, d )" and it converts into a "stack" of { variables, function-calls, operators }
# rpn a+Foo(b,c,d)
[rpn] kind3 3:1 token_StrictAlphaNum, type21
[a] kind3 3:2 token_StrictAlphaNum, type21
[+] kind2 3:3 token_Plus, type67
[Foo] kind3 3:4 token_StrictAlphaNum, type21
[(] kind2 3:5 token_OpenBracket, type84
[b] kind3 3:6 token_StrictAlphaNum, type21
[,] kind2 3:7 token_Comma, type9
[c] kind3 3:8 token_StrictAlphaNum, type21
[,] kind2 3:9 token_Comma, type9
[d] kind3 3:10 token_StrictAlphaNum, type21
[)] kind2 3:11 token_CloseBracket, type85
types analysis: PASSED
yards analysis: PASSED
stack analysis: PASSED
expr="a + Foo ( b , c , d ) " : PASSED
rpn=
{
0 1 cookie G0 [a] token_StrictAlphaNum, type21
4 1 f_cookie G1 [b] token_StrictAlphaNum, type21
6 1 f_cookie G1 [c] token_StrictAlphaNum, type21
8 1 f_cookie G1 [d] token_StrictAlphaNum, type21
2 -2 function G0 [Foo] token_StrictAlphaNum, type21
1 -1 operator G0 [+] token_Plus, type67
}
code gen, machine=ijvm-r2
---------------------------------------------- is_G=0
r1=a
---------------------------------------------- is_G=1
xxxx r2=b, push r2, SP=1
---------------------------------------------- is_G=1
xxxx r3=c, push r3, SP=2
---------------------------------------------- is_G=1
xxxx r4=d, push r4, SP=3
---------------------------------------------- is_G=0
push stack[4]=arg_n 3
push stack[5]=ret_addr
call Foo, SP=6
accessging arg0 stack[1]
accessging arg1 stack[2]
accessging arg2 stack[3]
f_ans: r2=stack[1]
---------------------------------------------- is_G=0
r1=r1 + r2
-----------------------------------
### SP=1
### PASSED
-----------------------------------
As you can see, the answer returned by the function can be of any size, even a super large struct would be accepted if it passed the "(strong) types check", and at the low level, there is absolutely no performance problem with this approach on linear stack machines, because it's the natural way their hardware operates.
The above come from the description of the "ijvm" (integer java virtual) machine designed by Andrew S. Tanenbaum. I have modified it a bit, but it's still didactic toy, funny and useful for my researching.
-
Note that the first version of ANSI C was ratified in 1989 (and in 1990 by the ISO).
Yes. I know. It's written on the book cover.
(https://images-na.ssl-images-amazon.com/images/I/51TGEPRTDNL._SX377_BO1,204,203,200_.jpg)
What I'm trying to show is that, before the standard, structures could not be returned by functions. A return is just a copy. And structures were not allowed to be copied.
After the standard, structures could be copied and, therefore, returned. However, apparently, there was never a limit on the size or depth of the structure, other than that imposed by the implementation.
Arrays and strings could never be directly copied, before or after ANSI, except with the use of functions like memcpy(), strcpy(), etc.
-
If I was writing a compiler for a limited machine such as the 8051
There are three memory models
- small: the total RAM is of 128 bytes, all variables and parameter-passing segments will be placed in the 8051's internal memory.
- Compact: the total RAM is of 256 bytes off-chip, 128 or 256 bytes on-chip, variables are stored in paged memory addressed by ports 0 and 2. Indirect addressing opcodes are used. On-chip registers are still used for locals and parameters.
- Large: the total RAM up to 64KB (full 16bit access, physically multiplexed), 128 or 256 bytes on-chip, variables etc. are placed in external memory addressed by @DPTR. On-chip registers are still used for locals and parameters.
There are severe hw limitation for ISRs. Technically (especially on modern 51-cores, e.g. S390 and S400) you can force the code to be xdata-xcode only (large), but ...
-
If I was writing a compiler for a limited machine such as the 8051 or 6502 or things of that ilk (8080/z80 too) I would allocate 32 ints worth of global memory as RISC-style registers (preferably addressable using 8 bit or smaller addresses in each instruction e.g. 6502 "zero page"), assigned in a similar way to Aarch65 or RISC-V registers (arguments / temps / saved).
Having consumed all the available RAM for that purpose what would be your next step? :)
-
If I was writing a compiler for a limited machine such as the 8051
There are three memory models
- small: the total RAM is of 128 bytes, all variables and parameter-passing segments will be placed in the 8051's internal memory.
- Compact: the total RAM is of 256 bytes off-chip, 128 or 256 bytes on-chip, variables are stored in paged memory addressed by ports 0 and 2. Indirect addressing opcodes are used. On-chip registers are still used for locals and parameters.
- Large: the total RAM up to 64KB (full 16bit access, physically multiplexed), 128 or 256 bytes on-chip, variables etc. are placed in external memory addressed by @DPTR. On-chip registers are still used for locals and parameters.
There are severe hw limitation for ISRs. Technically (especially on modern 51-cores, e.g. S390 and S400) you can force the code to be xdata-xcode only (large), but ...
Yes, I remember all that now. ;D
A PITA. But hey, it was workable.
-
C deserves credit for being the first portable macro assembler and enabling probably the first portable operating system. It also did away with all those stupid special statements and commands which plague old imperative languages, replacing them with plain functions, written in C. You can implement full C standard library in C, try that with Pascal's writeln. The languages you listed really are DSLs in comparison, I would never call COBOL or FORTRAN a "general purpose" language. Maybe Algol, if it's true what they say about similarities to Pascal, particularly the kinds of Pascal that have been adapted to low-level programming by adding pointers and whatnot.
IDK, FreePascal is programmed with FreePascal.
I have learned and forgot the C in past 10 years. I really don't like it in sense that {} are PITA to write out of my local keyboard (I would need to have US/UK keyboard for just C ) and the case sensitivity is just plain stupid.
-
C deserves credit for being the first portable macro assembler and enabling probably the first portable operating system. It also did away with all those stupid special statements and commands which plague old imperative languages, replacing them with plain functions, written in C. You can implement full C standard library in C, try that with Pascal's writeln. The languages you listed really are DSLs in comparison, I would never call COBOL or FORTRAN a "general purpose" language. Maybe Algol, if it's true what they say about similarities to Pascal, particularly the kinds of Pascal that have been adapted to low-level programming by adding pointers and whatnot.
IDK, FreePascal is programmed with FreePascal.
I have learned and forgot the C in past 10 years. I really don't like it in sense that {} are PITA to write out of my local keyboard (I would need to have US/UK keyboard for just C ) and the case sensitivity is just plain stupid.
The Pascal variants that became widely used are not really Pascal. They were extended to the point where they were more C like, and could actually be used to write their own run time library.
-
C deserves credit for being the first portable macro assembler and enabling probably the first portable operating system. It also did away with all those stupid special statements and commands which plague old imperative languages, replacing them with plain functions, written in C. You can implement full C standard library in C, try that with Pascal's writeln. The languages you listed really are DSLs in comparison, I would never call COBOL or FORTRAN a "general purpose" language. Maybe Algol, if it's true what they say about similarities to Pascal, particularly the kinds of Pascal that have been adapted to low-level programming by adding pointers and whatnot.
IDK, FreePascal is programmed with FreePascal.
I have learned and forgot the C in past 10 years. I really don't like it in sense that {} are PITA to write out of my local keyboard (I would need to have US/UK keyboard for just C ) and the case sensitivity is just plain stupid.
The Pascal variants that became widely used are not really Pascal. They were extended to the point where they were more C like, and could actually be used to write their own run time library.
The C as it came out to be known, is widely extended and not really represent the hack-job it first were. Besides real PROGRAMMERS use boolean logic.
-
IDK, FreePascal is programmed with FreePascal.
USCD Pascal was written in UCSD Pascal. The only machine dependent code, written in assembler language, was the interpreter. This wasn't a lot of code and it was pretty easy to port to different machines. Circa '77... Considering that the Altair 8800 was introduced in '75 (I bought mine in early '76. I didn't play with UCSD Pascal until '80 bit it was an important step on the way to modern machines. The IBM PC was introduced in '81. It was a slug compared to the 6 MHz Z80 machines running CP/M.
-
Besides real PROGRAMMERS use boolean logic.
And wire-wrap
-
C deserves credit for being the first portable macro assembler and enabling probably the first portable operating system. It also did away with all those stupid special statements and commands which plague old imperative languages, replacing them with plain functions, written in C. You can implement full C standard library in C, try that with Pascal's writeln. The languages you listed really are DSLs in comparison, I would never call COBOL or FORTRAN a "general purpose" language. Maybe Algol, if it's true what they say about similarities to Pascal, particularly the kinds of Pascal that have been adapted to low-level programming by adding pointers and whatnot.
IDK, FreePascal is programmed with FreePascal.
I have learned and forgot the C in past 10 years. I really don't like it in sense that {} are PITA to write out of my local keyboard (I would need to have US/UK keyboard for just C ) and the case sensitivity is just plain stupid.
The Pascal variants that became widely used are not really Pascal. They were extended to the point where they were more C like, and could actually be used to write their own run time library.
The C as it came out to be known, is widely extended and not really represent the hack-job it first were. :-DD
The original C was fully capable of being used to write its own run time library, because is was very level focussed from the start. It was higher level stuff that was added over time. Standard Pascal always lacked low level features. You can't even specify a variable length, or do basic things like 'and' and 'or' them. TurboPascal, and other popular Pascals, added those things. MS Pascal added far less, and never got very far in the market.
-
IDK, FreePascal is programmed with FreePascal.
USCD Pascal was written in UCSD Pascal. The only machine dependent code, written in assembler language, was the interpreter. This wasn't a lot of code and it was pretty easy to port to different machines. Circa '77... Considering that the Altair 8800 was introduced in '75 (I bought mine in early '76. I didn't play with UCSD Pascal until '80 bit it was an important step on the way to modern machines. The IBM PC was introduced in '81. It was a slug compared to the 6 MHz Z80 machines running CP/M.
Almost any language can be used to write a compiler to compile itself. That is trivial. However, fully developing a language in that language means writing all the run time support in that language. In most languages this is somewhere between difficult and impossible.
-
IDK, FreePascal is programmed with FreePascal.
Yeah. That's the case for many other compilers for other languages actually. But they just had to be bootstrapped at some point. And guess how? Usually using C. ;D
I have learned and forgot the C in past 10 years. I really don't like it in sense that {} are PITA to write out of my local keyboard (I would need to have US/UK keyboard for just C ) and the case sensitivity is just plain stupid.
Of course just a matter of personal preference regarding the brackets. I personally find them alright. My keyboard also requires a modifier key for them (AltGr), but that's still a lot less pain than writing "begin" and "end" everywhere. But just preference...
Regarding the case sensitivity, I can understand your feeling annoyed, coming from Pascal, but I do not agree with you. It actually enforces a better and more consistent coding style (how I hate looking at code in which the same identifiers are alternately written in either case, or a mix of both... and I've seen it in Pascal, VHDL code... awful!), and Wirth, the author of Pascal, would agree with me as he later introduced case-sensitivity in the derivative languages he designed, such as Oberon.
-
IDK, FreePascal is programmed with FreePascal.
Yeah. That's the case for many other compilers for other languages actually. But they just had to be bootstrapped at some point. And guess how? Usually using C. ;D
You do forget that even the all mighty C is bootstrapped, propably with Assembler or maybe even with direct machine code.
... And someone mentioned my path from pascal to C. This is not the case, I would think it were more of Ms-Dos batch scripts with 4Dos extensions which did allow crude dynamic graphics, then QBasic. But there were no internet and no general languages (at that PC access). Besides Begin and End is as much stupidity as is {} in typing sense, for clarity I do prefer them because there is more contrast, both sucks though.
I would prefer some (still not invented) hybrid general purpose language which would use parts from visual (ie. Grafcet) to all the way down to machine code.
Edit. GrafSet -> GrafCet .. because French origin .... PS. This mentioned general purpose hybrid language would be called Monkey.
-
I really don't like it in sense that {} are PITA to write out of my local keyboard (I would need to have US/UK keyboard for just C ) and the case sensitivity is just plain stupid.
The use of braces {} and the case sensitivity is a direct consequence of the early enthusiasm with which the developers at the time adopted the newly arrived ASCII code, replacing the Baudot code.
The Baudot code was case insensitive and only had parenthesis.
Notice that the same thing happens to Unix and its derivatives: all case sensitive and using all those new graphic symbols.
-
The Pascal variants that became widely used are not really Pascal. They were extended to the point where they were more C like, and could actually be used to write their own run time library.
That's what make them great :) People who like Pascal syntax can program in Pascal without lacking power of C.
-
Besides Begin and End is as much stupidity as is {} in typing sense
I have implemented a lib_tokenizer(1), it handles them as "token". The only thing I can say is ... well "{" consumes less cycles than "begin", simply because it's a shorter string :D
(1) long story on the reason. It's parametric and the same code can be reused to serve a shell as well as an interpreter as well as a compiler. It's implemented as library, before using it, you need to define and pass a "dictionary", telling the library about the token (is it an operator? separator? special?), and you can also pass callbacks (pointers to functions) for special cases, e.g. this library is able to "learn" how to recognize a floating point notation, and treats it as token.
-
If I was writing a compiler for a limited machine such as the 8051
There are three memory models
- small: the total RAM is of 128 bytes, all variables and parameter-passing segments will be placed in the 8051's internal memory.
- Compact: the total RAM is of 256 bytes off-chip, 128 or 256 bytes on-chip, variables are stored in paged memory addressed by ports 0 and 2. Indirect addressing opcodes are used. On-chip registers are still used for locals and parameters.
- Large: the total RAM up to 64KB (full 16bit access, physically multiplexed), 128 or 256 bytes on-chip, variables etc. are placed in external memory addressed by @DPTR. On-chip registers are still used for locals and parameters.
I would have thought the last sentence in the paragraph you just quoted the first part of would have indicated that I was aware of what you just wrote?
"On the 8051 you'll obviously need external RAM to run reasonable sized C programs, and will use DPTR to save "register" contents to a large stack there at the start of a non-leaf function, and restore them at the end."
-
I really don't like it in sense that {} are PITA to write out of my local keyboard (I would need to have US/UK keyboard for just C ) and the case sensitivity is just plain stupid.
The use of braces {} and the case sensitivity is a direct consequence of the early enthusiasm with which the developers at the time adopted the newly arrived ASCII code, replacing the Baudot code.
The Baudot code was case insensitive and only had parenthesis.
Notice that the same thing happens to Unix and its derivatives: all case sensitive and using all those new graphic symbols.
Thanks I did have forgotten this Baudot code existence. I also assume that you do mean the 7-bit ASCII and not this 8-bit ASCII rubbish.
The machine do not eventually not know which kind of hieroglyph you did use for instruction of the functionality, the case sensitivity is only justified for the limited memory of early systems (to process long strings) and to make a hack-jobs even quIckeR. This naturally is only my opinion.
-
If I was writing a compiler for a limited machine such as the 8051 or 6502 or things of that ilk (8080/z80 too) I would allocate 32 ints worth of global memory as RISC-style registers (preferably addressable using 8 bit or smaller addresses in each instruction e.g. 6502 "zero page"), assigned in a similar way to Aarch65 or RISC-V registers (arguments / temps / saved).
Having consumed all the available RAM for that purpose what would be your next step? :)
How do you come to equate "32 ints" (i.e. 64 bytes) with "all the available RAM?
Very few 6502 computers were sold with less than 16 KB of RAM, and the vast majority with 48 KB or 64 KB (e.g. "Commodore 64")
The 8051 supports 64 KB of external RAM. You're not going to be successful compiling reasonable-sized C programs for the 128 or 256 bytes of internal memory no matter what your code generation scheme is.
-
Besides Begin and End is as much stupidity as is {} in typing sense
I have implemented a lib_tokenizer(1), it handles them as "token". The only thing I can say is ... well "{" consumes less cycles than "begin", simply because it's a shorter string :D
(1) long story on the reason. It's parametric and the same code can be reused to serve a shell as well as an interpreter as well as a compiler. It's implemented as library, before using it, you need to define and pass a "dictionary", telling the library about the token (is it an operator? separator? special?), and you can also pass callbacks (pointers to functions) for special cases, e.g. this library is able to "learn" how to recognize a floating point notation, and treats it as token.
That is just how RPL / Forth can be twisted around.
I did once do something similar and replaced some reserved keyword in Codesys-environment [Structured Text (Pascal / Ada derivative)] and with the constant declaration IIRC, I was actually surprised that there were such an "hole" left to it (because of the ST-language IEC definition is all about safety and clarity).
-
If I was writing a compiler for a limited machine such as the 8051 or 6502 or things of that ilk (8080/z80 too) I would allocate 32 ints worth of global memory as RISC-style registers (preferably addressable using 8 bit or smaller addresses in each instruction e.g. 6502 "zero page"), assigned in a similar way to Aarch65 or RISC-V registers (arguments / temps / saved).
Having consumed all the available RAM for that purpose what would be your next step? :)
How do you come to equate "32 ints" (i.e. 64 bytes) with "all the available RAM?
The 8051 supports 64 KB of external RAM. You're not going to be successful compiling reasonable-sized C programs for the 128 or 256 bytes of internal memory no matter what your code generation scheme is.
I exaggerated a little, but if you are serious about making a useful compiler for the 8051 you have to be tight about RAM usage. Half your RAM for working memory, some space for saving in ISRs, and you don't have much left. Many fairly complex programs need only a handful of variables. There are compilers for the 8051 which do a pretty good job of compiling C for applications like that, and have all variables sit in the 128 byte RAM area.
Nobody makes 8051 based devices with 64k of RAM. Some of the 8051 based USB peripheral controllers probably have the most RAM of any class of 8051 based devices. However, most recent 8051 based devices have maybe 1k to 4k of RAM. That gives you some room to move, but you still need to be tight about your RAM usage.
-
You do forget that even the all mighty C is bootstrapped, propably with Assembler or maybe even with direct machine code.
... And someone mentioned my path from pascal to C. This is not the case, I would think it were more of Ms-Dos batch scripts with 4Dos extensions which did allow crude dynamic graphics, then QBasic. But there were no internet and no general languages (at that PC access). Besides Begin and End is as much stupidity as is {} in typing sense, for clarity I do prefer them because there is more contrast, both sucks though.
I would prefer some (still not invented) hybrid general purpose language which would use parts from visual (ie. Grafcet) to all the way down to machine code.
Edit. GrafSet -> GrafCet .. because French origin .... PS. This mentioned general purpose hybrid language would be called Monkey.
C can bootstrap itself just fine.
Yes you do need bootstraping code when working in raw C to set up everything before it enters main(). But unlike a lot of languages this bootstraping code can be put inside of in a *.c file and put trough the compiler together with all the rest of the code. The only thing that makes that code special is that the linker is told to place that code at the reset vector of the final program so that when execution starts the CPU starts running at the beginning of the bootstrap code.
That being said, usually a least a part of the startup code tends to be written in assembly (Using C before its environment is properly set up makes it very touchy about certain things, so its more reliable to do it this way) but due to the low level way C works it lives in synergy with assembler rather than being above it. The compiler will happily insert a block of assembler code anywhere in the middle of a C program if you ask it for that. It will happily call assembler subroutines the same way it calls any other C function and assembler subroutines can call any C function they like, including any C libraries. This free mixing of C and assembler code is not just there to DIY bootstrap it as some hack or something, its a intended feature of C and is commonly used for hand optimizing critical parts of code. This sprinkling of assembler code is also used for writing operating systems in C, as it lets it do some funky unusual things to a program such as save and restore execution state in order to create a multitasking scheduler and similar OSy things.
This is why C does not need any external bootstrapping to get it going, it is its own bootstrap.
-
You do forget that even the all mighty C is bootstrapped, propably with Assembler or maybe even with direct machine code.
... And someone mentioned my path from pascal to C. This is not the case, I would think it were more of Ms-Dos batch scripts with 4Dos extensions which did allow crude dynamic graphics, then QBasic. But there were no internet and no general languages (at that PC access). Besides Begin and End is as much stupidity as is {} in typing sense, for clarity I do prefer them because there is more contrast, both sucks though.
I would prefer some (still not invented) hybrid general purpose language which would use parts from visual (ie. Grafcet) to all the way down to machine code.
Edit. GrafSet -> GrafCet .. because French origin .... PS. This mentioned general purpose hybrid language would be called Monkey.
C can bootstrap itself just fine.
Yes you do need bootstraping code when working in raw C to set up everything before it enters main(). But unlike a lot of languages this bootstraping code can be put inside of in a *.c file and put trough the compiler together with all the rest of the code. The only thing that makes that code special is that the linker is told to place that code at the reset vector of the final program so that when execution starts the CPU starts running at the beginning of the bootstrap code.
That being said, usually a least a part of the startup code tends to be written in assembly (Using C before its environment is properly set up makes it very touchy about certain things, so its more reliable to do it this way) but due to the low level way C works it lives in synergy with assembler rather than being above it. The compiler will happily insert a block of assembler code anywhere in the middle of a C program if you ask it for that. It will happily call assembler subroutines the same way it calls any other C function and assembler subroutines can call any C function they like, including any C libraries. This free mixing of C and assembler code is not just there to DIY bootstrap it as some hack or something, its a intended feature of C and is commonly used for hand optimizing critical parts of code. This sprinkling of assembler code is also used for writing operating systems in C, as it lets it do some funky unusual things to a program such as save and restore execution state in order to create a multitasking scheduler and similar OSy things.
This is why C does not need any external bootstrapping to get it going, it is its own bootstrap.
My point were that and did answered to that when code monkeys at Bell labs did wrote the first C-compiler they did bootstrap it or wrote it (to solve causality dilemma) with existing language. Just like if you do make a lathe, the axels are not made with lathe because a such tool doesn't exist.
That Freepascal as far as I know is independent just like you describe for which ever C is at your example or use some inline assembly code here and there.
Programming languages are merely a tools (for fools).
-
My point were that and did answered to that when code monkeys at Bell labs did wrote the first C-compiler they did bootstrap it or wrote it (to solve causality dilemma) with existing language. Just like if you do make a lathe, the axels are not made with lathe because a such tool doesn't exist.
That Freepascal as far as I know is independent just like you describe for which ever C is at your example or use some inline assembly code here and there.
Programming languages are merely a tools (for fools).
Sorry i was talking more about the startup of a C program. Where a C program requires its execution environment(stack,globals,heap...) to be set up by the startup script before the actual C code is run. This startup script that bootstraps the actual compiled C code into execution can itself be written in C code so that the first instruction executed by the CPU is actually coming from C code and is used to start execution of the actual C code you wanted to run.
As for how the first working C compiler came into existence when there was no C compiler to compile it. well... it depends a bit on how you look at it. The way it happened is this:
1) A compiler for a weird cut down primordial version of C was written in NB (newB that was written in assembler) and compiled to machine code so it could be executed
2) This compiler was then fed the "weird primordial C" source code of itself and as a result generated an assembler translation of that source code
3) This compiled assembler source code was then fed trough a assembler to turn it into machine code
4) We now have machine code that can compile itself by looping back to step 2
Once this was working the cycle from 2 3 4 is repeated in small steps, each step adding more features to C. Some steps supporting multiple syntaxes for the same thing so that a new syntax could be introduced but still compile with the old compiler, once the new syntax was in the compiler the old one could be deprecated once all traces of it are removed from the source code. After enough of these steps you eventually get to something that would be considered real C code and will compile on modern compilers such as gcc (That first source code from step 1 won't successfully compile on any of our existing C compilers because its too different from C that any of the todays compilers can understand). So one way to interpret this is "The first C compiler was written in a beta version of C"
But yes if you go down the chain the beta version of C had to be compiled first in NB. But NB had to be compiled from assembly code. But to compile the assembly code there had to be a assembler program, and this assembly program had to at some point have been loaded into memory using switches, said switched being toggled by a human directly or a from a set of holes in tape that that some point got punched out by a human. So from that view even assembler needed to be bootstrapped by the ultimate bootstrapping tool, a human flipping bits in memory.
EDIT: Oh and the first assembler program ever run on a given system was likely not written entirely in machine code toggled into memory by hand. But instead the engineer likely used another already working computer to compile it and then just transfer the machine code into its memory. Or if they didn't have a already working computer at there disposal they would manually toggle in a small tool into memory to make a crude "hex editor" so that they could easily manipulate memory and execute it using the computer itself. Perhaps using the hex editor to add some features to the hex editor itself and once that was usable enough start writing the assembler program using the hex editor. The result would likely be a crappy dumb assembler program that would end up being used to write a smarter better version of itself.
-
Well machine language is waiting at the ROM registers (or which ever non volatile array of bits) to be called about, put there by the designer(s) of the CPU and that pointing is done with assembly mnemonics (if computer is available), isn't that the case. That is the reason the assembly mnemonic sets differ from CPU to CPU (or uCU).
Isn't the C many times described something like machine independent assembler or something like those lines. Anyhow it is close to assembly with its original form (ansi-C) with only minimal of complex structures (datatypes, branching etc.) what I can remember. I still don't like the case sensitivity, {} and the religious twist around it.
- JMP for life!
-
The early K&R C was more cryptic than modern ANSI-C. I consider there steps from early ANSI C to C-99 or other modern versions (not C++ and the like) are relatively small compared to this.
Especially the early K&R C is in many aspects relatively close to the machine and by some not really considered a higher level languish. It is still quite a bit more than the usual assembler - especially as the early assembler versions had less features, like macros.
There is also quite a bit going on in the preprocessed (all the defines), so it is not just the compiler. The early pre-processor may be borrowed from an assembler or other languish.
-
The early K&R C was more cryptic than modern ANSI-C.
Well, certainly, early, non-standardized C was very preliminary, even though the main ideas were all there. You shouldn't have to deal with it unless you're looking at VERY old source code.
-
#include <stdio.h>
int foo(i)
int i;
{
return i+1;
}
void main()
{
int j;
j = foo(14);
printf("%d\r\n",j);
}
[/font]
You might run into K&R syntax using the cc compiler included with 2.11BSD Unix running on a PiDP11 emulation of the PDP11/70.
Among other things, parameters are defined in the lines following the function definition but before the opening brace.
Parameters can not be 'void' but can be omitted.
The original 'The C Programming Language' book may help. The ANSI version? Not so much...
https://obsolescence.wixsite.com/obsolescence/pidp-11
-
The use of braces {} and the case sensitivity is a direct consequence of the early enthusiasm with which the developers at the time adopted the newly arrived ASCII code, replacing the Baudot code.
The Baudot code was case insensitive and only had parenthesis.
Oh come on. No one ever wrote programs in Baudot codes; they were pretty much long gone by the time C was being developed. ASCII came along ~1963, and most computer teleprinters were ascii, ebcdic, or something proprietary. Programming was done by cards, and keypunches used other codes (but not baudot.)Also, there were 7bit baudot-like codes, used by some newswires. Because people decided that lowercase was important!
Notice that the same thing happens to Unix and its derivatives: all case sensitive and using all those new graphic symbols.
We could have had the APL character set!
K&R may have been gloating a bit over new terminals and/or printers with more characters than an ASR33 or VT05, but I don't think you can blame it on the introduction of ASCII.
-
I've just committed to buying a Digital DEC VT525 with its original keyboard from the UK, and the price is a bit high (400 UKP shipped) because the unit is very rare and because the included keyboard is a special one.
It looks like a common ps/2 keyboard, but it has special keys; some are dedicated to the vt-terminal functions, some are a programmable shortcut for ESC-codes, some still look too weird to me, but even the key-scan is rather rare because it's a true key-scan/type3, while PCs keyboards use type2.
It also seems that years ago, if you wanted a keyboard with a "layout for programmers", you had to specifically find a special keyboard, whereas nowadays, you can cheaply purchase a common (key-scan/type2) keyboard with the UK or USA layout, and you are set.
Anyway, this keyboard is so special not only because it has a full "layout for programmers", specifically made to support Unix and C (so it has keys like <> \/ | ^ " ' {} &, etc ..), but rather because it allows you to manually enter a key-scan code, and it can also be reprogrammed on the fly.
WOW. Now I can type C things as well as I can reprogram some keys to type letters in Icelandic (oh well, this requires special support on the host side ... something to map into "unicode") :D
-
Oh come on. No one ever wrote programs in Baudot codes; they were pretty much long gone by the time C was being developed.
The Baudot code, in several of its incarnations (ITA2, US-TTY, etc.), kept on being used long after the introduction of ASCII and the creation of C.
K&R may have been gloating a bit over new terminals and/or printers with more characters than an ASR33 or VT05, but I don't think you can blame it on the introduction of ASCII.
Never said ASCII was to blame. I said they got enthusiastic about its introduction.
We're lucky that Unicode wasn't around when C was created, otherwise we could be writing C like this:
int main()
🍔
printf( 👯Hello, world!👯 ) 🤮
💩
-
I've just committed to buying a Digital DEC VT525 with its original keyboard from the UK, and the price is a bit high (400 UKP shipped) because the unit is very rare and because the included keyboard is a special one.
Wow! a VT525m, color me green with envy!
Back in the late 80's we used to fight over the Wyse terminals when I was doing data communications, but we didn't have any Dec terminals, I've never see one up close.
-
Well there are many ways to skin a cat.
Brackets provide a nice convenient way to enclose something, but non US keyboards don't provide 1 key access to most of them. The only one on my SI keyboard is < while >[]{}() all require a 2 key combination. The ever so common ; is also a 2 keys to get ti and yet its on the end of almost every line of C code. Also the most common bracket type the () is not accessible by 1 key press even on US keyboards. So you get used to holding down the modifier keys quite quickly when programming.
The alternative to brackets is the oldschool "begin" "end" that still lives on today in Verilog and il be honest i hate it. I much rater press a 2 key combo than type it out, also it just makes the code look more cluttered by putting in more words than needed. So far the only bracket alternative i found nice is the Python way where you just tab out the indent and that's it(something you would do in C anyway to make it readable), the idea of using whitespace as a programing character is a bit disturbing but it does work pretty well.
def main()
print('Hello world')
As oposed to
function main():integer;
begin
print('Hello world');
end
As for case sensitivity this is pretty much a standard feature is the majority of languages today. And its for a good reason, there is nothing to gain by letting the user write a variable as MyVar in one spot and then myVar somewhere else and myvar in another spot. It just leads to ugly and harder to read code (Especially when more than 1 people work on the same program). But if you want to have C with no case sensitivity all you have to do is add this to your makefile "tr '[:upper:]' '[:lower:]' < input" to convert your C files to all lower case before feeding it into the compiler.
There is no one right way to do programing syntax, but its generally accepted that brackets and case sensitivity is a good way so lots of languages use it today.
-
Back in the late 80's we used to fight over the Wyse terminals when I was doing data communications, but we didn't have any Dec terminals, I've never see one up close.
We are in contact with a military installation located in Greenland and managed by the Danish Government. They have some weird equipment there, which needs to be repaired or replaced with something new.
One of this was a portable vt-100 in laptop shape, made by Data General with the commercial name of "Walkabout". The "SX" series was a common 386SX laptop running DOS + Kermit, but their version is not the "SX" but rather something that runs a dedicated OS running a dedicated program with a ROM containing proprietary terminal protocols as well as common GG, DEC vt220 and vt100.
Oh, and once again, the keyboard is better than most modern-day notebooks, it is very light-weight and sturdy, unfortunately, it doesn't come with all the key you need to operate with Unix and to program in C because I think they have never programmed in C on that thing.
We cannot ask questions, so I really don't know what they used it for :-// ?
The only clear thing is: "C and Unix" did change the keyboard layout, on terminals, on general purpose computers, and even on laptops :D
-
We're lucky that Unicode wasn't around when C was created, otherwise we could be writing C like this:
I am not sure about this. Writing in Groelan is very difficult, and ASCII makes it even more difficult because you do not have the proper keys to press when you compose a word. The same applies to Danish and Icelandic, and probably to many more human languages of which I have never experimented directly, but I am 100% sure that arctic polar computers are very happy when you have a Unicode extension in your box.
It applies to emails, notes, things that scientists(1) do daily write to track their studies, while Unix and programming languages are more modeled around the English language.
(1) usually around the artic pole it's full of biologists, marine biologists, geologists, ... and a lot of volcanologists around Iceland.
-
So you get used to holding down the modifier keys quite quickly when programming.
Yeah! That holy magic key :D
On my DOS laptop, the key was managed by the operating system, so I lost it when I used Minix and RiscOS because my keyboard was dumb and the OS didn't care about any extra "modifier key".
The "magic" thing of the VT525 is that its firmware directly handles the modifier-key, and it directly sends out on the serial line the correct ASCII code of the wanted char!
I love it ;D
e.g.
you press the "modifier key" and the keys "9" and "1" on the numeric pad, and the Vterm sends out "[", while on DOS ... when you press "Alt"(1), "9", "1", and the keyboard sends the corresponding key-scan (which means press-and-released codes) for the pressed keys to keyboard-handler, which is managed by the operating system, which has to interpreter what you have typed and react.
(1) "Alt" was one of the common choices for the modifier key, because PCs keyboards do not have a dedicated key, so several implementations are possible.
-
I am not sure about this. Writing in Groelan is very difficult, and ASCII makes it even more difficult because you do not have the proper keys to press when you compose a word. The same applies to Danish and Icelandic, and probably to many more human languages of which I have never experimented directly, but I am 100% sure that arctic polar computers are very happy when you have a Unicode extension in your box.
ASCII was developed for the telecommunications industry. The Bell Labs were a subsidiary of AT&T. So it was only natural that the developers of C and Unix have decided to adopt it. Multics, the father of Unix, was designed as a multitask, time-sharing operating system to be accessed primarily via remote terminals. They intended to sell computing time to subscribers like any other utility.
People outside the US usually complain that their keyboard layouts do not encompass all the printable ASCII characters. But that is not a problem with ASCII.
Many non-US keyboard layouts have all the characters you find in the US standard keyboard, plus their specific ones.
-
People outside the US usually complain that their keyboard layouts do not encompass all the printable ASCII characters. But that is not a problem with ASCII.
Printable ASCII chars don't have any Chinese chars, Japanese chars, Icelandic chars, Danish chars, etc.
(https://cdn.shopify.com/s/files/1/1014/5789/files/Standard-ASCII-Table_large.jpg)
The 7-bit version of the printable ASCII chars have all the English chars, while the 8-bit extesnion introuces a few accented letters.
(https://cdn.shopify.com/s/files/1/1014/5789/files/Extended-ASCII-Table_large.jpg)
So, with the Extended ASCII set it's not a problem for French and Italian languages because Printable ASCII chars cover all the accented letters, but other human languages have many more accented and different letters so they have serious problems, and this is a big problem with Word processors unless you introduce Unicode.
OrangeBox # myemerge-log sys-libs/ncurses:unicode
So, if you have a Unix system with the basic "sys-libs/curses" (note that it's not "sys-libs/ncurses", the "n" stands for "new-support"), you have a problem if you to type a letter in Icelandic. Anyway, you need to recompile the word processor as well as the system libraries.
If you don't find a char in the Extended ASCII table, well ... without Unicode, you have to "remap" the hiest bank in the ASCII chars set (from 128 to 255) to something can serve you. This worked somehow, but it was prone to chaotical results on different systems.
-
If you don't find a char in the Extended ASCII table, well ... without Unicode, you have to "remap" the hiest bank in the ASCII chars set (from 128 to 255) to something can serve you.
Specifically, "Code page 861" is the canonical "remapped" code page used to write the Icelandic language.
-
I see the widespread use of ASCII only a good thing. Means that computers of all sorts can read each others text files.
You don't need a real US keyboard to have access to access all of the standard ASCII characters of 32 to 127. Pretty much all keyboards have keys for it, just that some of them are moved behind modifier keys to make room for the extra characters that language uses. There would be reason to complain if a programing language used the € character for something since its usually not found on keyboards outside of Europe.
Its simply not possible to fit all characters the world might need into ASCII even if you use all of the 255 characters. So for this reason Unicode came around to extend the ASCII set into all these other strange characters and give each one a standardized code so that it would render correctly on any computer. Also there is the convenient UTF-8 encoding that makes Unicode text render just fine except for the characters that are not in ASCII.
The thing that makes no sense if why we have QWERTY, QWERTZ and AZERTY layouts for the letters. It doesn't add any new letters, just moves existing letters around, making keyboards around the world more different from each other for no real benefit.
-
So, with the Extended ASCII set it's not a problem ...
Not true, it is a big problem. Formally, there's no such thing as "extended ASCII".
Second, there's no reliable way to know which "extended ASCII" set has been used (there are more than 220 different ones).
Using "extended ASCII" is a recipe for disaster.
The only real ASCII set has always the msb set to zero (7 bits only).
https://en.wikipedia.org/wiki/Extended_ASCII
-
People outside the US usually complain that their keyboard layouts do not encompass all the printable ASCII characters. But that is not a problem with ASCII.
ASCII does not encompass the entirety of the latin characters (áéóôüñ and so on), let alone the mostly different ones. Whoever lived in other countries had to fight a constant battle with MODE CODEPAGE PREPARE and CHCP in the DOS days. The printers then? Lots of fun with neverending streams of continuous feed printer paper or changes in formatting due to non-standard characters sent to the printer. :-DD
-
ASCII does not encompass the entirety of the latin characters (áéóôüñ and so on), let alone the mostly different ones. Whoever lived in other countries had to fight a constant battle with MODE CODEPAGE PREPARE and CHCP in the DOS days. The printers then? Lots of fun with neverending streams of continuous feed printer paper or changes in formatting due to non-standard characters sent to the printer. :-DD
With UTF-8 this is a thing of the, now remote, past. The first 128 characters of UTF-8 are identical to ASCII. Together with the next 128 code points, you have Latin-1, the widest used "extended ASCII" set. And all the other 4 billion code points are there to code simply all the glyphs produced by humankind since the invention of writing by the ancient Sumerians.
Look at the characters I can get directly from my keyboard:
'1234567890-=qwertyuiop'[asdfghjklç~]zxcvbnm,.;\
"!@#$%"&*()_+QWERTYUIOP`{ASDFGHJKLÇ^}ZXCVBNM<>:| SHIFT
¬¹²³£¢¬{[]}\§/?€®ŧ←↓→øþ´ªæßðđŋħ ̉ĸł'~º«»©“”nµ · ̣º ALTGR
¬¡½¾¼⅜¨⅞™±°¿˛/?€®Ŧ¥↑ıØÞ`¯Æ§ÐªŊĦ &Ł˝^º<>©‘’Nµ×÷˙˘ ALTGR + SHIFT
And if I combine the dead keys with the appropriate characters above I get:
ÁÉÍÓÚáéíóúÀÈÌÒÙàèìòùÂÊÎÔÛâêîôûÃẼĨÕŨãẽĩõũṔṕÝýŚśĀĒĪŌŪāēīōūĂĔĬŎŬăĕĭŏŭṠȦĖİȮ, etc., etc., etc.
Chinese characters? No worries.
#include<stdio.h>
int main()
{
printf( "欢迎来到中国\n" );
}
$ ./utf-8
欢迎来到中国
$
Welcome to the 21st century.
-
With UTF-8 this is a thing of the, now remote, past. The first 128 characters of UTF-8 are identical to ASCII. Together with the next 128 code points, you have Latin-1, the widest used "extended ASCII" set. And all the other 4 billion code points are there to code simply all the glyphs produced by humankind since the invention of writing by the ancient Sumerians.
Yup. 16-bit Unicode should have never existed to begin with. It was a disgrace, used unnecessary space and was a huge problem-maker for porting existing apps.
UTF-8 is great. You almost have nothing to do to support it, except when you need to delimit/count characters. And even that is pretty easy with just a couple rules to know and apply.
-
Yup. 16-bit Unicode should have never existed to begin with. It was a disgrace, used unnecessary space and was a huge problem-maker for porting existing apps.
UTF-8 is great. You almost have nothing to do to support it, except when you need to delimit/count characters. And even that is pretty easy with just a couple rules to know and apply.
Perhaps 16 bit Unicode was a trick played on Microsoft management who looked at how many characters were in a Microsoft Chinese font, instead of looking at how many Chinese characters there really are. :)
-
Perhaps 16 bit Unicode was a trick played on Microsoft management who looked at how many characters were in a Microsoft Chinese font, instead of looking at how many Chinese characters there really are. :)
;D
Well, ahah. But probably not. Both approaches can be justified. 16-bit Unicode had the merits of having fixed-size characters, so that probably appeared to be much simpler to deal with (after all, it was just a matter of changing the size of a "char"). All code could be in theory reused just by redefining a type. In practice though, this change was often more of a burden than it initially appeared.
OTOH, complex parsers, or text editors, especially if not well written, could be a lot more hassle to port to UTF-8 than Unicode.
-
16-bit Unicode had the merits of having fixed-size characters, so that probably appeared to be much simpler to deal with (after all, it was just a matter of changing the size of a "char").
Except for composite symbols of course :)
-
16-bit Unicode had the merits of having fixed-size characters, so that probably appeared to be much simpler to deal with (after all, it was just a matter of changing the size of a "char").
Except for composite symbols of course :)
Well, isn't this more like UTF-16 than the original 16-bit Unicode that MS implemented? Not sure about that, just a question...
-
Well, isn't this more like UTF-16 than the original 16-bit Unicode that MS implemented? Not sure about that, just a question...
Unicode has code points for accented characters (for example 0x00e9 is e with "accent de gue"), but the same character my be composed, for example e (0x0065) followed by "combining" accent de gue (0x0301).
Most funny application is Mac OS, where the file names must be converted to canonical form (I think it's composed, but I don't remember exactly) before use. As a result, different UTF-8 strings may refer to the same file - cannot use strcmp().
-
Well, isn't this more like UTF-16 than the original 16-bit Unicode that MS implemented? Not sure about that, just a question...
Unicode has code points for accented characters (for example 0x00e9 is e with "accent de gue"), but the same character my be composed, for example e (0x0065) followed by "combining" accent de gue (0x0301).
Most funny application is Mac OS, where the file names must be converted to canonical form (I think it's composed, but I don't remember exactly) before use. As a result, different UTF-8 strings may refer to the same file - cannot use strcmp().
Oh, I see! Well those combinations remind me of using Latex with no babel package, or similar.
- Just a note: "acute" is "accent aigu" in French (if that's what you were trying to spell.) ;D -
-
- Just a note: "acute" is "accent aigu" in French (if that's what you were trying to spell.) ;D -
:) I'm sorry about that. I felt something was wrong. I should've gone with "accent grave".
-
People outside the US usually complain that their keyboard layouts do not encompass all the printable ASCII characters. But that is not a problem with ASCII.
ASCII does not encompass the entirety of the latin characters (áéóôüñ and so on), let alone the mostly different ones. Whoever lived in other countries had to fight a constant battle with MODE CODEPAGE PREPARE and CHCP in the DOS days. The printers then? Lots of fun with neverending streams of continuous feed printer paper or changes in formatting due to non-standard characters sent to the printer. :-DD
With UTF-8 this is a thing of the, now remote, past.
My point exactly. It was a problem with ASCII used by computer systems of yore.
-
I still don't quite see the point.
ASCII was clearly designed with the English language in mind (so no accents) and kind of the lowest common denominator as far as latin letters and symbols go, that would fit within 7 bits of data. It was a limitation, but already a nice step forward.
-
With UTF-8 this is a thing of the, now remote, past.
My point exactly. It was a problem with ASCII used by computer systems of YORE.
TIFIFY ;)
-
ASCII was clearly designed with the English language in mind (so no accents)
Not entirely true. Some diacritical symbols are there `, ^, ~. Other punctuation symbols can double as accents: ', ". Remember that when ASCII came about, printers were the main machine-human interface, not video terminals. On a printer you can type LETTER+BACKSPACE+ACCENT, or ACCENT+LETTER (like on typewriters) if you configure accents as dead keys.
Do you need a c-cedilla to write "Ça va? Ça va bien, merci!"? Just print C, then backspace, then comma and you're good to go.
Tre[CTRL+H]`s chic.
It was a limitation, but already a nice step forward.
No doubt.
-
I've sort-of been waiting for the first language/ide to allow user-specified mark-up of the source code. (no, not just some scheme done dynamically by the IDE. Actually IN the source code.) It would be ... interesting.
Hmm. Which is worse, punctuation-heavy languages (like C), or languages with many English keywords?
-
I've sort-of been waiting for the first language/ide to allow user-specified mark-up of the source code. (no, not just some scheme done dynamically by the IDE. Actually IN the source code.) It would be ... interesting.
Then you're probably waiting for "westfw-forth" ;-)
-
void foo()
{
char_UTF8_t msg1[]="欢迎来到中国";
uint8_t msg2[]="欢迎来到中国"; /* there was warning here, it'was somehow handled as ASCII 8bit */
uint32_t len1=sizeof(msg1)-1;
uint32_t len2=sizeof(msg2)-1;
}
len1 = 24 byte
len2 = 6byte
Houston, we have a problem :o
edit:
When I manually copied the piece of code, I forgot to add "-1" after each sizeof() in the example.
I have also just renamed "size" with "len".
-
void foo()
{
char_UTF8_t msg1[]="欢迎来到中国";
uint8_t msg2[]="欢迎来到中国"; /* there was warning here, it'was somehow handled as ASCII 8bit */
uint32_t size1=sizeof(msg1);
uint32_t size2=sizeof(msg2);
}
size1 = 24 byte
size2 = 6byte
Houston, we have a problem :o
I don't see any problem... uint8_t is 1 byte. Probably it will store the first 1.5 chinese characters scattered through 6 bytes
-
legacy has a problem with some bullshit dinosaur-era compiler, as usual :P
This is a UTF-8 encoded string. GCC compiles it just fine on my system and it did for many years.
If you want to be standard-compliant, since C11 you type the literal as u8"欢迎来到中国" and every compiler is supposed to handle it correctly regardless of locale or anything.
By the way, I don't know WTF is char_UTF8_t and what that compiler is doing. The string as posted here on the forum is 18 bytes long. When using C, add 1 for null-termination. That's still neither 24 nor 6.
-
If you want to be standard-compliant, since C11 you type the literal as u8"欢迎来到中国" and every compiler is supposed to handle it correctly regardless of locale or anything.
Yep.
Also note that you can insert UTF-8 characters in numeric form inside string literals using the \u or \U escaped prefixes.
Eg: u8"\u03BC"
which is the small "mu" greek letter.
-
I don't see any problem
The problem is that warning-message since the UTF-8_t message somehow passed with a "cast" even if uint8_t is not the right type.
I would have been happy in seeing the C compiler issuing some error-message so the user could fix the mistake.
-
legacy has a problem with some bullshit dinosaur-era compiler, as usual :P
Yup. My team supports EOL computers; we are on things that are 20 years old, not older than this, but usually not more modern than this. Anyway, I was considering the DDE used for RISC/OS classic, whose C compiler is *supposed* to have some modern support for UTF.
18 bytes long
yup, this is a second problem: the string has *somehow* been handled as UTF-32 (each char is 4 byte, always) rather than UTF-8 (variable length).
-
I don't see any problem
The problem is that warning-message since the UTF-8_t message somehow passed with a "cast" even if uint8_t is not the right type.
I would have been happy in seeing the C compiler issuing some error-message so the user could fix the mistake.
I don't quite get your point. The compiler issued a warning, if I got it well? It's the appropriate behavior of a C compiler with fishy casts/conversions. (As was mentioned, your example doesn't show the right way of handling UTF-8 with modern C, but if your compiler doesn't support this, then it doesn't support officially UTF-8 either, or apparently in a way that's completely implementation-specific. As magic said, the char_UTF8_t is non-standard AFAIK.)
If you think a C compiler is too liberal issuing a warning here instead of an error, either use a stricter language, or set up a *zero warning* policy, which is what should be done in any serious development team (and I think is required in most "safe C" rules such as MISRA-C and many others). I personally don't tolerate ANY warning. If you don't trust yourself or others to follow that policy, many compilers have a flag to treat all warnings as errors. Enable this. If both approaches fail, stop using C.
-
char_UTF8_t is non-standard
I guess it's a "typedef", defined somewhere in the DDE ecosystem. I have to investigate.
-
I think is required in most "safe C" rules such as MISRA-C
In avionics, we have to pass external tools' validation.
(http://www.downthebunker.com/chunk_of/stuff/public/projects/arise-v2/tokener/pic/tokenizer.png)
(this funny image rapparesents the lib_tokenizer v8)
Anyway, the HL compiler I have been designing for Arise-v2 is able to recognize a Unicode string at the token layer (as Gcc does, I guess) since my lib_tokenizer is able to pass this information to the upper layers, but since I have just banned every kind of "casting", the compiler would have issued a serious error due to the type mismatch.
# evaline char_UTF32_t msg1="欢迎来到中国";
[char_UTF32_t] kind3 4:1 token_StrictAlphaNum, type21
[msg1] kind3 4:2 token_StrictAlphaNum, type21
[=] kind2 4:3 token_Assign, type39
[欢迎来到中国] kind3 4:4 token_String_UTF32, type424
[;] kind2 4:5 token_Semicolon, type92
The lib_tokenizer passes the token list to the parser, and when the parser sees "token_String_UTF" next to "char_UTF32_t" it checks if the "data-type" matches the define, and if not, it issues an error.
-
I think is required in most "safe C" rules such as MISRA-C
In avionics, we have to pass external tools' validation.
Of course. The "zero warning" policy is a general rule. Warnings may be issued from external tools, and/or from compilers. Same policy.
External tools will usually detect a lot more than the compilers, but conversely, in case the compiler detects a problem that the other tools didn't (which admittedly would be a rare event), the code should still be rejected.
As to using UTF-8 *inside* source code files, I'm against it, until I change my mind. Maybe someday. It's too slippery still. Only tolerated use is inside comments, but still not too fond of that.
Dealing with UTF-8 inputs and outputs, yes! But editing your source code as UTF-8, nope. Two very different things. It may look cute, but poses a number of problems in practice.
What it means is that the only acceptable UTF-8 litterals, to me, are UTF-8 codes inserted in numeric form, as I gave an example for above. Of course it's mainly just for often-used symbols that don't exist in ASCII, not full sentences!
Full sentences should not be written as litterals anyway IMO. Most cases for them are for words/sentences in various natural languages. There is a potential need for translating them, so putting those directly as litterals is not that great an idea. Or, the default language should be English. I know it's biased, but we deal with it and move on.
Just my 2 cents.
-
Of course. The "zero warning" policy is a general rule. Warnings may be issued from external tools, and/or from compilers. Same policy.
External tools will usually detect a lot more than the compilers, but conversely, in case the compiler detects a problem that the other tools didn't (which admittedly would be a rare event), the code should still be rejected.
QA guys do not use the C compiler, it's not their job. They check the source with external tools, which have been certified for the validation. Those things are very expensive, usually 50K euro per license, so bugs inside the validator are *(1)supposed to be* very very rare.
The developing squad does use the C compiler and ... well ... it's a policy that Adamulti rejects the source if there is a single warning (-wall), but this is not considered by QA guys.
The testing squad does also use the C compiler, and "-wall" usually means something smalls bad, so the code is not rejected but re-addressed to some senior Engineer to have some word about it.
Anyway, what actually makes a code passed is the QA-tool, plus a signature on the test-report, plus someone (usually the team leader) who sign the commit with his name.
Moral of the story, it's not a formal rule, but "-wall" statistically does not pass :D
edit:
(1) fixed
-
only acceptable UTF-8 litterals, to me, are UTF-8 codes inserted in numeric form
Just checked: the DDE's C compiler does accept it, and the result (size(msg)-1) is now correctly shown by RISC OS Adjust. So it was something in the DDE ecosystem that - between what you type and what the compiler sees - didn't correctly understand the input.
Funny. Thanks for the tips! :D
-
As to using UTF-8 *inside* source code files, I'm against it, until I change my mind. Maybe someday. It's too slippery still. Only tolerated use is inside comments, but still not too fond of that.
Huge numbers of developers don't know English very well, and comment their source code in their mother tongue. You may not like source code being regarded as UTF-8, and the C spec may say its not valid. However, you are going to have a real hard time convincing someone whose English is poor to make their source code ASCII, when UTF-8 is working just fine for them.
-
I'm actually not sure if C spec has anything against UTF-8 in comments. If it does, it probably also prohibits Windows-125x, ISO-8859 et al. Guess what, I have never seen a compiler which enforces that :)
-
I'm actually not sure if C spec has anything against UTF-8 in comments. If it does, it probably also prohibits Windows-125x, ISO-8859 et al. Guess what, I have never seen a compiler which enforces that :)
I didn't say there was. I just said I'm not fond of them. And yes, any compiler (unless it's a real dinosaur that rejects > 7-bit characters) will gladly just ignore them. UTF-8 or any 8-bit charset.
I'm not fond of comments in UTF-8 because they imply editing the source code as UTF-8, which in turn can be an incentive to use UTF-8 outside of comments. That's my rationale behind it, no editing in UTF-8.
One of the reasons being that opening an UTF-8 encoded file in an editor not supporting it can break everything badly in some cases. And shit happens. Another is that, outside of comments, using UTF-8 identifiers, for instance, could be an enormous disaster for code readability. I'm not even sure this is supported by C compilers, but if it ever becomes so, ouch. And yes I know it's not fair to people with languages using non-latin letters, but we have enough problems with software engineering as it is. ;D
Windows-125x, ISO-8859 etc. charsets are a different beast. Opening the files in any editor will not break anything. You may just not see some characters as intended, but saving the files will NOT change them (unless the editor is very broken.)
Just of course my approach. As I said, I may change rules when UTF-8 becomes more ubiquitous and supported in a clear fashion in all the programming languages, and code editors, that I'm confronted with.
-
QA guys do not use the C compiler, it's not their job. They check the source with external tools, which have been certified for the validation. Those things are very expensive, usually 50K euro per license, so bugs inside the validator are *(1)supposed to be* very very rare.
(...)
Just a few thoughts about what you said. By no means are they hard truths, just my (and those of some teams I've worked with) opinion/rules/habits. I know every field, and potentially every company in each field, has a different approach and different policies.
- I think this would be a mistake that QA teams only deal with source code, and not with the build tools, and the artefacts of the builds, which are every bit as important to check. So in that matter, QA people not even looking at the result of building steps is not fully right IMO. Ultimately, the build artefacts end up in the final product, not source code. So the whole chain is important.
- You didn't exactly say that, but if you think this would rather be a job for the test teams, I don't quite agree with that. Test teams are for testing. Not for checking that the object code/binaries/whatever (the artefacts) are correctly built or if there are any message from the tools. I'd think it would be more of a job for QA. Of course this is just an organizational detail, but it's important regarding who is taking responsibility for what, or who even looks at what...
- Compilers and build tools in general are not just used by developers/engineers. They are ultimately used for the builds that are tested and eventually become released. So it would be a mistake that only engineers care about them and their outputs, as said above.
- Regarding source code only, no tools, however expensive they are, are perfect. So combining checks from different tools, including, why not, the compilers themselves, is always a good idea. Don't put all your eggs in the same basket.
- I again don't tolerate ANY warning from compilers and build tools in general. I've just checked in the MISRA-C recommendations, and it's not really even mentioned actually (I think it's more of a commonly seen company policy). But I think this is an important rule to enforce. Writing code that emits no warning at all from compilers is NOT hard, and leaving warnings, even if they are checked by someone at some point, is never a good idea. You end up having them pile up and unable to see which are false flags and which may be critical.
- Regarding the above point, you may think that expensive analysis tools run before builds can catch everything a compiler will and much more. As I hinted earlier, most often this is true, but in some cases, it may not. And when it happens, it's never, IMO, a good idea to just ignore what the build tools say, because you know, they must all be much dumber than the tools you spent an arm and a leg on. For the reasons mentioned above, and also for an additional one: even for perfectly conforming code, a given compiler could be issuing a warning that thus would seem a false flag, but could be an indication that said compiler sees some piece of code as ambiguous or problematic and could compile it incorrectly, maybe because the compiler itself has a bug or an implementation-defined behavior that is not caught by your expensive tools. So warnings should never be ignored IMO. You may be biased by your experience of using maybe almost exclusively certified compilers that have a very very low probability of exhibiting what I just said, but that's not always the case in all settings...
- Finally, and I know those expensive tools are REALLY f*cking expensive, but I still think it's a good idea that developers have access to them on a regular basis, and not just QA guys. That saves a lot of time and tremendously helps them get better. You don't need to have a license for each workstation. You can have one on a server and have it run on a regular basis on commited code (like several times a day), which not only makes engineers get better at what they do, but also saves a lot of time for everyone including QA people.
* Just a side comment regarding GCC warning flags:
Contrary to what it looks like, the -Wall option doesn't quite enable ALL warnings. For instance, last I checked, GCC with -Wall was not enabling the -Wconversion check that you have to enable as an additional option. This check is especially important when writing embedded code. (Very recent versions of GCC may have changed this, but I'm not sure they have, probably because it would suddenly spit out a LOT of warnings for existing code with sloppy conversions...)
-
I didn't say there was. I just said I'm not fond of them. And yes, any compiler (unless it's a real dinosaur that rejects > 7-bit characters) will gladly just ignore them. UTF-8 or any 8-bit charset.
put { SierraC/68k, Avocet/68k, MIPS/Pro } in the list :D
-
Contrary to what it looks like, the -Wall option doesn't quite enable ALL warnings.
-Wextra adds a bunch more.
-
Among the QA-squad, there are ATL-guys, who are literally beyond any language except math and they usually address their attention to how an error does propagate, how the software has been proved by facts to fail when it's supposed it has to fail in abnormal conditions... and all whistles and bells they can read from the test-report and from their tools.
For the whole QA team, the test-report is more interesting than any source code, as well as the ATL-modelling is more interesting than the C or Ada implementation, while their colleagues usually care about approving *how* a piece of code is tested, and if all constraints and requirements have been properly tested. They also check if a piece of code has passed everything, the most interesting feedback from them is: "it's ok, but ... this piece of code has measured too much *cyclomatic complexity*, so .. it shall be rewritten".
-
For a reason or for another both the dev-team and the test-team have to re-compile the project and if some line of C issues a warning then we both usually looks for a senior for some word about.
Seniors are the highest authorities of competence in both test and dev teams. And the brain-storming is managed as internal and informal feedback since the two squads are not decoupled but rather part of the same software life-cycle.
So, before someone wants to commit the project and the test-report to the QA-team, feedbacks from the dev-team to test-team, and vice versa, are more than welcome. It's also a good way to perform team-working since both teams learn something.
A completely different matter is when you commit something on Doors because this way it's exposed to the customer's QA.
We also receive internal feedback from the QA team, but ... they are at a higher layer: they are usually related to the software design and modeling, so more addressed to the dev-team, unless the test-team finds a serious problem, which usually sounds like "guys? that feature is great, but ... how can we test it?".
It's a rare event, but when it happens we have a super meeting with all the three teams :D
For discussing things, and sometimes even ATL guys do participate because some of them are involved with the purchase of new developing/testing tools.
-
Contrary to what it looks like, the -Wall option doesn't quite enable ALL warnings.
-Wextra adds a bunch more.
Indeed. Note that it doesn't enable -Wconversion either though.
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#Warning-Options
-
- I again don't tolerate ANY warning from compilers and build tools in general. I've just checked in the MISRA-C recommendations,
and it's not really even mentioned actually (I think it's more of a commonly seen company policy). But I think this is an important rule to enforce.
Writing code that emits no warning at all from compilers is NOT hard, and leaving warnings, even if they are checked by someone at some point,
is never a good idea. You end up having them pile up and unable to see which are false flags and which may be critical.
I couldn't agree more :-+ ... and I use always these flags: -Wall -Wextra -Wshadow -Wformat-nonliteral -Wformat-security -Wtype-limits
-
I'm ambivalent on some warnings. The one that comes to mind is unused parameter, which bugs me when using three-parameter signal handlers (as the third one is usually not used). For Linux-specific code, or code compiled only using gcc or compatible compilers, I tend to use __attribute__((unused)) (via a macro, usually), but for portable code, I tend to disable the warning instead (-Wno-unused-parameter). I would prefer there to be a portable unused parameter attribute, at it'd make it clear to the human programmer (and not just the compiler) that the parameter is intentionally unused.
I like to avoid defined but unused function warnings for accessor functions by declaring them static inline instead of static. Since these accessor functions tend to be trivial, telling the compiler that inlining these functions is desirable, does no harm. (You can use -Wno-unused-function instead to suppress the warning, but I never do.)
In all cases, I do not like leaving any warnings visible when compiling. Just the way I address certain warnings vary. I always use at least -O2 -Wall, sometimes -O2 -Wall -pedantic -std=c99 (or -std=gnu99 for Linux-specific stuff, using built-ins), or -O2 -Wall -Wextra.
-
I'm ambivalent on some warnings. The one that comes to mind is unused parameter,
I agree. I rarely enable this one. That's why I don't use -Wextra with GCC - most of the added warnings, I find not very useful, except the '-Wsign-compare', that I add manually. Of course your own use case/environment may differ. I think some safety-critical software guidelines, for instance, require a "no unused parameter" policy, so if you have to follow that, you have no choice. The idea behing that is debatable, but it's not stupid either.
The standard way of dealing with it would then be to enable the "unused parameter" warning/rule. And in functions not using all their parameters (on purpose), you have to explicitely mark the ones that are unused to shut up the warning. It's annoying, but can make sense to make sure you did that ON PURPOSE.
To mark an unused parameter or local variable, the standard C way is adding a statement like so:(void) xxx;
with xxx being the local variable or parameter.
You way encounter source code that defines some macro such as: UNUSED_PARAMETER(xxx) or something like that, to make it clearer, but all they translate to is the above (and then there's often a rule not to replace standard C constructs with macros in safety-critical code anyway... so I don't recommend using a macro for that.) There are also compiler-specific attributes, as you mentioned, that I don't recommend either as they are non-standard. A cast to void works, and it's standard! (Even if it looks funny.)
The set of warnings I enable most often with GCC for C code is: '-Wall -Wconversion -Wsign-compare'
Of course, using additional static analysis tools is also very useful - even when you're not strictly REQUIRED to!
There are many out there. Open-source and actively maintained ones, while being easy to use, are: CppCheck and clang-check. I often use both. CppCheck is not bad and pretty fast. clang-check is slower but will catch more subtle potential problems. Of course there are commercial and expensive tools that do better, but unless you/your company can afford them and/or MUST use them, you'll have to stick to the free ones. ;)