Swearing and cursing. ::)Ah, but in what language?
Swearing and cursing. ::)Ah, but in what language?
I'm going to go with type inference as my favourite 'general' feature. It makes strongly typed languages so much nicer to use.
As for the 'coolest' feature, I'm going to go with Golang's goroutines and channels.
Separate compilation / linker.Java has a separate compiler (javac) and “linker” (java).
So many "modern" programming languages don't bother with it (Java, .Net, Python, NodeJS). So you end up with Docker hell trying to manage all your dependencies.
A programming language ought to be able to produce a single executable without additional runtime libraries or installs. That isn't to say all applications should be a single executable, but they all ought to be able to use a linker to include dependencies (as well as exclude) at compile time.
Template metaprogramming in C++. Too bad the error messages are unreadable.
Since Java has no executables, it also means the second paragraph would not apply. However: the closest it has to executables, JARs, allow you to pack anything you want — including all the dependencies. This is a horribly bad practice that will bite you just like any other failure at using modularity, but there is nothing that prevents you from doing so.
Java has a separate compiler (javac) and “linker” (java).
You'll never realize how much you appreciate braces surrounding (read: clearly, unmistakably delimited) code blocks in C/C++, until you work in a language like Python where (invisible) (whitespace) indent levels attempt to substitute for them.
Whomever dreamt that up should be professionally embarrassed for eternity. No other accomplishment can possibly offset the lunacy of that decision.
You'll never realize how much you appreciate braces surrounding (read: clearly, unmistakably delimited) code blocks in C/C++, until you work in a language like Python where (invisible) (whitespace) indent levels attempt to substitute for them.
Whomever dreamt that up should be professionally embarrassed for eternity. No other accomplishment can possibly offset the lunacy of that decision.
Ahah, I fully agree. This is wrong on so many levels, but one that's definitely worse than others is maintainability and code sharing.
More generally speaking, this is a pet peeve of mine actually: many people seem to be in the quest for minimizing the number of characters they have to type when writing code, and blame overly "verbose" languages. C is already pretty lean, but some people find it still too verbose and want to get rid of more signs like braces and parentheses.
I think this is completely dumb, and often say that if the number of characters you have to type to write a given piece of code really matters to you compared to all other design and implementation tasks, then the code you write must be either really simple or really lame.
Just my 2 cents. ::)
I think the missing delimiters and significance of whitespace bit me once or twice the first couple days I used Python. That ended after I banished tabs and turned on Python mode: in the same old editor I have been using for the last forty years.Just to be clear, my Python code works just fine. And I use IDLE, straight out of the Python distribution, so it's not an "editor thing".
(...)And it's up to the user to line everything up, visually, through blank white space across potentially many lines on the screen, to avoid (best case) compile errors or (worst case) unintended code behavior.
All of which could have been easily eliminated with a closing block delimiter. And that wouldn't have even been a new concept in the world of programming languages. Unbelievable.
This meaningful blank space disaster always makes me think of the Whitespace language: https://en.wikipedia.org/wiki/Whitespace_%28programming_language%29
;D (note that Python is actually cited in this article, which is extra funny.)
It just looks like a decision from some hugely stubborn person, not completely different from some religious belief. The amount of potential problems it introduces just to save one delimiter is, as you said, unbelievable! But as I said in an earlier thread, I don't think the author of Python ever expected it to become so popular either.And ironically, one of the things people say they like about Python is the availability of off-the-shelf libraries. As if those are a whole new Python invention.
The history is summed up here: https://en.wikipedia.org/wiki/Python_(programming_language)Whoa. From TFA:
I just had to learn Python for a small side project....There is a difference in using a tool as a hammer, and using it effectively, though.
Pascal code is simply elegant. Everything else is less elegant.
I kind of like the nested block structure of a language like Pascal where functions or procedures can be local to surrounding functions or procedures.I kind of like the nested block structure of a language like Python where functions can be local to surrounding functions.
I also like 'sets' as provided by Pascal.
Pascal code is simply elegant. Everything else is less elegant.
def parameterized_multiplier(param):
def multiply_this(x):
return param * x
return multiply_this
mul7 = parameterized_multiplier(7)
print(f"Answer to the Ultimate Question of Life, the Universe, and Everything: {mul7(6)}")
I kind of like the nested block structure of a language like Python where functions can be local to surrounding functions.
I personally think nested functions/procedures are a bad idea both for code readability and code structure.Heh, I tend to use nested functions precisely where (I think) it will improve code readability. Typically it will be where you have a function that contains doing roughly the same thing a few times. Okay, DRY, so make a separate function out of it. But wait, this new separate function does need an aweful lot of information that is only available within the scope of our existing function (the one we are refactoring). No problem, lets just all pass them as arguments. Sure, you can do that. But it does mean that you now have lines like this:
result = nice_non_nested_function(central_bit_of_information_for_this_particular_operation,
always_the_same_for_entire_scope_of_original_large_function1,
always_the_same_for_entire_scope_of_original_large_function2,
always_the_same_for_entire_scope_of_original_large_function3,
always_the_same_for_entire_scope_of_original_large_function4,
always_the_same_for_entire_scope_of_original_large_function5)
result = evil_nested_function(central_bit_of_information_for_this_particular_operation)
The ternary operator. (https://en.wikipedia.org/wiki/%3F:)I prefer a language which doesn't make the distinction between statement and expression and consequently doesn't need crutches like the ternary operator.
I kind of like the nested block structure of a language like Python where functions can be local to surrounding functions.
I personally think nested functions/procedures are a bad idea both for code readability and code structure. Wirth himself (the author of Pascal) got rid of them in later derivatives of Pascal, like Oberon.
I admit I liked them back when I used Pascal, but I changed my mind over time, just like Wirth did.
Modules are a much better way of structuring code and hiding/making things local than nested functions IMO. Unfortunately, modules are still supported only in a very small number of languages, but you can more or less emulate them with almost any language. In C, just declare functions you would otherwise like to nest, static and partition your code in as many source files as required to make code clearer and isolate "helper" functions.
One obvious downside of nesting functions is that it makes functions including other functions way too long overall, which hinders readablity. They initially look like a good idea, but you end up realizing they're not.
I personally think nested functions/procedures are a bad idea both for code readability and code structure. Wirth himself (the author of Pascal) got rid of them in later derivatives of Pascal, like Oberon.From the Oberon report: "Since procedures may be declared as local objects too, procedure declarations may be nested."
I personally think nested functions/procedures are a bad idea both for code readability and code structure. Wirth himself (the author of Pascal) got rid of them in later derivatives of Pascal, like Oberon.From the Oberon report: "Since procedures may be declared as local objects too, procedure declarations may be nested."
"Python 3.0, released in 2008, was a major revision of the language that is not completely backward-compatible, and much Python 2 code does not run unmodified on Python 3."
Just throw most existing code under the bus?!? Yessir, THAT's a "professionally managed platform" that I'd want to use as the basis for commercial projects. NOT.
Pretty much like Java which every 18 months comes out with a new revision that breaks all the existing code.
This forces enterprise developers to include their own copy of Java in their product so it doesn't happen, but that has security vunl etc etc.
This is one of the reasons for using standardized languages only. You at least get some minimal consistency.
"Python 3.0, released in 2008, was a major revision of the language that is not completely backward-compatible, and much Python 2 code does not run unmodified on Python 3."
Just throw most existing code under the bus?!? Yessir, THAT's a "professionally managed platform" that I'd want to use as the basis for commercial projects. NOT.
Pretty much like Java which every 18 months comes out with a new revision that breaks all the existing code. This forces enterprise developers to include their own copy of Java in their product so it doesn't happen, but that has security vunl etc etc.
DEATH TO JAVA
I'm so happy I managed to avoid "falling for" the Java hype originally.I managed to avoid it until I had to write code for the Android platform. Tablets make incredibly convenient control panels for all sorts of things, and (at the time, maybe still) Apple would not give you access to the communication port without a license that I seem to recall cost $25K. Plus Android tablets use standard USB which was a big plus. So I started working with Java to write human interfaces on Android tablets. Worked fine, but having to navigate Java's various minefields was a nightmare. Like C++, it felt like someone just covered a language with a layer of Object Orientedness because it was fashionable.
"Python 3.0, released in 2008, was a major revision of the language that is not completely backward-compatible, and much Python 2 code does not run unmodified on Python 3."
Just throw most existing code under the bus?!? Yessir, THAT's a "professionally managed platform" that I'd want to use as the basis for commercial projects. NOT.
Pretty much like Java which every 18 months comes out with a new revision that breaks all the existing code. This forces enterprise developers to include their own copy of Java in their product so it doesn't happen, but that has security vunl etc etc.
DEATH TO JAVA
I'm so happy I managed to avoid "falling for" the Java hype originally. I always thought it was too slow, and I had already seen a similar approach fail in the past (the UCSD P-system).
When I last used Java to replace a telecoms C/C++ application, onlookers were gobsmacked at how much faster my Java code executed. All that was needed was imagination, and the understanding that comes from an enquiring mind plus experience.That's what I've told my 18YO sophomore son at CalPoly. Architecture is crucially important. A good environment can be crippled by bad architecture, but good architecture can often compensate for a poor environment. True in hardware AND software.
It is exactly the same with C/C++; you have to create your code in a VM with a specific compiler version and library version, and always use those forevermore. Why? Because new optimisations and flags are prone to breaking programs that were previously OK.
When I last used Java to replace a telecoms C/C++ application, onlookers were gobsmacked at how much faster my Java code executed. All that was needed was imagination, and the understanding that comes from an enquiring mind plus experience.That's what I've told my 18YO sophomore son at CalPoly. Architecture is crucially important. A good environment can be crippled by bad architecture, but good architecture can often compensate for a poor environment. True in hardware AND software.
It is exactly the same with C/C++; you have to create your code in a VM with a specific compiler version and library version, and always use those forevermore. Why? Because new optimisations and flags are prone to breaking programs that were previously OK.
Seeing you "have" to do that using such absolute terms is a sign you are likely working either in very incompetent teams, or doing something out of ordinary with external requirements for such practices. Or just have no idea how the C/C++ software development world actually runs.
After all, I'm quite certain 99% of the "the different compiler version broke my C program" problems are caused by very actual and real bugs in the code, relying on undefined or implementation-defined behavior the programmer didn't understand or even consider when writing it. Which is completely understandable, we all make mistakes, but the key to reducing mistakes is to look for the root cause and learn from it, not to hide it by preventing the detection of said bug.
Portable (across compilers and exact standard library versions; even across CPU architectures) POSIX C code is being written and maintained every day. Bugs that are being hidden with one (standard-compliant) compiler, then revealed with another (standard-compliant) compiler do happen, we are humans after all, but they are almost meaningless, minuscule portion of total bugs the developers need to deal with, and definitely no absolute showstoppers. Even bugs caused by a buggy compiler do happen, although much more rarely than an average developer wants to believe.
Yes, I agree such total version lockdowns and virtual machines to prevent such breakage are sometimes needed, but that is really a very sucky way to do it, and claiming that you absolutely need to lock to exact versions forever only shows you have no idea what you are talking about, and are just screaming "I can't do this, I give up!" Extraordinary claims require extraordinary evidence; clearly most of the world being run on C code being compiled on varying compiler and C standard library versions is "the wrong way to do it". Yet this is how it works, and majority of bugs that cause actual damage could not have been prevented by locking down to specific tool versions.
Bugs will always happen, and sidestepping the issues caused by different compiler versions will only help with those types of bugs, which are a stunningly small percentage of total bugs. Quite the opposite; if you test your software (using proper automated test suites, yeah?) in different environments, with different compilers, you are likely to catch more bugs, instead of hiding them by limiting to one compiler.
After all, porting to a new environment, or reusing part of your code in another project, compiled on a different compiler, will likely eventually happen. Your code is worth more when it's portable, reusable, and relatively bug-free, not some write-once kludge for one forever-locked environment.
And think about this: what if you have a difficult-to-reproduce bug that appears with 1% probability with compiler A, but with 100% probability with compiler B? Was locking down to compiler A the right thing? Was this choice really adding robustness to your project, or was it just an excuse to keep producing buggy code?
"Python 3.0, released in 2008, was a major revision of the language that is not completely backward-compatible, and much Python 2 code does not run unmodified on Python 3."
Just throw most existing code under the bus?!? Yessir, THAT's a "professionally managed platform" that I'd want to use as the basis for commercial projects. NOT.
Pretty much like Java which every 18 months comes out with a new revision that breaks all the existing code. This forces enterprise developers to include their own copy of Java in their product so it doesn't happen, but that has security vunl etc etc.
DEATH TO JAVA
I'm so happy I managed to avoid "falling for" the Java hype originally. I always thought it was too slow, and I had already seen a similar approach fail in the past (the UCSD P-system).
That's a common misapprehension for those that don't realise the progress that has been made in the past decades[1]. Of course, the easiest course of (in)action is usually to avoid learning how to use a tool effectively.
When I last used Java to replace a telecoms C/C++ application, onlookers were gobsmacked at how much faster my Java code executed. All that was needed was imagination, and the understanding that comes from an enquiring mind plus experience.
[1] Example... They struggle to comprehend why a C/C++ program running in an emulated processor can run as fast as the same program running on the unemulated processor. Those same techniques are used in HotSpot. That was only published 21 years ago, so it is a little too recent for people to have noticed. Isn't it?! FFI https://www.hpl.hp.com/techreports/1999/HPL-1999-78.html (https://www.hpl.hp.com/techreports/1999/HPL-1999-78.html)
"Python 3.0, released in 2008, was a major revision of the language that is not completely backward-compatible, and much Python 2 code does not run unmodified on Python 3."
Just throw most existing code under the bus?!? Yessir, THAT's a "professionally managed platform" that I'd want to use as the basis for commercial projects. NOT.
Pretty much like Java which every 18 months comes out with a new revision that breaks all the existing code. This forces enterprise developers to include their own copy of Java in their product so it doesn't happen, but that has security vunl etc etc.
DEATH TO JAVA
I'm so happy I managed to avoid "falling for" the Java hype originally. I always thought it was too slow, and I had already seen a similar approach fail in the past (the UCSD P-system).
That's a common misapprehension for those that don't realise the progress that has been made in the past decades[1]. Of course, the easiest course of (in)action is usually to avoid learning how to use a tool effectively.
When I last used Java to replace a telecoms C/C++ application, onlookers were gobsmacked at how much faster my Java code executed. All that was needed was imagination, and the understanding that comes from an enquiring mind plus experience.
[1] Example... They struggle to comprehend why a C/C++ program running in an emulated processor can run as fast as the same program running on the unemulated processor. Those same techniques are used in HotSpot. That was only published 21 years ago, so it is a little too recent for people to have noticed. Isn't it?! FFI https://www.hpl.hp.com/techreports/1999/HPL-1999-78.html (https://www.hpl.hp.com/techreports/1999/HPL-1999-78.html)
My bad experiences might have been bad code / older Java implementations, fair enough. I had the team re-write everything in C++ and all the problems disappeared... only to be replaced by new problems, obviously, but that's life! :D
I like:
-closures
-dynamic typing
-nested functions
-async/await
-event loops
-the (function () {}}(); form
-the C syntax
-JSON
-impure functional
=> I like JavaScript
There's a problem though: it lets you build objects in a zillion different ways => there's no easy way for an IDE to keep track of what objects there are, what properties and methods they have (or not), the hierarchy, and makes it difficult to interoperate with other peoples' code/libraries. Most of which is solved by TypeScript. => I like TypeScript better (a superset that compiles to JS).
Is TypeScript the one that was designed by Anders Hejlsberg? - he has done a lot of good work in the software industry (e.g. Delphi, C Sharp)
It is exactly the same with C/C++; you have to create your code in a VM with a specific compiler version and library version, and always use those forevermore. Why? Because new optimisations and flags are prone to breaking programs that were previously OK.
Seeing you "have" to do that using such absolute terms is a sign you are likely working either in very incompetent teams, or doing something out of ordinary with external requirements for such practices. Or just have no idea how the C/C++ software development world actually runs.
Having been programming in C since 1982, I have a good idea of how it runs - and falls flat on its face.QuoteAfter all, I'm quite certain 99% of the "the different compiler version broke my C program" problems are caused by very actual and real bugs in the code, relying on undefined or implementation-defined behavior the programmer didn't understand or even consider when writing it. Which is completely understandable, we all make mistakes, but the key to reducing mistakes is to look for the root cause and learn from it, not to hide it by preventing the detection of said bug.
In one sense, that used by language lawyers, that is a correct answer - but to an unimportant question.
The important questions are
- how many and what proportion of C programmers understand the language well enough to avoid all nasal dæmons. The answer to that is simple: only a vanishingly small proportion
- how many programs in production contain UB, and could therefore "rm -rf /" when used with the next version of the compiler. Again, the answer is simple: too many and far more than people believe
For examples, nip over to the usenet group comp.arch and have a look at the ongoing "UB and C" thread. There you will see extremely competent and experienced people discussing the difficulties.QuotePortable (across compilers and exact standard library versions; even across CPU architectures) POSIX C code is being written and maintained every day. Bugs that are being hidden with one (standard-compliant) compiler, then revealed with another (standard-compliant) compiler do happen, we are humans after all, but they are almost meaningless, minuscule portion of total bugs the developers need to deal with, and definitely no absolute showstoppers. Even bugs caused by a buggy compiler do happen, although much more rarely than an average developer wants to believe.
And there you gloss over the problem you are attempting to address. It isn't just "another compiler", the problems can appear with the next version of the same compiler, classically when the next version contains new optimisations.
And that is why it is prudent to nail down the compiler and library in a VM.
Practical course of action is accepting that the world isn't perfect, bugs happen, they need to be dealt with properly; and that the fact C standard leaves some things implementation-defined probably wasn't the best idea, but all in all, we can deal with this, and very few will find this a showstopper.
Going even more practical, this means: treat the bugs caused by C UB or implementation-defined behavior like any other bugs: analyze for root cause, fix your code not to rely on nasal daemons. Learn, and write better code in the future. In very rare cases of compiler bugs, communicate with the compiler team.
This is wildly related to the SiliconWizard's original remark of getting "at least some minimal consistency" with "standardized languages". C fits that bill just right: bugs caused by problems in standardization and hence differences in (all standard-compliant) compiler versions happen and are widely documented, but are rare enough, and OTOH, well documented enough not to matter too much.
The very practical thing you can do is to learn these pitfalls and start writing code that does not rely on undefined or implementation-defined behavior. Your bug-free, portable, reusable code is more valuable than your locked-down kludge that was verified to work once in 1985.
OTOH, you propose a solution where there exists two processes for bugs, the "normal" process for most bugs, and then the "lock down development environment to hide some UB bugs by preventing them popping up" version regarding the bugs caused by writing non-standard-compliant code. Call this strawman if you want to, but this is essentially what you propose, and it sounds so bad because it is so bad.
Lockdown to a particular compiler version and toolset environment could be warranted for a life support device, for example. Even then, I wouldn't suggest doing that choice prematurely. And in such projects, writing unambiguous and standard-compliant code to begin with becomes increasingly important! But such projects must be verified to be safe on the binary level, and you just don't go and make small changes to the codebase without verifying the binaries.
We all can probably agree we would like to have something that completely fixes problems of the C language, but giving up and requiring silly coping methods like "always lock down your development tool versions totally" is making things worse, not better.
[...] See linux kernel for example, it's being compiled with new GCC versions all the time. In it's history, there have been a handful of problems due to this, but the linux kernel is fairly reliable and staying modern, is powering most of this modern world (like it or not), handling very critical and important tasks as well. [...]
It always seemed to me that nested functions fit cleanly into Wirth's Syntax Diagrams and the recursive descent approach of the Pascal compiler. The code looked a lot like the diagram.
[...]
I wonder, why are people so often choosing to depict the interaction between UBs and optimization in C in a manner that suggests compiler designers are intentionally breaking programs. It’s ridiculous. :/
It’s basically like not reading a datasheet, not knowing maximum ratings, applying 300V to a 3.3V chip and crying that the IC designers are stupid, because they should’ve made their device working up to 1000V and making it PITA for everyone else.
While there are cases in which it goes another way, most often the reasoning is not: find potential UB → use it to remove some code. It is: find a possible optimization → check if it works for all cases → if there are any UB cases, ignore them. What else do you people expect? That optimizations will be ruined for everyone just because there are people, who don’t care enough to learn and understand the language? Or that compiler creators will spend their time identifying each UB, determining what’s the “sane behaviour” for what is meaningless in the first place, and spending resources of maintaining the code? And making it more complex, as if it wasn’t a monstrum yet? Sure, putting guards like that everywhere is not insane, but it’s not free and I do not see people lining up to pay for it.
...
Another language feature I like(d) is from COBOL
MOVE CORRESPONDING identifier-1 TO identifier-2
Moves identically named fields of record identifier-1 to the fields of record identifier-2
Almost always when I found something annoying about Python, it was because I was slavishly following a design pattern, necessary in a prior language I had internalized, but now redundant in Python and agreeably forgotten.
I think the missing delimiters and significance of whitespace bit me once or twice the first couple days I used Python. That ended after I banished tabs and turned on Python mode: in the same old editor I have been using for the last forty years. The pretty printing I had always done, then mapped one-to-one to Python's indentation, and I promptly forgot about it.
I bet you are fighting the language in some way if its indentation is causing you grief, If you could post some examples (perhaps starting another thread), I would be happy to try to smooth things out.
I think the missing delimiters and significance of whitespace bit me once or twice the first couple days I used Python. That ended after I banished tabs and turned on Python mode: in the same old editor I have been using for the last forty years.Just to be clear, my Python code works just fine. And I use IDLE, straight out of the Python distribution, so it's not an "editor thing".
It's in reading and (especially) debugging, and especially when it's someone else's code, that Python's block delimiter mistake becomes very apparent. There's just no way that the visual absence of something is as clear and quick to recognize. This is particularly bad with nested loops. One of the things I had to write in Python recently was some matrix manipulation code, and the nested loops where a pain for everyone on the team. I lost count of the number of times one or more of us was carefully examining a line of code in the wrong loop because the visibility of the indent level was affected by your angle to the screen - if you weren't straight in front of a monitor it became very easy to misjudge which indent level you were tracking by eye. Utter insanity! And so easily avoided, as has been done in other languages for decades. This was a completely avoidable error in language design.
It also doesn't help, if you're going to rely on whitespace, that the default indent is just four spaces. I like tight indenting when block delimiters are in use, but when whitespace on the screen is all you've got, four little spaces makes things just that much harder. Particularly in a team environment and it's not YOUR screen so you're off-axis.
Finally, Python isn't even consistent in its elimination of block delimiters. It actually has an OPENING delimiter (the colon), which basically substitutes for (say) the open brace in C, C++, Java, etc. But Python's closing delimiter is - wait for it - a correct number of spaces from the left edge. Which is a number that varies depending upon where the associated colon happens to be (or, rather, where the line that contains that colon happens to start). And it's up to the user to line everything up, visually, through blank white space across potentially many lines on the screen, to avoid (best case) compile errors or (worst case) unintended code behavior.
All of which could have been easily eliminated with a closing block delimiter. And that wouldn't have even been a new concept in the world of programming languages. Unbelievable.
EDIT: A good related question to this topic is "What problem were they trying to solve by eliminating the closing block delimiter?" Honestly, how many times have programmers pounded their keyboard in frustration and shouted "My life would be so much better if there was a language out there that DIDN'T require a closing block delimiter!!!" I've said, and heard, that complaint precisely zero times.
%s/\t/ /gNote that you can replace the standard allocator with a garbage collector in C and C++ if you like that. A well known, and relatively well-behaved one is the Boehm GC: https://www.hboehm.info/gc/ (https://www.hboehm.info/gc/)
By language:
Python:
Portability. I can write a program and it will run on any machine that can run python interpreter. This is the best feature of this language. No more recompiling for some fancy new architecture!
C:
Very simple and elegant. You can learn all syntaxes in 2 weeks. The rest is standard library.
C++:
Smart pointers. They really simplify resource management.
Java:
Garbage collector.
VBA:
It has IP protection built in. Your code will be so horrible that no one will want to steal it. ;D
Note that you can replace the standard allocator with a garbage collector in C and C++ if you like that. A well known, and relatively well-behaved one is the Boehm GC: https://www.hboehm.info/gc/ (https://www.hboehm.info/gc/)
It makes a good attempt at reclaiming most garbage in programs that match its limitations.
Scarcely general purpose.
Note that you can replace the standard allocator with a garbage collector in C and C++ if you like that. A well known, and relatively well-behaved one is the Boehm GC: https://www.hboehm.info/gc/ (https://www.hboehm.info/gc/)
It makes a good attempt at reclaiming most garbage in programs that match its limitations.
Scarcely general purpose.
I've not seen it fail in any significant way. A number of programming languages use Boehm as their GC -- and many others *should*. For example Mono (open source C#) did for a long time -- they eventually wrote their own GC and it is the only example I know of where replacing Boehm by a custom GC improved performance.
On many or most C/C++ programs you can improve performance significantly (at the cost of a relatively minor increase in RAM usage) by compiling Boehm so that malloc becomes GC_malloc() and free() becomes a no-op and using LD_PRELOAD to replace the malloc() and free() in glibc.
Note that you can replace the standard allocator with a garbage collector in C and C++ if you like that. A well known, and relatively well-behaved one is the Boehm GC: https://www.hboehm.info/gc/ (https://www.hboehm.info/gc/)
It makes a good attempt at reclaiming most garbage in programs that match its limitations.
Scarcely general purpose.
I've not seen it fail in any significant way. A number of programming languages use Boehm as their GC -- and many others *should*. For example Mono (open source C#) did for a long time -- they eventually wrote their own GC and it is the only example I know of where replacing Boehm by a custom GC improved performance.
I'm more interested in correctness than performance. After all, if it is permissible that my code can contain faults, I can increase performance by orders of magnitude :)
It can be the devil's own job demonstrating that a GC is at fault.
QuoteOn many or most C/C++ programs you can improve performance significantly (at the cost of a relatively minor increase in RAM usage) by compiling Boehm so that malloc becomes GC_malloc() and free() becomes a no-op and using LD_PRELOAD to replace the malloc() and free() in glibc.
Presumably until garbage is collected!
I haven't looked at the topic for a long time. What are the Boehm's GC's characteristics on SMP systems and/or with highly threaded code?
Note that you can replace the standard allocator with a garbage collector in C and C++ if you like that. A well known, and relatively well-behaved one is the Boehm GC: https://www.hboehm.info/gc/ (https://www.hboehm.info/gc/)
It makes a good attempt at reclaiming most garbage in programs that match its limitations.
Scarcely general purpose.
I've not seen it fail in any significant way. A number of programming languages use Boehm as their GC -- and many others *should*. For example Mono (open source C#) did for a long time -- they eventually wrote their own GC and it is the only example I know of where replacing Boehm by a custom GC improved performance.
I'm more interested in correctness than performance. After all, if it is permissible that my code can contain faults, I can increase performance by orders of magnitude :)
It can be the devil's own job demonstrating that a GC is at fault.
It's amazing how many programs have use-after-free bugs. They just get lucky because the same memory chunk hasn't been reallocated yet. GC makes that impossible -- indeed Boehm can help you find those.
Even more programs simply stop using objects (and nothing points to them) without freeing them. That is probably *the* major reason why you have to restart your web browser, Windows, your WIFI router etc on a regular basis. GC prevents that.
Note that while in theory conservative GC can fail to collect some objects, Hans Boehm has a paper that shows that this effect is bounded -- it's extremely unlikely to cause OOM in the way that memory leaks with malloc/free do.
"Smart" pointers in C++ are just a PITFA to use. They clutter the code and drastically reduce its clarity -- and then people *still* make mistakes.
QuoteQuoteOn many or most C/C++ programs you can improve performance significantly (at the cost of a relatively minor increase in RAM usage) by compiling Boehm so that malloc becomes GC_malloc() and free() becomes a no-op and using LD_PRELOAD to replace the malloc() and free() in glibc.
Presumably until garbage is collected!
Including GCs of course. I'm talking about 10% to 100% more memory use than with the original malloc/free library. A big benefit of GC is you can tune your program to get the speed/memory use tradeoff you want (e.g. using the GC_FREE_SPACE_DIVISOR environment variable in Boehm)
If you simply disable free() while keeping standard malloc() then programs will work and will run fast, right up until they run out of memory. I've tried this in a professional setting. For example disabling free() during one LLVM optimization pass on one function and then releasing all the memory at the end in one hit can make sense -- it will probably use 5 MB. Disabling free() entirely in LLVM (or gcc, I've tried both) works OK for very small programs but for anything non-trivial you quickly find the compiler using tens or hundreds of GB of RAM.
Simply disabling free() on long running programs such as servers is completely impossible, of course.
QuoteI haven't looked at the topic for a long time. What are the Boehm's GC's characteristics on SMP systems and/or with highly threaded code?
It's excellent for throughput-oriented tasks i.e. "how long does the program take to finish?" Each thread gets its own pool of space for new objects, so there is essentially no allocation contention between threads. When a GC is needed all threads are stopped during marking -- but all CPU cores are used for the marking. To whatever extent app threads were allocating private objects there is no contention between marker threads, but they are free to follow pointers to "other threads" object graphs as needed.
This "world stopped" marking is usually very fast. A few ms or even less.
Boehm always does the sweep phase incrementally. Each page of objects of a particular size (normally the same as a VM page) is swept at the moment that the first new object is about to be allocated from that page. That is, the mark bits for that page are scanned and any unmarked objects are added to the start of the free list for objects of that size (and the first one will be allocated immediately). Succeeding objects of that size allocated by the same app thread will be from that same memory page until the page is full.
In 2005-2008 I worked for a tiny company writing a Java native compiler for BREW mobile phones. One of my main tasks was modifying Boehm GC to run well on 1 MHz ARM machines with as little as 400 K of RAM (though 1 to 2 MB was more common). The vast majority of the users were porting games from Java phones to BREW, though there was the occasional medical or industrial app. We ported the same system to iPhone (before there was an SDK and you had to jailbreak and hack everything yourself) and a number of games and other apps early in the AppStore were actually written in (native compiled) Java. Apple never noticed and some e.g. Virtual Villagers were in the top 20 iPhone games for ages.
Note that you can replace the standard allocator with a garbage collector in C and C++ if you like that. A well known, and relatively well-behaved one is the Boehm GC: https://www.hboehm.info/gc/ (https://www.hboehm.info/gc/)
It makes a good attempt at reclaiming most garbage in programs that match its limitations.
Scarcely general purpose.
I've not seen it fail in any significant way. A number of programming languages use Boehm as their GC -- and many others *should*.
For long-lived processes, I like to use a pooled allocator.
(The locality of the allocations is not important at all. The simplest implementation is one that adds a single pointer to each allocation, so that allocations within a "pool" form a singly-linked list. Some use cases can benefit from having the "pools" form a tree as well, so that if an error occurs at a higher conceptual level, all dependent pools can be destroyed at once. Instead of freeing individual allocations, you free an entire pool (including its sub-pools). This is surprisingly robust, and the only "problem" is that you sometimes need to extract or copy individual allocations from one pool to another, when they need to outlive the pool.)
Except back when I was studying, because it was textbook stuff, I have rarely implemented those with a lot of small (thus very fragmented) allocations linked with pointers. Even with good allocators, it eventually tends to fragment memory a lot and have bad locality properties, which is pretty bad for caches.True, that's why I said it was just the simplest implementation.
I usually allocate largish chunks of memory and handle objects in it with indexing. In many cases, this is quite effective and the resulting performance is often very largely better. This is a form of pooling.Yes, and that is exactly how and why Python NumPy implements its own array type, for example. I myself also use this in C a lot; so often that I no longer even think about it. (I worry more about the names of the variables/members; I like to use used for the number of elements currently in use, and size for the number of elements allocated for the area, with often data as the pointer to the area.. it soothes my OCD, I guess, but some people have found such names unintuitive.)
Note that you can replace the standard allocator with a garbage collector in C and C++ if you like that. A well known, and relatively well-behaved one is the Boehm GC: https://www.hboehm.info/gc/ (https://www.hboehm.info/gc/)
It makes a good attempt at reclaiming most garbage in programs that match its limitations.
Scarcely general purpose.
I've not seen it fail in any significant way. A number of programming languages use Boehm as their GC -- and many others *should*.
I mostly agree with that. If someone is claiming the Boehm GC is flawed, please at least give details, and accurate facts and figures about it. It has been widely used indeed.
Note that I was merely pointing out an option for those that would be dead sure you can't have a GC with C or C++. I was not particularly advocating using one, as GCs have specific issues (however good they are) that you don't necessarily want to have to deal with, such as non-predictable execution times. I am also not 100% convinced about garbage collecting in general - it naturally leads to sloppy resource management. You may argue that it will almost always do a better job than even careful "manual" resource management, and I wouldn't necessarily disagree in the general case, but I'm still not sure it promotes completely healthy programming habits, or that it's even the best way of dealing with "automatic" resource (memory) management.
Not necessarily my favorite feature, but one which I occasionally miss dearly in lesser languages:
keywords (symbols which evaluate to themselves) as found in Common Lisp.
There seem to be only cumbersome and ugly solutions in C/C++ when one wants e.g. string representations of enums. A common recommendation is to maintain a separate file with the definitions from which the C/C++ code is generated using a script ...
typedef void* Symbol;
Symbol square, circle;
float area(Symbol shape, float sz) {
if (shape == &square) return sz * sz;
if (shape == &circle) return 3.14/4 * sz * sz;
return 0.0;
}
#include <stdio.h>
typedef void* Symbol;
Symbol circle;
float area(Symbol shape, float sz);
void main() {
printf("Area = %f\n", area(&circle, 20.0));
}
Symbol square="square", circle="circle";
float area(Symbol shape, float sz) {
float res;
if (shape == &square) res = sz * sz;
if (shape == &circle) res = 3.14/4 * sz * sz;
printf("calculating area(%s, %f) = %f\n", *(char**)shape, sz, res);
return res;
}
calculating area(circle, 20.000000) = 314.000000
Area = 314.000000
I'll never forget the first time I looked at the Java standard libraries in 1996, and saw that they had effortlessly produced that which had eluded C++ for a decade - and gone far beyond. At that stage there wasn't a C++ string class, and there were endless (well, year-long at least) debates about what "const" ought to mean.
I'll never forget the first time I looked at the Java standard libraries in 1996, and saw that they had effortlessly produced that which had eluded C++ for a decade - and gone far beyond. At that stage there wasn't a C++ string class, and there were endless (well, year-long at least) debates about what "const" ought to mean.
Strings alone are enough reason to switch from C to C++, if you ask me. Anybody who's ever had to deal with lots of strings in C will immediately fall in love with automatic String objects.
[..]Well, pretty much all which maintain a log file do. That might very well be the majority of them.
Many applications of symbols don't require conversion to strings
and very very few require conversion from strings to symbols.How about parsing config files? Might not be what the majority is doing, but hardly an exotic application.
Lisp symbols are *not* enums.I made no such claim and I don't think I implied as much.
They don't have sequential or even well-defined values.The value of keyword symbols are themselves. Can't get much more well-defined imho.
They don't have sequential or even well-defined values.The value of keyword symbols are themselves. Can't get much more well-defined imho.
They don't have sequential or even well-defined values.The value of keyword symbols are themselves. Can't get much more well-defined imho.
Nope. Symbols are *not* self-evaluating. If you write the bare name of a symbol then it is looked up in the current context and you get whatever it is bound to, not the symbol itself. That's equally true in Lisps and in the C implementation I gave. If you want to refer to a symbol by name -- as a symbol -- then you *have* to quote it by variously putting a ' or a : after it, or perhaps various other things e.g. a # in front if it in Dylan or a * in front of it in Perl or a \ in front of it in PostScript.
They don't have sequential or even well-defined values.The value of keyword symbols are themselves. Can't get much more well-defined imho.
Nope. Symbols are *not* self-evaluating. If you write the bare name of a symbol then it is looked up in the current context and you get whatever it is bound to, not the symbol itself. That's equally true in Lisps and in the C implementation I gave. If you want to refer to a symbol by name -- as a symbol -- then you *have* to quote it by variously putting a ' or a : after it, or perhaps various other things e.g. a # in front if it in Dylan or a * in front of it in Perl or a \ in front of it in PostScript.
Keyword symbols are bound to themselves. You might want to peruse the standard: http://www.lispworks.com/documentation/HyperSpec/Body/t_kwd.htm (http://www.lispworks.com/documentation/HyperSpec/Body/t_kwd.htm)
Interning a symbol in the KEYWORD package has three automatic effects:
1. It causes the symbol to become bound to itself.
Note that you can replace the standard allocator with a garbage collector in C and C++ if you like that. A well known, and relatively well-behaved one is the Boehm GC: https://www.hboehm.info/gc/ (https://www.hboehm.info/gc/)
It makes a good attempt at reclaiming most garbage in programs that match its limitations.
Scarcely general purpose.
I've not seen it fail in any significant way. A number of programming languages use Boehm as their GC -- and many others *should*.
I mostly agree with that. If someone is claiming the Boehm GC is flawed, please at least give details, and accurate facts and figures about it. It has been widely used indeed.
Hans Boehm was completely open about what it couldn't do. Apply those to arbitrary programs composed of components (e.g. libraries!) produced by people not working together. Then try to ensure correct operation.
If there is a strict control of all code in a particular environment (e.g. Apple's) then it becomes less intractible - although still difficult and somewhat limited.
QuoteNote that I was merely pointing out an option for those that would be dead sure you can't have a GC with C or C++. I was not particularly advocating using one, as GCs have specific issues (however good they are) that you don't necessarily want to have to deal with, such as non-predictable execution times. I am also not 100% convinced about garbage collecting in general - it naturally leads to sloppy resource management. You may argue that it will almost always do a better job than even careful "manual" resource management, and I wouldn't necessarily disagree in the general case, but I'm still not sure it promotes completely healthy programming habits, or that it's even the best way of dealing with "automatic" resource (memory) management.
You sound as if you think people are claiming GC is a silver bullet, which would be stupid.
Note that you can replace the standard allocator with a garbage collector in C and C++ if you like that. A well known, and relatively well-behaved one is the Boehm GC: https://www.hboehm.info/gc/ (https://www.hboehm.info/gc/)
It makes a good attempt at reclaiming most garbage in programs that match its limitations.
Scarcely general purpose.
I've not seen it fail in any significant way. A number of programming languages use Boehm as their GC -- and many others *should*.
I mostly agree with that. If someone is claiming the Boehm GC is flawed, please at least give details, and accurate facts and figures about it. It has been widely used indeed.
Hans Boehm was completely open about what it couldn't do. Apply those to arbitrary programs composed of components (e.g. libraries!) produced by people not working together. Then try to ensure correct operation.
If there is a strict control of all code in a particular environment (e.g. Apple's) then it becomes less intractible - although still difficult and somewhat limited.
So you're basically saying that all projects in which it has worked well must fall into this category. Maybe so.
As Nominal Animal suggested, there are still many cases (obviously not ALL) in which just using it as a drop-in for malloc() mostly gives benefits without noticeable problems.
It should certainly be tested with care, though.
Another point, which may be part of what you are saying, is that ideally you must have access to ALL source code and recompile everything using the Boehm GC. I know there are ways to just kind of redirect malloc() calls at the executable level, but I would certainly suggest NOT doing this. Way too slippery.
But with your claims, it would have been nice to have more details on the exact limitations and how "difficult" it is to use properly. As it is, it's like we'll have to take your word for it.
I absolutely don't think it's perfect, but it would have been interesting to have more detailed information if anyone is interested in possibly using a GC with C or C++. (All the more that there are other options, just the this one is one of the most popular.)
Now in the general sense, any allocator that is an integral part of a given language, rather than a third-party addition, is more likely to be more robust, easier to use, and more consistent. No real doubt about that, and if this is what you implied beyond the very case of the Boehm GC, then that's something to consider.
...
We can also note that depending on the language you use, you may not have a choice. Some languages rely on GC and don't give you alternatives for dynamic allocation. For people using such a language, whether they consider GC a silver bullet or not doesn't matter - they just have to do with it.
Sane people would just have started with TCP/IP, not SQL!
As Tony Hoare memorably put it "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."
It is obvious which category a product made with Boehm's GC falls into.
As Tony Hoare memorably put it "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."
It is obvious which category a product made with Boehm's GC falls into.
Any program which uses explicit malloc() and free() which are not in the same control flow block (and which could therefore just as well use alloca() or even a local variable) has already exited from the first category.
It is an enormous simplification and contribution to reliability to debug the GC once, not debug every malloc() and free() in every program.
GC is a huge aid in making separately developed libraries work together. As are exceptions.
Very simple libraries can work ok without those, but as soon as you have anything that uses callbacks (or calls methods on an object you pass it, which can be of a derived class of the class the library was written for) then you need both.
It's very hard to see how substituting GC for free() in a C program that is bug-free can produce a buggy program. Some objects that would be free()d may never be collected if pointers to them are not zeroed after free() or overwritten with a new pointer sometime after. This effect is strictly limited and can not lead to a continuous leak that does not exist in the malloc/free version.
In C++ making delete a no-op results in destructors not being called. The vast majority of destructors do nothing more than delete other objects, in which case it doesn't matter.
The case that matters is if a destructor deallocates or closes some external resource such as a file or network port or sends a message to another process etc. However this is a rare style of programming. Usually such things are done by explicit function calls controlled by program logic, by nested control structures, or by RAII.
The first paper on BDW GC was published in 1988: https://hboehm.info/spe_gc_paper/preprint.pdf
That's over 30 years of experience with this software. on many CPU architectures, dozens of compilers, who knows how many thousands of programs. My own experience with Boehm GC is around 22 years.
If there were significant problems with the overall approach, in the face of C programmer habits or compiler optimizations, I think we'd know by now.
There *have* been problems found over the years, and they have been fixed or mitigated.
If you're going to write a program using dynamic memory allocation at all, I firmly believe if you use Boehm GC then it and its interaction with arbitrary C or C++ code is the LEAST LIKELY place you will encounter bugs, and using it instead of malloc/free or new/delete will VERY significantly decrease the overall probability of your having memory bugs.
Associative Arrays
Go channels is a winner for me. Nice concurrency primitive.