I am not dismissing anything as "merely syntactic sugar". I am pointing out that you can obtain the same functionality without relying on a compiler/runtime library "black magic", which is extremely relevant in the case of an embedded code.
10 years ago, most certainly, a no-brainer even. But more recently, that's much more of a vague
sometimes, than it used to be. And almost always at the expense of readability, correctness, and debuggability, as someone else pointed out.
I used to feel much the same with
asm over
C, to the point of often using it mostly as a thin shell of convenience around my
asm. Until one day I realised that the processors had gotten more complex, and the variations you needed to account for more broad, and that the sheer host of people that put development effort into these compilers, were actually collectively smarter than me after all. It's a humbling realisation for any developer, and like most younger people I felt like I was the king of my domain too, but the world continues to turn and grow, whether we like it or not, and we all have to grow up also, eventually.
In that case we don't have neither the CLR from C# nor the Python bytecode interpreter or optimizations. What you are describing is relevant if you are developing for a desktop machine or a SoC with oodles of RAM and normal OS, so you can have threads and what not. Not bare metal or RTOS where you are memory/performance sensitive and the APIs are very basic.
I should point out that I was responding to other people's comments also, which made certain assumptions not directly relevant to the more constrained 8-bitters. But I'd like to point out, that even more so, what the compiler does to the generated binary in the presence of some of those functions, such as yield (not so much with
C#'s use of threading),
would be a significant boon even in the more constrained 8-bit environments, and they're getting rather quickly to the point where you'd be hard pressed to do better by hand, without deploying at least a small measure of that syntactic sugary goodness.
I do agree that exception handling and a runtime are excessive baggage in most cases, which make these improved languages less useful on a small 8-bitter; but on even the smaller 32-bitters which are showing up in even simple devices, because of their incessant need to connect even the simplest of devices over the wild and wooly internet with it's SSL and RSA and what-not (think of the Amazon buttons, that I believe have to build those rather complex and crypto-heavy Amazon tokens, and then send that over an SSL link), even that issue is disappearing a lot faster than I've ever been particularly comfortable with, and on such devices, heap management is in fact again a very real thing (if not often mandatory). However, back on the 8-bitters, compiler support for yield type functionalities is still readily doable without the overhead of heap management, exceptions, or a runtime; it just seems more a case that all the new developments are being done in the bigger systems where they're needed (rather than down the small end where they'd be merely useful), and over in the bigger end of town things like exceptions are pretty much mandatory, and having large monolithic runtimes can actually be helpful (especially if they can be shared between several processes).
A quick detour: Over in
D (which does have a rather large runtime, though I'd be willing to bet not nearly as nasty as
C#), they've recently added support for designating functions to disallow automatic heap allocation (or rather, anything which will utilise the garbage collector, such as automatic heap allocation — a big thing for
D, since it uses a lot of it), through their
@nogc function tag, and even it's compiler directive version. More recently, they've gotten exceptions working again in the presence of
@nogc (previously you had to also tag functions as
nothrow for most vaguely useful code to compile, which cut down your standard library options to next to nothing), and I rather suspect that pattern will leak out into other aspects of the language in relatively short order. Now, again, yes,
any of the bigger languages, including
C# and
D, do tend to make the assumption that you have large gobs of memory just sitting there waiting to be used, so if you use them the same way you would on a desktop, you're generally in for a bad time. But as you noted already, an embedded systems developer will often forego the
STL, and likewise in
C#/D, that means foregoing large chunks of the standard library. But the hardest part I've found in learning
D in particular, is getting familiar with which of its various cat skinning techniques to apply when — you can choose from greedy or lazy processing, copying or non-copying semantics, using either high or low level functionality, and generally very finely craft the flow to meet your needs. There are also several people pushing
D towards smaller and smaller devices, and as an embedded systems developer targeting a smallish device (it's far from 8-bit friendly still, although I think I remember reading that with some considerable effort, someone did manage to hack out most of the runtime as an experiment — so it's at least possible), you'd essentially focus on the in-place non-copying paths (much the same as you do any time you want good tight loop efficiency, even on a desktop), and maximising the compilers ability to reason about (and hence optimise away) the generated binary. If they could get it to shed much of that runtime (if not automatically, then with a compiler directive), I'm fairly confident many an adamant C developer, with a decent and appropriately focused proficiency in D, would be surprised at the results it achieves.
It's also worth mentioning that other big-end features like CTFE have already fairly recently made their way back into the
C/C++ realms; though I've seen CTFE in
C/C++, and it's still a rather hellish affair — I've not done particularly advanced
C/C++ in a while, and even a simple example of CTFE took me a little while to reason my way through. But I could immediately see where it could have been applied with great success to several of my past projects. However, my point with
Python wasn't so much the presence of the interpreter (and there's plenty of smallish 32-bit processors that run some form of either
Python or
BASIC), which makes certain things an awful lot simpler at the expense of memory usage, but rather the fact that things are not always quite what you assume them to be, which is the main point at which coroutines often become impractical in static languages. (Who here, honestly, would have assumed that a
Python generator was actually
simpler than a regular function call? I know one University Professor I was watching
YouTube clips of teaching this stuff, certainly didn't, he went off on a merry chase trying to figure out where the complexity was buried, only to finally realise — even with me practically screaming at my monitor trying to point him towards it, as you do — it was the simple function call case that was the more complex one), but rather the pattern of its functionality; there's really very little reason
C/C++ compilers couldn't take the
Python approach to generators, at least, and apply it at the
C level also.
Python generators don't allow you to mess with the
C call stack, since
Python function calls each get their own private call stack independent of the
C one — it's actually allocated as a simple array of
PyObject's — and like the
Python bytecode compiler, a
C/C++ is also capable of calculating a functions own stack depth needs, or just plain switching to alternative means.) A C++ implementation of
yield could be implemented essentially by breaking the function up into a
class (or more specifically probably, a
struct), with local variables and arguments needing to bridge function sections being shifted into the class structure. At first glance, you'd think you're going to have overhead, since all those variables are now indirect accesses, but keeping in mind that even regular function locals are stack-relative accesses, now, they're just class-instance-relative instead, which is only really a problem if you don't have a spare index register available… Allocating the structure on the heap, or simply placing it on the stack far enough prior to the call such that it's in scope for its whole lifetime, will generally make little difference from an efficiency perspective. Even better, if you're able to statically allocate that structure at compile time (which is effectively what you'll typically be doing if you're implementing all this by hand), it might even achieve better results with a decent CTFE/inlining compiler which can recognise that the only "this" pointer the construct ever sees, is a constant, essentially converting it all back down to the same thing you'd be doing yourself, but a heck of a lot cleaner.
Apropos, generator based coroutines in Python using yield exist at least since Python 2.5, I believe. Python 3 has only expanded the functionality.
That sounds about right. I'd like to see
yield from ported back to 2.7 (or even better, the cause of my still being in 2.7 stepping up to use 3 instead), though, because doing the equivalent in current 2.7 is slightly tedious, and I have no doubt at least a little less efficient.
Python 3's async functionality, however, looks awful tasty, I do deeply miss
TCL's coroutines at times.
I do also totally understand why compiled languages like
C# go for threads over coroutines; interpreted languages like
Python and
TCL have the benefit in this regard of an inherently split stack, where
C…D and their ilk generally do not. Coroutines in such languages, end up as something more commonly called
fibres, essentially cooperative threading. Where generators, on the other hand, can readily be implemented by a compiler as an efficient state machine, requiring zero stack trickery. Once again however, this does not reduce the applicability of the more basic higher-level functionalities such as generators (which are essentially a special case of coroutines here); classes and function decomposition can be used to encapsulate the necessary state, especially for these simple cases — you do it yourself when implementing a generator the hard way.