I hate that kind of thing
It also means it is best to pass data by value in registers, so that data already in registers does not have to be written to RAM just to call a function.
Passing data by reference is an utter pain.
[C] preprocessor that would be able to generate hygienic macros
I hate that kind of thing
macmini-G4 /projects/devchain-baremetal-2022 # ./mybuild info all
m6811-elf, binutils v2.23, gcc v3.3.6-s12x
m88k-coff, binutils v2.16.1, gcc v2.95.3
m68k-elf, binutils v2.34, gcc v4.1.2-core
mips-elf, binutils v2.24, gcc v4.1.2-core
mips64-elf, binutils v2.24, gcc v4.1.2-core
powerpc-elf, binutils v2.24, gcc v4.1.2-core
powerpc-eabi, binutils v2.24, gcc v4.1.2-core
arm-eabi, binutils v2.24, gcc v4.1.2-core
sh2-elf, binutils v2.24, gcc v4.1.2-core
bugged parts: cpp0, cpp, collect....
which uses a state object to record its own state, and stack only for calling functions;
...
That's a very good technique, especially where there are many independent FSMs running simultaneously, e.g. telecom phone calls, or web connections.
Have a single FIFO containing events yet to be processed. Have one "worker thread" per core/processor dedicated to consuming an event, doing the actions, and creating another event. Each worker thread sucks the next event from the FIFO. Doesn't require co-routines. It all leads to good scaleable high performance applications.
Miro Samek's freely available ebook Practical UML Statecharts in C/C++, 2nd Ed Event-Driven Programming for Embedded Systems
https://www.state-machine.com/psicc2
describes an Active Object pattern decoupling the event producers and consumers:
https://www.state-machine.com/active-object
This Active Object-pattern can be used as an alternative to / with the system-wide event-queue.
It also means it is best to pass data by value in registers, so that data already in registers does not have to be written to RAM just to call a function.
Passing data by reference is an utter pain.Yup. The only good use is for objects, so you pass the pointer to a large structure.
If you look at efficient service daemons in Linux and Unix systems, nonblocking I/O using select()/poll() does something extremely similar:
Each connection is their own state object, and there is a simple "event" loop, handling all descriptors that are readable/writable in turn, and otherwise blocking in the select()/poll() call.
Expressing this pattern in a better way would be nice. This is also what I meant by having more than one event loop.
Python passes by assignment, which basically means that ints, floats, strs, tuples, and other immutable types are passed by value, but dictionaries, lists, sets, and mutable types and objects are passed by reference. It doesn't seem to be a problem for Python programmers, which indicates to me that the passing scheme (by value, by reference, or by assignment like in Python) doesn't seem to significantly affect the code; so simplicity should be favoured for the reasons I already outlined.
But at least you realise your statement "Probably ... no But it would be WOW!" is unduly pessimistic.
I mean I don't think I am good enough to make my-c an event-oriented language.
do you really need to design a specific language for this? Apart from the sugar coating?
do you really need to design a specific language for this? Apart from the sugar coating?
I think the right question is: how can I reduce the amount of - and tricks behind the stage - aka assembly code - I have to write to serve events-programming in a language?
If we concentrate on the main "worker" engine doing most of the processing, and ignore separate event loops which create events....
Is there any performance value in having more than one event loop per core?
Is there any simplicity/clarity value in having more than one event loop per core?
I tend to ignore Python where high performance and scalability is required, since despite having "threads" it is "crippled" by its Global Interpreter Lock. Ditto Ruby. That may be unjust, but...
If we concentrate on the main "worker" engine doing most of the processing, and ignore separate event loops which create events....
Is there any performance value in having more than one event loop per core?
Is there any simplicity/clarity value in having more than one event loop per core?Excellent questions. I don't know the answers, and the only way to find out I know of is to look at actual practical implementations.
In my opinion, this means that because at the hardware level it makes a HUGE difference, passing by value being much, MUCH more efficient, passing by value is the superior approach for low-level languages.
CALL PushLit
12345
andCALL PushU2
16bitUnsigned ; pointer to the value
So both pass by value and pass by reference.CALL PrintChar ; value is in a register
andCALL PrintStr
PointToString
In my opinion, this means that because at the hardware level it makes a HUGE difference, passing by value being much, MUCH more efficient, passing by value is the superior approach for low-level languages.Don't you need both?
Although passing by value is efficient in terms of CPU cycles, it isn't the whole picture. Passing by reference has CPU overhead but less instruction memory.
If your language has a concept of pointer, then by constructing a pointer to an object and passing that pointer by value will give you the effects of passing by reference.
There are 8-bit architectures that have very few registers and were designed to pass function arguments on the stack, but they're quite difficult to optimize code for. Most hardware today, from AVRs to ARMs to RISCV, have many general-purpose registers, so targeting those makes more sense to me anyway.
There are 8-bit architectures that have very few registers and were designed to pass function arguments on the stack
Sun UltrasSPARC
Sun UltrasSPARC
My reference is POWER10. With built-in tr-mem.
tr-mem costs silicon, thus money to us, POWER10 is *VERY* expensive, but ... I personally think it's worth it.
I mean, you'd better use silicon for tr-mem rather than for sliding registers window, plus complex out-of-order-reordering load/store units (that goes crazy, very crazy with specific pipeline instructions, which costs more silicon, more complexity in both hardware and software sides)
Comparing the SysV ABI (application binary interface, at the hardware level, assembly language) on x86-64 to most common x86 ABIs shows that passing by value, using registers to store values, and returning up to two 64-bit words, yields superior performance (regardless of language) because it minimises call overhead.
One fundamental problem with transactional memory is that doesn't scale
[...]
it worked fine with 4 chips, but despite best attempts it never worked well above that.
One fundamental problem with transactional memory is that doesn't scale
[...]
it worked fine with 4 chips, but despite best attempts it never worked well above that.
ummm, talking about my MIPS5++ experimental board, the CPU module has only 4 cores attached to a single tr-mem.
POWER10 requires a special Talos mainboard, different from the the first modern (post-2013) you can buy for POWER9 workstations. and servers, and the last POWER9 TL2WK2 comes with Two 4-core IBM POWER9 v2 CPUs.
Perhaps *only 4 cores* per package are too few. I need a bigger workstation
Another fundamental problem with transactional memory is what happens when a transactions are used inappropriately or a transaction fails to complete. That can be due to programming errors, memory errors, processor errors etc. Even if timeouts are implemented and work, how much unrelated activity is screwed until then?
Another fundamental problem with transactional memory is what happens when a transactions are used inappropriately or a transaction fails to complete. That can be due to programming errors, memory errors, processor errors etc. Even if timeouts are implemented and work, how much unrelated activity is screwed until then?
There are 8-bit architectures that have very few registers and were designed to pass function arguments on the stack, but they're quite difficult to optimize code for. Most hardware today, from AVRs to ARMs to RISCV, have many general-purpose registers, so targeting those makes more sense to me anyway.Not always true...
That was demonstrated very clearly with the Sun UltrasSPARC Niagara T series processors, which had up to 128 "cores" (and used them effectively!) in 2005-2011, when x86-64 has <8 cores.
If the values are passed on stack, they have to be loaded from the stack before they're operated on. This is the overhead one can avoid by passing by value in registers, with passing by reference implemented via pointers.