Author Topic: The Imperium programming language - IPL (Read 70499 times)

Kalvin · « **Reply #600 on:** December 28, 2022, 11:09:47 am »

Quote from: tggzzz on December 28, 2022, 10:59:32 am

I hate that kind of thing

I know that this idea is a bit controversial. Extending the language with an intelligent preprocessor might help when implementing new frameworks and expressing FSMs etc.

DiTBho · « **Reply #601 on:** December 28, 2022, 11:11:31 am »

Quote from: Nominal Animal on December 28, 2022, 07:23:48 am

It also means it is best to pass data by value in registers, so that data already in registers does not have to be written to RAM just to call a function.
Passing data by reference is an utter pain.

Yup. The only good use is for objects, so you pass the pointer to a large structure.

kind of f(p_obj, value);

p_obj points to a struct of { methods, properties, data-context ... }
value is passed by register or by stack.

DiTBho · « **Reply #602 on:** December 28, 2022, 11:19:20 am »

Quote from: tggzzz on December 28, 2022, 10:59:32 am

Quote from: Kalvin on December 28, 2022, 08:00:04 am
[C] preprocessor that would be able to generate hygienic macros

I hate that kind of thing

me too, indeed the preprocessor was the first thing I removed from my-c.

oh, its implementation also sucks in gcc, at least from gcc-v2.95 to gcc-v3.3.*
Yesterday I prepared a builder, and I had to *hack* gcc, because it stops working in modern Linux ecosystems.

Code: [Select]

macmini-G4 /projects/devchain-baremetal-2022 # ./mybuild info all
m6811-elf, binutils v2.23, gcc v3.3.6-s12x
m88k-coff, binutils v2.16.1, gcc v2.95.3
m68k-elf, binutils v2.34, gcc v4.1.2-core
mips-elf, binutils v2.24, gcc v4.1.2-core
mips64-elf, binutils v2.24, gcc v4.1.2-core
powerpc-elf, binutils v2.24, gcc v4.1.2-core
powerpc-eabi, binutils v2.24, gcc v4.1.2-core
arm-eabi, binutils v2.24, gcc v4.1.2-core
sh2-elf, binutils v2.24, gcc v4.1.2-core

bugged parts: cpp0, cpp, collect.

Whatever you do, don't do it that way

Nominal Animal · « **Reply #603 on:** December 28, 2022, 11:54:20 am »

Quote from: Kalvin on December 28, 2022, 11:02:27 am

Quote from: tggzzz on December 28, 2022, 10:48:37 am
Quote from: Nominal Animal on December 28, 2022, 07:23:48 am
...
which uses a state object to record its own state, and stack only for calling functions;
...

That's a very good technique, especially where there are many independent FSMs running simultaneously, e.g. telecom phone calls, or web connections.

Have a single FIFO containing events yet to be processed. Have one "worker thread" per core/processor dedicated to consuming an event, doing the actions, and creating another event. Each worker thread sucks the next event from the FIFO. Doesn't require co-routines. It all leads to good scaleable high performance applications.

Miro Samek's freely available ebook Practical UML Statecharts in C/C++, 2nd Ed Event-Driven Programming for Embedded Systems
https://www.state-machine.com/psicc2
describes an Active Object pattern decoupling the event producers and consumers:
https://www.state-machine.com/active-object

This Active Object-pattern can be used as an alternative to / with the system-wide event-queue.

If you look at efficient service daemons in Linux and Unix systems, nonblocking I/O using select()/poll() does something extremely similar:
Each connection is their own state object, and there is a simple "event" loop, handling all descriptors that are readable/writable in turn, and otherwise blocking in the select()/poll() call.

Expressing this pattern in a better way would be nice. This is also what I meant by having more than one event loop.

Quote from: DiTBho on December 28, 2022, 11:11:31 am

Quote from: Nominal Animal on December 28, 2022, 07:23:48 am
It also means it is best to pass data by value in registers, so that data already in registers does not have to be written to RAM just to call a function.
Passing data by reference is an utter pain.
Yup. The only good use is for objects, so you pass the pointer to a large structure.

Or to an array, yes.

I do prefer the C way of explicitly defining references (pointers) as separate types, and believe the C++ way of letting the function signature describe whether it takes a value or a reference to a value is likelier to lead to programmer errors. The latter seems like added complexity just to save the programmer from having to write an extra address-of operator in each call.

Python passes by assignment, which basically means that ints, floats, strs, tuples, and other immutable types are passed by value, but dictionaries, lists, sets, and mutable types and objects are passed by reference. It doesn't seem to be a problem for Python programmers, which indicates to me that the passing scheme (by value, by reference, or by assignment like in Python) doesn't seem to significantly affect the code; so simplicity should be favoured for the reasons I already outlined.

tggzzz · « **Reply #604 on:** December 28, 2022, 02:34:24 pm »

Quote from: Nominal Animal on December 28, 2022, 11:54:20 am

If you look at efficient service daemons in Linux and Unix systems, nonblocking I/O using select()/poll() does something extremely similar:
Each connection is their own state object, and there is a simple "event" loop, handling all descriptors that are readable/writable in turn, and otherwise blocking in the select()/poll() call.

Expressing this pattern in a better way would be nice. This is also what I meant by having more than one event loop.

If we concentrate on the main "worker" engine doing most of the processing, and ignore separate event loops which create events....

Is there any performance value in having more than one event loop per core?
Is there any simplicity/clarity value in having more than one event loop per core?

Quote

Python passes by assignment, which basically means that ints, floats, strs, tuples, and other immutable types are passed by value, but dictionaries, lists, sets, and mutable types and objects are passed by reference. It doesn't seem to be a problem for Python programmers, which indicates to me that the passing scheme (by value, by reference, or by assignment like in Python) doesn't seem to significantly affect the code; so simplicity should be favoured for the reasons I already outlined.

I tend to ignore Python where high performance and scalability is required, since despite having "threads" it is "crippled" by its Global Interpreter Lock. Ditto Ruby. That may be unjust, but...

SiliconWizard · « **Reply #605 on:** December 28, 2022, 08:20:15 pm »

Quote from: DiTBho on December 28, 2022, 12:55:55 am

Quote from: tggzzz on December 27, 2022, 10:04:40 pm
But at least you realise your statement "Probably ... no But it would be WOW!" is unduly pessimistic.

I mean I don't think I am good enough to make my-c an event-oriented language.

Question being, as I mentioned earlier, do you really need to design a specific language for this? Apart from the sugar coating?

One point that has been raised and would indeed require to be embedded in the language itself would be the possibility of not using a stack (although I have mixed feelings about that), which you can't usually control with existing languages (unless you use assembly directly.)

Admittedly, another point would be static analysis though. If the language is designed to be "event-oriented", then it becomes possible to statically analyze event handling and prove some level of correctness, while it's a lost cause with classic imperative languages.

DiTBho · « **Reply #606 on:** December 29, 2022, 12:49:36 pm »

Quote from: SiliconWizard on December 28, 2022, 08:20:15 pm

do you really need to design a specific language for this? Apart from the sugar coating?

I think the right question is: how can I reduce the amount of - and tricks behind the stage - aka assembly code - I have to write to serve events-programming in a language?

tggzzz · « **Reply #607 on:** December 29, 2022, 01:42:50 pm »

Quote from: DiTBho on December 29, 2022, 12:49:36 pm

Quote from: SiliconWizard on December 28, 2022, 08:20:15 pm
do you really need to design a specific language for this? Apart from the sugar coating?

I think the right question is: how can I reduce the amount of - and tricks behind the stage - aka assembly code - I have to write to serve events-programming in a language?

No more than is required for an RTOS with cooperative scheduling, preferably plus a keyword to indicate waiting for any of several events. I know that since I used one such environment in C on a Z80, 40 years ago. So that was K&R C as documented in the two available books.

For a suitable keyword, see the xC "select" statement as I outlined earlier - i.e. a switch where the cases are events that are awaited.

You could make it arbitrarily complicated, of course, but that would be inelegant (just like C++

).

Nominal Animal · « **Reply #608 on:** December 29, 2022, 11:04:35 pm »

Quote from: tggzzz on December 28, 2022, 02:34:24 pm

If we concentrate on the main "worker" engine doing most of the processing, and ignore separate event loops which create events....

Is there any performance value in having more than one event loop per core?
Is there any simplicity/clarity value in having more than one event loop per core?

Excellent questions. I don't know the answers, and the only way to find out I know of is to look at actual practical implementations.

Quote from: tggzzz on December 28, 2022, 02:34:24 pm

I tend to ignore Python where high performance and scalability is required, since despite having "threads" it is "crippled" by its Global Interpreter Lock. Ditto Ruby. That may be unjust, but...

Oh, I only meant wrt. syntax: that whether parameters are passed by value or by reference (or by assignment, a "mix" of the two, as in Python), does not seem to affect the number of bugs in the code.

In my opinion, this means that because at the hardware level it makes a HUGE difference, passing by value being much, MUCH more efficient, passing by value is the superior approach for low-level languages.

tggzzz · « **Reply #609 on:** December 30, 2022, 12:15:40 am »

Quote from: Nominal Animal on December 29, 2022, 11:04:35 pm

Quote from: tggzzz on December 28, 2022, 02:34:24 pm
If we concentrate on the main "worker" engine doing most of the processing, and ignore separate event loops which create events....

Is there any performance value in having more than one event loop per core?
Is there any simplicity/clarity value in having more than one event loop per core?
Excellent questions. I don't know the answers, and the only way to find out I know of is to look at actual practical implementations.

Well, I surprised everybody by the performance of a half-sync half-async telecom event processor running on a Sun Niagara T1 with about one worker thread per core. "About" because I left a few cores for the GC and and input events and o/s stuff. It helped that the T.series processors have/had 64 SMT cores, each running at DRAM speed with no caches

As for implementations in the embedded niche, look at the xCORE processors architecture and how it is used with xC.

MIS42N · « **Reply #610 on:** December 30, 2022, 01:12:18 am »

Quote from: Nominal Animal on December 29, 2022, 11:04:35 pm

In my opinion, this means that because at the hardware level it makes a HUGE difference, passing by value being much, MUCH more efficient, passing by value is the superior approach for low-level languages.

Don't you need both? Although I write assembler, I would think passing data in a higher level language would be similar. I have arithmetic stack operations, the two relevant subroutines to get stuff on the stack are:

Code: [Select]

CALL PushLit
12345

and

Code: [Select]

CALL PushU2
16bitUnsigned ; pointer to the value

So both pass by value and pass by reference.

For generating output the relevant subroutines are

Code: [Select]

CALL PrintChar ; value is in a registerand

Code: [Select]

CALL PrintStr
PointToString

Although passing by value is efficient in terms of CPU cycles, it isn't the whole picture. Passing by reference has CPU overhead but less instruction memory. Sometimes memory is the constraint. I don't think one can say one method is superior to the other, they both have advantages.

Nominal Animal · « **Reply #611 on:** December 30, 2022, 04:10:13 am »

Quote from: MIS42N on December 30, 2022, 01:12:18 am

Quote from: Nominal Animal on December 29, 2022, 11:04:35 pm
In my opinion, this means that because at the hardware level it makes a HUGE difference, passing by value being much, MUCH more efficient, passing by value is the superior approach for low-level languages.
Don't you need both?

If your language has a concept of pointer, then by constructing a pointer to an object and passing that pointer by value will give you the effects of passing by reference.

It was OP who suggested passing everything by reference. This means that to pass a variable, one would need to store that variable in RAM first, before calling a function.

Comparing the SysV ABI (application binary interface, at the hardware level, assembly language) on x86-64 to most common x86 ABIs shows that passing by value, using registers to store values, and returning up to two 64-bit words, yields superior performance (regardless of language) because it minimises call overhead.

Quote from: MIS42N on December 30, 2022, 01:12:18 am

Although passing by value is efficient in terms of CPU cycles, it isn't the whole picture. Passing by reference has CPU overhead but less instruction memory.

No, that claim does not seem to be supported by practical evidence.

There are several calling conventions on x86 and x86-64, some that pass parameters on the stack, and some that pass parameters in registers. If your claim was true, then passing parameters on stack would yield shorter functions than passing parameters in registers. The opposite tends to be true. Passing parameters in registers is also significantly faster on both architectures, but that difference is only visible on hardware that has caches and memory access slower than register access.

(If we consider passing parameters in registers by reference instead of by value, then the difference is significant: passing by reference involves an extra indirection.)

There are 8-bit architectures that have very few registers and were designed to pass function arguments on the stack, but they're quite difficult to optimize code for. Most hardware today, from AVRs to ARMs to RISCV, have many general-purpose registers, so targeting those makes more sense to me anyway.

MIS42N · « **Reply #612 on:** December 30, 2022, 08:43:00 am »

Quote from: Nominal Animal on December 30, 2022, 04:10:13 am

If your language has a concept of pointer, then by constructing a pointer to an object and passing that pointer by value will give you the effects of passing by reference.

You are right. If you take a pointer to be a value, then I pass everything by value.

tggzzz · « **Reply #613 on:** December 30, 2022, 10:21:09 am »

Quote from: Nominal Animal on December 30, 2022, 04:10:13 am

There are 8-bit architectures that have very few registers and were designed to pass function arguments on the stack, but they're quite difficult to optimize code for. Most hardware today, from AVRs to ARMs to RISCV, have many general-purpose registers, so targeting those makes more sense to me anyway.

Not always true...

If you match main memory speed to core speed, then

that code optimisation advantage disappears
the enormous amount of hardware devoted to caches, pipelining and superscalar operation simply evaporates into thin air. That means the silicon can be used for more productive computations
simpler hardware is easier to get right
much improved Watt/MIPS

That was demonstrated very clearly with the Sun UltrasSPARC Niagara T series processors, which had up to 128 "cores" (and used them effectively!) in 2005-2011, when x86-64 has <8 cores.

It is also the case in the xCORE devices, which have 32 cores for ~£25 (one off price at DigiKey).

Of course while such processors have superb aggregate throughput, their single-thread throughput is comparatively anaemic. But for "embarassingly parallel" servers and embedded applications, that's a good trade off.

DiTBho · « **Reply #614 on:** December 30, 2022, 10:58:16 am »

Quote from: Nominal Animal on December 30, 2022, 04:10:13 am

There are 8-bit architectures that have very few registers and were designed to pass function arguments on the stack

from 6800, to 6809, including 68hc11

DiTBho · « **Reply #615 on:** December 30, 2022, 11:04:03 am »

Quote from: tggzzz on December 30, 2022, 10:21:09 am

Sun UltrasSPARC

My reference is POWER10(1). With built-in tr-mem.
tr-mem costs silicon, thus money to us, POWER10 is *VERY* expensive, but ... I personally think it's worth it.

I mean, you'd better use silicon for tr-mem rather than for sliding registers window, plus complex out-of-order-reordering load/store units (that goes crazy, very crazy with specific pipeline instructions, which costs more silicon, more complexity in both hardware and software sides)

edit:
tr-mem, removed in Power ISA >= v.3.1

tggzzz · « **Reply #616 on:** December 30, 2022, 11:13:47 am »

Quote from: DiTBho on December 30, 2022, 11:04:03 am

Quote from: tggzzz on December 30, 2022, 10:21:09 am
Sun UltrasSPARC

My reference is POWER10. With built-in tr-mem.
tr-mem costs silicon, thus money to us, POWER10 is *VERY* expensive, but ... I personally think it's worth it.

I mean, you'd better use silicon for tr-mem rather than for sliding registers window, plus complex out-of-order-reordering load/store units (that goes crazy, very crazy with specific pipeline instructions, which costs more silicon, more complexity in both hardware and software sides)

One fundamental problem with transactional memory is that doesn't scale. Sooner or later too much time is eaten up by transaction-related activity. That was clearly demonstrated with the cache snooping on Athlon system: it worked fine with 4 chips, but despite best attempts it never worked well above that.

Another fundamental problem with transactional memory is what happens when a transactions are used inappropriately or a transaction fails to complete. That can be due to programming errors, memory errors, processor errors etc. Even if timeouts are implemented and work, how much unrelated activity is screwed until then?

DiTBho · « **Reply #617 on:** December 30, 2022, 11:14:46 am »

Quote from: Nominal Animal on December 30, 2022, 04:10:13 am

Comparing the SysV ABI (application binary interface, at the hardware level, assembly language) on x86-64 to most common x86 ABIs shows that passing by value, using registers to store values, and returning up to two 64-bit words, yields superior performance (regardless of language) because it minimises call overhead.

It's the same with PowerPC(1) and MIPS(2). I can confirm it's largely used (even by Windriver's and Green Hills(1)' toolchains) because it minimises call overhead.

(1) { C, C++, Ada } Optimizing Compilers - Green Hills.
(2) Microsoft C/C++ Optimizing Compilers for Pulsar PDA devices (2000s, WindowsCE-1.*)

DiTBho · « **Reply #618 on:** December 30, 2022, 11:38:52 am »

Quote from: tggzzz on December 30, 2022, 11:13:47 am

One fundamental problem with transactional memory is that doesn't scale
[...]
it worked fine with 4 chips, but despite best attempts it never worked well above that.

ummm, talking about my MIPS5++ experimental board, the CPU module has only 4 cores attached to a single tr-mem.

POWER10 requires a special Talos mainboard, different from the the first modern (post-2013) you can buy for POWER9 workstations. and servers, and the last POWER9 TL2WK2 comes with Two 4-core IBM POWER9 v2 CPUs.

Perhaps *only 4 cores* per package are too few. I need a bigger workstation

tggzzz · « **Reply #619 on:** December 30, 2022, 12:13:57 pm »

Quote from: DiTBho on December 30, 2022, 11:38:52 am

Quote from: tggzzz on December 30, 2022, 11:13:47 am
One fundamental problem with transactional memory is that doesn't scale
[...]
it worked fine with 4 chips, but despite best attempts it never worked well above that.

ummm, talking about my MIPS5++ experimental board, the CPU module has only 4 cores attached to a single tr-mem.

POWER10 requires a special Talos mainboard, different from the the first modern (post-2013) you can buy for POWER9 workstations. and servers, and the last POWER9 TL2WK2 comes with Two 4-core IBM POWER9 v2 CPUs.

Perhaps *only 4 cores* per package are too few. I need a bigger workstation

Where the scalability cliff is depends on many fine details; there are no simple extrapolations. Nonetheless, the cliff is there, somewhere.

What about this other point?...

Quote from: tggzzz on December 30, 2022, 11:13:47 am

Another fundamental problem with transactional memory is what happens when a transactions are used inappropriately or a transaction fails to complete. That can be due to programming errors, memory errors, processor errors etc. Even if timeouts are implemented and work, how much unrelated activity is screwed until then?

DiTBho · « **Reply #620 on:** December 30, 2022, 01:13:19 pm »

Quote from: tggzzz on December 30, 2022, 11:13:47 am

Another fundamental problem with transactional memory is what happens when a transactions are used inappropriately or a transaction fails to complete. That can be due to programming errors, memory errors, processor errors etc. Even if timeouts are implemented and work, how much unrelated activity is screwed until then?

Talking about the MIPS5++ cpu board installed on the MIPS.inc Atlas board, the tr-mem is built-in the CPU module.

my-c supports native tr-mem, it's a true type, with a defined behavior and memory model. This "saves" from misuse.

(well, with "save", I mean ... necessary, not strictly sufficient, but decently sufficient if you don't do too weird things)

I followed all the points reported in the CPU documentation, and I did my best to achieve these points

Never cached, tr-mem is not cached. MIPS can kseg address things, bypassing the cache. I chose this working scheme because when caches are used, the tr-mem may introduce the risk of "false conflicts" due to the use of cache line granularity ... whose handling adds more complexities therefore problems and bugs
Memory (parity) errors trigger hw exceptions, covered by the exception software side of the machine, covered by my-c
Memory conflicts are managed by tr-mem, it's hardware stuff, if it fails ... it's hw severe failure
Processor errors (except too severe) trigger hw exceptions, covered by the exception software side of the machine, covered by my-c.
Severe processor errors (damaged hw (1)) trigger voter reaction. The CPU is forced into disable channel and halt.

At the moment the my-c policy for tr-mem failed transaction is:
do nothing if it fails, let's exceptions do their resume job (unless it's a too severe hw failure) abort and try again (it allows non-blocking mode). and if, and only if, it's too severe (when voter reacts it's for a "severe hw failure"), propagate "hw severe failure" to the CPU port, wait4ever, disable channel and halt
This way you are guaranteed that a transaction will either be a success or a failure, never nothing in the middle, never nothing undefined.

(1) damaged hw is detected by two methods
- voter periodically cbit
- random internal cbit

cbit = continuous integrated tests. Sort of..."what is the result of (0xff + 0x01)|size32bit? ... I expect 0x100, right? no? your ALU is bad!!!". Done by machine exceptions.

SiliconWizard · « **Reply #621 on:** December 30, 2022, 07:27:37 pm »

Intel has added support for transactional memory with their TSX extension: https://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions

It was added in Xeon processors years ago but was apparently defective, and the latest working version of TSX is in the current Skylake series. I don't know if any software/OS makes use of this extension yet.

Nominal Animal · « **Reply #622 on:** December 31, 2022, 12:47:23 am »

Quote from: tggzzz on December 30, 2022, 10:21:09 am

Quote from: Nominal Animal on December 30, 2022, 04:10:13 am
There are 8-bit architectures that have very few registers and were designed to pass function arguments on the stack, but they're quite difficult to optimize code for. Most hardware today, from AVRs to ARMs to RISCV, have many general-purpose registers, so targeting those makes more sense to me anyway.
Not always true...

Well, I do think 12 general purpose registers (xCORE-200 XS2 ISA (PDF)) is plenty!

But, sure, there are exceptions to any rule of thumb. XCore is definitely one.

One optimization difficulty is how data has to be shuffled around between function calls. When memory access is equally fast as register access –– so it doesn't matter whether you shuffle data in registers, or between registers and memory ––, then optimizing such shuffling becomes simple.

However, even in the xCORE XS2 ISA, arithmetic and logical operations are done in registers, with any of the 12 general purpose registers used as source and destination registers. A function that does say additions, multiplications, and divisions between a few values, can do so directly if the values are passed in registers. If the values are passed on stack, they have to be loaded from the stack before they're operated on. This is the overhead one can avoid by passing by value in registers, with passing by reference implemented via pointers.

Quote from: tggzzz on December 30, 2022, 10:21:09 am

That was demonstrated very clearly with the Sun UltrasSPARC Niagara T series processors, which had up to 128 "cores" (and used them effectively!) in 2005-2011, when x86-64 has <8 cores.

Yep. I personally like the idea of asymmetric multiprocessing a lot, and would love to play with such hardware. Alas, the only ones I have are mixed Cortex-A cores (ARM big.LITTLE).

We can see from the increasing complexity of peripherals on even cheap ARM Cortex-M microcontrollers, and things like Raspberry Pi Pico/RP2040 (and of course TI AM335x with its PRUs) that we're slowly going that way anyway.

Personally, I'd love to have tiny programmable cores with just a basic ALU – addition, subtraction, comparison, and bit operations, with just a couple of run-time registers – and access to reading and writing from the main memory, and the ability to raise an interrupt in the main core. Heck, all the buses I use could be implemented with one.

I looked at XEF216-512-TQ128-C20A (23.16€ in singles at Digikey, in stock), and the problem I have with it is that it is too powerful!

I fully understand why XMOS developed xC, a variant of C with additions for tile computing; something like this would be perfect for developing a new event-based programming language, since it already has the hardware support for it.

For now, however, I have my own sights set much lower: replacing the standard C library with something more suitable for my needs, to be used in a mixed freestanding C/C++ environment. This is well defined in the standards, and there are more than one compiler I can use it with (specifically, GCC and Clang on different architectures, including 8-bit (AVR), 32-bit (ARM, x86), and 64-bit (x86-64, Aarch64/ARM64) ones).

MIS42N · « **Reply #623 on:** December 31, 2022, 01:00:37 am »

Is this is going off topic. Xeon processors, MIPS, Athlon, power10?

I'm thinking a microcontroller has less CPU grunt but is augmented by a few onboard PWMs, extra timers, voltage references, comparators, ADC and DAC ability, USARTs, onboard USB, I2C, SPI, WDT, writable permanent memory, sleep modes and commonly low powered so it can be run off battery power. There may be some offloading of functions such as DMA and gated timers. And commonly just 1 CPU.

How do you then optimize for different architectures - AVR with multiple registers but transfers to memory are load and store - PIC with one working register but instructions like INCFSZ (increment memory and skip if zero) that doesn't touch the register at all. And maybe optimize for speed or optimize for minimum program instructions but in the same program.

Or was 'microcontroller' relegated SYS$BLACKHOLE (sorry - my VMS background creeping in).

Perhaps the best course is to wait for Sherlock to produce the language, and see if what benefits (if any) it brings. And with something to work with, then make comment to improve (if feasible).

DiTBho · « **Reply #624 on:** December 31, 2022, 02:04:18 am »

Quote from: Nominal Animal on December 31, 2022, 12:47:23 am

If the values are passed on stack, they have to be loaded from the stack before they're operated on. This is the overhead one can avoid by passing by value in registers, with passing by reference implemented via pointers.

a little hw-trick I see on the MIPS5++ CPU module on the Atlas board: it uses 128Kbyte synchronous ram which always takes 1 clock (@80Mhz) to complete every read/write operation. It's kseg mapped (not cached).

Nice trick! This way the stack is as fast as registers


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: The Imperium programming language - IPL (Read 70499 times)

Share me