re: compact 68K code
I may be one of the few, but I like the idea of split register sets in the 68K. Compilers have no problems with A vs. D registers, and at worst it takes one extra move. With that, you can save one bit for register operand specifier. It all can add up.
...and we know that CISC ISA like the x86 can be decoded into micro-RISC-ops, so wonder what a highly tuned 68K, or for that matter, PDP-11/VAX-11 micro-architecture could be like. We can throw away the flags ~_o if they make a difference and add a couple instructions as mentioned.
before you get to your push multiple, first the core has read the vector table and fetch the first instruction of the ISR (prolog) done automatically it can often be done in parallelI'm not convinced. CM0 is listed as von Neuman architecture with both flash and ram connected to the same memory bus matrix. And it always has to save the PC, anyway, so if it could do simultaneous vector fetch (one word) and PC save (one word), it would be caught up by then, more or less. (and ... I would tend to relocate the vector table to RAM, anyway.)
QuoteI actually asked Microchip about it. They said it let them distribute binary libraries that worked across a range of chips (with identical peripherals at different locations.) That makes some sense - it's a good thing that disk space is cheap with many vendors distributing close to one library per chip. (OTOH, not entirely happy with the idea of binary-only libraries.)Microchip was defining those symbols at link timeThat's true. Although it's not a very good idea.
I like the idea of split register sets in the 68K.It seems to work OK on the 68k. Partially because there were a lot of them (16 each, right?) The crop of 8bit chips with "we have TWO index registers! PLUS a stack!" was depressing...I'm not sure that it buys you much from a hardware implementation PoV - can't you pretty much use the same instruction bit you used to pick "Address or Data" to address twice as many GP registers? (I don't quite remember which instructions were different between A/D registers.) Maybe some speed-up from having separate banks? (there's an idea for optimization: "we have 32 registers organized in 4 banks of 8. Operations that use registers from different banks can be more easily parallelized..." (Lots of CPUs have done this with Memory. The Cray-1, for instance: "write your algorithm so that you access memory at 8-word intervals", or something like that.) Or Disk (remember "interleaving"?))
wonder what a highly tuned 68K or PDP-11 ... could be like.Yeah. I wonder what the internal architecture of the more recent Coldfire chips is like; my impression is that that's about what they've done...The PDP-10 emulator "using an x86 for its microcode interpreter" apparently ran something like 6x faster than the fastest DEC ever built. (and that was a decade or two ago, I think.)
I may be one of the few, but I like the idea of split register sets in the 68K. Compilers have no problems with A vs. D registers, and at worst it takes one extra move. With that, you can save one bit for register operand specifier. It all can add up.Quote...and we know that CISC ISA like the x86 can be decoded into micro-RISC-ops, so wonder what a highly tuned 68K, or for that matter, PDP-11/VAX-11 micro-architecture could be like. We can throw away the flags ~_o if they make a difference and add a couple instructions as mentioned.
Cortex M3, M4, M7 all have 12 cycle interrupt latency (M0 has 16). It's sitting there writing those eight registers out at one per clock cycle, exactly the same as you could do yourself in software.
Motorola had junked so many processor architectures in the 2000s that it's not even funny. 88K was one, and there's also mCore. By the look of it, it should have been competitive, but when even their own phone division wouldn't use it, that's the end.
...It's that if you tie your company to them then you have a huge risk of being orphaned within a decade.
This, more than any technical superiority, is one of the things that makes RISC-V so attractive.
The 68K had ISA features like double indirect addressing which made it even worse than x86 when scaled up. The separate address and data registers was one of those features although I do not remember why now.
Motorola had junked so many processor architectures in the 2000s that it's not even funny. 88K was one
The 88000 appeared too late on the marketplace, later than MIPS and SPARC, and since it was not compatible with 68K it was not competitive at all: Amiga/classic? 68k! Atari ST? 68k! Macintosh/classic? 68k!
In short, Motorola was not happy because they had problems at selling the chip.
As far as I have understood, IBM was working on S/370 since a long while, and their researching was on the IBM 801 chip, which was the first POWER chip
If 2015 is not too late (or 2012 for Arm64) then 1990 was certainly not too late.
The IBM 801
The IBM 801
801 was a proof of concept, made in 1974. But POWER and PowerPC are derived from the evolution of this POC.
My IBM Red, Green, and Blu books point to this article.
Probably to underline that one of their men, mr.Cocke, received the Turing Award in 1987, the US National Medal of Science in 1994, and the US National Medal of Technology in 1991
See that one of our prestigious men received an award for having invented the RISC before any H&P book started talking about it
“The main idea is not to add any complexity to the machine unless it pays for itself by how frequently you would use it. And so, for example, a machine which was being used in a heavily scientific way, where floating point instructions were important, might make a different set of tradeoffs than another machine where that wasn't important. Similarly, one in which compatibility with other machines was important or in which certain types of networking was important would include different features. But in each case they ought to be done as the result of measurements of relative frequency of use and the penalty that you would pay for the inclusion or non-inclusion of a particular feature.”
Joel Birnbaum
FORMER DIRECTOR OF COMPUTER SCIENCES AT IBM
“Computer Chronicles: RISC Computers (1986),”
October 2, 1986
I compiled them the way they came.
000001b5 <main>:
1b5: e8 20 00 00 00 call 1da <__x86.get_pc_thunk.ax>
1ba: 05 3a 1e 00 00 add $0x1e3a,%eax
1bf: 8b 80 0c 00 00 00 mov 0xc(%eax),%eax
1c5: 81 88 94 00 00 00 80 orl $0x80,0x94(%eax)
1cc: 00 00 00
1cf: 81 88 c8 00 00 00 00 orl $0x1000,0xc8(%eax)
1d6: 10 00 00
1d9: c3 ret
000001da <__x86.get_pc_thunk.ax>:
1da: 8b 04 24 mov (%esp),%eax
1dd: c3 ret
00002000 <PORT>:
2000: 00 f0 add %dh,%al
08048450 <main>:
8048450: a1 c0 95 04 08 mov 0x80495c0,%eax
8048455: 83 48 30 20 orl $0x20,0x30(%eax)
8048459: 83 48 40 20 orl $0x20,0x40(%eax)
804845d: c3 ret
#include <stdio.h>
#include <stdint.h>
#define PORT_PINCFG_DRVSTR (1<<5)
struct {
struct {
struct {
uint32_t reg;
} PINCFG[16];
struct {
uint32_t reg;
} DIRSET;
} Group[10];
} *PORT = (void*)0xdecaf000;
void main(){
PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
PORT->Group[0].DIRSET.reg |= 1<<5;
}
gcc a.c -o c -save-temps -O1 -fomit-frame-pointer -masm=intel
.file "a.c"
.intel_syntax noprefix
.text
.globl main
.type main, @function
main:
mov eax, DWORD PTR PORT
or DWORD PTR [eax+48], 32
or DWORD PTR [eax+64], 32
ret
.size main, .-main
.globl PORT
.data
.align 4
.type PORT, @object
.size PORT, 4
PORT:
.long -557125632
.ident "GCC: (GNU) 4.5.0"
.section .note.GNU-stack,"",@progbits
I compiled them the way they came.
It doesn't matter if you deliberately tweaked the compiler options and offsets to make RISC-V look good, or they magically came out this way. The problem is that your tests do not reflect reality, but rather a blunder of inconsequential side effects.
If you tweak the offsets a different way
[CM0 and limitations on offsets/constants, making assembly unpleasant]
Oh come on. You not change only change the data structure (which I freely admit I made up at random, as westfw didn't provide it) to be less than 128 bytes to suit your favourite ISA, you *ALSO* change the bit offsets in the constants to be less than 8 so the masks fit in a byte. If you hadn't done *both* of those then your code would have 32 bit literals for both offset and bit mask, the same as mine, not 8 bit. You also changed the code compilation and linking model from that used by all the other ISAs, which would all benefit pretty much equally from the same change.
And you accuse me of bad faith?
Quote[CM0 and limitations on offsets/constants, making assembly unpleasant](I did specifically choose offsets and bitvalues to be "beyond" what CM0 allows.)
As another example, I *think* that the assembly for my CM0 example (the actual data structure is from Atmel SAMD21, but it's scattered across several files) can be improved by accessing the port as 8bit registers instead of 32bit. All I have to do is look really carefully at the datasheet (and test!) to see if that actually works, rewrite or obfuscate the standard definitions in ways that would confuse everyone and perhaps not be CMSIS-compatible, and remember to make sure that it remains legal if I move to a slightly different chip.
Perhaps I have a high bar for what makes a pleasant assembly language.
The $1M question is. How is my tweaking is any worse than yours?
You on the other hand worked backwards from a processor to make code that suited it.
You on the other hand worked backwards from a processor to make code that suited it.
Haven't you?
Isn't this the way it should be. When you compile for a CPU you select the settings which maximize performance for this particular CPU instead of using settings which produce the bloat. As, by your own admission, you did for Motorola.
If you haven't done this for RISC-V, why don't you tweak it so that it produces better code? Go ahead, try to beat my 14 bytes, or even get remotely close.