including the most difficult to be implemented
- pre increment?
- post increment?
- stack? (push/pop)?
about the first two instructions, ARM seems to have them, while m88K hasn't
and currently I have no idea about how to implement them in RISC terms
thanks
bit count instructions. like 'how many bits in this byte are 1'
i HATE RISC processors by the way. i want processors with massive single cycle instruction sets.
When i was still mucking inside silicon i built hardware instruction set expanders for an ARM7 in an FPGA.
i even had vector-like capabilities
bit count instructions. like 'how many bits in this byte are 1'
to be used for bitmap purpose ?
I can implement an instruction like it, can you suggest me an "opcode name" ?
how should it be called ?
bitcnt rt, rs1
(how many bits in register rs1 are "1"?, put the result int rt)
I can also offer bit manipulation
Advanced bit manipulation instructions or engine.
Simple things like reverse the order of all the bits, as well as more complex things like randomly programmable remapping of input to output bits. This is very slow to do in software in pretty much any CPU I can imagine:
8-bit value:
Input result
bit bit
0 -> 5
1 -> 7
2 -> 2
3 -> 0
4 -> 6
5 -> 1
6 -> 4
7 -> 3
When i was still mucking inside silicon i built hardware instruction set expanders for an ARM7 in an FPGA.
i even had vector-like capabilities
like the tensor processor unit (TPU) by SGI
?
(kidding)
do you think I'd better implement MAC (from DSP world)?
saturated operation ALU engine?
something to accelerate matrix computation?
(exactly what?)
about pre-increment and post-descrement (and stack operations)
how are they implemented within the pipeline?
in which stage?
here I Arise-v2 (my soft core) currently has a strict RISC pipeline
- fetch
- decode
- register_rd (2 registers)
- ALU/EA (multiply and division will take more cycles)
- load/store (dtack will take more cycles)
- register_wr(1 register), branch
which looks like MIPS-approch, and MIPS hasn't them
bit count instructions. like 'how many bits in this byte are 1'
to be used for bitmap purpose ?
I can implement an instruction like it, can you suggest me an "opcode name" ?
how should it be called ?
The SPARC v9 architecture defines an instruction
popc rs, rdmeaning "population count". It wasn't actually implemented until the UltraSPARC IV. This is also called the "Hamming weight" so
hwt would be a suitable mnemonic. Using a combination of bitwise operations and popc, you can implement the FF1 and FF0 operations (Find First 0 or Find First 1) which are used in hash algorithms.
I want all the instruction from a modern ARM architecture (let's say ARMv8). It has all you need and some more.
But good look implementing all of that, it is a nightmare.
how are they implemented within the pipeline?
in which stage?
Decoder issues multiple simple instructions. Push/pop typically take more cycles because of this.
I want all the instruction from a modern ARM architecture (let's say ARMv8). It has all you need and some more.
But good look implementing all of that, it is a nightmare.
hence we need a compromise
hence we need a compromise
No, you need a compromise, I'm perfectly happy using ARM
- pre increment?
- post increment?
- stack? (push/pop)?
I have no idea about how to implement them in RISC terms
When implemented for connivence you crack them within the dispatcher. However, that will require instruction group restrictions which makes optimization more difficult.
As an example,
stwu sp,-stksz(sp)
Would crack to an add & store, but there would be a FIG restriction. The store has a register dependency which can't be mitigated.
bit count instructions. like 'how many bits in this byte are 1'
i HATE RISC processors by the way. i want processors with massive single cycle instruction sets.
When i was still mucking inside silicon i built hardware instruction set expanders for an ARM7 in an FPGA.
i even had vector-like capabilities
It is a question of transistor count I suppose.
"Given a million transistors, is it better to have them arranged into a single CISC or several smaller RISCs?"
Transistor count is never the constraint, wiring and power dissipation are.
Transistor count is never the constraint, wiring and power dissipation are.
those are called "micro architectural constraints", right
?
guys, and what about addressing modes ?
e.g. should I include "scaled indexed addressing mode" ?
(currently it takes +1 extra cycle)
e.g. should I include "scaled indexed addressing mode" ?
Sure, why not?
The point is, the more you include, the more people will use. But if core is FPGA-only softcore, then it must be small in a first place.
bit count instructions. like 'how many bits in this byte are 1'
If you have 256 nibbles free (which I know is a big if), this is a simple array lookup. Not sure if it deserves its own instruction?
Not sure if it deserves its own instruction?
Everything deserves its own instruction. Why not? Apart from being annoying for chip designers, there is really no downside. None of those small extensions significantly affect the price of the device.
Array lookups need memory access, and that's expensive. Actually more expensive than to count with a simple stupid loop.
For soft cores look at Microblaze or NIOS-II. They have all you really need and noting extra. Plus they support extensions. I know it is much more fun to design cores, of course.
I want all the instruction from a modern ARM architecture (let's say ARMv8). It has all you need and some more.
But good look implementing all of that, it is a nightmare.
hence we need a compromise
Such a compromise was available starting in the 70s with several of the computer companies, mini and mainframe types, that offered WCS, writable control store, that allowed end users to define and run their own machine instructions using the same microcode method that the standard supplied instructions used.
https://en.wikipedia.org/wiki/Microcode
Such a compromise was available starting in the 70s with several of the computer companies, mini and mainframe types, that offered WCS, writable control store, that allowed end users to define and run their own machine instructions using the same microcode method that the standard supplied instructions used.
Because the compilers of the day were getting a little too good at actually using the available instructions!
I have no idea about how to implement [autoincrement/etc] in RISC terms
Presumably, since an alu operation happens in a single clock cycle, and a memory access takes more than one clock, you just do the the increment/etc while you're waiting for the memory. That's where all those weird modes of RISC operands come from, right: "We have an ALU, a barrel shifter, and ...; they should all be able to do something each clock cycle!"
The less standard thing that I wish were more common is probably bitfield instructions (combining barrel shift and AND.)
I wonder how the PDP11 or PDP10 instruction sets (both of which are nominally regiser/memory architectures) would work i you RISCified them and made it so only the load/store operations could access memory?
I want all the instruction from a modern ARM architecture
The 32-bit ARM instructions, I hope? The Cortex (thumbX only) encodings have lost any elegance, and the CM0 is almost painful in is weird limitations...