Author Topic: if U were a RISC assembly programmer which instructions would U want to have ? (Read 21143 times)

legacy · « **on:** February 29, 2016, 06:17:25 pm »

including the most difficult to be implemented

pre increment?
post increment?
stack? (push/pop)?

about the first two instructions, ARM seems to have them, while m88K hasn't
and currently I have no idea about how to implement them in RISC terms

thanks

free_electron · « **Reply #1 on:** February 29, 2016, 07:02:57 pm »

bit count instructions. like 'how many bits in this byte are 1'
i HATE RISC processors by the way. i want processors with massive single cycle instruction sets.

When i was still mucking inside silicon i built hardware instruction set expanders for an ARM7 in an FPGA.
i even had vector-like capabilities

legacy · « **Reply #2 on:** February 29, 2016, 07:08:12 pm »

Quote from: free_electron on February 29, 2016, 07:02:57 pm

bit count instructions. like 'how many bits in this byte are 1'

to be used for bitmap purpose ?

I can implement an instruction like it, can you suggest me an "opcode name" ?
how should it be called ?

Code: [Select]

bitcnt rt, rs1

(how many bits in register rs1 are "1"?, put the result int rt)

I can also offer bit manipulation

clear
set
get
toggle

tesla500 · « **Reply #3 on:** February 29, 2016, 07:10:26 pm »

Advanced bit manipulation instructions or engine.

Simple things like reverse the order of all the bits, as well as more complex things like randomly programmable remapping of input to output bits. This is very slow to do in software in pretty much any CPU I can imagine:

8-bit value:
Input result
bit bit
0 -> 5
1 -> 7
2 -> 2
3 -> 0
4 -> 6
5 -> 1
6 -> 4
7 -> 3

legacy · « **Reply #4 on:** February 29, 2016, 07:17:36 pm »

Quote from: free_electron on February 29, 2016, 07:02:57 pm

When i was still mucking inside silicon i built hardware instruction set expanders for an ARM7 in an FPGA.
i even had vector-like capabilities

like the tensor processor unit (TPU) by SGI

?
(kidding)

do you think I'd better implement MAC (from DSP world)?
saturated operation ALU engine?

something to accelerate matrix computation?
(exactly what?)

legacy · « **Reply #5 on:** February 29, 2016, 07:18:40 pm »

about pre-increment and post-descrement (and stack operations)
how are they implemented within the pipeline?
in which stage?

here I Arise-v2 (my soft core) currently has a strict RISC pipeline

fetch
decode
register_rd (2 registers)
ALU/EA (multiply and division will take more cycles)
load/store (dtack will take more cycles)
register_wr(1 register), branch

which looks like MIPS-approch, and MIPS hasn't them

helius · « **Reply #6 on:** February 29, 2016, 07:33:26 pm »

Quote from: legacy on February 29, 2016, 07:08:12 pm

Quote from: free_electron on February 29, 2016, 07:02:57 pm
bit count instructions. like 'how many bits in this byte are 1'
to be used for bitmap purpose ?

I can implement an instruction like it, can you suggest me an "opcode name" ?
how should it be called ?

The SPARC v9 architecture defines an instruction
popc rs, rd
meaning "population count". It wasn't actually implemented until the UltraSPARC IV. This is also called the "Hamming weight" so hwt would be a suitable mnemonic. Using a combination of bitwise operations and popc, you can implement the FF1 and FF0 operations (Find First 0 or Find First 1) which are used in hash algorithms.

ataradov · « **Reply #7 on:** February 29, 2016, 07:39:19 pm »

I want all the instruction from a modern ARM architecture (let's say ARMv8). It has all you need and some more.

But good look implementing all of that, it is a nightmare.

ataradov · « **Reply #8 on:** February 29, 2016, 07:41:22 pm »

Quote from: legacy on February 29, 2016, 07:18:40 pm

how are they implemented within the pipeline?
in which stage?

Decoder issues multiple simple instructions. Push/pop typically take more cycles because of this.

legacy · « **Reply #9 on:** February 29, 2016, 07:54:49 pm »

Quote from: ataradov on February 29, 2016, 07:39:19 pm

I want all the instruction from a modern ARM architecture (let's say ARMv8). It has all you need and some more.

But good look implementing all of that, it is a nightmare.

hence we need a compromise

legacy · « **Reply #10 on:** February 29, 2016, 07:56:45 pm »

Quote from: helius on February 29, 2016, 07:33:26 pm

Find First 0 or Find First 1) which are used in hash algorithms.

already implemented

ataradov · « **Reply #11 on:** February 29, 2016, 08:01:10 pm »

Quote from: legacy on February 29, 2016, 07:54:49 pm

hence we need a compromise

No, you need a compromise, I'm perfectly happy using ARM

blacksheeplogic · « **Reply #12 on:** February 29, 2016, 08:35:06 pm »

Quote from: legacy on February 29, 2016, 06:17:25 pm

pre increment?
post increment?
stack? (push/pop)?

I have no idea about how to implement them in RISC terms

When implemented for connivence you crack them within the dispatcher. However, that will require instruction group restrictions which makes optimization more difficult.

As an example,
stwu sp,-stksz(sp)

Would crack to an add & store, but there would be a FIG restriction. The store has a register dependency which can't be mitigated.

HAL-42b · « **Reply #13 on:** February 29, 2016, 08:55:51 pm »

Quote from: free_electron on February 29, 2016, 07:02:57 pm

bit count instructions. like 'how many bits in this byte are 1'
i HATE RISC processors by the way. i want processors with massive single cycle instruction sets.

When i was still mucking inside silicon i built hardware instruction set expanders for an ARM7 in an FPGA.
i even had vector-like capabilities

It is a question of transistor count I suppose.

"Given a million transistors, is it better to have them arranged into a single CISC or several smaller RISCs?"

helius · « **Reply #14 on:** February 29, 2016, 09:10:58 pm »

Transistor count is never the constraint, wiring and power dissipation are.

legacy · « **Reply #15 on:** February 29, 2016, 10:34:14 pm »

Quote from: helius on February 29, 2016, 09:10:58 pm

Transistor count is never the constraint, wiring and power dissipation are.

those are called "micro architectural constraints", right

?

legacy · « **Reply #16 on:** February 29, 2016, 11:01:16 pm »

guys, and what about addressing modes ?
e.g. should I include "scaled indexed addressing mode" ?
(currently it takes +1 extra cycle)

ataradov · « **Reply #17 on:** February 29, 2016, 11:31:52 pm »

Quote from: legacy on February 29, 2016, 11:01:16 pm

e.g. should I include "scaled indexed addressing mode" ?

Sure, why not?

The point is, the more you include, the more people will use. But if core is FPGA-only softcore, then it must be small in a first place.

rs20 · « **Reply #18 on:** February 29, 2016, 11:36:16 pm »

Quote from: free_electron on February 29, 2016, 07:02:57 pm

bit count instructions. like 'how many bits in this byte are 1'

If you have 256 nibbles free (which I know is a big if), this is a simple array lookup. Not sure if it deserves its own instruction?

ataradov · « **Reply #19 on:** February 29, 2016, 11:39:30 pm »

Quote from: rs20 on February 29, 2016, 11:36:16 pm

Not sure if it deserves its own instruction?

Everything deserves its own instruction. Why not? Apart from being annoying for chip designers, there is really no downside. None of those small extensions significantly affect the price of the device.

Array lookups need memory access, and that's expensive. Actually more expensive than to count with a simple stupid loop.

legacy · « **Reply #20 on:** March 01, 2016, 12:42:35 am »

Quote from: ataradov on February 29, 2016, 11:31:52 pm

FPGA-only soft core

yep, FPGA-only soft core

ataradov · « **Reply #21 on:** March 01, 2016, 12:43:00 am »

For soft cores look at Microblaze or NIOS-II. They have all you really need and noting extra. Plus they support extensions. I know it is much more fun to design cores, of course.

retrolefty · « **Reply #22 on:** March 01, 2016, 01:02:43 am »

Quote from: legacy on February 29, 2016, 07:54:49 pm

Quote from: ataradov on February 29, 2016, 07:39:19 pm
I want all the instruction from a modern ARM architecture (let's say ARMv8). It has all you need and some more.

But good look implementing all of that, it is a nightmare.

hence we need a compromise

Such a compromise was available starting in the 70s with several of the computer companies, mini and mainframe types, that offered WCS, writable control store, that allowed end users to define and run their own machine instructions using the same microcode method that the standard supplied instructions used.

https://en.wikipedia.org/wiki/Microcode

helius · « **Reply #23 on:** March 01, 2016, 01:08:34 am »

Quote from: retrolefty on March 01, 2016, 01:02:43 am

Such a compromise was available starting in the 70s with several of the computer companies, mini and mainframe types, that offered WCS, writable control store, that allowed end users to define and run their own machine instructions using the same microcode method that the standard supplied instructions used.

Because the compilers of the day were getting a little too good at actually using the available instructions!

westfw · « **Reply #24 on:** March 01, 2016, 01:58:52 am »

Quote

I have no idea about how to implement [autoincrement/etc] in RISC terms

Presumably, since an alu operation happens in a single clock cycle, and a memory access takes more than one clock, you just do the the increment/etc while you're waiting for the memory. That's where all those weird modes of RISC operands come from, right: "We have an ALU, a barrel shifter, and ...; they should all be able to do something each clock cycle!"

The less standard thing that I wish were more common is probably bitfield instructions (combining barrel shift and AND.)

I wonder how the PDP11 or PDP10 instruction sets (both of which are nominally regiser/memory architectures) would work i you RISCified them and made it so only the load/store operations could access memory?

Quote

I want all the instruction from a modern ARM architecture

The 32-bit ARM instructions, I hope? The Cortex (thumbX only) encodings have lost any elegance, and the CM0 is almost painful in is weird limitations...


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: if U were a RISC assembly programmer which instructions would U want to have ? (Read 21143 times)

Share me