Author Topic: if U were a RISC assembly programmer which instructions would U want to have ?  (Read 21143 times)

0 Members and 1 Guest are viewing this topic.

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
including the most difficult to be implemented
  • pre increment?
  • post increment?
  • stack? (push/pop)?

about the first two instructions, ARM seems to have them, while m88K hasn't
and currently I have no idea about how to implement them in RISC terms

thanks  :D
 

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8518
  • Country: us
    • SiliconValleyGarage
bit count instructions.  like 'how many bits in this byte are 1'
i HATE RISC processors by the way. i want processors with massive single cycle instruction sets.

When i was still mucking inside silicon i built hardware instruction set expanders for an ARM7 in an FPGA.
i even had vector-like capabilities
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
bit count instructions.  like 'how many bits in this byte are 1'

to be used for bitmap purpose ?

I can implement an instruction like it, can you suggest me an "opcode name" ?
how should it be called ?

Code: [Select]
bitcnt rt, rs1
(how many bits in register rs1 are "1"?, put the result int rt)


I can also offer bit manipulation

  • clear
  • set
  • get
  • toggle
 

Offline tesla500

  • Regular Contributor
  • *
  • Posts: 149
Advanced bit manipulation instructions or engine.

Simple things like reverse the order of all the bits, as well as more complex things like randomly programmable remapping of input to output bits. This is very slow to do in software in pretty much any CPU I can imagine:

8-bit value:
Input   result
bit      bit
0  ->    5
1  ->    7
2  ->    2
3  ->    0
4  ->    6
5  ->    1
6  ->    4
7  ->    3

 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
When i was still mucking inside silicon i built hardware instruction set expanders for an ARM7 in an FPGA.
i even had vector-like capabilities

like the tensor processor unit (TPU) by SGI  :D ?
(kidding)


do you think I'd better implement MAC (from DSP world)?
saturated operation ALU engine?

something to accelerate matrix computation?
(exactly what?)

 
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
about pre-increment and post-descrement (and stack operations)
how are they implemented within the pipeline?
in which stage?

here I Arise-v2 (my soft core) currently has a strict RISC pipeline

  • fetch
  • decode
  • register_rd (2 registers)
  • ALU/EA (multiply and division will take more cycles)
  • load/store (dtack will take more cycles)
  • register_wr(1 register), branch

which looks like MIPS-approch, and MIPS hasn't them
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3644
  • Country: us
bit count instructions.  like 'how many bits in this byte are 1'
to be used for bitmap purpose ?

I can implement an instruction like it, can you suggest me an "opcode name" ?
how should it be called ?
The SPARC v9 architecture defines an instruction
popc rs, rd
meaning "population count". It wasn't actually implemented until the UltraSPARC IV. This is also called the "Hamming weight" so hwt would be a suitable mnemonic. Using a combination of bitwise operations and popc, you can implement the FF1 and FF0 operations (Find First 0 or Find First 1) which are used in hash algorithms.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11281
  • Country: us
    • Personal site
I want all the instruction from a modern ARM architecture (let's say ARMv8). It has all you need and some more.

But good look implementing all of that, it is a nightmare.
Alex
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11281
  • Country: us
    • Personal site
how are they implemented within the pipeline?
in which stage?
Decoder issues multiple simple instructions. Push/pop typically take more cycles because of this.
Alex
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
I want all the instruction from a modern ARM architecture (let's say ARMv8). It has all you need and some more.

But good look implementing all of that, it is a nightmare.

hence we need a compromise  :D
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Find First 0 or Find First 1) which are used in hash algorithms.

already implemented :D
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11281
  • Country: us
    • Personal site
hence we need a compromise  :D
No, you need a compromise, I'm perfectly happy using ARM :)
Alex
 

Offline blacksheeplogic

  • Frequent Contributor
  • **
  • Posts: 532
  • Country: nz
  • pre increment?
  • post increment?
  • stack? (push/pop)?

I have no idea about how to implement them in RISC terms

When implemented for connivence you crack them within the dispatcher. However, that will require instruction group restrictions which makes optimization more difficult.

As an example,
         stwu sp,-stksz(sp)

         Would crack to an add & store, but there would be a FIG restriction. The store has a register dependency which can't be mitigated.
 

Offline HAL-42b

  • Frequent Contributor
  • **
  • Posts: 423
bit count instructions.  like 'how many bits in this byte are 1'
i HATE RISC processors by the way. i want processors with massive single cycle instruction sets.

When i was still mucking inside silicon i built hardware instruction set expanders for an ARM7 in an FPGA.
i even had vector-like capabilities

It is a question of transistor count I suppose.

"Given a million transistors, is it better to have them arranged into a single CISC or several smaller RISCs?"
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3644
  • Country: us
Transistor count is never the constraint, wiring and power dissipation are.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Transistor count is never the constraint, wiring and power dissipation are.

those are called "micro architectural constraints", right :D ?
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
guys, and what about addressing modes ?
e.g. should I include "scaled indexed addressing mode" ?
(currently it takes +1 extra cycle)
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11281
  • Country: us
    • Personal site
e.g. should I include "scaled indexed addressing mode" ?
Sure, why not?

The point is, the more you include, the more people will use. But if core is FPGA-only softcore, then it must be small in a first place.
Alex
 

Offline rs20

  • Super Contributor
  • ***
  • Posts: 2320
  • Country: au
bit count instructions.  like 'how many bits in this byte are 1'

If you have 256 nibbles free (which I know is a big if), this is a simple array lookup. Not sure if it deserves its own instruction?
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11281
  • Country: us
    • Personal site
Not sure if it deserves its own instruction?
Everything deserves its own instruction. Why not? Apart from being annoying for chip designers, there is really no downside. None of those small extensions significantly affect the price of the device.

Array lookups need memory access, and that's expensive. Actually more expensive than to count with a simple stupid loop.
Alex
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
FPGA-only soft core

yep, FPGA-only soft core
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11281
  • Country: us
    • Personal site
For soft cores look at Microblaze or NIOS-II. They have all you really need and noting extra. Plus they support extensions. I know it is much more fun to design cores, of course.
Alex
 

Offline retrolefty

  • Super Contributor
  • ***
  • Posts: 1648
  • Country: us
  • measurement changes behavior
I want all the instruction from a modern ARM architecture (let's say ARMv8). It has all you need and some more.

But good look implementing all of that, it is a nightmare.

hence we need a compromise  :D

 Such a compromise was available starting in the 70s with several of the computer companies, mini and mainframe types, that offered WCS, writable control store, that allowed end users to define and run their own machine instructions using the same microcode method that the standard supplied instructions used.

https://en.wikipedia.org/wiki/Microcode
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3644
  • Country: us
Such a compromise was available starting in the 70s with several of the computer companies, mini and mainframe types, that offered WCS, writable control store, that allowed end users to define and run their own machine instructions using the same microcode method that the standard supplied instructions used.
Because the compilers of the day were getting a little too good at actually using the available instructions!
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4206
  • Country: us
Quote
I have no idea about how to implement [autoincrement/etc] in RISC terms
Presumably, since an alu operation happens in a single clock cycle, and a memory access takes more than one clock, you just do the the increment/etc while you're waiting for the memory.  That's where all those weird modes of RISC operands come from, right: "We have an ALU, a barrel shifter, and ...; they should all be able to do something each clock cycle!"

The less standard thing that I wish were more common is probably bitfield instructions (combining barrel shift and AND.)

I wonder how the PDP11 or PDP10 instruction sets (both of which are nominally regiser/memory architectures) would work i you RISCified them and made it so only the load/store operations could access memory?

Quote
I want all the instruction from a modern ARM architecture
The 32-bit ARM instructions, I hope?  The Cortex (thumbX only) encodings have lost any elegance, and the CM0 is almost painful in is weird limitations...
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf