Author Topic: your own ISA, what do you think about having 64 registers ?  (Read 7999 times)

0 Members and 1 Guest are viewing this topic.

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
your own ISA, what do you think about having 64 registers ?
« on: October 14, 2016, 10:43:20 pm »
currently I have 32 registers, two are special (e.g. r0,ra),
16 registers are used for the context inside a function,
while the rest is used to pass parameters on function call
(if you don't want to use the stack because it costs more
load/store operations --> the external ram is slower)

I can easily increase their number up to 64 registers

is it good, bad, useful, useless ?
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: your own ISA, what do you think about having 64 registers ?
« Reply #1 on: October 14, 2016, 10:52:03 pm »
(ops, I had forgotten to say, the size of each cpu register is 32bit)
 

Offline Sal Ammoniac

  • Super Contributor
  • ***
  • Posts: 1672
  • Country: us
Re: your own ISA, what do you think about having 64 registers ?
« Reply #2 on: October 14, 2016, 11:19:56 pm »
Too many. Task context switch times increase when you have to save/restore that many registers.
Complexity is the number-one enemy of high-quality code.
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3642
  • Country: us
Re: your own ISA, what do you think about having 64 registers ?
« Reply #3 on: October 14, 2016, 11:20:47 pm »
Having a lot of registers increases the cost of context switches, unless you have physical shadow register files for all of them. One thing that has worked well in the past is to use a base pointer into the register file, like the AMD 29000.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: your own ISA, what do you think about having 64 registers ?
« Reply #4 on: October 14, 2016, 11:44:22 pm »
What they said.  Too much context.  I've noticed this particularly on Atmel AVRs.  ~37 bytes of context, 2k of memory, 16MB/s memory access - not a great combination.
 

Offline C

  • Super Contributor
  • ***
  • Posts: 1346
  • Country: us
Re: your own ISA, what do you think about having 64 registers ?
« Reply #5 on: October 15, 2016, 02:46:25 am »

My first thought is Slow stack memory, fix stack memory.
Does you external memory save you time if you read/write a block of locations?
Done right a cache for stack would make most cpu access very fast while doing block read/writes to external memory.
Stack uses block read/write
Instruction uses block read
data uses with normal memory read/write.

 

Offline ale500

  • Frequent Contributor
  • **
  • Posts: 415
Re: your own ISA, what do you think about having 64 registers ?
« Reply #6 on: October 15, 2016, 05:54:54 am »
There are not that many examples of working architectures with 64 registers. One of the newest is the epiphany processor found in the Parallella board. It has 2 sizes of opcodes, 16 and 32 bits. Accessing the upper 56 registers requires the use of 32 bit opcodes, the lower 8 registers can be manipulated with 16 bit opcodes. It supports C, through GCC. Each processor has a 32 kBytes pool of local memory... that much.
The 4-bit Saturn processor used in the HP48 among others, has 8 64-bit registers. Memory access is slow, and math routines are realized using only registers, and reading constants from memory.

One architecture I like is the SuperH: 16 bit opcodes and 16 GPRs. Small constants can be used directly, larger constants need to be read from memory.

Different uses have different needs, I'd like to see what uses do you have for 64 registers.
 

Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: your own ISA, what do you think about having 64 registers ?
« Reply #7 on: October 15, 2016, 07:35:54 am »
Not only is it expensive for context switching, it also affects your ISA encoding. It takes 6 bits to encode 64 registers, and that's just too much overhead.
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: your own ISA, what do you think about having 64 registers ?
« Reply #8 on: October 15, 2016, 08:59:00 am »
Do what real ISA designers do, and analyze real code to see how registers are actually used. There's an unprecedented amount of research material available. Port a compiler to your proposed architecture, and see how it is able to use the registers. Does your average function take fourteen parameters? If not, they're wasted, unless you're implementing some kind of register window scheme, in which case are there enough? Non-leaf functions will need to save the contents of those registers, so they will be heavily disfavoured by compilers. How it affect your code density? Latencies?

AFAIK, current research shows that around 16-32 registers is optimal, depending on whether you're using two- or three-operand instructions.

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: your own ISA, what do you think about having 64 registers ?
« Reply #9 on: October 15, 2016, 09:17:20 am »
Too many. Task context switch times increase
when you have to save/restore that many registers.

the context switch is performed in hardware
it costs 1 instruction, and 4 clock ticks
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: your own ISA, what do you think about having 64 registers ?
« Reply #10 on: October 15, 2016, 09:19:50 am »
My first thought is Slow stack memory, fix stack memory.

the external memory is an asynchronous static ram
the RAM controller needs to take 100 msec of time
in order to propagate the address
then it needs other time in order to address the cell,
perform the IO operation, and close the bus cycle

it means  a delay of 100msec plus some clock ticks to do so
currently every load/store on external memory needs more
than 10 clock ticks on my spartan6e @ 33Mhz,
whereas BRAM takes 1 clock tick: it's at least 10x times slower

conclusion: the less you use the stack, the faster you go

Unfortunately I can't use the BRAM, it's already used

Done right a cache for stack would make most cpu access very fast
while doing block read/writes to external memory

ah, that fixage, you mean caching … well yes,
I can add a "Write-through" caches D and I
it's not tested at the moment, but I can do it
« Last Edit: October 15, 2016, 10:15:15 am by legacy »
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: your own ISA, what do you think about having 64 registers ?
« Reply #11 on: October 15, 2016, 09:25:43 am »
I said "Write through" as the preferred method
because it allows for fast retrieval on demand
while the same data in main memory ensures
that nothing will get lost if a crash, or other
disruption occurs: data loss cannot be tolerated 
 

Online Kleinstein

  • Super Contributor
  • ***
  • Posts: 14202
  • Country: de
Re: your own ISA, what do you think about having 64 registers ?
« Reply #12 on: October 15, 2016, 09:38:41 am »
Usually one does not need that many registers in typical algorithms. So the only use would be parameter passing.  Even than so many registers with a static address are not needed - you would need to move the data to free the parameter passing registers before you can do an other function call. So if at all so many registers might be more useful if they are used similar to the sparc CPU:  a few fixed direct accessible register file and more that are shifted out on function calls. This is kind of in between many static registers and a hardware implemented stack.

A faster (buffered) stack might be the more useful way. Stack buffering could be different (simpler) than a classic memory cache, as there is only a defined way of access, no random access.

It also depends on the use of the CPU. Context switching only is an issue with a multitasking OS.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: your own ISA, what do you think about having 64 registers ?
« Reply #13 on: October 15, 2016, 09:59:19 am »
Port a compiler to your proposed architecture,
and see how it is able to use the registers.

I have studied something similar on register transfer language
it's internally used by C compiler before taking the choice
of the machine code generation, in this phase you can have
an infinity amount of registers, which are then reduced
according to the target capability, in this phase the optimizer
takes place, the smarter it goes the less it uses the stack


I am designing my own compiler, it's not so smart  :D

Does your average function take fourteen parameters?

looking at my own C programming style, I never use more than 6 parameters
if I need more, I usually pass a pointer to a struct

If not, they're wasted, unless you're implementing some kind of register window scheme,
in which case are there enough?
Non-leaf functions will need to save the contents of those registers,
so they will be heavily disfavoured by compilers.
How it affect your code density? Latencies?

in the current ISA I have 32 registers, and hardware context switch

14 registers are not saved, they are used to pass parameters on function call
16 registers are saved in hardware, it costs 4 clock ticks, they are usable for
local variables, local loop-indexes, local data, etc

exceptions and interrupts have their private space
when interrupted, the hardware switch the whole bank of registers
(both parameters pool and context pool), so you can call a function
from the ISR and it won't waste parameters once you go back
from the exception

the mechanism is very comfortable for the programer's point of view
at the price of making the pipelining (HDL implementation) more complex


current research shows that around 16-32 registers is optimal,
depending on whether you're using two- or three-operand instructions.

currently both the ALU and the ISA are designed with 3 sources
instructions can take three registers, or 1 register and 2 immediate data
the ALU always outputs 1 register
(plus an hidden register, e.g. used by multiply and division)

I can have up to 15 Cop-s, Cop0 is reserved for exceptions and interrupts
Cop1 is used for the DSP Engine, Cop2 is used by Cordic

Cop1 has 3 inputs and 2 outputs
Cop2 has 3 inputs and 3 outputs


How it affect your code density? Latencies?

switching from 32 registers to 64 registers
doesn't increase the code density,
without increasing the latencies

the code density is not itself optimal
the instruction's length usually takes
from 4 bytes to 12 bytes

applications are usually 2 times bloated  :palm:
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: your own ISA, what do you think about having 64 registers ?
« Reply #14 on: October 15, 2016, 10:16:42 am »
math routines are realized using only registers, and reading constants from memory

currently I am solving the expression evaluation through an RPN approach
which tries to map on registers instead of doing pushs and pops on memory,
even though the reverse polish notation is itself a stack approach
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: your own ISA, what do you think about having 64 registers ?
« Reply #15 on: October 15, 2016, 10:59:38 am »
14 registers are not saved, they are used to pass parameters on function call
Functions often call other functions.

EDIT: Register banking helps with interrupt latency (but you need several banks for nested interrupts), but not OS task switching.
« Last Edit: October 15, 2016, 11:09:31 am by andersm »
 

Offline Kalvin

  • Super Contributor
  • ***
  • Posts: 2145
  • Country: fi
  • Embedded SW/HW.
Re: your own ISA, what do you think about having 64 registers ?
« Reply #16 on: October 15, 2016, 11:28:45 am »
The instruction set needs to be as orthogonal as possible so that the compiler construction is straightforward. As the register count increases the instruction word needs to have more bits to access the registers. If you implement the processor in FPGA as softcore, you could possibly create an architecture which will have a sliding register window using the FPGA's internal fast RAM memory blocks. With the sliding register window you do not have to save and restore the registers during interrupts as you just slide the register window for a new context. You should have a few dedicated registers that are hardcoded: R0 will always be read as 0, register R1 could be dedicated as the base address of the sliding register window, R2 could be used as a generic stack pointer and R3 could be used as a generic register (for example Ada is using a second stack for the string operations). Then the other N register will be accessed using the sliding register window. During the FPGA compilation time you will just determine how deep the register window should be which will determine how many nested function calls and interrupts you can have.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: your own ISA, what do you think about having 64 registers ?
« Reply #17 on: October 15, 2016, 11:48:44 am »
Functions often call other functions

there is no problem with that since before calling it's all matter of
copying the (parameter) register's content into a (context) register
which consumes less clock ticks than putting things on the stack

the alternative consists on special class of instructions
able to load/save things from/to the local context to/from the next one

(i don't call it sliding window register, the concept is the almost same)

in this case parameters are all a private matter
and all the registers are saved by the hardware

therefore, every function owns its context with parameters

the problem with this approach: it adds more complexity
and it costs a whole class of special instructions

while the limit of this approach is the size of the BRAM
currently I can serve 1024 contexts, each of 16 registers
extending to 32 registers to be saved
the hardware will be able to serve the half

I have also implemented these features in a branch of the ISA
(I say "branch", because it's a git branch of vhdl files)
the multi-cycle version works as expected, but I have some troubles with
the pipelined version  :palm:


Register banking helps with interrupt latency
(but you need several banks for nested interrupts)

currently I have no support for nested interrupts
since the hardware, during those events, can serve only one bank per time

it means, if an interrupt happens while the CPU is already serving an interrupt
the new interrupt will be added to a queue, marked as "pending"
and it will be served only after the conclusion of the RTI instruction
(return from interrupt)

edit:
the queue uses a First In -> First Out principle of action
interrupts are not sorted or served by their priority, the Cop0 is not smart at all
and from the programming point of view: interrupts are disabled during an ISR
« Last Edit: October 15, 2016, 12:01:02 pm by legacy »
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: your own ISA, what do you think about having 64 registers ?
« Reply #18 on: October 15, 2016, 11:51:38 am »
@Kalvin
dude, something similar is already designed and implemented
I have to decide details: how many registers and which branch
I'd better go on
 

Offline Kalvin

  • Super Contributor
  • ***
  • Posts: 2145
  • Country: fi
  • Embedded SW/HW.
Re: your own ISA, what do you think about having 64 registers ?
« Reply #19 on: October 15, 2016, 12:01:09 pm »
@Kalvin
dude, something similar is already designed and implemented
I have to decide details: how many registers and which branch
I'd better go on

Yes, I know it has already implemented. I just addressed the problem of the context switching and how the register count may affect the instruction width. There are studies that have determined the optimal register count for different architectures (just google a bit). Of course, you can ignore those and do what ever you want to. The single instruction architecture (MOVE) could be very simple and flexible, too.
 

Offline Kalvin

  • Super Contributor
  • ***
  • Posts: 2145
  • Country: fi
  • Embedded SW/HW.
Re: your own ISA, what do you think about having 64 registers ?
« Reply #20 on: October 15, 2016, 12:12:40 pm »
I you can fit the complete application code inside the FPGA's memory, you don't have to worry about instruction width and you can use a VLIW ISA, thus you can have as many registers as you want. Pretty simple.
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: your own ISA, what do you think about having 64 registers ?
« Reply #21 on: October 15, 2016, 12:31:51 pm »
Maybe you should rephrase the question, since you're clearly not just designing an ISA, but instead have some fixed hardware and are designing around that. Which is fine, as long as you recognize that many of your design decisions don't make sense outside of that context.

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8275
Re: your own ISA, what do you think about having 64 registers ?
« Reply #22 on: October 15, 2016, 12:41:49 pm »
I agree with everyone else here who says that's far too many... in fact you're probably going to have trouble getting a compiler to use them all effectively. x86 has only 8 (more like 7 in practice) and yet I often see compilers using only 3-4 of them at once.

That is provided we are talking about addressable registers, and not physical registers if you're doing out-of-order execution.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: your own ISA, what do you think about having 64 registers ?
« Reply #23 on: October 15, 2016, 01:54:52 pm »
That is provided we are talking about addressable registers,
and not physical registers if you're doing out-of-order execution.

good point  :-+

mmm, at the end I believe I will go back on the 32-registers branch
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: your own ISA, what do you think about having 64 registers ?
« Reply #24 on: October 16, 2016, 02:22:36 pm »
From what I have seen I think having 64 orthogonal GPRs can be a bit too much. However if some of the 64 32-bit registers serve different purposes than GPRs I would consider it a better design. For example you can have 28 orthogonal GPRs, have your program counter, stack pointer, program state word and return address register all accessible just like normal registers, and 16 128-bit float-point and SIMD registers each accessible as 4 of 32-bit registers using a different instruction (or a subset of memory access instructions.)

This would be somewhere in the middle of your 32-register branch and your 64-register idea. For a processor core that is not optimized for DSP/SIMD/Multimedia purposes you can leave the FPU/SIMD registers unimplemented or limited, while if the core is DSP-oriented you can even allowing the 28 main integer registers being used as 7 FPU and SIMD registers.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: your own ISA, what do you think about having 64 registers ?
« Reply #25 on: October 16, 2016, 02:30:43 pm »
to storing several X,Y,Z coordinates, R,G,B,A colors, etc. 3D transform matrix coefficients, pattern / texture bitmap vectors, etc.)  and then have quite many (thousands maybe) of parallel instances of these thread execution units that all, for the most part, execute from registers with occasional copying to/from RAM for the finished calculation.

In other algorithms like say 8x8 2D-DCT discrete cosine transform as similarly used almost everywhere in JPEG/video encoding you need to operate on an [8][8] = 64 pixel matrix of variable input pixels

exactly the point  :D
 

Offline Kalvin

  • Super Contributor
  • ***
  • Posts: 2145
  • Country: fi
  • Embedded SW/HW.
Re: your own ISA, what do you think about having 64 registers ?
« Reply #26 on: October 16, 2016, 03:27:18 pm »
It is very difficult for a human to utilize large number of registers efficiently, and optimize the code locally and globally so that it will maximize the register usage as optimal as possible. Not to mention what kind of a nightmare it will be to maintain. In a similar way it is very difficult for a human to produce manually efficient code for the architectures having multiple parallel execution units in order to keep the pipeline and the parallel execution units as full as possible. The TI C6000 is one example of this. Therefore, I would say that the optimal register count would be around 16 registers if you produce manual asm code. An optimizing high-level language compiler could possibly benefit from a large register array, but even then the gains may be quite marginal. Of course one can point out some specific applications that can really benefit from large register array, but in general the benefits of a large register array will not be that great.
« Last Edit: October 16, 2016, 03:31:25 pm by Kalvin »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf