Port a compiler to your proposed architecture,
and see how it is able to use the registers.
I have studied something similar on register transfer language
it's internally used by C compiler before taking the choice
of the machine code generation, in this phase you can have
an infinity amount of registers, which are then reduced
according to the target capability, in this phase the optimizer
takes place, the smarter it goes the less it uses the stack
I am designing my own compiler, it's not so smart
Does your average function take fourteen parameters?
looking at my own C programming style, I never use more than 6 parameters
if I need more, I usually pass a pointer to a struct
If not, they're wasted, unless you're implementing some kind of register window scheme,
in which case are there enough?
Non-leaf functions will need to save the contents of those registers,
so they will be heavily disfavoured by compilers.
How it affect your code density? Latencies?
in the current ISA I have 32 registers, and hardware context switch
14 registers are not saved, they are used to pass parameters on function call
16 registers are saved in hardware, it costs 4 clock ticks, they are usable for
local variables, local loop-indexes, local data, etc
exceptions and interrupts have their private space
when interrupted, the hardware switch the whole bank of registers
(both parameters pool and context pool), so you can call a function
from the ISR and it won't waste parameters once you go back
from the exception
the mechanism is very comfortable for the programer's point of view
at the price of making the pipelining (HDL implementation) more complex
current research shows that around 16-32 registers is optimal,
depending on whether you're using two- or three-operand instructions.
currently both the ALU and the ISA are designed with 3 sources
instructions can take three registers, or 1 register and 2 immediate data
the ALU always outputs 1 register
(plus an hidden register, e.g. used by multiply and division)
I can have up to 15 Cop-s, Cop0 is reserved for exceptions and interrupts
Cop1 is used for the DSP Engine, Cop2 is used by Cordic
Cop1 has 3 inputs and 2 outputs
Cop2 has 3 inputs and 3 outputs
How it affect your code density? Latencies?
switching from 32 registers to 64 registers
doesn't increase the code density,
without increasing the latencies
the code density is not itself optimal
the instruction's length usually takes
from 4 bytes to 12 bytes
applications are usually 2 times bloated