EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: DiTBho on December 20, 2020, 02:26:50 pm

Title: RISC with two RA: why?
Post by: DiTBho on December 20, 2020, 02:26:50 pm
I spent the day of yesterday working with a  super-scalar processor written in C on a software simulator. A week ago a dude sent me an archive, but he is not the author (perhaps someone who worked at IBM?), the author him/herself is unknown, and nobody knows any inner details, that makes this tale ... a great mystery! Anyway, the ISA is somewhat similar to a PowerPC with four condition code registers and two return address registers separated from the main 32-entry register file  :wtf:


So there are examples of RISC-designs with two-like RA, and of-course, two RA do not get used at the same time, but rather it's seems sometimes handy to be able to have a depth of two for register-based routine returns without having to stack addresses. But that - two dedicated RA, namely called "ra0" and "ra1" - is really really too weird for me!

In a comment I can read that "the first level routine returns using ra0 then the next level routine returns using ra1", so having two registers allows an alternate return path to be specified in a function.

But why one register would not be sufficient? I do not know. Can not just the one register be reloaded with values?!?

Ummm, reading up on RISC-V it seems the second link register is used to enable an inner layer of "milli-code", namely "something" used in the case of function-prologue and function-epilogue.

Confused  :-//
Title: Re: RISC with two RA: why?
Post by: SiliconWizard on December 20, 2020, 06:08:04 pm
Not sure I completely get your point. I can't speak of all existing RISC ISAs, of course, but I can for RISC-V that I'm beginning to know pretty well.

RISC-V, like MIPS (I think) has an instruction to jump to a given address while storing the next PC (the address immediately following the jump instruction) in a register. So this is how "calls" are typically handled.

The RISC-V *ABI* reserves "ra" as the main register for this (which is x1), and defines one alternate register (t0/x5?). But this is just a convention to make the ABI consistent. You can actually use absolutely any other register for this if you don't care about following the ABI.

Obviously, and somehow related to the other thread, using more registers than just 1 for return addresses can avoid having to save it to the stack in some specific situations. The rest is very similar to the discussion in the other thread.
Title: Re: RISC with two RA: why?
Post by: DiTBho on December 20, 2020, 08:07:15 pm
Quote
the ISA is somewhat similar to a PowerPC with four condition code registers and two return address registers separated from the main 32-entry register file 

This is the part that is unique, unexpected, and that makes me perplexed.
Title: Re: RISC with two RA: why?
Post by: DiTBho on December 20, 2020, 08:21:05 pm
First doubt, and quick question: how does a pair of return address registers get used?

using more registers than just 1 for return addresses can avoid having to save it to the stack in some specific situations

RISC-V related, but that's a good confirm. Just why someone and IBM need to spend precious silicon to implement two RA separated from the register file (which not only costs two extra latches but also costs extra instructions to manage it) rather than a simpler solution like in RISC-V?

That's still an open question. Maybe, there is no answer, but I can't see any serious advantage at the moment, except, perhaps, my speculation about an hypothetical super specific code in a super specific ISR code (RT-OS or something?) that needs to nest-call something two times and always as fast as a leaf-call.
Title: Re: RISC with two RA: why?
Post by: DiTBho on December 20, 2020, 08:27:11 pm
Code: [Select]
critical_code
   call func0
        call func1
             j (ra1)         The second level routine returns using ra1
        j (ra0)              The first level routine returns using ra0
Title: Re: RISC with two RA: why?
Post by: SiliconWizard on December 20, 2020, 09:32:24 pm
Just why someone and IBM need to spend precious silicon to implement two RA separated from the register file (which not only costs two extra latches but also costs extra instructions to manage it) rather than a simpler solution like in RISC-V?

OK, I didn't get this exact question from the way your phrased it in your OP (or maybe just the way I understood). That makes it clearer, at least to me!

I don't really know either. What I can just say - probably also related to the question of security we talked about in the other thread - is that using dedicated register(s) for the return address, just like using a dedicated return stack (only here a very small one), is more secure. Now maybe you can consider it's only marginally more secure, but at least it certainly prevents any other instruction than the ones dedicated to calls and returns from accessing those registers. Whether you find this a real benefit, or whether it can be proven, on real-life code, that it makes a difference, is another story.

Just my 2 cents. The designers' rationale may have been different.

Oh, and regarding "precious silicon", only on extremely small, or ultra-low power chips (or on very old processes, that had "large" features and comparatively cost a lot) would that really matter. Two extra registers on any moderately complex CPU? That's a no-brainer IMHO.

As I mentioned in the other thread, the 8-bit PIC MCUs, which were pretty small and low-cost, and are now quite old, they even dedicated a whole *8* entry return stack. Not just 2. Not that much of an issue, and those processors were certainly ultra-simple compared to a powerful, super-scalar processor.
Title: Re: RISC with two RA: why?
Post by: DiTBho on December 20, 2020, 09:52:04 pm
Oh, and regarding "precious silicon", only on extremely small

Dunno, but what I learned from the IBM Red and Green books is  that in an Superscalar-design like PowerPC, if you add a register this may be duplicated for the number of active pipelines, and for sure it adds more circuits for the management, so I have the feeling that it's not a thing you can put in the basket for free unless it's really useful for something.
Title: Re: RISC with two RA: why?
Post by: David Hess on December 20, 2020, 10:10:16 pm
Could there be some advantage for the microarchitecture with separate return address registers not relying on ports to the register file serving the ALUs?  It seems like this could be the case if jumps are handled by a different execution unit.

Last year I was considering something similar for a register file which holds the condition codes.
Title: Re: RISC with two RA: why?
Post by: SiliconWizard on December 20, 2020, 10:46:03 pm
Oh, and regarding "precious silicon", only on extremely small

Dunno, but what I learned from the IBM Red and Green books is  that in an Superscalar-design like PowerPC, if you add a register this may be duplicated for the number of active pipelines, and for sure it adds more circuits for the management, so I have the feeling that it's not a thing you can put in the basket for free unless it's really useful for something.

Always beware of dogmatic rules. The answer almost invariably depends on the actual design in question. If it costs a lot in a particular architecture, then it costs a lot. Obviously. If not...

And here, dedicated return address registers, only accessible to a limited number of instructions and in a limited way, will probably NOT require the same amount of logic to handle than general-purpose registers anyway.
Title: Re: RISC with two RA: why?
Post by: brucehoult on December 20, 2020, 11:04:50 pm
The two situations are a bit different.

On PowerPC the PC, LR, and CTR are off in a little execution unit of their own which can perhaps potentially run off ahead of the rest of the machine on counted loops.  I *think* (but it's been a while) the ability to jump to the address in CTR is used for calling functions through a pointer and for jump tables in dense switch statements. It's quick and easy to move from a general purpose register to CTR (as this is needed for counted loops anyway). I don't recall whether there are obstacles in moving from a GPR to LR.

On RISC-V in the base instruction set (32 bit opcodes) as far as the hardware goes you can use *any* register to hold a return address. The C extension only works with ra (x1) as the return address, but that only affect code size reduction (or not), not functionality.

In a really tiny embedded use, you could potentially use a RISC-V CPU with no RAM at all, if you limited the function call depth and didn't use recursion. You'd have to manually (or in a special linker) partition the registers used by functions called at different depths, including where they receive their return address.

In a more normal system, x5 (aka t0) is used as a return address by certain special compiler or runtime library functions that might be called before a function has had a chance to save ra or after ra has been restored. This could include certain kinds of thunks, shims, dynamic linking etc, all of which are free to clobber any or all t0-t6 registers as neither the calling function nor the called function are allowed to care about their values being preserved.

One common use of x5 as a return register is if you give gcc/clang the -msave-restore option. In this case the first instruction of each function is a call to a compiler library routine that makes a stack frame and saves ra and a certain number of s0-s11 before returning. The last instruction of the function then jumps to a compiler library routine that reloads ra and the same number of s0-s11, removes the stack frame, and returns to the just-restored ra. This is a substitute for ARM's push and pop instructions that adds a small time overhead (three jumps) to each function and 96 bytes of code for the library routines, in exchange for a simpler CPU implementation.

As an optimization, some RISC-V CPUs have a small (maybe 2 to 8 levels) integrated return address prediction stack that allows prefetching instructions after a function return even before the ra has been reloaded from the stack. The return address prediction stack is referred to. pushed, or popped when ra or t0 are used in particular patterns (listed in the ISA manual) by the JAL and JALR instructions.
Title: Re: RISC with two RA: why?
Post by: DiTBho on December 22, 2020, 10:37:01 am
thank you @brucehoult  :D

Yesterday I investigated deeply, and the mysterious software processor is a actually a true PowerPC just with a reduced instruction set! It's focused on instructions the Clang/LLVM compiler might use, and there are several notes around the project telling that "eventually" the author'd have like to add all the instructions  but he didn't due to "code stability and bugs" which are  always an issue with this type of projects ...

Anyway, it's very interesting that even an sub-set of the ISA there are instructions that take advantage from the two RA.