I haven't yet looked it in detail, but the "FENCE" class looks interesting!
Radical personalization is the present and future of many industries. Now it's available in CPUs to
I haven't yet looked it in detail, but the "FENCE" class looks interesting!
EIEIO. And related :-)
expensive? 60 USD are nothing for a board.
There is no great *technical* advantage over ARM or MIPS, but also no disadvantage. Compare code size, compare Dhrystone or Coremark or SPEC ... it's a photo finish in most cases. MIPS code is the biggest (and microMIPS doesn't help as much as Thumb or rvc), rv32i is comparable to ARM, rv32ic to Thumb2. In 64 bit, rv64ic is much smaller than anything else (ARM didn't see fit to duplicate Thumb in 64 bit!).
In case anyone is interested in instruction encodings...
I just went over to ARM to get a sense of the size of their instruction set. Somehow, I think they have moved beyond Reduced Instruction Set with the latest designs. There certainly are some 'interesting' instructions but I wonder which opcodes GCC actually uses.
QuoteRadical personalization is the present and future of many industries. Now it's available in CPUs too.
So if I understand correctly, RISC-V is more intended for high volume SoC type customers who want to make specialized cores? (i.e. the Western Digital use case).
I looked through the SiFive site and it seems the message is that you can get your own custom SoC made quicker.
I haven't yet looked it in detail, but the "FENCE" class looks interesting!
EIEIO. And related :-)
This will add more "fun" to every superscalar implementation of the RISC-V. EIEIO + isync + sync on our PowerPC460 is able to cause great emotions, like people hammering their heads on the desk and going to throw the target-board out of the window ... which is ... love ... in reverse order
another interesting point I see: like MIPS and PowerPC, even RISC-V uses LL/SC to emulate CAS, which is to say, LL/SC is used to write a tiny bit of code which loads a target memory address, compares it to a comparand, and then writes back a swap value to the target if the comparand and target values are equal.
It would be interesting ... how LL monitors an address (say, a semaphore), and how SC does its job.
A senior here said that X86/x64 is better because it implements DWCAS (sort of CAS, but more complex) instead of LL/SC ... dunno, I have ZERO experience with Intel x86
Thinking about the smallest FPGA incantation, does the RISC-V make sense as a general purpose drop-in core? Maybe there is a project where the CPU is just handling details (maybe console IO or file IO) but the majority of the project is some kind of hardware thing (even including another CPU) that just needs a little high level help - that is, the full hardware description is too ugly to contemplate and a programmable core would smooth things out.
It should be an interesting winter.
- strict separation of computation from data transfer (load/store)
- enough registers that you don't touch memory much. Arguments for most functions fit in registers, and the return address too (the otherwise RISC AVR8 violates this).
- no instruction can cause more than one cache or TLB miss, or two adjacent lines/pages if unaligned access is supported (and this case might be trapped and emulated)
- each instruction modifies at most one register.
- integer instructions read at most two registers. This is ultra-purist :-) A number of RISC ISAs break it in order to have e.g. a register plus (scaled) register addressing mode, or conditional select. But no more than three!
- no microcode or hardware sequencing. Each instruction executes in a small and fixed number of clock cycles (usually one). Load/Store multiple are the main offenders in both ARM and PowerPC. They help with code size, but it's interesting that ARM didn't put them in Aarch64 and is deprecating them in 32 bit as well, providing the much less offensive load/store pair.
What a huge number of instructions *does* do is make very small low end implementations impossible. And puts a big burden of work on every hardware and every emulator implementer.
There is no great *technical* advantage over ARM or MIPS, but also no disadvantage. Compare code size, compare Dhrystone or Coremark or SPEC ... it's a photo finish in most cases. MIPS code is the biggest (and microMIPS doesn't help as much as Thumb or rvc), rv32i is comparable to ARM, rv32ic to Thumb2. In 64 bit, rv64ic is much smaller than anything else (ARM didn't see fit to duplicate Thumb in 64 bit!).
Lack of flags increasing code size by 4 times and requiring 2 extra registers to detect various conditions sure seems like a disadvantage. That extra code and register pressure also has the effect of making the caches effectively smaller. Having to effectively execute an ALU operation twice or more cannot help power efficiency.
Technically only flags which represent changes in state like carry and overflow are required; for instance zero, negative, and parity can be computed at any time. What I would like to see is a design where flags requiring state are stored in a register dedicated to each destination register which avoids the hazard of having a single flags register like in x86 or requiring a flags register operand which would require extra instruction bits.
Some ISAs do this to track whether a register has been used in the current execution context so that the entire register set does not need to be saved on a context switch. The first use of a register is just another bit of state to save.
I've been tinkering with RISC-V in my spare time, and I have to say that the 32-bit integer instruction set is quite nice for hardware implementation:
- The source and destination registers are always encoded in the same place.
- The most significant bit of any constant is always in the same place (makes for easy sign extension)
- The privileged instructions (ones that need to be trapped for OS / Hypervisor) are all nicely contained
The only thing I find awkward is that the encoding of the offsets on the jump instructions - fine for H/W but painful to decode for a naive software emulation.
Quote- each instruction modifies at most one register.
That is pretty standard but how then do you handle integer multiplies and divides? Break them up into two instructions?
If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[ S ]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies.
00000000:00000FFF Debug address space
00001000:01FFFFFF On-chip Non volatile memory
02000000:1FFFFFFF I/O
20000000:7FFFFFFF Off-chip Non volatile memory
80000000:FFFFFFFF On-chip volatile memory
And after a reset execution starts at 0000:1000.
Is there a preferred or recommended memory map for a RISC-V environment?
The ISA spec doesn't have much to say, apart from that the ISA is set up be helpful for generating relocatable code. Is there a guide of "common/best practice" for where you put your memory mapped I/O, bootstrap ROMs, and so on that is helpful?
For my software emulator I was thinking of trying something like the FE310-G000:Quote00000000:00000FFF Debug address space
00001000:01FFFFFF On-chip Non volatile memory
02000000:1FFFFFFF I/O
20000000:7FFFFFFF Off-chip Non volatile memory
80000000:FFFFFFFF On-chip volatile memory
And after a reset execution starts at 0000:1000.
Does that sound sane?
/cs-v1/;
/{
model = \"SiFive,FE310G-0000-Z0\";
compatible = \"sifive,fe300\";
/include/ 0x20004;
};
0x01_0000_0000:0x0F_FFFF_FFFF Peripherals
0x10_0000_0000:0x1F_FFFF_FFFF System
0x20_0000_0000:0x3F_FFFF_FFFF RAM
You probably can re-code it on MIPS one-to-one, except for LUI (if not followed by XORI or ADDI), which would require an extra instruction - very simple hardware emulator
Why every instruction has "11" at the end? This way it only uses 1/4 of the code point space.
It would be an interesting experiment to do to implement this. And this is EXACTLY what RISC-V enables you to do for low cost in time and money. Modify your favourite FPGA implementation to have your new instructions, modify gcc or llvm to generate them, and run dhrystone/coremark/SPEC/your favourite benchmark suite without and without using the new instructions. Publish the results with execution time, energy use, area cost, and any effect on MHz. We all learn something!
QuoteSome ISAs do this to track whether a register has been used in the current execution context so that the entire register set does not need to be saved on a context switch. The first use of a register is just another bit of state to save.
I haven't seen a bit for every register, but it's common for FPUs or vector units to have a single bit for the whole unit, as many programs don't use FP or vectors at all.
Back in June, Intel disclosed a "Lazy FPU State Restore" bug in all Core-based processors. Microsoft and others fixed the bug by disabling the use of the FPU dirty bit and just saving and restoring everything on every context switch. The effect on performance was basically unmeasurable.
Again, worth trying, though context switches are very rare on normal systems.
It would be an interesting experiment to do to implement this. And this is EXACTLY what RISC-V enables you to do for low cost in time and money. Modify your favourite FPGA implementation to have your new instructions, modify gcc or llvm to generate them, and run dhrystone/coremark/SPEC/your favourite benchmark suite without and without using the new instructions. Publish the results with execution time, energy use, area cost, and any effect on MHz. We all learn something!
It would be too big of a change to RISC-V. It alters the basic ISA and architecture and then a new code generator would be required anyway. It goes against the design principles of RISC-V.
Intel has a lot of performance problems with their vector units and so much so that they had to issue a directive not to use them for things like memory copies.