Anyone want to show off their hand coding?
sub eax,eax
loop: mov ecx,[esi+eax]
adc ecx,[ebx+eax]
mov [edi+eax],ecx
lea eax,[eax+4]
loop loop
sub eax,eax
loop: mov rcx,[rsi+rax]
adc rcx,[rbx+rax]
mov [rdi+rax],rcx
lea rax,[rax+8]
loop loop
Operating systems and compiler runtime libraries are always going to have a little bit of assembler in them -- at least as long as there exist CSRs that are not memory-mapped.That's really a sign of a bad design in 2018.
Anyone want to show off their hand coding?
Why not. dsPIC33:Code: [Select]dec w0,w0
add #0,w0 ; clear carry
do w0,loop
mov [w1++],w4
loop: addc w4,[w2++],[w3++]
5 instructions, 15 bytes, n+3 instruction cycles (where n is the number of bytes)
Anyone want to show off their hand coding?
Of course, Intel is not slauch neither.
32-bit:Code: [Select]sub eax,eax
loop: mov ecx,[esi+eax]
adc ecx,[ebx+eax]
mov [edi+eax],ecx
lea eax,[eax+4]
loop loop
6 instructions, 16 bytes, probably below 1 cycle per byte.
64-bit:Code: [Select]sub eax,eax
loop: mov rcx,[rsi+rax]
adc rcx,[rbx+rax]
mov [rdi+rax],rcx
lea rax,[rax+8]
loop loop
This requires 4 REX bytes, so it's 20 bytes total, but that's where the 64-bitness helps. It'll probably run at about 0.4 bytes per cycle.
MIPS doesn't have any Control and Status Register, and it's fine this way
Multiword math.
MIPS doesn't have any Control and Status Register, and it's fine this way
Sure it does. They live in Coprocessor #0 and are accessed using special MTC0 and MFC0 instructions.
MIPS doesn't have any Control and Status Register, and it's fine this way
Sure it does. They live in Coprocessor #0 and are accessed using special MTC0 and MFC0 instructions.
exactly: the cop0 is not CPU, it's a Cop, thus it's "external" to the ISA
@ataradov appears to be unhappy that someone -- the operating system or at least runtime library writer -- should have to write a few lines of machine-dependent assembly language to set this stuff up.
PA-RISC
"Coprocessor 0 (also known as the CP0 or system control coprocessor) is a required coprocessor part of the MIPS32 and MIPS64 ISA which provides the facilities needed for an operating system."
There are no other good reasons to regret old RISC machines. Except, the manual (and I am saying THE manual) of 88K. (eight-eight key, by Motorola) which is super marvelous!
I'm afraid I don't understand how those work.
sub eax,eax
loop: mov edx,[esi+eax]
adc edx,[ebx+eax]
mov [edi+eax],edx
lea eax,[eax+4]
loop loop
sub eax,eax
loop: mov rdx,[rsi+rax]
adc rdx,[rbx+rax]
mov [rdi+rax],rdx
lea rax,[rax+8]
loop loop
Also, where is the "ret", even if nothing else is needed?
clc
loop: lodsd
adc eax,[ebx]
lea ebx,[ebx+4]
stosd
loop loop
Why not. dsPIC33:
Very nice.
How does gcc do on it?
I'm actually very disappointed that manufacturers of machines with condition codes don't seem to have added recognition of the C idiom for the carry flag and generated "add with carry" from it. gcc on every machine does recognise idioms for things such as rotate and generate rotate instructions even though C doesn't have an operator for it. Or maybe I just didn't find the correct idiom? Can anyone assist with that?
LOL - "risc-v-will-stop-hackers-dead-from-getting-into-your-computer" - said someone in this article on hackaday
You did in two days?
struct opcode_entry {
char *spec;
int (*func)(void);
uint32_t value;
uint32_t mask;
} opcodes[] = {
{"-------------------------0010111", op_auipc},
{"-------------------------0110111", op_lui},
{"-------------------------1101111", op_jal},
{"-----------------000-----1100111", op_jalr},
{"-----------------000-----1100011", op_beq},
{"-----------------001-----1100011", op_bne},
{"-----------------100-----1100011", op_blt},
{"-----------------101-----1100011", op_bge},
{"-----------------110-----1100011", op_bltu},
{"-----------------111-----1100011", op_bgeu},
{"-----------------000-----0000011", op_lb},
{"-----------------001-----0000011", op_lh},
{"-----------------010-----0000011", op_lw},
{"-----------------100-----0000011", op_lbu},
{"-----------------101-----0000011", op_lhu},
{"-----------------000-----0100011", op_sb},
{"-----------------001-----0100011", op_sh},
{"-----------------010-----0100011", op_sw},
{"-----------------000-----0010011", op_addi},
{"-----------------010-----0010011", op_slti},
{"-----------------011-----0010011", op_sltiu},
{"-----------------100-----0010011", op_xori},
{"-----------------110-----0010011", op_ori},
{"-----------------111-----0010011", op_andi},
{"0000000----------001-----0010011", op_slli},
{"0000000----------101-----0010011", op_srli},
{"0100000----------101-----0010011", op_srai},
{"0000000----------000-----0110011", op_add},
{"0100000----------000-----0110011", op_sub},
{"0000000----------001-----0110011", op_sll},
{"0000000----------010-----0110011", op_slt},
{"0000000----------011-----0110011", op_sltu},
{"0000000----------100-----0110011", op_xor},
{"0000000----------101-----0110011", op_srl},
{"0100000----------101-----0110011", op_sra},
{"0000000----------110-----0110011", op_or},
{"0000000----------111-----0110011", op_and},
{"0000--------00000000000000001111", op_fence},
{"00000000000000000001000000001111", op_fence_i},
{"00000000000000000000000001110011", op_ecall},
{"00000000000100000000000001110011", op_ebreak},
{"-----------------001-----1110011", op_csrrw},
{"-----------------010-----1110011", op_csrrs},
{"-----------------011-----1110011", op_csrrc},
{"-----------------101-----1110011", op_csrrwi},
{"-----------------110-----1110011", op_csrrsi},
{"-----------------111-----1110011", op_csrrci},
{"--------------------------------", op_unknown} // Catches all the others
};
/****************************************************************************/
static void decode(uint32_t instr) {
int32_t broffset_12_12, broffset_11_11, broffset_10_05, broffset_04_01;
int32_t jmpoffset_20_20, jmpoffset_19_12, jmpoffset_11_11, jmpoffset_10_01;
rs1 = (instr >> 15) & 0x1f ;
rs2 = (instr >> 20) & 0x1F;
rd = (instr >> 7) & 0x1f;
csrid = (instr >> 20);
uimm = (instr >> 15) & 0x1f;
shamt = (instr >> 20) & 0x1f;
upper20 = instr & 0xFFFFF000;
imm12 = ((int32_t)instr) >> 20;
jmpoffset_20_20 = (int32_t)(instr & 0x80000000)>>11;
jmpoffset_19_12 = (instr & 0x000FF000);
jmpoffset_11_11 = (instr & 0x00100000) >> 9;
jmpoffset_10_01 = (instr & 0x7FE00000) >> 20;
jmpoffset = jmpoffset_20_20 | jmpoffset_19_12 | jmpoffset_11_11 | jmpoffset_10_01;
broffset_12_12 = (int)(instr & 0x80000000) >> 19;
broffset_11_11 = (instr & 0x00000080) << 4;
broffset_10_05 = (instr & 0x7E000000) >> 20;
broffset_04_01 = (instr & 0x00000F00) >> 7;
broffset = broffset_12_12 | broffset_11_11 | broffset_10_05 | broffset_04_01;
imm12wr = instr; /* Note - becomes signed */
imm12wr >>= 20;
imm12wr &= 0xFFFE0;
imm12wr |= (instr >> 7) & 0x1f;
current_instr = instr;
}
/****************************************************************************/
static int op_beq(void) { trace("BEQ\tr%i, r%i, %i", rs1, rs2, broffset);
if(regs[rs1] == regs[rs2]) {
pc += broffset;
} else {
pc += 4;
}
return 1;
}
/****************************************************************************/
static int do_op(void) {
uint32_t instr;
int i;
if((pc & 3) != 0) {
display_log("Attempt to execute unaligned code");
return 0;
}
/* Fetch */
if(!memorymap_read(pc,4, &instr)) {
display_log("Unable to fetch instruction");
return 0;
}
/* Decode */
decode(instr);
/* Execute */
for(i = 0; i < sizeof(opcodes)/sizeof(struct opcode_entry); i++) {
if((instr & opcodes[i].mask) == opcodes[i].value) {
return opcodes[i].func();
}
}
return 0;
}
My little software RISC-V emulator seems to be alive! Has churned through 3,047 instructions of a HiFive 'blink' binary. Maybe a couple of evenings play to get it this far - you couldn't do that with x86... the actual RISC-V code is < 800 lines.
Instruction execution seems easy to code, the pipeline will be a lot more complex (I think...).
Instruction execution seems easy to code, the pipeline will be a lot more complex (I think...).For the level I am aiming at I don't think it will be too complex. All that is needed is a way to indicate if the instructions in the pipeline are no longer valid because the program counter was updated rather than incremented.
Unaligned memory accesses (which will need two cycles to execute) will a bit of complexity though, as it will stall the pipeline rather than requiring that it gets flushed.
Humm....
What I really need is a reference book for the RISC-V that covers all the hardware details. Not just at 10,000 feet up but right down in the dirt. Something I can convert from text to HDL or, better yet, maybe the HDL is given.
Are there any such references?
Unaligned memory accesses (which will need two cycles to execute) will a bit of complexity though, as it will stall the pipeline rather than requiring that it gets flushed.