RISC-V assembly language programming tutorial on YouTube

#125 Reply
Posted by brucehoult on 15 Dec, 2018 07:53
Quote from: westfw on 15 Dec, 2018 07:27
Quote
The MSP430 code gcc produced for your example is depressingly bad.
Here's what I get for a hand-written MSP430 version.

And that's what I'd expect, looking at the instruction set. The mystery is why gcc so completely fails to do that, when it can for other ISAs.

It's the gcc I get by doing apt get on Ubtuntu 18.04. It's a little old, from 2012:

msp430-gcc (GCC) 4.6.3 20120301 (mspgcc LTS 20120406 unpatched)

But, still ... that vintage gcc could do this stuff on other ISAs. SH4, for example. Do people use gcc for msp430, or something else?

Looking at some of those, I can certainly understand why people still like to use assembly language quite often. It's hard to understand why they'd put up with compiler results like that at all.

#126 Reply
Posted by hamster_nz on 15 Dec, 2018 09:16
If anybody is interested, I've put my RISC-V toy up on Github - https://github.com/hamsternz/emulate-risc-v - I've even added a little colour.

Does anybody know where I can find the encoding for the RV32M extensions? I've got to the point where the binary I am using uses DIVU...

I can find this, but it is a bit obscure for me!

Code: [Select]
mul rd rs1 rs2 31..25=1 14..12=0 6..2=0x0C 1..0=3 mulh rd rs1 rs2 31..25=1 14..12=1 6..2=0x0C 1..0=3 mulhsu rd rs1 rs2 31..25=1 14..12=2 6..2=0x0C 1..0=3 mulhu rd rs1 rs2 31..25=1 14..12=3 6..2=0x0C 1..0=3 div rd rs1 rs2 31..25=1 14..12=4 6..2=0x0C 1..0=3 divu rd rs1 rs2 31..25=1 14..12=5 6..2=0x0C 1..0=3 rem rd rs1 rs2 31..25=1 14..12=6 6..2=0x0C 1..0=3 remu rd rs1 rs2 31..25=1 14..12=7 6..2=0x0C 1..0=3

#127 Reply
Posted by westfw on 15 Dec, 2018 10:10
Quote
Do people use gcc for msp430, or something else?
There is gcc, now maintained by someone else and distributed by TI, and there is TI's CCS compiler.

The version I have that was distributed with CCS8 is "v7.3.1.24 (Mitto Systems Limited)", and produces significantly different (but still not very good) code. Here's the loop (down to 20 instructions!)
Code: [Select]
fd4e: 27 4d mov @r13, r7 ; fd50: 08 47 mov r7, r8 ; fd52: 28 5e add @r14, r8 ; fd54: 09 48 mov r8, r9 ; fd56: 09 5a add r10, r9 ; fd58: 8c 49 00 00 mov r9, 0(r12) ; fd5c: 4b 46 mov.b r6, r11 ; fd5e: 08 97 cmp r7, r8 ; fd60: 01 28 jnc $+4 ;abs 0xfd64 fd62: 4b 45 mov.b r5, r11 ; fd64: 48 46 mov.b r6, r8 ; fd66: 09 9a cmp r10, r9 ; fd68: 01 28 jnc $+4 ;abs 0xfd6c fd6a: 48 45 mov.b r5, r8 ; fd6c: 4b d8 bis.b r8, r11 ; fd6e: 4a 4b mov.b r11, r10 ; fd70: 2d 53 incd r13 ; fd72: 2e 53 incd r14 ; fd74: 2c 53 incd r12 ; fd76: 0f 9d cmp r13, r15 ; fd78: ea 23 jnz $-42 ;abs 0xfd4e
TI's compiler does a bit better (17 instructions.) It manages to use the autoincrement address modes, and actually doesn't look too bad, for a faithful translation of the the source algorithm (without using the availabe carry flag):
Code: [Select]
c: 38 4d mov @r13+, r8 e: 3b 4e mov @r14+, r11 10: 0b 58 add r8, r11 12: 0a 43 clr r10 14: 0b 98 cmp r8, r11 16: 01 2c jc $+4 ;abs 0x1a 18: 1a 43 mov #1, r10 ;r3 As==01 1a: 0b 59 add r9, r11 1c: 2c 53 incd r12 1e: 8c 4b fe ff mov r11, -2(r12) ; 0xfffe 22: 08 43 clr r8 24: 0b 99 cmp r9, r11 26: 01 2c jc $+4 ;abs 0x2a 28: 18 43 mov #1, r8 ;r3 As==01 2a: 09 48 mov r8, r9 2c: 09 da bis r10, r9 2e: 1f 83 dec r15 30: ed 23 jnz $-36 ;abs 0xc(!12 of those instructions are faking the carry status, which is the sort of thing that makes assembly programmers curse at HLLs...)

#128 Reply
Posted by westfw on 15 Dec, 2018 10:36
Huh. I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming, but it actually did really well! 15 instuctions in the loop, and only 46 bytes total - significantly shorter than the thumb2 code, slightly beating the RISCV.

Code: [Select]
e: 594b ldr r3, [r1, r5] 10: 5954 ldr r4, [r2, r5] 12: 191c adds r4, r3, r4 14: 19a7 adds r7, r4, r6 16: 42b7 cmp r7, r6 18: 41b6 sbcs r6, r6 1a: 429c cmp r4, r3 1c: 41a4 sbcs r4, r4 1e: 5147 str r7, [r0, r5] 20: 4264 negs r4, r4 22: 4276 negs r6, r6 24: 3504 adds r5, #4 26: 4326 orrs r6, r4 28: 45ac cmp ip, r5 2a: d1f0 bne.n e <bignumAdd+0xe>

#129 Reply
Posted by brucehoult on 15 Dec, 2018 10:55
Quote from: hamster_nz on 15 Dec, 2018 09:16
If anybody is interested, I've put my RISC-V toy up on Github - https://github.com/hamsternz/emulate-risc-v - I've even added a little colour.

Nice!

Quote
Does anybody know where I can find the encoding for the RV32M extensions? I've got to the point where the binary I am using uses DIVU...

Sure, but you don't need them. If you're using freedom-e-sdk then just use a build command like:

Code: [Select]
make software PROGRAM=hello RISCV_ARCH=rv32i
Everything is in the "RV32/64G Instruction Set Listings" of the ISA manual. https://github.com/riscv/riscv-isa-manual/blob/master/release/riscv-spec-v2.2.pdf

#130 Reply
Posted by josip on 15 Dec, 2018 12:05
Quote from: westfw on 15 Dec, 2018 10:36
I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming

I am coding CM0+ in assembler, and didn't found any unpleasant surprises till now. Coming from MSP430 (20-bit CPUvX2) assembler.

Also, for comparing code that is executing on different MCU's, relevant is number of cycles, not number of instructions.

#131 Reply
Posted by brucehoult on 15 Dec, 2018 13:02
Quote from: josip on 15 Dec, 2018 12:05
Quote from: westfw on 15 Dec, 2018 10:36
I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming

I am coding CM0+ in assembler, and didn't found any unpleasant surprises till now. Coming from MSP430 (20-bit CPUvX2) assembler.

Definitely Thumb1 (which is what CM0 basically is) is not awful. I spent three years programming the ARM7TDMI in assembly language and we did 95+% of the code in Thumb and ARM only where necessary because of things missing in Thumb.

Mostly it's just a bit short of registers that can be used by all instructions, and it's tricky to incorporate the hi registers.

Quote
Also, for comparing code that is executing on different MCU's, relevant is number of cycles, not number of instructions.

Number of clock cycles depends not only on the instruction set but on the implementation, for example single or multiple issue, in-order or out-of-order.

Also, even within, say, single-issue in-order implementations you have effects such as a CPU with a 2-stage pipeline might use slightly fewer clock cycles than a CPU with a 5-stage pipeline because fewer cycles are wasted in pipeline flushes after conditional branches. *But* the CPU with a 2-stage pipeline will almost certainly be capable of a lower maximum MHz than the CPU with the 5-stage pipeline, given the same manufacturing technology for both.

There are also instruction set features that allow programs in one ISA to use fewer instructions and clock cycles than programs in another ISA, but that increase the work required within each clock cycle enough to limit the MHz to lower than the other ISA.

These days you also have to consider the silicon area used by a CPU, and the energy consumed in executing a complete program.

#132 Reply
Posted by legacy on 15 Dec, 2018 14:44
Quote from: brucehoult on 15 Dec, 2018 13:02
Number of clock cycles depends not only on the instruction set but on the implementation

The best example is the div unit.

(old-school 8bit traditional division algorithm)

Intel developed a super fast Newton-Raphson-ish method that takes fastly converges to the result, while others methods take 1 clock cycle per bit + a residual, thus say ... DIV U/S 32 bit is computed in 33-34 clock cycles. Newton-Raphson-ish methods converge in a quarter of cycles or less.

The pipeline needs to be stalled during computation.

#133 Reply
Posted by legacy on 15 Dec, 2018 14:48
OT:
a bit of humor about acronyms used for instruction-set

#134 Reply
Posted by NorthGuy on 15 Dec, 2018 15:10
Every ISA has some sort of history. It was designed for the conditions and tasks which were important back then. Then it evolved to meet new requirements. While doing so, the designers had to maintain backward compatibility. Thus most existing ISA have lots of ugly details where the old standards didn't come along with new requirements.

RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Without any doubts, you can create CISC ISA which will provide better code density, the same way as Huffman compression will always take less space than plain text. Or, you can create a totally different CISC ISA for high deterministic performance. I don't see anything wrong with comparing RISC and CISC code. Such comparisons show the differences very well, even though it's hard to come up with formal criteria.

#135 Reply
Posted by rstofer on 15 Dec, 2018 16:48
Quote from: westfw on 14 Dec, 2018 02:44
Or, in the case of ARM, it's nice that the ABI and the hardware agree on which registers get saved, so that ISR functions and normal C functions are indistinguishable. I guess. Other times I wish the ISRs in C code were more easily distinguishable, and that the HW interrupt entry was quicker...

Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?

#136 Reply
Posted by rstofer on 15 Dec, 2018 17:04
Quote from: hamster_nz on 14 Dec, 2018 00:32
Quote from: rstofer on 13 Dec, 2018 23:56
What I really need is a reference book for the RISC-V that covers all the hardware details. Not just at 10,000 feet up but right down in the dirt. Something I can convert from text to HDL or, better yet, maybe the HDL is given.

Are there any such references?

I think that this is the key of the RISC-V ethos - it is just the ISA specification. What you do with it is up to you.

As long as your hardware runs the RISC-V RV32I (+ whatever extensions) you don't have to worry too much about the software tooling.

RISC-V it isn't a hardware specification - it is a specification of the interface between the software layer and digital logic layers. If you build a CPU that implements RISC-V, you have a ready-made software layer.

I think I am coming at this from the other end. I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks, But, as long as I'm implementing something, it might as well be for a modern ISA. The two are tied together, without doubt, but an ISA without hardware is pretty meaningless.

In some ways, it's like the 8086 I designed using AMD Am2900 series logic for a class I took back in the early '80s. It looked great on paper (well, it was more like 'adequate') but I will never know if it actually worked. Microcode, all the way!

All those with a copy of Mick and Brick raise your hands! Nobody remembers the title of the book but they sure remember who wrote it!

#137 Reply
Posted by NorthGuy on 15 Dec, 2018 17:54
Quote from: rstofer on 15 Dec, 2018 17:04
I think I am coming at this from the other end. I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks, But, as long as I'm implementing something, it might as well be for a modern ISA. The two are tied together, without doubt, but an ISA without hardware is pretty meaningless.

It will be easier to implement RISC-V ISA and you're likely to make it run at faster clock speeds. Of course, you can probably do better if you design your own RISC ISA which is specifically suited for your particular hardware (such as specific FPGA), but not by much, and with RISC-V you get free software tools.

#138 Reply
Posted by rstofer on 15 Dec, 2018 18:10
Quote from: NorthGuy on 15 Dec, 2018 17:54
Quote from: rstofer on 15 Dec, 2018 17:04
I think I am coming at this from the other end. I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks, But, as long as I'm implementing something, it might as well be for a modern ISA. The two are tied together, without doubt, but an ISA without hardware is pretty meaningless.

It will be easier to implement RISC-V ISA and you're likely to make it run at faster clock speeds. Of course, you can probably do better if you design your own RISC ISA which is specifically suited for your particular hardware (such as specific FPGA), but not by much, and with RISC-V you get free software tools.

I think the software tools is the whole idea. There are lots of interesting CPUs to emulate (think CDC 6400) but unless the software is out in the wild, the CPU is useless.

The LC3 project has an assembler and C compiler so it is actually a reasonable project. The documentation for the project makes no attempt at pipelining and, since it is an undergrad project, that's as it should be.

I have the "Reader" book and it's quite good. I've read about 1/3 of it.

The other day I was reading something about generic RISC architectures and it went in to great detail about hazards. Yes, the taken branch is one example but it's trivial - flush the pipeline and restart. The more interesting problems are hazards where a register is being written at one stage and is an operand for an instruction in the pipeline. There are many examples where the datapath needs to pass results backwards in the pipeline. Detecting and controlling the path is the design issue that concerns me.

It would be pretty easy to design a multi-cycle version of the RISC-V and that's probably where I will start but the end goal is a fully pipelined CPU. Hamster_nz's work will be a good start.

My HiFive1 board showed up today and the diagnostic screen comes up in PuTTY.

#139 Reply
Posted by rstofer on 15 Dec, 2018 21:53
I have VS Code and PlatformIO installed and I can build the blinking LED example from the videos. What I haven't tumbled to is how to get Debug to work. If I attempt to debug, the .elf file is created, a bunch of messages pour out on the terminal then, after a few second timeout, I get an error dialog that says the connection was refused.

I wandered through PlatformIOs site and while they extol the virtues of the debugger, I can't seem to find PHD type instructions (Push Here Dummy). There doesn't seem to be much help on the SiFive site either. Or, I missed it...

Any hints?

#140 Reply
Posted by brucehoult on 15 Dec, 2018 23:13
Quote from: rstofer on 15 Dec, 2018 21:53
I have VS Code and PlatformIO installed and I can build the blinking LED example from the videos. What I haven't tumbled to is how to get Debug to work. If I attempt to debug, the .elf file is created, a bunch of messages pour out on the terminal then, after a few second timeout, I get an error dialog that says the connection was refused.

I wandered through PlatformIOs site and while they extol the virtues of the debugger, I can't seem to find PHD type instructions (Push Here Dummy). There doesn't seem to be much help on the SiFive site either. Or, I missed it...

Any hints?

The videos at the start of this thread show exactly how to use the debugger in PlatformIO.

Sadly, you have to get "pro" and pay $10/month for the privilege -- or at least sign up for the 30 day free trial.

SiFive's Eclipse-based "Freedom Studio" does debugging for free. Or you can use gdb on the command line. The secret there is to open OpenOCD in one terminal and gdb in another. The HiFive1 Getting Started document shows how.

#141 Reply
Posted by brucehoult on 15 Dec, 2018 23:45
Quote from: NorthGuy on 15 Dec, 2018 15:10
RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

The minor 32 bit or 64 bit ISAs are a different matter. I think they're dead. Andes have shipped billions of cores using their propriety nds32 ISA, and it just recently got accepted into the main Linux kernel repository, but they're switching to RISC-V. The same with C-SKY. Pretty much everyone using ARC or Xtensa is likely to switch to RISC-V on their next major redesign or for new projects. I wouldn't be surprised to see MicroChip convert their 32 bit PIC line from MIPS to RISC-V.

Quote
Without any doubts, you can create CISC ISA which will provide better code density, the same way as Huffman compression will always take less space than plain text.

I think that ignores two things:

1) modern RISC ISAs such as Thumb2 and RISC-V are already Huffman encoded.

2) 8086 is nowhere near Huffman encoded. It's encoded as "if it doesn't need any arguments then it gets a short encoding". Just look at AAA, AAD, AAM, AAS, ADC, CLC, CLD, CLI, CMC, DAA, DAS, HLT, IN, INT, INTO, IRET, JNP, JO, JP, JPE, JPO, LAHF, OUT, RCL, RCR, SAHF, SBB, STC, STD, STI, XLATB. That's 31 instructions -- almost 1/8th of the opcode space -- taken up by instructions that are either statistically never used (especially now), or that even in 8086 days were not used often enough to justify a 1-byte encoding (plus offset for the Jumps). Most of them probably do need to exist (or did) but the effect on program size or speed if they'd been hidden away in a secondary opcode page would be minuscule. And those opcodes could have been used for something useful.

The same with VAX. *Every* instruction gets a 1-byte opcode, followed by the arguments. The length of the instructions is decided by the number and size of arguments, not by the frequency of use of the instruction.

#142 Reply
Posted by westfw on 16 Dec, 2018 00:01
Quote
Quote
CM0 ... has a bunch of unpleasant surprises
I am coding CM0+ in assembler, and didn't found any unpleasant surprises

It's mostly the lack of "op2" and the limited range of literal values in the instructions that still have them.

My surprises show up when initializing periperals. I expected code like:
Code: [Select]
PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR; PORT->Group[0].DIRSET.reg |= 1<<12;
To be implementable with code something like:
Code: [Select]
ldr r1, =(PORT + <offset of GROUP[0]>) ldr r2, [r1, #<offset of PINCFG[12]>] orr r2, #PORT_PINCFG_DRVSTR str r2, [r1, #<offset of PINCFG[12]>] ldr r2, [r1, #PORT_DIRSET] orr r2, #4096 str r2, [r1, #PORT_DIRSET]
Instead, you run into "orr doesn't have immediate arguments any more" and "PINCFG is beyond the range allowed by the [r, #const] encoding", so the code takes an extra 5 instructions and two additional registers. The extra instructions may be a wash with the 32bit forms on the v7m chips, but having to use the extra registers (out of the limited set available) is ... annoying.

Now, what Bruce's example code seems to demonstrate is that the "peripheral initialization" is essentially a degenerate case and that the issues I'm complaining about show up less in the "meat" of a real program. That could be, and it's an interesting result.

(I was impressed by the RV32i summary that was posted, WRT the impressive array of "immediate" operands. But I haven't looked too carefully to see if it does the things I want.)

#143 Reply
Posted by legacy on 16 Dec, 2018 00:20
Quote from: brucehoult on 15 Dec, 2018 23:45
Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

*They* have Acorn (where Arm was born) and RISC-PC computers, manufactured and used in the UK. I love my R/600, it comes with a 586 hardware emulator (it's called "guest PC card") so I can also run DOS programs as well as RISC-OS applications

The best and more interesting is the Desktop Development Environment (DDE), a full-featured development suite of tools required to build Applications for RISCOS (mine is v4.39 Adjust/classic). It dates back to the days when Acorn developed RISC-OS and is derived from the in-house development tools. It includes:
- C compiler optimised to producing efficient ARM code
- ARM assembler, more powerful and advanced than any current Open Source ARM assembler
- Makefile utility
- Desktop debugger
- GUI resource file editor
- Object compression/decompression tools
- Intelligent ARM disassembler
- ABC (Archimedes BASIC compiler) to convert BBC BASIC source into machine code
- ARM Cortex A8 instruction timing simulator
- Comprehensive full documentation

It's great for both classic machines (RiscPC/600 with StrongArm, 26bit-space) and newer ones (misc/Cortex A8, 32bit-space), suitable for running on and producing both 26 & 32-bit versions of RISC-OS.

I think RISC-V would be more interesting if a similar solution (a RISC-V workstation + RISC-V/OS and DDE) existed

Besides, another great motivation for Arm is ... the Nintendo GBA with its low-cost development kit (200 euro all inclusive): yet again RISC-V would be more interesting if a mini-video-game portable console existed.

#144 Reply
Posted by legacy on 16 Dec, 2018 00:26
(NUMWorks, ARM-based)

Probably I will buy a tiny RISC-V board to develop a pocket calculator. This idea sounds really intriguing to me

I have already reverse engineered a CASIO Graphics calculator, thus I can re-use the keyboards, I just need a proper LCD ... and a motherboard. The software can be derived from the NUMWorks's project (opensource).

#145 Reply
Posted by NorthGuy on 16 Dec, 2018 00:45
Quote from: brucehoult on 15 Dec, 2018 23:45
Quote from: NorthGuy on 15 Dec, 2018 15:10
RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

I'm sure MIPS is not that much worse, but everyone chooses ARM. Do you really think Xilinx used ARM cores in Zynq because of the technical merit? I don't think so. It's pure marketing. Popularity. People want ARM, Xilinx gives them ARM. But popularity comes and goes. When the next popular think emerges, the old one dies very quickly.

Quote from: brucehoult on 15 Dec, 2018 23:45
I wouldn't be surprised to see MicroChip convert their 32 bit PIC line from MIPS to RISC-V.

After their failure with MIPS and PIC32, I'm sure they won't want to miss the opportunity with RISC-V.

Quote from: brucehoult on 15 Dec, 2018 23:45
1) modern RISC ISAs such as Thumb2 and RISC-V are already Huffman encoded.

This only applies to single instructions. If you analyze the real code generated by compilers, you can find multi-instruction frequent combinations. For example, in your RV32I ISA, setting a single bit in memory takes 3 instructions - 12 bytes. IMHO, in real life the Huffman code for this action would be much shorter.

Quote from: brucehoult on 15 Dec, 2018 23:45
2) 8086 is nowhere near Huffman encoded. It's encoded as "if it doesn't need any arguments then it gets a short encoding". Just look at AAA, AAD, AAM, AAS, ADC, CLC, CLD, CLI, CMC, DAA, DAS, HLT, IN, INT, INTO, IRET, JNP, JO, JP, JPE, JPO, LAHF, OUT, RCL, RCR, SAHF, SBB, STC, STD, STI, XLATB. That's 31 instructions -- almost 1/8th of the opcode space -- taken up by instructions that are either statistically never used (especially now), or that even in 8086 days were not used often enough to justify a 1-byte encoding (plus offset for the Jumps). Most of them probably do need to exist (or did) but the effect on program size or speed if they'd been hidden away in a secondary opcode page would be minuscule. And those opcodes could have been used for something useful.

Of course, it has long history, so the coding is far from perfect. I'm sure, if they started from scratch now, they would have much better encoding in terms of numbers of bytes.

Many things, such as ENTER, LEAVE, LODS, STOS, SCAS, CMPS do save lots of bytes, but are not efficient, so nobody uses them.

BTW: JP and JPE is the same code (also JNP is the same as JO).

#146 Reply
Posted by westfw on 16 Dec, 2018 02:12
Quote
If RISC-V spreads, it should outcompete ARM fairly quickly.
I think you underestimate the effectiveness and importance of a large marketing, sales, and support organization...

#147 Reply
Posted by NorthGuy on 16 Dec, 2018 03:55
Quote from: westfw on 16 Dec, 2018 02:12
Quote
If RISC-V spreads, it should outcompete ARM fairly quickly.
I think you underestimate the effectiveness and importance of a large marketing, sales, and support organization...

Yes, I'm bad at marketing.

But, if Apple (or Google) decides that their phones batteries can last 30% longer with RISC-V, it'll get all the marketing it needs. Of course, this may not happen, and RISC-V gets forgotten. Impossible to see the future is

#148 Reply
Posted by brucehoult on 16 Dec, 2018 04:43
Quote from: westfw on 16 Dec, 2018 00:01
My surprises show up when initializing periperals. I expected code like:
Code: [Select]
PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR; PORT->Group[0].DIRSET.reg |= 1<<12;
To be implementable with code something like:
Code: [Select]
ldr r1, =(PORT + <offset of GROUP[0]>) ldr r2, [r1, #<offset of PINCFG[12]>] orr r2, #PORT_PINCFG_DRVSTR str r2, [r1, #<offset of PINCFG[12]>] ldr r2, [r1, #PORT_DIRSET] orr r2, #4096 str r2, [r1, #PORT_DIRSET]
Instead, you run into "orr doesn't have immediate arguments any more" and "PINCFG is beyond the range allowed by the [r, #const] encoding", so the code takes an extra 5 instructions and two additional registers. The extra instructions may be a wash with the 32bit forms on the v7m chips, but having to use the extra registers (out of the limited set available) is ... annoying.

I guess there are two options: 1) let the C compiler figure it out, or 2) do something like

Code: [Select]
ldr r1, =(PORT + <offset of GROUP[0]> + #<offset of PINCFG[12]>) ldr r2, [r1] ldr r3, #PORT_PINCFG_DRVSTR orr r2, r3 str r2, [r1] ldr r1, =(PORT + <offset of GROUP[0]> + #PORT_DIRSET) ldr r2, [r1] ldr r3, #4096 orr r2, r3 str r2, [r1]
One extra register and three extra instructions. And four 32-bit values in a nearby constant poo instead of the three you'd have in ARM/Thumb2 mode, if that code was actually valid (I didn't check too hard)

So:
A32 is a total of 7*4 + 3*4 = 40 bytes
T16 is a total of 10*2 + 4*4 = 36 bytes

Some size savings, but not a lot. I *think* T32 would be the same size as the A32.

Quote
Now, what Bruce's example code seems to demonstrate is that the "peripheral initialization" is essentially a degenerate case and that the issues I'm complaining about show up less in the "meat" of a real program. That could be, and it's an interesting result.

Sure. Computations with values that are already in registers are where 16 bit opcodes shine. That's equally true with PDP11, M68k, Thumb1, RISC-V C, MSP430, SH4. Or even x86 with opcode + ModR/M byte for reg-reg opertions, until it starts needing prefix bytes to set the operand size.

Quote
(I was impressed by the RV32i summary that was posted, WRT the impressive array of "immediate" operands. But I haven't looked too carefully to see if it does the things I want.)

12 bit immediates and offsets on everything. It's often enough, but you can't do your #4096 as an immediate (only -2048...+2047 is covered). You can do it as LUI t0, #00001. In general you can make any 32 bit constant with LUI t0,#nnnnn; ADDI t0,t0,#nnn, or any 32-bit offset from the PC with LUIPC t0,#nnnnn;ADDI t0,t0,#nnn. Or you can load or store to any 32 bit absolute or PC-relative address with an LUI or AUIPC followed by a load or store with an offset.

As with ARM, there are assembler pseudo ops like LDR so you don't have to worry about the exact instructions used in a particular case.

RISC-V is allergic to constant pools. They are ok in low end processors, but as soon as you get an instruction cache you have the problem that the constant pools will likely get into the instruction cache, but be useless there. And if you have a data cache then instructions around the constant pool get into the data cache, and are useless there. Maybe the compiler/linker could arrange for the constant pools to be in different cache lines to instructions, but I haven't seen that happen.

So RISC-V, along with MIPS, Alpha, and ARM64 prefers using inline code to load constants, even if it needs several instructions to do it.

#149 Reply
Posted by brucehoult on 16 Dec, 2018 04:55
Quote from: legacy on 16 Dec, 2018 00:26
Probably I will buy a tiny RISC-V board to develop a pocket calculator. This idea sounds really intriguing to me

I have already reverse engineered a CASIO Graphics calculator, thus I can re-use the keyboards, I just need a proper LCD ... and a motherboard. The software can be derived from the NUMWorks's project (opensource).

You could try the LoFive: https://store.groupgets.com/products/lofive-risc-v

Note: you need a JTAG interface to program it. Most people use the Olimex ARM-USB-TINY-H, but others should work as long as OpenOCD can find them.

But for this low performance task you'd do it just as well using a soft RISC-V core in a small FPGA.

The TinyFPGA A2 *might* just about be big enough, but the BX certainly is and lots of people use them for this purpose.

https://www.crowdsupply.com/tinyfpga/tinyfpga-bx/updates/tinyfpga-b2-and-bx-projects

https://tinyfpga.com/

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

There was an error while thanking

Thanking...

Go to page:

« 1 2 3 4 5 6 7 8 9 10 11 12 13 » All

Full site Menu

Navigation

Powered by SMFPacks Advanced Attachments Uploader Mod