Author Topic: RISC-V assembly language programming tutorial on YouTube (Read 53931 times)

brucehoult · « **Reply #125 on:** December 15, 2018, 07:53:55 am »

Quote from: westfw on December 15, 2018, 07:27:32 am

Quote
The MSP430 code gcc produced for your example is depressingly bad.
Here's what I get for a hand-written MSP430 version.

And that's what I'd expect, looking at the instruction set. The mystery is why gcc so completely fails to do that, when it can for other ISAs.

It's the gcc I get by doing apt get on Ubtuntu 18.04. It's a little old, from 2012:

msp430-gcc (GCC) 4.6.3 20120301 (mspgcc LTS 20120406 unpatched)

But, still ... that vintage gcc could do this stuff on other ISAs. SH4, for example. Do people use gcc for msp430, or something else?

Looking at some of those, I can certainly understand why people still like to use assembly language quite often. It's hard to understand why they'd put up with compiler results like that at all.

hamster_nz · « **Reply #126 on:** December 15, 2018, 09:16:03 am »

If anybody is interested, I've put my RISC-V toy up on Github - https://github.com/hamsternz/emulate-risc-v - I've even added a little colour.

Does anybody know where I can find the encoding for the RV32M extensions? I've got to the point where the binary I am using uses DIVU...

I can find this, but it is a bit obscure for me!

Code: [Select]

mul     rd rs1 rs2 31..25=1 14..12=0 6..2=0x0C 1..0=3
mulh    rd rs1 rs2 31..25=1 14..12=1 6..2=0x0C 1..0=3
mulhsu  rd rs1 rs2 31..25=1 14..12=2 6..2=0x0C 1..0=3
mulhu   rd rs1 rs2 31..25=1 14..12=3 6..2=0x0C 1..0=3
div     rd rs1 rs2 31..25=1 14..12=4 6..2=0x0C 1..0=3
divu    rd rs1 rs2 31..25=1 14..12=5 6..2=0x0C 1..0=3
rem     rd rs1 rs2 31..25=1 14..12=6 6..2=0x0C 1..0=3
remu    rd rs1 rs2 31..25=1 14..12=7 6..2=0x0C 1..0=3

westfw · « **Reply #127 on:** December 15, 2018, 10:10:52 am »

Quote

Do people use gcc for msp430, or something else?

There is gcc, now maintained by someone else and distributed by TI, and there is TI's CCS compiler.

The version I have that was distributed with CCS8 is "v7.3.1.24 (Mitto Systems Limited)", and produces significantly different (but still not very good) code. Here's the loop (down to 20 instructions!)

Code: [Select]

    fd4e:    27 4d           mov    @r13,    r7    ;
    fd50:    08 47           mov    r7,    r8    ;
    fd52:    28 5e           add    @r14,    r8    ;
    fd54:    09 48           mov    r8,    r9    ;
    fd56:    09 5a           add    r10,    r9    ;
    fd58:    8c 49 00 00     mov    r9,    0(r12)    ;
    fd5c:    4b 46           mov.b    r6,    r11    ;
    fd5e:    08 97           cmp    r7,    r8    ;
    fd60:    01 28           jnc    $+4          ;abs 0xfd64
    fd62:    4b 45           mov.b    r5,    r11    ;
    fd64:    48 46           mov.b    r6,    r8    ;
    fd66:    09 9a           cmp    r10,    r9    ;
    fd68:    01 28           jnc    $+4          ;abs 0xfd6c
    fd6a:    48 45           mov.b    r5,    r8    ;
    fd6c:    4b d8           bis.b    r8,    r11    ;
    fd6e:    4a 4b           mov.b    r11,    r10    ;
    fd70:    2d 53           incd    r13        ;
    fd72:    2e 53           incd    r14        ;
    fd74:    2c 53           incd    r12        ;
    fd76:    0f 9d           cmp    r13,    r15    ;
    fd78:    ea 23           jnz    $-42         ;abs 0xfd4e

TI's compiler does a bit better (17 instructions.) It manages to use the autoincrement address modes, and actually doesn't look too bad, for a faithful translation of the the source algorithm (without using the availabe carry flag):

Code: [Select]

   c:   38 4d           mov     @r13+,  r8
   e:   3b 4e           mov     @r14+,  r11
  10:   0b 58           add     r8,     r11
  12:   0a 43           clr     r10
  14:   0b 98           cmp     r8,     r11
  16:   01 2c           jc      $+4             ;abs 0x1a
  18:   1a 43           mov     #1,     r10     ;r3 As==01
  1a:   0b 59           add     r9,     r11
  1c:   2c 53           incd    r12
  1e:   8c 4b fe ff     mov     r11,    -2(r12) ; 0xfffe
  22:   08 43           clr     r8
  24:   0b 99           cmp     r9,     r11
  26:   01 2c           jc      $+4             ;abs 0x2a
  28:   18 43           mov     #1,     r8      ;r3 As==01
  2a:   09 48           mov     r8,     r9
  2c:   09 da           bis     r10,    r9
  2e:   1f 83           dec     r15
  30:   ed 23           jnz     $-36            ;abs 0xc

(!12 of those instructions are faking the carry status, which is the sort of thing that makes assembly programmers curse at HLLs...)

westfw · « **Reply #128 on:** December 15, 2018, 10:36:35 am »

Huh. I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming, but it actually did really well! 15 instuctions in the loop, and only 46 bytes total - significantly shorter than the thumb2 code, slightly beating the RISCV.

Code: [Select]

   e:   594b            ldr     r3, [r1, r5]
  10:   5954            ldr     r4, [r2, r5]
  12:   191c            adds    r4, r3, r4
  14:   19a7            adds    r7, r4, r6
  16:   42b7            cmp     r7, r6
  18:   41b6            sbcs    r6, r6
  1a:   429c            cmp     r4, r3
  1c:   41a4            sbcs    r4, r4
  1e:   5147            str     r7, [r0, r5]
  20:   4264            negs    r4, r4
  22:   4276            negs    r6, r6
  24:   3504            adds    r5, #4
  26:   4326            orrs    r6, r4
  28:   45ac            cmp     ip, r5
  2a:   d1f0            bne.n   e <bignumAdd+0xe>

brucehoult · « **Reply #129 on:** December 15, 2018, 10:55:54 am »

Quote from: hamster_nz on December 15, 2018, 09:16:03 am

If anybody is interested, I've put my RISC-V toy up on Github - https://github.com/hamsternz/emulate-risc-v - I've even added a little colour.

Nice!

Quote

Does anybody know where I can find the encoding for the RV32M extensions? I've got to the point where the binary I am using uses DIVU...

Sure, but you don't need them. If you're using freedom-e-sdk then just use a build command like:

Code: [Select]

make software PROGRAM=hello RISCV_ARCH=rv32i

Everything is in the "RV32/64G Instruction Set Listings" of the ISA manual. https://github.com/riscv/riscv-isa-manual/blob/master/release/riscv-spec-v2.2.pdf

josip · « **Reply #130 on:** December 15, 2018, 12:05:15 pm »

Quote from: westfw on December 15, 2018, 10:36:35 am

I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming

I am coding CM0+ in assembler, and didn't found any unpleasant surprises till now. Coming from MSP430 (20-bit CPUvX2) assembler.

Also, for comparing code that is executing on different MCU's, relevant is number of cycles, not number of instructions.

brucehoult · « **Reply #131 on:** December 15, 2018, 01:02:43 pm »

Quote from: josip on December 15, 2018, 12:05:15 pm

Quote from: westfw on December 15, 2018, 10:36:35 am
I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming

I am coding CM0+ in assembler, and didn't found any unpleasant surprises till now. Coming from MSP430 (20-bit CPUvX2) assembler.

Definitely Thumb1 (which is what CM0 basically is) is not awful. I spent three years programming the ARM7TDMI in assembly language and we did 95+% of the code in Thumb and ARM only where necessary because of things missing in Thumb.

Mostly it's just a bit short of registers that can be used by all instructions, and it's tricky to incorporate the hi registers.

Quote

Also, for comparing code that is executing on different MCU's, relevant is number of cycles, not number of instructions.

Number of clock cycles depends not only on the instruction set but on the implementation, for example single or multiple issue, in-order or out-of-order.

Also, even within, say, single-issue in-order implementations you have effects such as a CPU with a 2-stage pipeline might use slightly fewer clock cycles than a CPU with a 5-stage pipeline because fewer cycles are wasted in pipeline flushes after conditional branches. *But* the CPU with a 2-stage pipeline will almost certainly be capable of a lower maximum MHz than the CPU with the 5-stage pipeline, given the same manufacturing technology for both.

There are also instruction set features that allow programs in one ISA to use fewer instructions and clock cycles than programs in another ISA, but that increase the work required within each clock cycle enough to limit the MHz to lower than the other ISA.

These days you also have to consider the silicon area used by a CPU, and the energy consumed in executing a complete program.

legacy · « **Reply #132 on:** December 15, 2018, 02:44:49 pm »

Quote from: brucehoult on December 15, 2018, 01:02:43 pm

Number of clock cycles depends not only on the instruction set but on the implementation

The best example is the div unit.

(old-school 8bit traditional division algorithm)

Intel developed a super fast Newton-Raphson-ish method that takes fastly converges to the result, while others methods take 1 clock cycle per bit + a residual, thus say ... DIV U/S 32 bit is computed in 33-34 clock cycles. Newton-Raphson-ish methods converge in a quarter of cycles or less.

The pipeline needs to be stalled during computation.

legacy · « **Reply #133 on:** December 15, 2018, 02:48:17 pm »

OT:
a bit of humor about acronyms used for instruction-set

NorthGuy · « **Reply #134 on:** December 15, 2018, 03:10:09 pm »

Every ISA has some sort of history. It was designed for the conditions and tasks which were important back then. Then it evolved to meet new requirements. While doing so, the designers had to maintain backward compatibility. Thus most existing ISA have lots of ugly details where the old standards didn't come along with new requirements.

RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Without any doubts, you can create CISC ISA which will provide better code density, the same way as Huffman compression will always take less space than plain text. Or, you can create a totally different CISC ISA for high deterministic performance. I don't see anything wrong with comparing RISC and CISC code. Such comparisons show the differences very well, even though it's hard to come up with formal criteria.

rstofer · « **Reply #135 on:** December 15, 2018, 04:48:39 pm »

Quote from: westfw on December 14, 2018, 02:44:51 am

Or, in the case of ARM, it's nice that the ABI and the hardware agree on which registers get saved, so that ISR functions and normal C functions are indistinguishable. I guess. Other times I wish the ISRs in C code were more easily distinguishable, and that the HW interrupt entry was quicker...

Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?

rstofer · « **Reply #136 on:** December 15, 2018, 05:04:07 pm »

Quote from: hamster_nz on December 14, 2018, 12:32:18 am

Quote from: rstofer on December 13, 2018, 11:56:47 pm
What I really need is a reference book for the RISC-V that covers all the hardware details. Not just at 10,000 feet up but right down in the dirt. Something I can convert from text to HDL or, better yet, maybe the HDL is given.

Are there any such references?

I think that this is the key of the RISC-V ethos - it is just the ISA specification. What you do with it is up to you.

As long as your hardware runs the RISC-V RV32I (+ whatever extensions) you don't have to worry too much about the software tooling.

RISC-V it isn't a hardware specification - it is a specification of the interface between the software layer and digital logic layers. If you build a CPU that implements RISC-V, you have a ready-made software layer.

I think I am coming at this from the other end. I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks, But, as long as I'm implementing something, it might as well be for a modern ISA. The two are tied together, without doubt, but an ISA without hardware is pretty meaningless.

In some ways, it's like the 8086 I designed using AMD Am2900 series logic for a class I took back in the early '80s. It looked great on paper (well, it was more like 'adequate') but I will never know if it actually worked. Microcode, all the way!

All those with a copy of Mick and Brick raise your hands! Nobody remembers the title of the book but they sure remember who wrote it!

NorthGuy · « **Reply #137 on:** December 15, 2018, 05:54:05 pm »

Quote from: rstofer on December 15, 2018, 05:04:07 pm

I think I am coming at this from the other end. I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks, But, as long as I'm implementing something, it might as well be for a modern ISA. The two are tied together, without doubt, but an ISA without hardware is pretty meaningless.

It will be easier to implement RISC-V ISA and you're likely to make it run at faster clock speeds. Of course, you can probably do better if you design your own RISC ISA which is specifically suited for your particular hardware (such as specific FPGA), but not by much, and with RISC-V you get free software tools.

rstofer · « **Reply #138 on:** December 15, 2018, 06:10:06 pm »

Quote from: NorthGuy on December 15, 2018, 05:54:05 pm

Quote from: rstofer on December 15, 2018, 05:04:07 pm
I think I am coming at this from the other end. I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks, But, as long as I'm implementing something, it might as well be for a modern ISA. The two are tied together, without doubt, but an ISA without hardware is pretty meaningless.

It will be easier to implement RISC-V ISA and you're likely to make it run at faster clock speeds. Of course, you can probably do better if you design your own RISC ISA which is specifically suited for your particular hardware (such as specific FPGA), but not by much, and with RISC-V you get free software tools.

I think the software tools is the whole idea. There are lots of interesting CPUs to emulate (think CDC 6400) but unless the software is out in the wild, the CPU is useless.

The LC3 project has an assembler and C compiler so it is actually a reasonable project. The documentation for the project makes no attempt at pipelining and, since it is an undergrad project, that's as it should be.

I have the "Reader" book and it's quite good. I've read about 1/3 of it.

The other day I was reading something about generic RISC architectures and it went in to great detail about hazards. Yes, the taken branch is one example but it's trivial - flush the pipeline and restart. The more interesting problems are hazards where a register is being written at one stage and is an operand for an instruction in the pipeline. There are many examples where the datapath needs to pass results backwards in the pipeline. Detecting and controlling the path is the design issue that concerns me.

It would be pretty easy to design a multi-cycle version of the RISC-V and that's probably where I will start but the end goal is a fully pipelined CPU. Hamster_nz's work will be a good start.

My HiFive1 board showed up today and the diagnostic screen comes up in PuTTY.

rstofer · « **Reply #139 on:** December 15, 2018, 09:53:32 pm »

I have VS Code and PlatformIO installed and I can build the blinking LED example from the videos. What I haven't tumbled to is how to get Debug to work. If I attempt to debug, the .elf file is created, a bunch of messages pour out on the terminal then, after a few second timeout, I get an error dialog that says the connection was refused.

I wandered through PlatformIOs site and while they extol the virtues of the debugger, I can't seem to find PHD type instructions (Push Here Dummy). There doesn't seem to be much help on the SiFive site either. Or, I missed it...

Any hints?

brucehoult · « **Reply #140 on:** December 15, 2018, 11:13:30 pm »

Quote from: rstofer on December 15, 2018, 09:53:32 pm

I have VS Code and PlatformIO installed and I can build the blinking LED example from the videos. What I haven't tumbled to is how to get Debug to work. If I attempt to debug, the .elf file is created, a bunch of messages pour out on the terminal then, after a few second timeout, I get an error dialog that says the connection was refused.

I wandered through PlatformIOs site and while they extol the virtues of the debugger, I can't seem to find PHD type instructions (Push Here Dummy). There doesn't seem to be much help on the SiFive site either. Or, I missed it...

Any hints?

The videos at the start of this thread show exactly how to use the debugger in PlatformIO.

Sadly, you have to get "pro" and pay $10/month for the privilege -- or at least sign up for the 30 day free trial.

SiFive's Eclipse-based "Freedom Studio" does debugging for free. Or you can use gdb on the command line. The secret there is to open OpenOCD in one terminal and gdb in another. The HiFive1 Getting Started document shows how.

brucehoult · « **Reply #141 on:** December 15, 2018, 11:45:04 pm »

Quote from: NorthGuy on December 15, 2018, 03:10:09 pm

RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

The minor 32 bit or 64 bit ISAs are a different matter. I think they're dead. Andes have shipped billions of cores using their propriety nds32 ISA, and it just recently got accepted into the main Linux kernel repository, but they're switching to RISC-V. The same with C-SKY. Pretty much everyone using ARC or Xtensa is likely to switch to RISC-V on their next major redesign or for new projects. I wouldn't be surprised to see MicroChip convert their 32 bit PIC line from MIPS to RISC-V.

Quote

Without any doubts, you can create CISC ISA which will provide better code density, the same way as Huffman compression will always take less space than plain text.

I think that ignores two things:

1) modern RISC ISAs such as Thumb2 and RISC-V are already Huffman encoded.

2) 8086 is nowhere near Huffman encoded. It's encoded as "if it doesn't need any arguments then it gets a short encoding". Just look at AAA, AAD, AAM, AAS, ADC, CLC, CLD, CLI, CMC, DAA, DAS, HLT, IN, INT, INTO, IRET, JNP, JO, JP, JPE, JPO, LAHF, OUT, RCL, RCR, SAHF, SBB, STC, STD, STI, XLATB. That's 31 instructions -- almost 1/8th of the opcode space -- taken up by instructions that are either statistically never used (especially now), or that even in 8086 days were not used often enough to justify a 1-byte encoding (plus offset for the Jumps). Most of them probably do need to exist (or did) but the effect on program size or speed if they'd been hidden away in a secondary opcode page would be minuscule. And those opcodes could have been used for something useful.

The same with VAX. *Every* instruction gets a 1-byte opcode, followed by the arguments. The length of the instructions is decided by the number and size of arguments, not by the frequency of use of the instruction.

westfw · « **Reply #142 on:** December 16, 2018, 12:01:55 am »

Quote

Quote
CM0 ... has a bunch of unpleasant surprises
I am coding CM0+ in assembler, and didn't found any unpleasant surprises

It's mostly the lack of "op2" and the limited range of literal values in the instructions that still have them.

My surprises show up when initializing periperals. I expected code like:

Code: [Select]

       PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
       PORT->Group[0].DIRSET.reg |= 1<<12;

To be implementable with code something like:

Code: [Select]

       ldr r1, =(PORT + <offset of GROUP[0]>)
       ldr r2, [r1, #<offset of PINCFG[12]>]
       orr r2, #PORT_PINCFG_DRVSTR
       str r2, [r1, #<offset of PINCFG[12]>]
       ldr r2, [r1, #PORT_DIRSET]
       orr r2, #4096
       str r2, [r1, #PORT_DIRSET]

Instead, you run into "orr doesn't have immediate arguments any more" and "PINCFG is beyond the range allowed by the [r, #const] encoding", so the code takes an extra 5 instructions and two additional registers. The extra instructions may be a wash with the 32bit forms on the v7m chips, but having to use the extra registers (out of the limited set available) is ... annoying.

Now, what Bruce's example code seems to demonstrate is that the "peripheral initialization" is essentially a degenerate case and that the issues I'm complaining about show up less in the "meat" of a real program. That could be, and it's an interesting result.

(I was impressed by the RV32i summary that was posted, WRT the impressive array of "immediate" operands. But I haven't looked too carefully to see if it does the things I want.)

legacy · « **Reply #143 on:** December 16, 2018, 12:20:30 am »

Quote from: brucehoult on December 15, 2018, 11:45:04 pm

Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

*They* have Acorn (where Arm was born) and RISC-PC computers, manufactured and used in the UK. I love my R/600, it comes with a 586 hardware emulator (it's called "guest PC card") so I can also run DOS programs as well as RISC-OS applications

The best and more interesting is the Desktop Development Environment (DDE), a full-featured development suite of tools required to build Applications for RISCOS (mine is v4.39 Adjust/classic). It dates back to the days when Acorn developed RISC-OS and is derived from the in-house development tools. It includes:
- C compiler optimised to producing efficient ARM code
- ARM assembler, more powerful and advanced than any current Open Source ARM assembler
- Makefile utility
- Desktop debugger
- GUI resource file editor
- Object compression/decompression tools
- Intelligent ARM disassembler
- ABC (Archimedes BASIC compiler) to convert BBC BASIC source into machine code
- ARM Cortex A8 instruction timing simulator
- Comprehensive full documentation

It's great for both classic machines (RiscPC/600 with StrongArm, 26bit-space) and newer ones (misc/Cortex A8, 32bit-space), suitable for running on and producing both 26 & 32-bit versions of RISC-OS.

I think RISC-V would be more interesting if a similar solution (a RISC-V workstation + RISC-V/OS and DDE) existed

Besides, another great motivation for Arm is ... the Nintendo GBA with its low-cost development kit (200 euro all inclusive): yet again RISC-V would be more interesting if a mini-video-game portable console existed.

legacy · « **Reply #144 on:** December 16, 2018, 12:26:18 am »

(NUMWorks, ARM-based)

Probably I will buy a tiny RISC-V board to develop a pocket calculator. This idea sounds really intriguing to me

I have already reverse engineered a CASIO Graphics calculator, thus I can re-use the keyboards, I just need a proper LCD ... and a motherboard. The software can be derived from the NUMWorks's project (opensource).

NorthGuy · « **Reply #145 on:** December 16, 2018, 12:45:20 am »

Quote from: brucehoult on December 15, 2018, 11:45:04 pm

Quote from: NorthGuy on December 15, 2018, 03:10:09 pm
RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

I'm sure MIPS is not that much worse, but everyone chooses ARM. Do you really think Xilinx used ARM cores in Zynq because of the technical merit? I don't think so. It's pure marketing. Popularity. People want ARM, Xilinx gives them ARM. But popularity comes and goes. When the next popular think emerges, the old one dies very quickly.

Quote from: brucehoult on December 15, 2018, 11:45:04 pm

I wouldn't be surprised to see MicroChip convert their 32 bit PIC line from MIPS to RISC-V.

After their failure with MIPS and PIC32, I'm sure they won't want to miss the opportunity with RISC-V.

Quote from: brucehoult on December 15, 2018, 11:45:04 pm

1) modern RISC ISAs such as Thumb2 and RISC-V are already Huffman encoded.

This only applies to single instructions. If you analyze the real code generated by compilers, you can find multi-instruction frequent combinations. For example, in your RV32I ISA, setting a single bit in memory takes 3 instructions - 12 bytes. IMHO, in real life the Huffman code for this action would be much shorter.

Quote from: brucehoult on December 15, 2018, 11:45:04 pm

2) 8086 is nowhere near Huffman encoded. It's encoded as "if it doesn't need any arguments then it gets a short encoding". Just look at AAA, AAD, AAM, AAS, ADC, CLC, CLD, CLI, CMC, DAA, DAS, HLT, IN, INT, INTO, IRET, JNP, JO, JP, JPE, JPO, LAHF, OUT, RCL, RCR, SAHF, SBB, STC, STD, STI, XLATB. That's 31 instructions -- almost 1/8th of the opcode space -- taken up by instructions that are either statistically never used (especially now), or that even in 8086 days were not used often enough to justify a 1-byte encoding (plus offset for the Jumps). Most of them probably do need to exist (or did) but the effect on program size or speed if they'd been hidden away in a secondary opcode page would be minuscule. And those opcodes could have been used for something useful.

Of course, it has long history, so the coding is far from perfect. I'm sure, if they started from scratch now, they would have much better encoding in terms of numbers of bytes.

Many things, such as ENTER, LEAVE, LODS, STOS, SCAS, CMPS do save lots of bytes, but are not efficient, so nobody uses them.

BTW: JP and JPE is the same code (also JNP is the same as JO).

westfw · « **Reply #146 on:** December 16, 2018, 02:12:14 am »

Quote

If RISC-V spreads, it should outcompete ARM fairly quickly.

I think you underestimate the effectiveness and importance of a large marketing, sales, and support organization...

NorthGuy · « **Reply #147 on:** December 16, 2018, 03:55:57 am »

Quote from: westfw on December 16, 2018, 02:12:14 am

Quote
If RISC-V spreads, it should outcompete ARM fairly quickly.
I think you underestimate the effectiveness and importance of a large marketing, sales, and support organization...

Yes, I'm bad at marketing.

But, if Apple (or Google) decides that their phones batteries can last 30% longer with RISC-V, it'll get all the marketing it needs. Of course, this may not happen, and RISC-V gets forgotten. Impossible to see the future is

brucehoult · « **Reply #148 on:** December 16, 2018, 04:43:14 am »

Quote from: westfw on December 16, 2018, 12:01:55 am

My surprises show up when initializing periperals. I expected code like:
Code: [Select]
PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR; PORT->Group[0].DIRSET.reg |= 1<<12;
To be implementable with code something like:
Code: [Select]
ldr r1, =(PORT + <offset of GROUP[0]>) ldr r2, [r1, #<offset of PINCFG[12]>] orr r2, #PORT_PINCFG_DRVSTR str r2, [r1, #<offset of PINCFG[12]>] ldr r2, [r1, #PORT_DIRSET] orr r2, #4096 str r2, [r1, #PORT_DIRSET]
Instead, you run into "orr doesn't have immediate arguments any more" and "PINCFG is beyond the range allowed by the [r, #const] encoding", so the code takes an extra 5 instructions and two additional registers. The extra instructions may be a wash with the 32bit forms on the v7m chips, but having to use the extra registers (out of the limited set available) is ... annoying.

I guess there are two options: 1) let the C compiler figure it out, or 2) do something like

Code: [Select]

ldr r1, =(PORT + <offset of GROUP[0]> + #<offset of PINCFG[12]>)
ldr r2, [r1]
ldr r3, #PORT_PINCFG_DRVSTR
orr r2, r3
str r2, [r1]
ldr r1, =(PORT + <offset of GROUP[0]> + #PORT_DIRSET)
ldr r2, [r1]
ldr r3, #4096
orr r2, r3
str r2, [r1]

One extra register and three extra instructions. And four 32-bit values in a nearby constant poo instead of the three you'd have in ARM/Thumb2 mode, if that code was actually valid (I didn't check too hard)

So:
A32 is a total of 7*4 + 3*4 = 40 bytes
T16 is a total of 10*2 + 4*4 = 36 bytes

Some size savings, but not a lot. I *think* T32 would be the same size as the A32.

Quote

Now, what Bruce's example code seems to demonstrate is that the "peripheral initialization" is essentially a degenerate case and that the issues I'm complaining about show up less in the "meat" of a real program. That could be, and it's an interesting result.

Sure. Computations with values that are already in registers are where 16 bit opcodes shine. That's equally true with PDP11, M68k, Thumb1, RISC-V C, MSP430, SH4. Or even x86 with opcode + ModR/M byte for reg-reg opertions, until it starts needing prefix bytes to set the operand size.

Quote

(I was impressed by the RV32i summary that was posted, WRT the impressive array of "immediate" operands. But I haven't looked too carefully to see if it does the things I want.)

12 bit immediates and offsets on everything. It's often enough, but you can't do your #4096 as an immediate (only -2048...+2047 is covered). You can do it as LUI t0, #00001. In general you can make any 32 bit constant with LUI t0,#nnnnn; ADDI t0,t0,#nnn, or any 32-bit offset from the PC with LUIPC t0,#nnnnn;ADDI t0,t0,#nnn. Or you can load or store to any 32 bit absolute or PC-relative address with an LUI or AUIPC followed by a load or store with an offset.

As with ARM, there are assembler pseudo ops like LDR so you don't have to worry about the exact instructions used in a particular case.

RISC-V is allergic to constant pools. They are ok in low end processors, but as soon as you get an instruction cache you have the problem that the constant pools will likely get into the instruction cache, but be useless there. And if you have a data cache then instructions around the constant pool get into the data cache, and are useless there. Maybe the compiler/linker could arrange for the constant pools to be in different cache lines to instructions, but I haven't seen that happen.

So RISC-V, along with MIPS, Alpha, and ARM64 prefers using inline code to load constants, even if it needs several instructions to do it.

brucehoult · « **Reply #149 on:** December 16, 2018, 04:55:45 am »

Quote from: legacy on December 16, 2018, 12:26:18 am

Probably I will buy a tiny RISC-V board to develop a pocket calculator. This idea sounds really intriguing to me

I have already reverse engineered a CASIO Graphics calculator, thus I can re-use the keyboards, I just need a proper LCD ... and a motherboard. The software can be derived from the NUMWorks's project (opensource).

You could try the LoFive: https://store.groupgets.com/products/lofive-risc-v

Note: you need a JTAG interface to program it. Most people use the Olimex ARM-USB-TINY-H, but others should work as long as OpenOCD can find them.

But for this low performance task you'd do it just as well using a soft RISC-V core in a small FPGA.

The TinyFPGA A2 *might* just about be big enough, but the BX certainly is and lots of people use them for this purpose.

https://www.crowdsupply.com/tinyfpga/tinyfpga-bx/updates/tinyfpga-b2-and-bx-projects

https://tinyfpga.com/


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: RISC-V assembly language programming tutorial on YouTube (Read 53931 times)

Share me