Author Topic: RISC-V assembly language programming tutorial on YouTube  (Read 14814 times)

0 Members and 1 Guest are viewing this topic.

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
RISC-V assembly language programming tutorial on YouTube
« on: December 08, 2018, 02:37:44 am »
Western Digital (WD) has just posted a 12-part YouTube series in which CTO Martin Fink (!!) presents assembly language programming for RISC-V, using a SiFive HiFive1 with VS Code.

Sadly, they don't seem to be linked to each other.













 
The following users thanked this post: hans, obiwanjacobi, newbrain, lucazader, cepwin

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #1 on: December 08, 2018, 02:42:25 am »
Not linked, but there is a playlist
« Last Edit: December 08, 2018, 02:48:07 am by brucehoult »
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #2 on: December 08, 2018, 02:58:37 pm »
Anyone spotted the deliberate (?) error in his assembly language code?

Answer: 729de934392445a122503b40747a83e50b3c4a20
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #3 on: December 08, 2018, 03:23:46 pm »
2nd more minor bug: when MTIME wraps around after about 18 hours, while(MTIME < targetTime){} can misbehave, giving a zero delay. Should use while(int(targetTime-MTIME) > 0){}
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #4 on: December 09, 2018, 01:19:50 am »
Interesting presentation!  I installed the tools and, I must say, I like Visual Studio Code.  It looks a lot like Eclipse with a few more buttons but it works really well.

To add the Debug functionality, you need an account and it seems that something is only temporary because they talk about something happening in about a month.  It costs about $10/month for the Professional version which includes the Unified Debugger and some other features.  Without the debugger option, the IDE is free to use.  Actually, you give up a lot of functionality in the free version.

Well, I couldn't get the debugger to work.  I probably need the board to do that.  The demo doesn't say anything about the board while discussing the debugger.

The HiFive1 board is fairly expensive at around $60 and has no ADCs.  Maybe just use an SPI ADC on a Shield.

Here's where I go off the rails:  Following the demo with 10 lines of C and 3 tiny assembly language files (total 38 instructions), the compiled output is 4,624 bytes of RAM and 53,710 bytes of flash.  To flash an LED!

There seems to be a lot of library code included in the build but since I can't find the linker map file (or how to turn it on), I have no idea how to fix this.  The size doesn't seem to change much when I comment out #include <stdio.h> nor do I know why it was included in the project.

The FE310-G000 chip is FAST - upwards of 320 MHz.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #5 on: December 09, 2018, 02:19:31 am »
Interesting presentation!  I installed the tools and, I must say, I like Visual Studio Code.  It looks a lot like Eclipse with a few more buttons but it works really well.

Yeah, I'm allergic to Javascript, but I installed VS Code a while ago and it seems ok. Pretty quick (which Eclipse isn't). I didn't actually realise you could add that kind of embedded programming functionality to it.

Quote
To add the Debug functionality, you need an account and it seems that something is only temporary because they talk about something happening in about a month.  It costs about $10/month for the Professional version which includes the Unified Debugger and some other features.  Without the debugger option, the IDE is free to use.  Actually, you give up a lot of functionality in the free version.

Ugh. I'd pay $10, but not $10/month. I guess I'll stick to using gdb, with or without emacs.

Quote
Well, I couldn't get the debugger to work.  I probably need the board to do that.  The demo doesn't say anything about the board while discussing the debugger.

Sure, he's using the board. You should be able to run the compiled code on qemu, but you're not going to flash a LED that way :-)

Quote
The HiFive1 board is fairly expensive at around $60 and has no ADCs.  Maybe just use an SPI ADC on a Shield.

SiFive didn't have access to analogue IP two years ago then the HiFive1 shipped. Do now.

People have been using:

MCP3008 (SPI) https://www.adafruit.com/product/856
ADS1015 (I2C) https://www.adafruit.com/product/1083

Quote
Here's where I go off the rails:  Following the demo with 10 lines of C and 3 tiny assembly language files (total 38 instructions), the compiled output is 4,624 bytes of RAM and 53,710 bytes of flash.  To flash an LED!

There seems to be a lot of library code included in the build but since I can't find the linker map file (or how to turn it on), I have no idea how to fix this.  The size doesn't seem to change much when I comment out #include <stdio.h> nor do I know why it was included in the project.

There's a tension in supplying IDEs in that beginners just want code they grab from who knows where to work and like to have fully featured libraries, while pros want small code. We definitely default to the former as much as possible. Pros have the knowledge needed to cut the size down if required. And this board has 16 *mega*bytes of flash for the program, so it's not a big issue for beginners.

I haven't tried following these videos yet and don't know exactly what toolchain they're using, but the size seems normal for Newlib. Newlib is really designed for PC use rather than embedded. Even Newlib Nano doesn't help much. We're aware of and working on this...

It doesn't help that ARMs C library gets much smaller size by using a software FP library that doesn't meet IEEE 754 requirements, while ours does.

Using SiFive's freedom-e-sdk I get the following text sizes:

51498 HelloWorld with printf
 2042 program with main() just "return 0"
 6924 Dhrystone using a custom minimal printf https://github.com/sifive/freedom-e-sdk/blob/master/software/dhrystone/dhry_printf.c

Basically, the Newlib printf(), among other things, is pretty bloaty.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #6 on: December 09, 2018, 05:51:52 am »
To add the Debug functionality, you need an account and it seems that something is only temporary because they talk about something happening in about a month.  It costs about $10/month for the Professional version which includes the Unified Debugger and some other features.  Without the debugger option, the IDE is free to use.  Actually, you give up a lot of functionality in the free version.

Now I've followed through the presentation actually doing it :-) Yes, using their debugger needs the "pro" version of PlatformIO at $10/month. You do get a free 30 day trial.

I guess the good news is that if you pay that, you can work with 550 or so other boards as well: AVRs, PICs, any number of ARM boards...

It *looks* as though you should be able to use gdb directly. It's given as an option in the debugger menu

 

Offline lucazader

  • Regular Contributor
  • *
  • Posts: 119
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #7 on: December 09, 2018, 08:58:08 am »
Thanks for the post Bruce.
Definitely keen to see where all this Risc-V stuff is headed.
Especially that new core/chip designer stuff you are rolling out at SiFive.

As far as VScode goes, PIO is great if you are a beginner not wanting to setup tool-chains etc.
However if your slightly more advanced, using makefiles and GDB within vscode for no cost is super easy! No need to install the great hulk that is PIO.
Just the specific gcc for your MCU (of which you can get pre-built binaries from the SiFive website), and then GDB with the correct script, which you can probably find on the internet for pretty much any board.

Would definitely be interested in looking into some dev boards and MCU's from the E20 series as they come out, especially with integrated ADC's etc...
 

Offline FlyingDutch

  • Contributor
  • Posts: 23
  • Country: pl
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #8 on: December 09, 2018, 09:06:59 am »
Hello,

very interesting subject, but this board HiFive1 is very expensive 59$. I would like to evaluate the "RISC-V" architecture, but HiFive1 board is too expensive for me.
I am vondering if it is possible to evaluate RISC-V architecture using FPGA board and RISC-V Ip-Core? I have few FPGA boards (the biggest one with Artix7 17000 logic cells). I founf link for RISC-V cores:

https://github.com/riscv/riscv-wiki/wiki/RISC-V-Cores-and-SoCs

Has somebody tried RISC-V as IP-core running on FPGA board. Any hints before start?

Kind Regards
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #9 on: December 09, 2018, 11:46:01 am »
As far as VScode goes, PIO is great if you are a beginner not wanting to setup tool-chains etc.
However if your slightly more advanced, using makefiles and GDB within vscode for no cost is super easy! No need to install the great hulk that is PIO.
Just the specific gcc for your MCU (of which you can get pre-built binaries from the SiFive website), and then GDB with the correct script, which you can probably find on the internet for pretty much any board.

I completely agree. I almost never use an IDE myself, just emacs, gcc, openocd (or avrdude or whatever, as appropriate), gdb. However obviously this tutorial is aimed at beginners, and Western Digital happened to choose VS Code with PIO for it .. so I figured I'd better find out something about it.

Our command-line freedom-e-sdk (which PIO downloads and uses internally) is fairly easy to use itself. For one of the provided projects just type...

Code: [Select]
make software PROGRAM=led_fade
make upload PROGRAM=led_fade

That's it! Now it's running on the board. If you want to do your own program just duplicate one of the example programs, change the name, and you're away.

I personally find it much more understandable to work with standard tools such as shell commands, make, my favourite editor etc than to learn the latest IDE every couple of years.

We do however provide a completely free Eclipse-based IDE "Freedom Studio" that supports GUI debugging, much as PIO does https://sifive.cdn.prismic.io/sifive%2F08af66c3-f408-4ffd-8e92-0428e5b8011a_freedomstudio_manual.v1p3.pdf

Quote
Would definitely be interested in looking into some dev boards and MCU's from the E20 series as they come out, especially with integrated ADC's etc...

SiFive's business model is to enable -- to ENCOURAGE -- *other people* to decide that there will be demand for a chip with a particular core (or cores), memory, peripherals and make it themselves (or Sifive can organise the volume manufacturing, put your logo on the chips etc). With the IP partners added during 2018 this now includes analogue peripherals such as ADCs.

The FE310 and FU540 won't be the last chips SiFive makes, but don't expect to see dozens or hundreds of them, in every possible configuration.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #10 on: December 09, 2018, 12:01:21 pm »
very interesting subject, but this board HiFive1 is very expensive 59$. I would like to evaluate the "RISC-V" architecture, but HiFive1 board is too expensive for me.

The 3rd-party designed "LoFive" uses the same CPU chip as the HiFive1 and programs are compatible, but it's been available for $25 to $30 at various times. https://store.groupgets.com/products/lofive-risc-v Note that it does require a JTAG programmer, whereas the HiFive1 is programmed using USB, so if you don't already have one you won't see much savings.

You can also buy bare FE310 chips yourself for $25 for 5 on the HiFive1 CrowdSupply page and build your own board. Complete design files for the LoFive are available on github https://github.com/mwelling/lofive

Quote
I am vondering if it is possible to evaluate RISC-V architecture using FPGA board and RISC-V Ip-Core? I have few FPGA boards (the biggest one with Artix7 17000 logic cells). I founf link for RISC-V cores:

https://github.com/riscv/riscv-wiki/wiki/RISC-V-Cores-and-SoCs

Has somebody tried RISC-V as IP-core running on FPGA board. Any hints before start?

Sure! There are literally dozens of cores and cores with peripherals designs for putting into FPGAs from a wide variety of manufacturers.

Probably one of the easiest to get going (and using 100% open-source software) is picorv32/picosoc running on a TinyFPGA BX https://discourse.tinyfpga.com/t/riscv-example-project-on-tinyfpga-bx/451

VexRiscv is also very much worth checking out. It works on a wide range of FPGAs from different vendors, including Artix 7: https://github.com/SpinalHDL/VexRiscv
 
The following users thanked this post: FlyingDutch

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #11 on: December 09, 2018, 03:48:21 pm »

Here's where I go off the rails:  Following the demo with 10 lines of C and 3 tiny assembly language files (total 38 instructions), the compiled output is 4,624 bytes of RAM and 53,710 bytes of flash.  To flash an LED!

There seems to be a lot of library code included in the build but since I can't find the linker map file (or how to turn it on), I have no idea how to fix this.  The size doesn't seem to change much when I comment out #include <stdio.h> nor do I know why it was included in the project.

There's a tension in supplying IDEs in that beginners just want code they grab from who knows where to work and like to have fully featured libraries, while pros want small code. We definitely default to the former as much as possible. Pros have the knowledge needed to cut the size down if required. And this board has 16 *mega*bytes of flash for the program, so it's not a big issue for beginners.

Basically, the Newlib printf(), among other things, is pretty bloaty.

Yes, newlib is a pig!  But in the demo project, no library function is ever called.  Yes, I imagine some C library functions might be called but it seems unlikely given the code that was written.  I can't see where newlib comes into this project.

I need to look around in the IDE and find the compile/link options.  They have to be there somewhere.  I want to see the assembly output and the link map.

The really good news:  The referenced book, 'RISC V Reader' is less than $20, shipped from Amazon.  The other 2 books are pricey but the author of the videos refers to the 'Reader' as THE book to buy.  So I did...

The interesting part of RISC ISAs is how the hardware handles hazards.  The deeper the pipeline, the more complex it gets.  Throwing away partially executed instructions because a branch was taken but couldn't be predicted because the condition code hadn't been updated from an arithmetic operation that hadn't completed - it gets complicated!

Next question:  How many of the more esoteric op codes does the compiler use?  The problem with building an FPGA version is to ensure you have implemented enough of the instruction set to allow the compiler to generate workable code.  Or, be prepared to rewrite the code generation of the C compiler!

At the moment, the RISC V is in its infancy.  It's been around a while but there is strong competition from ARM and ARM covers quite a broad spectrum of computing.  In the meantime, it's worth knowing how the RISC V works.  If you like knowing that kind of stuff...
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #12 on: December 09, 2018, 04:13:35 pm »
but this board HiFive1 is very expensive 59$

expensive? 60 USD are nothing for a board.
 
The following users thanked this post: Kilrah

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #13 on: December 09, 2018, 05:21:57 pm »
but this board HiFive1 is very expensive 59$

expensive? 60 USD are nothing for a board.

It is essentially a stripped down Arduino, at least in form factor, that runs a lot faster.  The Arduino UNO is $19 at Amazon.  It also has no ADCs which limits its applicability.  What it does have is a LOT of memory.

I'm vacillating, I want to know more about the ISA as a matter of general interest and, truly, $60 isn't a number that concerns me.  But I have to be aware that a) I don't need the board and b) even if I did, it lacks certain features.

So, the book gets here Tuesday and after I skim through it, I'll probably buy the board just to play with it.  Compared to the FPGA boards I have bought, $60 is nothing!

Pipelined processors are interesting and RISC V is definitely approachable at the hardware level (as opposed to just buying an ARM <whatever>).  If I so desire, I can pick up one of the FPGA cores and get right down in the dirt of hardware design.  I like hardware design!

Still vacillating...
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #14 on: December 09, 2018, 06:40:48 pm »
It is essentially a stripped down Arduino, at least in form factor, that runs a lot faster.

An Infineon 4500 (Arm core with Cordic) costs something like 45 Euro, so 60 Euro is definitively not too much.


 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #15 on: December 09, 2018, 06:53:34 pm »
It also has no ADCs which limits its applicability.  What it does have is a LOT of memory.

my MIPS ATLAS board costs 600 Euro. I got a second-hand one for 120 Euro. It comes with no ADC, DAC, SPI, etc ... just the CPU, 32Mbyte of ram, a PCI bus, two serial line, a couple of timers, LA headers and e-Jtag.

It's not a problem. Simply it's not the kind of target I will ever use to interface a CNC, or a 3D printer. It's not its job, and there are cheaper and better boards for this (Sanguino for example).

I am enjoying my ATLAS board for XINU, and other software stuff that runs more comfortably there. Besides, the e-Jtag makes the experience more comfortable for the debugging point of view.

Still vacillating...

No doubt! And no need to compare a true 32bit RISC processor to ... Arduino1 (at least you should consider Arduino2, or Arduino/Zero ... or Infineon 4500)!

Anyway, if you like RISC-V, go and buy the board  :D
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #16 on: December 09, 2018, 10:16:53 pm »
Anyway, if you like RISC-V, go and buy the board  :D

I decided to buy myself a Christmas present - $75 including tax and shipping.  It'll probably be here next week.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #17 on: December 09, 2018, 10:58:42 pm »
Yes, newlib is a pig!  But in the demo project, no library function is ever called.  Yes, I imagine some C library functions might be called but it seems unlikely given the code that was written.  I can't see where newlib comes into this project.

.platformio/packages/framework-freedom-e-sdk/env/freedom-e300-hifive1/init.c:225:25
Code: [Select]
printf("core freq at %d Hz\n", get_cpu_freq());

On newer versions I've replaced that with a custom itoa() and three puts() which cuts the bloat hugely.

Quote
I need to look around in the IDE and find the compile/link options.  They have to be there somewhere.  I want to see the assembly output and the link map.

PlatformIO isn't supplying all the tools, for example there is no objdump :-(

If you want to poke more deeply you'll be better off using the command-line freedom-e-sdk direct from SiFive. I recommend building it -- it's 25 minutes on a quad core. (but there are precompiled binaries too)

https://github.com/sifive/freedom-e-sdk

Quote
The interesting part of RISC ISAs is how the hardware handles hazards.  The deeper the pipeline, the more complex it gets.  Throwing away partially executed instructions because a branch was taken but couldn't be predicted because the condition code hadn't been updated from an arithmetic operation that hadn't completed - it gets complicated!

No problem in an in-order CPU, which includes everything anyone has shipped so far. Instructions after the branch have been fetched and decoded and their operands fetched, but they don't *execute* until after the branch does .. at which point it is known whether it will branch or not. If the prediction was wrong then execution of already-fetched instructions is squashed and fetch/decode starts again at the correct place.

Quote
Next question:  How many of the more esoteric op codes does the compiler use?  The problem with building an FPGA version is to ensure you have implemented enough of the instruction set to allow the compiler to generate workable code.  Or, be prepared to rewrite the code generation of the C compiler!

All of them!

If you want to build your own CPU in an FPGA then you only need to implement RV32I, which has only and exactly what is needed to compile C code ... but omitting multiply and divide. gcc will happily emit code for this, using library functions for multiply and divide.

The complete RV32I instruction set:

Code: [Select]
Registers r0 ... r31. r0 is always 0, can be used to discard unwanted results (e.g. jal/jalr)
PC is separate. There is no dedicated SP or LR at the hardware level.
All immediate values are signed.

OP rd, rs1, rs2  ; OP = add/sub, slt/sltu (rd=1 if less than, else 0), and/or/xor, sll/srl/sla (shifts)
OPi rd, rs1, 0xNNN ; OP = add, slt/sltu, and/or/xor, sll/srl/sra

lui rd, 0xNNNNN ; load immediate<<12 into rd
auipc rd, 0xNNNNN ; add immediate<<12 to the PC and store in rd

jal rd, 0xNNNNN ; add immediate<<1 to the PC. Store old PC in rd
jalr rd, 0xNNN(rs1) ; add immediate to rs1, clear the low bit, store in the PC. Store old PC in rd

bOP rs1, rs2, 0xNNN ; if rs1 OP rs2 is true add immediate<<1 to PC; OP = eq/ne/lt/ltu/ge/geu

sSZ rs2, 0xNNN(rs1); add immediate to rs1, use as address to store from rs2, SZ = b/h/w
lSZ rd, 0xNNN(rs1); load to rd. SZ = b/bu/h/hu/w (u zero extends, others sign extend)

ecall/ebreak ; no arguments. Call OS or debugger.

There's not a lot of fat there.

Optional: implement only r0 .. r15 and then use -march=rv32e to gcc ("e" for embedded)

Optional: implement multiply/divide -march=rv32im (or em)

Optional: implement atomic operations for SMP -march=rv32a

Optional: implement 16 bit opcodes duplicating the most common 32 bit opcodes for code density comparable to Thumb2 instead of comparable to ARM -march=rv32ic

Optional: implement floating point -march=rv32if or rv32ifd

You can compile any normal C/C++ code and newlib using rv32i or rv32e. The linux kernel and glibc require at least rv32ia (actually I think they require rv64ia at present, but that is being fixed).

Quote
At the moment, the RISC V is in its infancy.  It's been around a while but there is strong competition from ARM and ARM covers quite a broad spectrum of computing.  In the meantime, it's worth knowing how the RISC V works.  If you like knowing that kind of stuff...

Yes, it's early days, but momentum is building.

There is no great *technical* advantage over ARM or MIPS, but also no disadvantage. Compare code size, compare Dhrystone or Coremark or SPEC ... it's a photo finish in most cases. MIPS code is the biggest (and microMIPS doesn't help as much as Thumb or rvc), rv32i is comparable to ARM, rv32ic to Thumb2. In 64 bit, rv64ic is much smaller than anything else (ARM didn't see fit to duplicate Thumb in 64 bit!).

The advantage is that absolutely anyone is free to implement their own CPU (FPGA, ASIC, emulator), call it "RISC-V" if it passes the conformance tests, and use it, sell it, give it away as you please. You are not going to get any nasty lawyers letters.

The advantage over implementing your own instruction set is that someone else already wrote/ported a huge amount of software for you.
« Last Edit: December 10, 2018, 01:38:55 am by brucehoult »
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #18 on: December 09, 2018, 11:24:57 pm »
Yes, newlib is a pig!  But in the demo project, no library function is ever called.  Yes, I imagine some C library functions might be called but it seems unlikely given the code that was written.  I can't see where newlib comes into this project.

.platformio/packages/framework-freedom-e-sdk/env/freedom-e300-hifive1/init.c:225:25
Code: [Select]
printf("core freq at %d Hz\n", get_cpu_freq());

On newer versions I've replaced that with a custom itoa() and three puts() which cuts the bloat hugely.

Yup!  I found the code.  Of course there is also a lot of framework code to initialize the CPU and there is a bunch of space for vectors.  This is typical of modern processors.  In a lot of ways, the vector table feels a lot like ARM7-TDMI.  So does 'start.S'.

I would probably modify the code to just eliminate displaying the CPU frequency.  Maybe I'll try that just to see what  happens.

Quote

PlatformIO isn't supplying all the tools, for example there is no objdump :-(

If you want to poke more deeply you'll be better off using the command-line freedom-e-sdk direct from SiFive. I recommend building it -- it's 25 minutes on a quad core. (but there are precompiled binaries too)

https://github.com/sifive/freedom-e-sdk


I'll try to build the tools next week.  Not a big deal - my main workstation dual boots and it has plenty of horsepower (I7-7700).

I have great expectations of the "RISC-V Reader" book.  I imagine many of my questions will be answered.
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #19 on: December 09, 2018, 11:36:09 pm »
Yes, it's early days, but momentum is building.

There is no great *technical* advantage over ARM or MIPS, but also no disadvantage. Compare code size, compare Dhrystone or Coremark or SPEC ... it's a photo finish in most cases. MIPS code is the biggest (and microMIPS doesn't help as much as Thumb or rvc), rv32i is comparable to ARM, rv32ic to Thumb2. In 64 bit, rv64ic is much smaller than anything else (ARM didn't see fit to duplicate Thumb in 64 bit!).

The advantage is that absolutely anyone is free to implement their own CPU (FPGA, ASIC, emulator), call it "RISC-V" if it passes the conformance tests, and use it, sell it, give it away as you please. You are not going to get any nasty lawyers letters.

The advantage over implementing your own instruction set is that someone else already wrote/ported a huge amount of software for you.

I have always been of the opinion that having software to run on a CPU is more important than the CPU itself. 

In the same way, I am looking at microcontrollers as opposed to microcomputers.  I want a bunch of peripherals.  I don't much care about the CPU architecture, if I want to apply the chip to something, it needs peripherals.

I'll be looking at one of the FPGA implementations of the RISC-V to see if I can actually bring up a working CPU.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #20 on: December 10, 2018, 04:48:29 am »
In case anyone is interested in instruction encodings...

 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #21 on: December 10, 2018, 12:22:31 pm »
oh, it seems there is/will be support for  lauterbach's debuggers  :o
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #22 on: December 10, 2018, 01:42:08 pm »
oh, it seems there is/will be support for  lauterbach's debuggers  :o

Already for a year
And Segger
 

Offline ehughes

  • Frequent Contributor
  • **
  • Posts: 359
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #23 on: December 10, 2018, 01:56:25 pm »
What is the value proposition for this core?     It seems that the value might be for IC vendors to avoid licensing fees from ARM but that is a drop in the bucket compared everything else.     The instruction set doesn't seem all that interesting.    There are a handful of interesting instruction for ARM CM4 (such as SMLAL) and there were interesting features such as bit-banding and simple interrupt handling.   

I am not seeing anything that is particularly useful here?  Or am I missing something?

 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #24 on: December 10, 2018, 02:44:26 pm »
What is the value proposition for this core?

It's not a core, it's an instruction set, like x86 or ARM that can be implemented in many different cores. There are already dozens of different core designs from both commercial and non-commercial sources.

Quote
It seems that the value might be for IC vendors to avoid licensing fees from ARM but that is a drop in the bucket compared everything else.

It's about "Free as in speech, not free as in beer". Yes, you can design your own core or download a free one from github. But that's a lot of work and there are no guarantees or support. Companies such as SiFive, Andes, Syntacore, Esperanto etc are going to want licensing fees just as ARM are (perhaps less, perhaps not).

Quote
The instruction set doesn't seem all that interesting.

It's not intended to be interesting. It's intended to be extremely boring, simple, enabling of both very simple and small implementations and complex high performance implementations, as technically effective as the others, and patent and license-free.

The standard parts of RISC-V very deliberately tread only well worn and established paths that are either provably public-domain in the first place or else covered by expired patents.

Quote
There are a handful of interesting instruction for ARM CM4 (such as SMLAL) and there were interesting features such as bit-banding and simple interrupt handling.   

I am not seeing anything that is particularly useful here?  Or am I missing something?

If you want SMLAL in RISC-V you can add it yourself. Or ask your chip vendor to add it for you. You don't have to persuade the trademark/patent/copyright holder that it's a useful instruction for you and hopefully lots of other users, and wait for them to incorporate it in the next version of their standard and eventually produce chips.

Ok, ARM already did SMLAL in Cortex M4. Cool. But what if you want it in a small microarchitecture such as CM0? You're out of luck. If you want an equivalent to SMLAL in a CM0-sized core such as the SiFive E20 or the Pulpino ZERO RISCY -- no problem, just add it, or have them add it for you.

ARM and Intel have smart people, but will they think of every possible instruction that might make someone's program go 10x or 100x faster? No, they can't. And, worse, if they do add it they're going to add it for everyone -- for your competition as well as for you. Everyone gets any feature that might be useful to anyone .. or else no one gets it. So at the same time you can't get chips with features that you really want, AND you get bloated chips with a lot of features that you don't want.

Radical personalization is the present and future of many industries. Now it's available in CPUs too.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #25 on: December 10, 2018, 02:47:40 pm »
I haven't yet looked it in detail, but the "FENCE" class looks interesting!
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #26 on: December 10, 2018, 02:53:01 pm »
I haven't yet looked it in detail, but the "FENCE" class looks interesting!

EIEIO. And related :-)
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #27 on: December 10, 2018, 03:16:28 pm »
I just went over to ARM to get a sense of the size of their instruction set.  Somehow, I think they have moved beyond Reduced Instruction Set with the latest designs.  There certainly are some 'interesting' instructions but I wonder which opcodes GCC actually uses.
 

Offline ehughes

  • Frequent Contributor
  • **
  • Posts: 359
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #28 on: December 10, 2018, 03:29:34 pm »
Quote
Radical personalization is the present and future of many industries. Now it's available in CPUs to
o.

So if I understand correctly,   RISC-V  is more intended for high volume SoC type customers who want to make specialized cores?      (i.e. the Western Digital use case).       

I looked through the SiFive site and it seems the message is that you can get your own custom SoC made quicker.     
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #29 on: December 10, 2018, 03:30:54 pm »
I haven't yet looked it in detail, but the "FENCE" class looks interesting!

EIEIO. And related :-)

This will add more "fun" to every superscalar implementation of the RISC-V. EIEIO + isync + sync on our PowerPC460 is able to cause great emotions, like people hammering their heads on the desk and going to throw the target-board out of the window ... which is ... love ... in reverse order :D


another interesting point I see: like MIPS and PowerPC, even RISC-V uses LL/SC to emulate CAS, which is to say, LL/SC is used to write a tiny bit of code which loads a target memory address, compares it to a comparand, and then writes back a swap value to the target if the comparand and target values are equal.

It would be interesting ... how LL monitors an address (say, a semaphore), and how SC does its job.


A senior here said that X86/x64 is better because it implements DWCAS (sort of CAS, but more complex) instead of LL/SC ... dunno, I have ZERO experience with Intel x86  :-//
 

Offline FlyingDutch

  • Contributor
  • Posts: 23
  • Country: pl
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #30 on: December 10, 2018, 06:57:50 pm »

expensive? 60 USD are nothing for a board.

Hello,

comparing for example to these FPGA boards (I have both of them):

https://numato.com/product/mimas-v2-spartan-6-fpga-development-board-with-ddr-sdram

https://store.digilentinc.com/cmod-a7-breadboardable-artix-7-fpga-module/

Yes, it seems to me expensive ;)

Regards
« Last Edit: December 10, 2018, 07:03:04 pm by FlyingDutch »
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #31 on: December 10, 2018, 08:27:44 pm »
Thinking about the smallest FPGA incantation, does the RISC-V make sense as a general purpose drop-in core?  Maybe there is a project where the CPU is just handling details (maybe console IO or file IO) but the majority of the project is some kind of hardware thing (even including another CPU) that just needs a little high level help - that is, the full hardware description is too ugly to contemplate and a programmable core would smooth things out.

OK, I'll fess up!  I would rather write C code than HDL when it comes to things like a microSD driver.

Assuming adequate resources, of course.

Board has shipped, toolchain has been built.  One thing about a fresh Linux install, there are a bunch of dependent tools that need to be built or just installed.  Among my favorites:  MPR, MPFC, GMP.  Nothing is as simple as it seems it would be!

This thread has links to amazing resources.  Once I get to play with the HiFive1 board for a while, I am almost certain to be looking at an Artix incantation of the core.  I have a couple of Arty 7 boards and that Nexys 4 board I have uses an Artix 100T chip.  Lots of resources!

It should be an interesting winter.

 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 9902
  • Country: us
  • DavidH
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #32 on: December 10, 2018, 08:34:37 pm »
There is no great *technical* advantage over ARM or MIPS, but also no disadvantage. Compare code size, compare Dhrystone or Coremark or SPEC ... it's a photo finish in most cases. MIPS code is the biggest (and microMIPS doesn't help as much as Thumb or rvc), rv32i is comparable to ARM, rv32ic to Thumb2. In 64 bit, rv64ic is much smaller than anything else (ARM didn't see fit to duplicate Thumb in 64 bit!).

Lack of flags increasing code size by 4 times and requiring 2 extra registers to detect various conditions sure seems like a disadvantage.  That extra code and register pressure also has the effect of making the caches effectively smaller.  Having to effectively execute an ALU operation twice or more cannot help power efficiency.

Technically only flags which represent changes in state like carry and overflow are required; for instance zero, negative, and parity can be computed at any time.  What I would like to see is a design where flags requiring state are stored in a register dedicated to each destination register which avoids the hazard of having a single flags register like in x86 or requiring a flags register operand which would require extra instruction bits.

Some ISAs do this to track whether a register has been used in the current execution context so that the entire register set does not need to be saved on a context switch.  The first use of a register is just another bit of state to save.
 
The following users thanked this post: rhodges

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #33 on: December 10, 2018, 09:01:28 pm »
In case anyone is interested in instruction encodings...



I've been tinkering with RISC-V in my spare time, and I have to say that the 32-bit integer instruction set is quite nice for hardware implementation:

- The source and destination registers are always encoded in the same place.
- The most significant bit of any constant is always in the same place (makes for easy sign extension)
- The privileged instructions (ones that need to be trapped for OS / Hypervisor) are all nicely contained

The only thing I find awkward is that the encoding of the offsets on the jump instructions - fine for H/W but painful to decode for a naive software emulation.

Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #34 on: December 10, 2018, 11:27:09 pm »
I just went over to ARM to get a sense of the size of their instruction set.  Somehow, I think they have moved beyond Reduced Instruction Set with the latest designs.  There certainly are some 'interesting' instructions but I wonder which opcodes GCC actually uses.

Exactly what "RISC" means has always been and remains the subject of some debate :-)

For me, I think the most important characteristics are:

- strict separation of computation from data transfer (load/store)

- enough registers that you don't touch memory much. Arguments for most functions fit in registers, and the return address too (the otherwise RISC AVR8 violates this).

- general purpose registers rather than special purpose.

- no instruction can cause more than one cache or TLB miss, or two adjacent lines/pages if unaligned access is supported (and this case might be trapped and emulated)

- the instruction length can be determined easily (combinatorial circuit with few gates) by examining only the initial part of the instruction.

- each instruction modifies at most one register.

- integer instructions read at most two registers. This is ultra-purist :-) A number of RISC ISAs break it in order to have e.g. a register plus (scaled) register addressing mode, or conditional select. But no more than three!

- no microcode or hardware sequencing. Each instruction executes in a small and fixed number of clock cycles (usually one). Load/Store multiple are the main offenders in both ARM and PowerPC. They help with code size, but it's interesting that ARM didn't put them in Aarch64 and is deprecating them in 32 bit as well, providing the much less offensive load/store pair.


Something that I think is *not* necessary in order to be "RISC" is to have a small number of instructions. Yes, PowerPC has a huge number of instructions, as does modern ARM and Aarch64. This does not disqualify from being RISC as long as each instruction follows the above rules.

What a huge number of instructions *does* do is make very small low end implementations impossible. And puts a big burden of work on every hardware and every emulator implementer.
 
The following users thanked this post: TNorthover, rhodges

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #35 on: December 11, 2018, 12:04:35 am »
Quote
Radical personalization is the present and future of many industries. Now it's available in CPUs to
o.

So if I understand correctly,   RISC-V  is more intended for high volume SoC type customers who want to make specialized cores?      (i.e. the Western Digital use case).       

I don't think you can distil one single thing that a standard backed by 100+ companies is "intended for".

It's a free and open standard that software can be written to, and that anyone is free to implement in any way they chose: hardware, software (interpret/JIT), or FPGA.

It is intended that nothing in RISC-V disqualify is from being applicable to everything from the smallest (32 bit) microcontroller to the largest supercomputer, and everything between. See for example the European Processor Initiative which is developing processors for supercomputers. Based on the RISC-V ISA.

Quote
I looked through the SiFive site and it seems the message is that you can get your own custom SoC made quicker.   

SiFive is just one company of many doing things with RISC-V. It happens to be one of the first out of the starting gate (founded in September 2015) and therefore currently one of the most visible.

SiFive's business model is indeed not to be a chip vendor, but to enable others to make chips.

At the moment. most of the RISC-V activity has been people (often individuals) making soft cores for FPGAs and large companies who are already making SoCs putting a RISC-V processor in on corner.

What a lot of people on this forum want is to be able to go to digikey/mouser/element14 and choose a microcontroller with a CPU (and they don't really care what CPU) and the selection of peripherals they need for some task.

Those doesn't exist now, but they will start to in 2019, from a number of vendors.

The first off the block appears to be NXP, with an SoC (RV32M1) with two RISC-V cores (not SiFive ones), two ARM cores, and a bunch of peripherals including BlueTooth, USB, ADC, RTC, uSDHC, crypto acceleration. They have a web site where you can get a board with this for free http://open-isa.org/order/ and they gave away a few hundred boards at the RISC-V Summit last week.

MicroSemi/Microchip have announced a version of their PolarFire FPGA with embedded SiFive FU540 complex (five 64 bit cores, four with FPU&MMU).

There will be a *lot* more to follow during 2019.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #36 on: December 11, 2018, 12:34:45 am »
I haven't yet looked it in detail, but the "FENCE" class looks interesting!

EIEIO. And related :-)

This will add more "fun" to every superscalar implementation of the RISC-V. EIEIO + isync + sync on our PowerPC460 is able to cause great emotions, like people hammering their heads on the desk and going to throw the target-board out of the window ... which is ... love ... in reverse order :D

Alas, if you want to do out of order CPUs then you absolutely need a well thought-out system for ensuring memory consistency. Hopefully RISC-V has got that right -- there's been a committee with very very experienced people in this field (industry and academics) who've spent over a year on this. The PowerPC/Alpha/ARM experience has hopefully been learned from -- certainly it's not lack of will or effort.

The RISC-V spec also allows TSO (like SPARC, x86) as an optional feature. That will inevitably be a little lower performance, especially when scaled to large numbers of cores (dozens, hundreds, thousands), but it does make life easier for programmers. Standard RISC-V code written for a weak memory model will run correctly on systems with TSO, but not vice-versa.

Quote
another interesting point I see: like MIPS and PowerPC, even RISC-V uses LL/SC to emulate CAS, which is to say, LL/SC is used to write a tiny bit of code which loads a target memory address, compares it to a comparand, and then writes back a swap value to the target if the comparand and target values are equal.

Right. You can also implement many other interesting things using LL/SC.

RISC-V also has a number of Atomic Memory Operations (AMOs) which take one integer argument (not two like CAS) and an address, atomically do ... something ... with the integer and the memory contents, and return an integer. The allowed operations are swap (unconditional), add, and/or/xor, min/max (signed and unsigned).

This can be done by bringing the data into the CPU and sending the new value back, but the TileLink protocol (from Berkeley, which SiFive and others use) supports pushing these out to be executed in a cache controller, or on another CPU or peripheral that owns the address.

I see AMBA 5 got similar capability this year, although it also includes a remote CAS, which TileLink doesn't.

Quote
It would be interesting ... how LL monitors an address (say, a semaphore), and how SC does its job.

That is entirely up to whoever implements an individual core/memory system. The most common way would be for the CPU to take exclusive ownership of a cache line, and then the SC checks that it still has exclusive ownership. There is a small limit on the number (and type) of instructions you are allowed to execute between the LL and SC if you want to guarantee forward progress. One way this might work is that the CPU might ... delay ... its response to other CPUs requests to read or take ownership of that cache line for a few clock cycles.

Quote
A senior here said that X86/x64 is better because it implements DWCAS (sort of CAS, but more complex) instead of LL/SC ... dunno, I have ZERO experience with Intel x86  :-//

DCAS is useful for some things, such as manipulating queues without the expense of using a full semaphore. But it's not cheap to implement and has its limitations. e.g. see http://www.cs.tau.ac.il/~shanir/nir-pubs-web/Papers/DCAS.pdf

The RISC-V community is interesting in adopting some more powerful mechanism than LL/SC, but I think it's more likely to be a form of LL/SC that accepts a small number (2 to 5) of addresses rather than something as specific as DCAS ... and rather than something as general as full STM (which Intel has had numerous bugs trying to implement).
 
The following users thanked this post: TNorthover

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #37 on: December 11, 2018, 12:44:25 am »
Thinking about the smallest FPGA incantation, does the RISC-V make sense as a general purpose drop-in core?  Maybe there is a project where the CPU is just handling details (maybe console IO or file IO) but the majority of the project is some kind of hardware thing (even including another CPU) that just needs a little high level help - that is, the full hardware description is too ugly to contemplate and a programmable core would smooth things out.

Sure. That's one of the major uses of RISC-V right now. Some of the stripped down RV32I cores are using around 300 LUT6's! In fact the winner of a recent contest, engine-V uses only 306 LUT4's -- amazing.

https://riscv.org/2018/12/risc-v-softcpu-contest-winners-demonstrate-cutting-edge-risc-v-implementations-for-fpgas/
https://github.com/micro-FPGA/engine-V

PicoRV32 and VexRiscv are also worth checking out.

Quote
It should be an interesting winter.

Have fun!
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 9902
  • Country: us
  • DavidH
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #38 on: December 11, 2018, 12:59:07 am »
- strict separation of computation from data transfer (load/store)

On the other hand, allowing ALU instructions to have one memory operand acts as a type of instruction set compression, lowers register pressure, and seems to have little disadvantage when out-of-order execution allows long load-to-use latencies from cache.

Quote
- enough registers that you don't touch memory much. Arguments for most functions fit in registers, and the return address too (the otherwise RISC AVR8 violates this).

But if the register set is too large, it means more state to save.  There are other solutions for this though.

Quote
- no instruction can cause more than one cache or TLB miss, or two adjacent lines/pages if unaligned access is supported (and this case might be trapped and emulated)

The lack of hardware support for unaligned access always seems to end up being a performance problem once a processor gets deployed into the real world.

Weak memory ordering which seems like it should be an advantage also becomes a liability.

Quote
- each instruction modifies at most one register.

That is pretty standard but how then do you handle integer multiplies and divides?  Break them up into two instructions?

Quote
- integer instructions read at most two registers. This is ultra-purist :-) A number of RISC ISAs break it in order to have e.g. a register plus (scaled) register addressing mode, or conditional select. But no more than three!

Internally it seems like this sort of thing and modifying more than one register should be broken up into separate micro-operations so that the register file has a lower number of read and write ports.  The alternative is having to decode more instructions which clog up the front end once an out-of-order implementation is desired.

On the other hand, this means discarding the performance advantages of the FMA instruction.

Quote
- no microcode or hardware sequencing. Each instruction executes in a small and fixed number of clock cycles (usually one). Load/Store multiple are the main offenders in both ARM and PowerPC. They help with code size, but it's interesting that ARM didn't put them in Aarch64 and is deprecating them in 32 bit as well, providing the much less offensive load/store pair.

Maybe more interesting is why ARM even included them in the first place.  Load and store multiple took advantage of fast-page-mode DRAM access when ARMs instruction pipeline was closely linked with DRAM access.

Should stack instructions be broken up as well?

Quote
What a huge number of instructions *does* do is make very small low end implementations impossible. And puts a big burden of work on every hardware and every emulator implementer.

I do not know about that.  Multiple physical designs covering a wide performance range are possible with the same ISA.  Microcode is convenient to handle seldom used instructions.  Vector operations can be broken up into instructions which fit the micro-architecture's ALU width while allowing support for the same vector instruction set across a wide range of implementations.

Or you can use instruction set extensions every time you want to support a different vector length.  How many FPU ISAs has ARM gone through now?
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #39 on: December 11, 2018, 01:07:18 am »
There is no great *technical* advantage over ARM or MIPS, but also no disadvantage. Compare code size, compare Dhrystone or Coremark or SPEC ... it's a photo finish in most cases. MIPS code is the biggest (and microMIPS doesn't help as much as Thumb or rvc), rv32i is comparable to ARM, rv32ic to Thumb2. In 64 bit, rv64ic is much smaller than anything else (ARM didn't see fit to duplicate Thumb in 64 bit!).

Lack of flags increasing code size by 4 times and requiring 2 extra registers to detect various conditions sure seems like a disadvantage.  That extra code and register pressure also has the effect of making the caches effectively smaller.  Having to effectively execute an ALU operation twice or more cannot help power efficiency.

That happens only with the overflow flag, which is used ... well ... never ... in standard software. C/C++ code does not use or require it.

Carry flag requires *one* instruction to branch on it (same as in an ISA with condition codes), or *one* instruction to set a register to 0 or 1 (one more than an ISA with condition codes). It's also extremely rarely needed -- mostly in bignum libraries, which are going to be limited by memory load/store anyway.

Quote
Technically only flags which represent changes in state like carry and overflow are required; for instance zero, negative, and parity can be computed at any time.  What I would like to see is a design where flags requiring state are stored in a register dedicated to each destination register which avoids the hazard of having a single flags register like in x86 or requiring a flags register operand which would require extra instruction bits.

That would be better for OoO implementations than a single flags register, yes. You'd need BVC, BVS, BCC, BCS instructions that took a full register number as well as the branch offset.

It would be an interesting experiment to do to implement this. And this is EXACTLY what RISC-V enables you to do for low cost in time and money. Modify your favourite FPGA implementation to have your new instructions, modify gcc or llvm to generate them, and run dhrystone/coremark/SPEC/your favourite benchmark suite without and without using the new instructions. Publish the results with execution time, energy use, area cost, and any effect on MHz. We all learn something!

Quote
Some ISAs do this to track whether a register has been used in the current execution context so that the entire register set does not need to be saved on a context switch.  The first use of a register is just another bit of state to save.

I haven't seen a bit for every register, but it's common for FPUs or vector units to have a single bit for the whole unit, as many programs don't use FP or vectors at all.

Again, worth trying, though context switches are very rare on normal systems.

Back in June, Intel disclosed a "Lazy FPU State Restore" bug in all Core-based processors. Microsoft and others fixed the bug by disabling the use of the FPU dirty bit and just saving and restoring everything on every context switch. The effect on performance was basically unmeasurable.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #40 on: December 11, 2018, 01:34:29 am »
I've been tinkering with RISC-V in my spare time, and I have to say that the 32-bit integer instruction set is quite nice for hardware implementation:

- The source and destination registers are always encoded in the same place.
- The most significant bit of any constant is always in the same place (makes for easy sign extension)
- The privileged instructions (ones that need to be trapped for OS / Hypervisor) are all nicely contained

Yes, all of those were deliberate goals when Krste Asanovic designed the encoding -- or should I say, redesigned it after experience implementing an earlier version.

Something that almost everyone else does is put the data register for load and store instructions in the same place, even though for a load it's the destination but for a store it's a source!

Quote
The only thing I find awkward is that the encoding of the offsets on the jump instructions - fine for H/W but painful to decode for a naive software emulation.

Yes, I find it painful for trying to mentally decode instructions too. It's the result of the interaction of three things:

- always having the sign bit in the same place (as you noted)

- wanting to scale branch offsets to increase reach as you don't need byte-addressing (except for J(AL)R, which is almost always either paired with a LUI/AUIPC making increased reach unnecessary OR has a zero offset)

- minimising the number of places in the opcode where each bit of a literal or offset can come from and *not* requiring mass shifters. I *think* it's the case that only bit 11 can come from three places in the instruction while all the other bits can come from at most two places in the instruction (and bits 13 to 19 only one). A constant 0 or sign extension is also possible as well of course.
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #41 on: December 11, 2018, 02:44:40 am »
Quote
- each instruction modifies at most one register.

That is pretty standard but how then do you handle integer multiplies and divides?  Break them up into two instructions?

Yes, Exactly this.

From the ISA spec (https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf):

Quote
If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[ S ]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies.
« Last Edit: December 11, 2018, 02:49:30 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #42 on: December 11, 2018, 02:52:36 am »
Video #1 has been replaced today. I don't know what changed. I wonder if Western Digital will update the videos that show buggy code (at least with a text "oops.." overlay as they already did for a few things).
« Last Edit: December 11, 2018, 03:00:54 am by brucehoult »
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #43 on: December 11, 2018, 04:34:15 am »
Is there a preferred or recommended memory map for a RISC-V environment?

The ISA spec doesn't have much to say, apart from that the ISA is set up be helpful for generating relocatable code. Is there a guide of "common/best practice" for where you put your memory mapped I/O, bootstrap ROMs, and so on that is helpful?

For my software emulator I was thinking of trying something like the FE310-G000:

Quote
00000000:00000FFF Debug address space
00001000:01FFFFFF On-chip Non volatile memory
02000000:1FFFFFFF I/O
20000000:7FFFFFFF Off-chip Non volatile memory
80000000:FFFFFFFF On-chip volatile memory

And after a reset execution starts at 0000:1000.

Does that sound sane?
« Last Edit: December 11, 2018, 04:56:31 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #44 on: December 11, 2018, 08:23:25 am »
Is there a preferred or recommended memory map for a RISC-V environment?

The ISA spec doesn't have much to say, apart from that the ISA is set up be helpful for generating relocatable code. Is there a guide of "common/best practice" for where you put your memory mapped I/O, bootstrap ROMs, and so on that is helpful?

For my software emulator I was thinking of trying something like the FE310-G000:

Quote
00000000:00000FFF Debug address space
00001000:01FFFFFF On-chip Non volatile memory
02000000:1FFFFFFF I/O
20000000:7FFFFFFF Off-chip Non volatile memory
80000000:FFFFFFFF On-chip volatile memory

And after a reset execution starts at 0000:1000.

Does that sound sane?

Looks ok to me.

Neither the RISC-V user mode architecture nor the privileged mode architecture define a memory map. SiFive follows rocket-chip and I think other prior Berkeley practice with the memory map. I don't know whether other vendors do too.

You're supposed to read the configuration string to figure out where things are. The FE310 has a config string at 0x100C  in the mask ROM(just after the reset vector) containing:

Code: [Select]
/cs-v1/;
/{
  model = \"SiFive,FE310G-0000-Z0\";
  compatible = \"sifive,fe300\";
  /include/ 0x20004;
};

And then 0x20004 is in the OTP.

For Linux, the Boot Loader creates a deviceTree somehow (for example from config string, or it could be a DTB on disk) and passes it to the kernel when it starts it.

The bottom 4 GB on the FU540 are very similar the FE310 (including RAM at 0x8000_000:0xFFFF_FFFF), and then above 4 GB you have:

Code: [Select]
0x01_0000_0000:0x0F_FFFF_FFFF Peripherals
0x10_0000_0000:0x1F_FFFF_FFFF System
0x20_0000_0000:0x3F_FFFF_FFFF RAM
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #45 on: December 11, 2018, 02:17:23 pm »
You probably can re-code it on MIPS one-to-one, except for LUI (if not followed by XORI or ADDI), which would require an extra instruction - very simple hardware emulator :)

Why every instruction has "11" at the end? This way it only uses 1/4 of the code point space.
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #46 on: December 11, 2018, 09:01:39 pm »
You probably can re-code it on MIPS one-to-one, except for LUI (if not followed by XORI or ADDI), which would require an extra instruction - very simple hardware emulator :)

Why every instruction has "11" at the end? This way it only uses 1/4 of the code point space.

This is just the RV32I (32-bit integer) instructions - the minimal set set you need to support. On top of this are the mult/div extensions, the floating point, compressed instructions and so on.

It is encoded this way to make life easier (i.e. smaller, faster, simpler) for the instruction fetching/decode.

The ISA supports quite a few different instruction lengths:
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 9902
  • Country: us
  • DavidH
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #47 on: December 11, 2018, 09:27:29 pm »
It would be an interesting experiment to do to implement this. And this is EXACTLY what RISC-V enables you to do for low cost in time and money. Modify your favourite FPGA implementation to have your new instructions, modify gcc or llvm to generate them, and run dhrystone/coremark/SPEC/your favourite benchmark suite without and without using the new instructions. Publish the results with execution time, energy use, area cost, and any effect on MHz. We all learn something!

It would be too big of a change to RISC-V.  It alters the basic ISA and architecture and then a new code generator would be required anyway.  It goes against the design principles of RISC-V.

Quote
Quote
Some ISAs do this to track whether a register has been used in the current execution context so that the entire register set does not need to be saved on a context switch.  The first use of a register is just another bit of state to save.

I haven't seen a bit for every register, but it's common for FPUs or vector units to have a single bit for the whole unit, as many programs don't use FP or vectors at all.

Back in June, Intel disclosed a "Lazy FPU State Restore" bug in all Core-based processors. Microsoft and others fixed the bug by disabling the use of the FPU dirty bit and just saving and restoring everything on every context switch. The effect on performance was basically unmeasurable.

That is what I was thinking of.  Intel of course managed to screw it up.  It had a limited effect on performance because of its limited applicability; task switching the vector instructions was already a performance problem.

Intel has a lot of performance problems with their vector units and so much so that they had to issue a directive not to use them for things like memory copies.

It would be more appropriate for a design intended to take advantage of it like with old ARM's load and store multiple.

Quote
Again, worth trying, though context switches are very rare on normal systems.

But subroutine calls are not.

Stack dumps would be marvelous in a bad way I think with a feature like this but I would want it anyway.
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #48 on: December 11, 2018, 10:03:10 pm »
It would be an interesting experiment to do to implement this. And this is EXACTLY what RISC-V enables you to do for low cost in time and money. Modify your favourite FPGA implementation to have your new instructions, modify gcc or llvm to generate them, and run dhrystone/coremark/SPEC/your favourite benchmark suite without and without using the new instructions. Publish the results with execution time, energy use, area cost, and any effect on MHz. We all learn something!

It would be too big of a change to RISC-V.  It alters the basic ISA and architecture and then a new code generator would be required anyway.  It goes against the design principles of RISC-V.

I sort of think that Bruce's use is fully in in line with the design principles of RISC-V...
"RISC-V (pronounced “risk-five”) is a new instruction set architecture (ISA) that was originally
designed to support computer architecture research and education..."

"An ISA separated into a small base integer ISA, usable by itself as a base for customized
accelerators or for educational purposes, and optional standard extensions, to support general purpose
software development."
/
To me, a RISC-V RV32I core, with a custom hyperconverged-blockchain-crypto-quantum-showlace-tying extension sounds perfectly in line with the design principles.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #49 on: December 11, 2018, 10:57:07 pm »
Intel has a lot of performance problems with their vector units and so much so that they had to issue a directive not to use them for things like memory copies.

I definitely need to read this. I'm one of those who have been using them for memory copying for 20 years or so, and it always had performance increases in my tests when I moved to the next bigger register size over the years. Do you have any reference for the document?
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 9902
  • Country: us
  • DavidH
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #50 on: December 12, 2018, 02:40:52 am »
Intel has a lot of performance problems with their vector units and so much so that they had to issue a directive not to use them for things like memory copies.

I definitely need to read this. I'm one of those who have been using them for memory copying for 20 years or so, and it always had performance increases in my tests when I moved to the next bigger register size over the years. Do you have any reference for the document?

The discussion was in the RWT (Real World Technologies) forums months ago.  The problem was library routines or periodically executed code which blindly uses AVX for string copies thereby triggering lower core clock rates when certain AVX instructions are used.  The result was lower scalar performance.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #51 on: December 12, 2018, 05:33:22 am »
At the outset here, I want to point out that my previous post was describing what characteristics are generally associated with the term "RISC", not whether they are individually or collectively good or bad.

I think the collective goodness or badness is sufficiently addressed by the simple fact that since 1980 there has been no (successful) new CISC instruction set, and the ones that have survived have done so by translating to a more RISCy form internally.

CISC doesn't even win for compactness of programs in the 32 bit and 64 bit eras. RISC architectures with a simple mix of 2-byte and 4-byte instructions do. VAX and x86 both allow instructions with lengths from 1 byte to 15 or 16 bytes in 1 byte increments. VAX even bigger in some extreme instructions, but let's stick to common things such as ADDL3. In the 64 bit era, x86's 15 byte limitation is by fiat -- the syntax easily allows longer instructions.

- strict separation of computation from data transfer (load/store)

On the other hand, allowing ALU instructions to have one memory operand acts as a type of instruction set compression, lowers register pressure, and seems to have little disadvantage when out-of-order execution allows long load-to-use latencies from cache.

Lowers register pressure yes. "Little disadvantage" with OoO: yes. But a significant disadvantage with simpler in-order implementations, such as microcontrollers.

Program compression: it seems logical, but is not borne out by empirical data.

It's logical that not having to mention a temporary register name (twice!) in a load and subsequent arithmetic instruction could make for smaller programs. But no one seems to have been able to achieve this in practice.

I think the problem is that in all of PDP11, VAX, 68k, x86 each operand that can potentially be in memory is accompanied by an addressing mode field, which is *wasted* in the more common case (if you have enough registers) when the operand is actually in a register. PDP11 and 68k have in 16 bit opcodes two 3 bit register fields plus two 3 bit addressing modes. VAX has 4 bit register field plus 4 bit addressing mode for each operand. x86 limits one operand to be a register but effectively wastes three bits out of 16 for reg-reg operations (1 bit in the opcode to select reg-mem or mem-reg, plus 2 bits in the modr/m byte).

If not for those often-wasted addressing mode fields, all of PDP11, 68k and x86 could perhaps have instead fit 3-address instructions into a 16 bit opcode (as Thumb does for add&sub), saving a lot of extra instructions to copy one operand to where the result is required first.

Quote
Quote
- enough registers that you don't touch memory much. Arguments for most functions fit in registers, and the return address too (the otherwise RISC AVR8 violates this).

But if the register set is too large, it means more state to save.  There are other solutions for this though.

Sure. It's possible to go overboard. Four registers doesn't seem to be enough. 128 is too many. Sixteen seems to be a good number if you have complex addressing modes (big immediates and offsets, register plus scaled register), and thirty two if you don't. Eight isn't really enough, even with complex addressing modes, but two sets of eight seems workable e.g. 8 address plus 8 data (68k) or 8 general purpose lo registers plus 8 more limited hi (Thumb). Or 8 directly addressable, plus 8 that need an extra prefix byte to address (x86_64).

Quote
Quote
- no instruction can cause more than one cache or TLB miss, or two adjacent lines/pages if unaligned access is supported (and this case might be trapped and emulated)

The lack of hardware support for unaligned access always seems to end up being a performance problem once a processor gets deployed into the real world.

Weak memory ordering which seems like it should be an advantage also becomes a liability.

Unaligned access, I agree.

Weak memory ordering ... I think languages, compilers, and programmers have got that under control now. It took a while. I don't think TSO scales well to 100+ cores .. or even 50. We'll really start to see this bite (or not) in the next five years.

Quote
Quote
- each instruction modifies at most one register.

That is pretty standard but how then do you handle integer multiplies and divides?  Break them up into two instructions?

Yes. The high part of multiplies and the remainder from divisions are very rarely needed. Better to define separate instructions for them. Higher-end processors can notice both are being calculated and combine them, if that's profitable.

Quote
Quote
- integer instructions read at most two registers. This is ultra-purist :-) A number of RISC ISAs break it in order to have e.g. a register plus (scaled) register addressing mode, or conditional select. But no more than three!

Internally it seems like this sort of thing and modifying more than one register should be broken up into separate micro-operations so that the register file has a lower number of read and write ports.  The alternative is having to decode more instructions which clog up the front end once an out-of-order implementation is desired.

Breaking complex instructions up into several microops for execution is a valid thing to do, especially on a lower end implementation. Intel obviously does this at large scale, but even ARM does it quite a lot, including in Aarch64.

Recognising adjacent simple operations and dynamically combining them into a more powerful single operation is also a valid thing to do. Modern x86 does this, for example, when there is a compare followed by a conditional branch. Future high end RISC-V CPUs are also expected to do this heavily.

The former puts a burden on to low end implementations. The latter keeps low end implementations as simple as possible, while putting a burden onto high end implementations, which can perhaps more easily afford it.

Quote
On the other hand, this means discarding the performance advantages of the FMA instruction.

I was explicitly talking about *integer* instructions. Most floating point code uses FMA very heavily (if -ffast-math is enabled) and IEEE 754-2008 mandates it. It's a big performance advantage to have three read ports on the FP register file, worth the expense. It would be wasted most of the time on the integer register file.

Quote
Quote
- no microcode or hardware sequencing. Each instruction executes in a small and fixed number of clock cycles (usually one). Load/Store multiple are the main offenders in both ARM and PowerPC. They help with code size, but it's interesting that ARM didn't put them in Aarch64 and is deprecating them in 32 bit as well, providing the much less offensive load/store pair.

Maybe more interesting is why ARM even included them in the first place.  Load and store multiple took advantage of fast-page-mode DRAM access when ARMs instruction pipeline was closely linked with DRAM access.

Yes. Modern caches achieve the same effect with individual stores.

Load/store multiple do make programs significantly smaller, especially in function prologue and epilogue.

RISC-V gcc has an option "-msave-restore" (this might get included in -Os later) that calls one of several special library routines as the first instruction in functions to create the stack frame and save the return address plus s0-s2, or ra&s0-s6 or ra&s0-s10 or ra&s0-s11. Function return is then done by tail-calling the corresponding return function.

This replaces anything up to 29 instructions (116 bytes with rv32i or rv64i, 58 bytes with the C extension) with two instructions (8 bytes with rv32i or rv64i, 6 bytes with the C extension though I have a plan to reduce that to 4). Of course the average case is a lot less than that -- most functions that create a stack frame at all use three or fewer callee-save registers, so you're usually replacing 5/7/9/11 by 2 instructions.

The time cost is three extra jump instructions, plus sometimes a couple of unneeded loads and stores because not every size is provided in order to keep the library size down.

In the current version, the total size of the save/restore library functions is 96 bytes with RVC enabled.

I made a couple of tests on a HiFive1 board with gcc 7.2.0.

Dhrystone w/rv32im, -msave-restore made the program 4 bytes smaller, and 5% slower (1.66 vs 1.58).

CoreMark w/rv32imac, -msave-restore made the program 252 bytes (0.35%) smaller, and 0.4% FASTER (2.676 vs 2.687, with +/- 0.001 variance on different runs).

I attribute the faster speed for CoreMark to greater effectiveness of the 16 KB instruction cache.

Both of these are very small programs of course (especially Dhrystone). Bigger programs will show more difference.

Quote
Should stack instructions be broken up as well?

Yes. This is done almost universally in RISCV ISAs. A function starts with decrementing the stack pointer (once) and ends with incrementing it. Each register (or sometimes register pair) is saved and restored with an individual instruction -- which on a superscalar processor might run multiple in each clock cycle.

The x86's push/pop instructions are very hard on OoO machinery, and recent ones have a special push/pop execution (keeping track of the stack pointer) IN THE DECODER, translating each push or pop in a series to a store or load at an offset from the stack pointer as it was at the start of the sequence.

Quote
Quote
What a huge number of instructions *does* do is make very small low end implementations impossible. And puts a big burden of work on every hardware and every emulator implementer.

I do not know about that.  Multiple physical designs covering a wide performance range are possible with the same ISA.  Microcode is convenient to handle seldom used instructions.  Vector operations can be broken up into instructions which fit the micro-architecture's ALU width while allowing support for the same vector instruction set across a wide range of implementations.

Or you can use instruction set extensions every time you want to support a different vector length.  How many FPU ISAs has ARM gone through now?

Seldom-used operations are handled just as easily by library routines as by microcode -- especially ones which are inherently slow. This is commonly done in every CPU family for floating point operations. Sometimes the compiler generates a binary with instructions replaced by library calls. Sometimes the instruction is generated but on execution traps into the operating system which then calls the appropriate library function.

Microcode made sense when ROM within the processor was faster than RAM, but since about 1980 SRAM has actually been faster than ROM. You could copy the microcode into SRAM at boot -- and even allow the user to write some custom microcode, as the VAX did. Or you can use that SRAM as an icache and create instruction sets that don't need microcode. Then real functions are just as fast to call as microcode ones.

When Dave Patterson (who invented the terms "RISC" and later "RAID" and is currently involved with RISC-V and Google's TPU) was on sabbatical at DEC he discovered that even on the VAX, using a series of simple instructions was faster than using the complex microcoded instructions such as CALLS and RET (which automatically saved and restored registers for you, according to a bitmask at the start of the function).

John Cocke at IBM independently discovered the same fact about the IBM 370 at about the same time (in fact a couple of years earlier).

I agree with you about vectors. ARM has gone through several vector/multimedia instruction sets, and Intel is up to at least number 4 (depending on what you count the different iterations of SSE as).

The RISC-V vector instruction set which is being finalised at the moment (I'm on the Working Group) -- and which has been in development and testing in various iterations for at least ten years (it's the main reason the RISC-V project was started in the first place) -- is vector length agnostic. The same code will run on hardware with any vector length.

ARM's new SVE is of course similar.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #52 on: December 12, 2018, 05:56:53 am »
You probably can re-code it on MIPS one-to-one, except for LUI (if not followed by XORI or ADDI), which would require an extra instruction - very simple hardware emulator :)

Sure, it's very very similar to MIPS.

MIPS has 16 bits for both LUI and for XORI/ADDI (and all other immediates and offsets), while RISC-V has 20 bits for LUI/AUIPC/JAL and 12 bits for everything else. This is the major thing that buys a lot of spare instruction encoding space in RISC-V compared to MIPS.

Quote
Why every instruction has "11" at the end? This way it only uses 1/4 of the code point space.

Aaaand .. that's how a little of the extra instruction encoding space is spent :-)

Instructions with 11 in the LSBs are 32 bit instructions (30 bits available to be used).
Instructions with 00/01/10 in the LSBs are 16 bit instructions (49152 (48k) encodings available)

Instructions with 11111 in the LSBs are reserved for instructions longer than 32 bits. So actually there are only 939,524,096 possible 32 bit opcodes not 1,073,741,824.


You can compare this to Thumb2 where instructions with 111 in the MSBs of the first 16 bit packet are 32 bit instructions and all others are 16 bit instructions.
 

Offline obiwanjacobi

  • Frequent Contributor
  • **
  • Posts: 915
  • Country: nl
  • What's this yippee-yayoh pin you talk about!?
    • Marctronix Blog
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #53 on: December 12, 2018, 06:07:35 am »
I don't get why you would talk about Risc-V asm and then spend 6 episodes trying to take the beginners by the hand.
I would rather see video's that assume a certain level - close to the material at hand- and provide resources where to get up to speed.

Other than that it was a pretty nice series.

BTW, is debugging on PlatformIO free these days? Last I looked at it you needed a pro (paid) account...
Arduino Template Library | Zalt Z80 Computer
Wrong code should not compile!
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #54 on: December 12, 2018, 07:21:43 am »
It would be an interesting experiment to do to implement this. And this is EXACTLY what RISC-V enables you to do for low cost in time and money. Modify your favourite FPGA implementation to have your new instructions, modify gcc or llvm to generate them, and run dhrystone/coremark/SPEC/your favourite benchmark suite without and without using the new instructions. Publish the results with execution time, energy use, area cost, and any effect on MHz. We all learn something!

It would be too big of a change to RISC-V.  It alters the basic ISA and architecture and then a new code generator would be required anyway.  It goes against the design principles of RISC-V.

I disagree. Your exact suggestion is a little outside the scope of what SiFive is at present allowing for automated generation of CPU cores with custom instructions specified by customers: those will at least at first be restricted to "A = B op C" where op is specified by customer-supplied HDL. However, if you take source code to an existing core it would be trivial to enlarge each registers and data bus by 1 bit, and add new branch instructions based on that bit.

However I'd suggest another plan. Just create a new instruction "SETV a,b,c" that sets a to 1 if b+c has a signed overflow and to 0 otherwise. And/or create an instruction "BVS b,c,label" that branches to label if b+c overflows. And BVC if you want. Or TRAPV b,c, or ADDV a,b,c (that traps if b+c overflows). It's up to you. All of those would fit into existing instruction formats and pipelines no problem at all. Well .. the trapping ones would take a little more work. But they're all easier than your original suggestion.

I note that MIPS "ADD" and "ADDI" instructions trap on overflow, but virtually no one ever using them, using the "Unsigned add" instruction even for signed values, and they are now deprecated.

Quote
Quote
Again, worth trying, though context switches are very rare on normal systems.

But subroutine calls are not.

But the vast majority of subroutine calls save and restore very few registers.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #55 on: December 12, 2018, 07:32:29 am »
BTW, is debugging on PlatformIO free these days? Last I looked at it you needed a pro (paid) account...

Yep, $10/month for pro, with a 30 day free trial.

I guess that's my biggest problem with the series. It could have used SiFive's free Eclipse-based "Freedom Studio", which of course works with the SiFive board.

The VS Code plus PlatformIO setup works very nicely, but .... yeah.
 
The following users thanked this post: obiwanjacobi

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 9902
  • Country: us
  • DavidH
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #56 on: December 12, 2018, 03:11:59 pm »
I disagree. Your exact suggestion is a little outside the scope of what SiFive is at present allowing for automated generation of CPU cores with custom instructions specified by customers: those will at least at first be restricted to "A = B op C" where op is specified by customer-supplied HDL. However, if you take source code to an existing core it would be trivial to enlarge each registers and data bus by 1 bit, and add new branch instructions based on that bit.

It is more than that because the flags are stored in a separate independently accessed register file so no additional ports need to be added to the regular register file.  This does not matter so much in low performance implementations but it is a performance limiting problem with superscalar designs.

Quote
However I'd suggest another plan. Just create a new instruction "SETV a,b,c" that sets a to 1 if b+c has a signed overflow and to 0 otherwise. And/or create an instruction "BVS b,c,label" that branches to label if b+c overflows. And BVC if you want. Or TRAPV b,c, or ADDV a,b,c (that traps if b+c overflows). It's up to you. All of those would fit into existing instruction formats and pipelines no problem at all. Well .. the trapping ones would take a little more work. But they're all easier than your original suggestion.

This gets back to the reason to store the flags in the first place.  It avoids having to execute the same ALU operation again which is a waste of power.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #57 on: December 12, 2018, 06:00:15 pm »
Sure. It's possible to go overboard. Four registers doesn't seem to be enough. 128 is too many. Sixteen seems to be a good number if you have complex addressing modes (big immediates and offsets, register plus scaled register), and thirty two if you don't. Eight isn't really enough, even with complex addressing modes ...

So it seems. i86 in 32-bit mode has 8 registers, x64 has 16. Should be a huge improvement, right? But in practice, there's none. If you compile the same program for 32-bt and for 64-bit and run it on the same computer, there's no increase in speed whatsoever.
« Last Edit: December 12, 2018, 06:04:50 pm by NorthGuy »
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #58 on: December 12, 2018, 08:25:08 pm »
Video #1 has been replaced today. I don't know what changed. I wonder if Western Digital will update the videos that show buggy code (at least with a text "oops.." overlay as they already did for a few things).

I watched the new version and no changes jumped out at me.  It is just an introduction with very little technical content other than describing the tools and 3 books.  Still, it a pretty good introduction!

 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #59 on: December 12, 2018, 08:26:22 pm »
Sure. It's possible to go overboard. Four registers doesn't seem to be enough. 128 is too many. Sixteen seems to be a good number if you have complex addressing modes (big immediates and offsets, register plus scaled register), and thirty two if you don't. Eight isn't really enough, even with complex addressing modes ...

So it seems. i86 in 32-bit mode has 8 registers, x64 has 16. Should be a huge improvement, right? But in practice, there's none. If you compile the same program for 32-bt and for 64-bit and run it on the same computer, there's no increase in speed whatsoever.

Is this because the code generator doesn't even bother with the extra registers?
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #60 on: December 12, 2018, 09:20:24 pm »
Is this because the code generator doesn't even bother with the extra registers?

I think it does. It has to because the calling conventions are different.

It would be a good experiment to run benchmarks on RISC-V with different number of registers and see how the performance depends on the number of registers. Either GCC or other compilers may allow this.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #61 on: December 13, 2018, 01:43:15 am »
Quote
However I'd suggest another plan. Just create a new instruction "SETV a,b,c" that sets a to 1 if b+c has a signed overflow and to 0 otherwise. And/or create an instruction "BVS b,c,label" that branches to label if b+c overflows. And BVC if you want. Or TRAPV b,c, or ADDV a,b,c (that traps if b+c overflows). It's up to you. All of those would fit into existing instruction formats and pipelines no problem at all. Well .. the trapping ones would take a little more work. But they're all easier than your original suggestion.

This gets back to the reason to store the flags in the first place.  It avoids having to execute the same ALU operation again which is a waste of power.

Where and when are you going to use this new oVerflow flag, and how often?

99.9999% of when it gets used on an x86 or ARM is because you don't have a direct "branch to label if A < B, signed" like on RISC-V but only a "CMP" instruction which does a subtraction and sets the flags but throws away the result. And then the conditional branch needs to reconstruct what happened -- and only the conditional branch knows whether you wanted a signed or unsigned comparison.

Given that you have "branch if A < B, signed", writing an overflow flag that's never going to be looked at from a million instructions will be a bigger waste of energy than computing an addition twice instead of once, one instruction in a million.

What is your use-case where this overflow flag is so critical?
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 9902
  • Country: us
  • DavidH
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #62 on: December 13, 2018, 01:51:02 am »
Where and when are you going to use this new oVerflow flag, and how often?

99.9999% of when it gets used on an x86 or ARM is because you don't have a direct "branch to label if A < B, signed" like on RISC-V but only a "CMP" instruction which does a subtraction and sets the flags but throws away the result. And then the conditional branch needs to reconstruct what happened -- and only the conditional branch knows whether you wanted a signed or unsigned comparison.

Given that you have "branch if A < B, signed", writing an overflow flag that's never going to be looked at from a million instructions will be a bigger waste of energy than computing an addition twice instead of once, one instruction in a million.

What is your use-case where this overflow flag is so critical?

Multiword math.
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #63 on: December 13, 2018, 02:06:49 am »
Quote
The simple fact that since 1980 there has been no (successful) new CISC instruction set
What about the MSP430?

So ... what prompts the development of a new RISC instruction set, anyway?
You'd think that by the time things were "reduced" enough, there wouldn't be all that much room for innovation or improvement.   Do you learn from mistakes in other vendors' instruction sets (I've got to say that the more I look at it, the more unpleasant I find the ARM v6m instruction set. (CM0: thumb-16 only))  Do advances in hardware (what's "standard" in an FPGA, for instance, or the growing popularity of QSPI memory)or SW issues (security) drive things?
I guess RISC-V is somewhat motivated by wanting to provide an open-source instruction set.  Bur is that all?

 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #64 on: December 13, 2018, 02:20:11 am »
So it seems. i86 in 32-bit mode has 8 registers, x64 has 16. Should be a huge improvement, right? But in practice, there's none. If you compile the same program for 32-bt and for 64-bit and run it on the same computer, there's no increase in speed whatsoever.

There are several differences between 32 bit code and 64 bit code, all of which will have an effect:

- 16 registers instead of 8. It does in fact make it faster for non-trivial programs :-)

- pointers are 64 bit instead of 32 bit. This makes data structures bigger and programs slower. Except when your program won't fit into 4 GB (or 3 GB or whatever) and so won't run at *all*.

- calling convention is to use registers for arguments, not stack. Does in fact make the program faster.

- arithmetic is 64 bit instead of 32 bit. Has no effect if your program only uses 32 bit variables, a significant effect if you use 64 bit variables.

It's pretty hard to separate out and test these factors individually. The Linux kernel and gcc support the `-mx32` flag that uses 64 bit registers, 16 registers, and arguments in registers, but uses only 32 bit pointers. It runs faster than standard 64 bit code, and a LOT faster than i686.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #65 on: December 13, 2018, 02:25:07 am »
Is this because the code generator doesn't even bother with the extra registers?

I think it does. It has to because the calling conventions are different.

It would be a good experiment to run benchmarks on RISC-V with different number of registers and see how the performance depends on the number of registers. Either GCC or other compilers may allow this.

They do. "rv32e" is a standard alternative for very small embedded systems that is the same as "rv32i" but only had 16 registers. gcc supports it.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #66 on: December 13, 2018, 02:27:48 am »
Where and when are you going to use this new oVerflow flag, and how often?

99.9999% of when it gets used on an x86 or ARM is because you don't have a direct "branch to label if A < B, signed" like on RISC-V but only a "CMP" instruction which does a subtraction and sets the flags but throws away the result. And then the conditional branch needs to reconstruct what happened -- and only the conditional branch knows whether you wanted a signed or unsigned comparison.

Given that you have "branch if A < B, signed", writing an overflow flag that's never going to be looked at from a million instructions will be a bigger waste of energy than computing an addition twice instead of once, one instruction in a million.

What is your use-case where this overflow flag is so critical?

Multiword math.

Multiword math uses carry, not overflow. Carry can be detected by simply testing if the result is less than one of the arguments (it doesn't matter which one), and either branch on that (using BLTU) or set a register to the value of the carry (using SLTU). Pick your poison.

It's also extremely rare and not a performance influencer outside of specialised domains.
« Last Edit: December 13, 2018, 02:32:20 am by brucehoult »
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #67 on: December 13, 2018, 02:52:21 am »
IP Checksum can make use of a carry flag (a one's complement add does an end-around carry, and you can fold that into a series of "add with carry" on the widest word your CPU implements, to achieve a improvement over the methods that are (now) "traditionally" used in C.
(eg twice the speed and half the size on AVR: https://github.com/WestfW/Duino-hacks/blob/master/ipchecksum_test/ipchecksum_test.ino )
I'm not sure I'd call that "common" enough justify a carry flag bit, except that ... today's CPUs tend to do an awful lot of IP checksumming!
(also: limited by memory speed fetching the data to-be-checksummed, so probably irrelevant on faster RISC chips.  It takes a little getting used to when a valid answer to "but it takes more instructions" is "so what?")
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #68 on: December 13, 2018, 03:49:33 am »
(also: limited by memory speed fetching the data to-be-checksummed, so probably irrelevant on faster RISC chips.  It takes a little getting used to when a valid answer to "but it takes more instructions" is "so what?")

Precisely.

While you were posting that, I did a little experiment and whipped up a quick and dirty multi-word add in C.

Code: [Select]
typedef unsigned int uint;

void bignumAdd(uint *res, uint *a, uint *b, int len){
    uint carry = 0;
    for (int i=0; i<len; ++i){
        uint t = a[i] + b[i];
        uint carryOut = t < a[i];
        t += carry;
        res[i] = t;
        carry = (t<carry) | carryOut;
    }
}

Here's the code for 32 bit RISC-V with gcc -O2

Code: [Select]
00000000 <bignumAdd>:
   0:   02d05763                blez    a3,2e <.L1>
   4:   068a                    slli    a3,a3,0x2
   6:   00d588b3                add     a7,a1,a3
   a:   4781                    li      a5,0

0000000c <.L3>:
   c:   4198                    lw      a4,0(a1)
   e:   4214                    lw      a3,0(a2)
  10:   0591                    addi    a1,a1,4
  12:   0611                    addi    a2,a2,4
  14:   96ba                    add     a3,a3,a4
  16:   00f68833                add     a6,a3,a5
  1a:   01052023                sw      a6,0(a0)
  1e:   00e6b733                sltu    a4,a3,a4
  22:   00f837b3                sltu    a5,a6,a5
  26:   8fd9                    or      a5,a5,a4
  28:   0511                    addi    a0,a0,4
  2a:   feb891e3                bne     a7,a1,c <.L3>

0000002e <.L1>:
  2e:   8082                    ret

Here's for Thumb2:

Code: [Select]
00000000 <bignumAdd>:
   0:   2b00            cmp     r3, #0
   2:   dd19            ble.n   38 <bignumAdd+0x38>
   4:   3a04            subs    r2, #4
   6:   eb01 0383       add.w   r3, r1, r3, lsl #2
   a:   3804            subs    r0, #4
   c:   b4f0            push    {r4, r5, r6, r7}
   e:   2500            movs    r5, #0

  10:   f851 4b04       ldr.w   r4, [r1], #4
  14:   2700            movs    r7, #0
  16:   f852 6f04       ldr.w   r6, [r2, #4]!
  1a:   19a4            adds    r4, r4, r6
  1c:   bf28            it      cs
  1e:   2701            movcs   r7, #1
  20:   1964            adds    r4, r4, r5
  22:   f840 4f04       str.w   r4, [r0, #4]!
  26:   bf2c            ite     cs
  28:   2401            movcs   r4, #1
  2a:   2400            movcc   r4, #0
  2c:   428b            cmp     r3, r1
  2e:   ea47 0504       orr.w   r5, r7, r4
  32:   d1ed            bne.n   10 <bignumAdd+0x10>

  34:   bcf0            pop     {r4, r5, r6, r7}
  36:   4770            bx      lr
  38:   4770            bx      lr
  3a:   bf00            nop

48 bytes for RISC-V, with 12 instructions in the loop.
58 bytes for ARM, with 14 instructions in the loop. (I'm not counting the nop for alignment)

It seems pretty obvious you could improve the ARM one by hand coding in assembly language, but few people are going to do that -- they just use gcc and take what they get.

The RISC-V one can't be improved by hand coding assembly language. I like that. Who doesn't want to get the best results possible just by coding in C?

By the way, I swear I wrote the C code at the start, compiled it for both, and did not adjust the C in any way.

Let's try 64 bit ARM!

Code: [Select]
0000000000000000 <bignumAdd>:
   0:   7100007f        cmp     w3, #0x0
   4:   5400020d        b.le    44 <bignumAdd+0x44>
   8:   d2800004        mov     x4, #0x0                        // #0
   c:   52800005        mov     w5, #0x0                        // #0

  10:   b8647828        ldr     w8, [x1, x4, lsl #2]
  14:   b8647846        ldr     w6, [x2, x4, lsl #2]
  18:   0b060106        add     w6, w8, w6
  1c:   0b0500c7        add     w7, w6, w5
  20:   6b06011f        cmp     w8, w6
  24:   1a9f97e6        cset    w6, hi  // hi = pmore
  28:   b8247807        str     w7, [x0, x4, lsl #2]
  2c:   6b0500ff        cmp     w7, w5
  30:   91000484        add     x4, x4, #0x1
  34:   1a9f27e5        cset    w5, cc  // cc = lo, ul, last
  38:   6b04007f        cmp     w3, w4
  3c:   2a0500c5        orr     w5, w6, w5
  40:   54fffe8c        b.gt    10 <bignumAdd+0x10>

  44:   d65f03c0        ret

So that's 72 bytes of code (by far the biggest) and 13 instructions in the loop (one more than RISC-V, one less than Thumb2).

Maybe x86_64?

Code: [Select]
0000000000000000 <bignumAdd>:
   0:   85 c9                   test   %ecx,%ecx
   2:   7e 39                   jle    3d <bignumAdd+0x3d>
   4:   8d 41 ff                lea    -0x1(%rcx),%eax
   7:   45 31 c0                xor    %r8d,%r8d
   a:   31 c9                   xor    %ecx,%ecx
   c:   4c 8d 14 85 04 00 00    lea    0x4(,%rax,4),%r10
  13:   00
  14:   0f 1f 40 00             nopl   0x0(%rax)

  18:   45 31 c9                xor    %r9d,%r9d
  1b:   8b 04 0a                mov    (%rdx,%rcx,1),%eax
  1e:   03 04 0e                add    (%rsi,%rcx,1),%eax
  21:   72 1c                   jb     3f <bignumAdd+0x3f>
  23:   44 01 c0                add    %r8d,%eax
  26:   41 0f 92 c0             setb   %r8b
  2a:   89 04 0f                mov    %eax,(%rdi,%rcx,1)
  2d:   48 83 c1 04             add    $0x4,%rcx
  31:   45 0f b6 c0             movzbl %r8b,%r8d
  35:   45 09 c8                or     %r9d,%r8d
  38:   49 39 ca                cmp    %rcx,%r10
  3b:   75 db                   jne    18 <bignumAdd+0x18>

  3d:   f3 c3                   repz retq

  3f:   41 b9 01 00 00 00       mov    $0x1,%r9d
  45:   eb dc                   jmp    23 <bignumAdd+0x23>

71 bytes of code, 12 instructions in the loop but 2 more outside (the compiler introduces conditional branches where there were none).

How about 32 bit x86?

Code: [Select]
00000000 <bignumAdd>:
   0:   55                      push   %ebp
   1:   57                      push   %edi
   2:   56                      push   %esi
   3:   53                      push   %ebx
   4:   8b 44 24 20             mov    0x20(%esp),%eax
   8:   8b 7c 24 14             mov    0x14(%esp),%edi
   c:   8b 5c 24 18             mov    0x18(%esp),%ebx
  10:   8b 6c 24 1c             mov    0x1c(%esp),%ebp
  14:   85 c0                   test   %eax,%eax
  16:   7e 29                   jle    41 <bignumAdd+0x41>
  18:   31 c9                   xor    %ecx,%ecx
  1a:   31 d2                   xor    %edx,%edx
  1c:   8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi

  20:   31 f6                   xor    %esi,%esi
  22:   8b 44 8d 00             mov    0x0(%ebp,%ecx,4),%eax
  26:   03 04 8b                add    (%ebx,%ecx,4),%eax
  29:   72 1b                   jb     46 <bignumAdd+0x46>
  2b:   01 d0                   add    %edx,%eax
  2d:   0f 92 c2                setb   %dl
  30:   89 04 8f                mov    %eax,(%edi,%ecx,4)
  33:   83 c1 01                add    $0x1,%ecx
  36:   0f b6 d2                movzbl %dl,%edx
  39:   09 f2                   or     %esi,%edx
  3b:   39 4c 24 20             cmp    %ecx,0x20(%esp)
  3f:   75 df                   jne    20 <bignumAdd+0x20>

  41:   5b                      pop    %ebx
  42:   5e                      pop    %esi
  43:   5f                      pop    %edi
  44:   5d                      pop    %ebp
  45:   c3                      ret   
  46:   be 01 00 00 00          mov    $0x1,%esi
  4b:   eb de                   jmp    2b <bignumAdd+0x2b>

77 bytes of code, and the same 12-14 instructions in the loop.

Motorola 68000?

Code: [Select]
00000000 <bignumAdd>:
   0:   4e56 0000       linkw %fp,#0
   4:   48e7 3030       moveml %d2-%d3/%a2-%a3,%sp@-
   8:   262e 0014       movel %fp@(20),%d3
   c:   6f36            bles 44 <bignumAdd+0x44>
   e:   206e 000c       moveal %fp@(12),%a0
  12:   266e 0010       moveal %fp@(16),%a3
  16:   246e 0008       moveal %fp@(8),%a2
  1a:   e58b            lsll #2,%d3
  1c:   d688            addl %a0,%d3
  1e:   4281            clrl %d1

  20:   2418            movel %a0@+,%d2
  22:   2002            movel %d2,%d0
  24:   d09b            addl %a3@+,%d0
  26:   2240            moveal %d0,%a1
  28:   d3c1            addal %d1,%a1
  2a:   24c9            movel %a1,%a2@+
  2c:   b082            cmpl %d2,%d0
  2e:   55c0            scs %d0
  30:   4400            negb %d0
  32:   b289            cmpl %a1,%d1
  34:   52c1            shi %d1
  36:   4401            negb %d1
  38:   8200            orb %d0,%d1
  3a:   0281 0000 00ff  andil #255,%d1
  40:   b1c3            cmpal %d3,%a0
  42:   66dc            bnes 20 <bignumAdd+0x20>

  44:   4cdf 0c0c       moveml %sp@+,%d2-%d3/%a2-%a3
  48:   4e5e            unlk %fp
  4a:   4e75            rts

76 bytes of code, 16 instruction in the loop.

msp430?

Code: [Select]
00000000 <bignumAdd>:
   0:   0b 12           push    r11             
   2:   0a 12           push    r10             
   4:   09 12           push    r9             
   6:   08 12           push    r8             
   8:   07 12           push    r7             
   a:   06 12           push    r6             
   c:   1c 93           cmp     #1,     r12     ;r3 As==01
   e:   1f 38           jl      $+64            ;abs 0x4e
  10:   29 4e           mov     @r14,   r9     
  12:   28 4d           mov     @r13,   r8     
  14:   0c 5c           rla     r12             
  16:   0b 43           clr     r11             
  18:   0a 43           clr     r10             
  1a:   06 3c           jmp     $+14            ;abs 0x28

  1c:   09 4e           mov     r14,    r9     
  1e:   09 5b           add     r11,    r9     
  20:   29 49           mov     @r9,    r9     
  22:   08 4d           mov     r13,    r8     
  24:   08 5b           add     r11,    r8     
  26:   28 48           mov     @r8,    r8     
  28:   08 59           add     r9,     r8     
  2a:   07 48           mov     r8,     r7     
  2c:   07 5a           add     r10,    r7     
  2e:   06 4f           mov     r15,    r6     
  30:   06 5b           add     r11,    r6     
  32:   86 47 00 00     mov     r7,     0(r6)   ;0x0000(r6)
  36:   16 43           mov     #1,     r6      ;r3 As==01
  38:   07 9a           cmp     r10,    r7     
  3a:   01 28           jnc     $+4             ;abs 0x3e
  3c:   06 43           clr     r6             
  3e:   1a 43           mov     #1,     r10     ;r3 As==01
  40:   08 99           cmp     r9,     r8     
  42:   01 28           jnc     $+4             ;abs 0x46
  44:   0a 43           clr     r10             
  46:   0a d6           bis     r6,     r10     
  48:   2b 53           incd    r11             
  4a:   0b 9c           cmp     r12,    r11     
  4c:   e7 23           jnz     $-48            ;abs 0x1c

  4e:   36 41           pop     r6             
  50:   37 41           pop     r7             
  52:   38 41           pop     r8             
  54:   39 41           pop     r9             
  56:   3a 41           pop     r10             
  58:   3b 41           pop     r11             instructions in the loop. (I'm not counting the nop for alignment)
  5a:   30 41           ret                     

92 bytes of code, and 24 instructions in the loop.

sh4:

Code: [Select]
00000000 <bignumAdd>:
   0:   15 47           cmp/pl  r7
   2:   13 8b           bf      2c <bignumAdd+0x2c>
   4:   73 63           mov     r7,r3
   6:   08 43           shll2   r3
   8:   fc 73           add     #-4,r3
   a:   09 43           shlr2   r3
   c:   00 e1           mov     #0,r1
   e:   01 73           add     #1,r3

  10:   56 60           mov.l   @r5+,r0
  12:   66 62           mov.l   @r6+,r2
  14:   0c 32           add     r0,r2
  16:   23 67           mov     r2,r7
  18:   26 30           cmp/hi  r2,r0
  1a:   1c 37           add     r1,r7
  1c:   29 02           movt    r2
  1e:   76 31           cmp/hi  r7,r1
  20:   29 01           movt    r1
  22:   72 24           mov.l   r7,@r4
  24:   10 43           dt      r3
  26:   04 74           add     #4,r4
  28:   f2 8f           bf.s    10 <bignumAdd+0x10>

  2a:   2b 21           or      r2,r1
  2c:   0b 00           rts     
  2e:   09 00           nop     

46 bytes of code and 13 instructions in the loop. I like it!

To summarise .. bytes of code and instructions in the loop for all

48 12 RISC-V
58 14 Thumb2
72 13 Aarch64
71 14 x86_64
77 14 i386
76 16 m68k
92 24 msp430
46 13 sh4

sh4 and RISC-V the clear winners, Thumb2 not too far behind.

x86_64, aarch64, m68k, i386 in a close bunch

msp430 bringing up the rear by some distance.

Once again, this is just one arbitrary silly example, made (I think) somewhat fair by using gcc for everything, as most people do. You could improve a lot of these by handing coding, at least for code size -- maybe or maybe not for execution speed.

I *like* a processor where you can do everything in C.

Anyone want to show off their hand coding?
« Last Edit: December 13, 2018, 03:52:45 am by brucehoult »
 
The following users thanked this post: oPossum

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #69 on: December 13, 2018, 04:38:44 am »
I *like* a processor where you can do everything in C.
So do I. How do I do startup code and interrupt handlers in C with RISC-V?
Alex
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #70 on: December 13, 2018, 04:50:06 am »
Quote
The simple fact that since 1980 there has been no (successful) new CISC instruction set

What about the MSP430?

Yes, it's a little bit CISCy with memory-to-memory moves and adds. It falls into the PDP11-design M68000 space.

Based on the example I just tried, the code isn't very compact! At least as generated from C by gcc.

Quote
So ... what prompts the development of a new RISC instruction set, anyway?
You'd think that by the time things were "reduced" enough, there wouldn't be all that much room for innovation or improvement.   Do you learn from mistakes in other vendors' instruction sets (I've got to say that the more I look at it, the more unpleasant I find the ARM v6m instruction set. (CM0: thumb-16 only))  Do advances in hardware (what's "standard" in an FPGA, for instance, or the growing popularity of QSPI memory)or SW issues (security) drive things?
I guess RISC-V is somewhat motivated by wanting to provide an open-source instruction set.  Bur is that all?

RISC-V was motivated originally by wanting something as simple as possible (while still being effective) for Berkeley students to use to:

- learn assembly language programming (undergrad)
- design and implement their own processor core (masters?)
- design and implement experimental instruction set extensions (PhD)

Their previous experience was mostly with MIPS, but:

- only 32 bit could be used freely
- it's got annoying warts
- there is very little spare opcode space to put experimental extensions in .. and no easy way to go variable-length and have longer instructions.

ARM and x86 were considered as bases, but:

- x86 was far too complex for students to implement, and legally completely impossible to get permission to use, especially if you wanted to distribute results of your work.

- ARM also very complex, 32 bit only (at that time), mostly impossible to get permission to use, and again then not publishable.

OpenRISC and LEON were also considered, but again the things you could use were 32-bit only, and also lacked space opcode space.

I've looked closely at OpenRISC and it's very nicely done, as an instruction set pitched at one point in time, one fixed set of capabilities.

RISC-V is intended as a simple base that is sufficient for standard software (Linux or similar kernel and applications (Windows or MacOS/iOS would be fine too), compiled from C/C++ and similar languages) and that base is fixed and software built for it will work forever. But there is room and mechanism to add any number of future extensions targeting as yet unthought of application areas.

Will that be successful or not, I don't know, but anyway it's trying :-)

At the very least, it's not going to disappear without trace when the company that owns it goes out of business or loses interest. That's a serious problem. How much software has been lost as a result of the demise of PDP11, VAX, Alpha, Nova, Eclipse, PA-RISC, Itanium (Kittson is the last generation ever), SPARC (Oracle has cancelled future development, though Fujitsu is soldiering on for the moment).

Linux and the GNU toolchain and environment help to keep software alive over changes in instruction set. RISC-V means it may not be necessary to have another instruction set in future, unless and until computers change fundamentally from the way they've been since at least the 1960s. The Linux kernel maintainers even speculated that RISC-V and C-SKY might be the last hardware ISAs ever added to Linux, as everyone not using ARM/x86/POWER is switching to RISC-V for future chips -- including C-SKY and Andes (nds32).

https://www.phoronix.com/scan.php?page=news_item&px=C-SKY-Approved-Last-Arch
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #71 on: December 13, 2018, 05:07:20 am »
I *like* a processor where you can do everything in C.
So do I. How do I do startup code and interrupt handlers in C with RISC-V?

All algorithms found in user applications, I mean.

Operating systems and compiler runtime libraries are always going to have a little bit of assembler in them -- at least as long as there exist CSRs that are not memory-mapped.

However that's out of scope for both the C language and the RISC-V Unprivileged ISA (which is the thing that is portable and hopefully forever).

Interrupt handlers are no problem. You can either use hardware vectoring and add __attribute__((interrupt)) to your C function (in which case it will save all registers it uses), or else with the CLIC which is up for ratification as a standard for RISC-V you can use absolutely standard C functions along with a small firmware software vectoring function that can live in mask ROM (making it little different to microcode). When using the upcoming EABI instead of the Linux ABI the latency to get to a standard C function is very similar to that on ARM Cortex M.

Details at https://github.com/sifive/clic-spec/blob/master/clic.adoc#c-abi-trampoline-code

I notice your company's management have said they are 100% behind your subsidiary's move to put RISC-V into their FPGAs.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #72 on: December 13, 2018, 05:09:34 am »
Operating systems and compiler runtime libraries are always going to have a little bit of assembler in them -- at least as long as there exist CSRs that are not memory-mapped.
That's really a sign of a bad design in 2018.

I notice your company's management have said they are 100% behind your subsidiary's move to put RISC-V into their FPGAs.
So?

Current RISC-V implementations are heavily slanted towards MPUs. Which is fine, but has to be acknowledged. Trying to shove the same design in an MCU will not go over well.
« Last Edit: December 13, 2018, 05:11:34 am by ataradov »
Alex
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #73 on: December 13, 2018, 05:35:47 am »
Anyone want to show off their hand coding?

Why not. dsPIC33:

Code: [Select]
      dec w0,w0
      add #0,w0 ; clear carry
      do w0,loop
      mov [w1++],w4
loop: addc w4,[w2++],[w3++]

5 instructions, 15 bytes, n+3 instruction cycles (where n is the number of bytes)
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #74 on: December 13, 2018, 05:38:52 am »
They do. "rv32e" is a standard alternative for very small embedded systems that is the same as "rv32i" but only had 16 registers. gcc supports it.

Were there any performance comparisons between these two?
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #75 on: December 13, 2018, 06:26:35 am »
Anyone want to show off their hand coding?

Of course, Intel is not slauch neither.

32-bit:

Code: [Select]
      sub eax,eax

loop: mov ecx,[esi+eax]
      adc ecx,[ebx+eax]
      mov [edi+eax],ecx
      lea eax,[eax+4]
      loop loop

6 instructions, 16 bytes, probably below 1 cycle per byte.

64-bit:

Code: [Select]
      sub eax,eax

loop: mov rcx,[rsi+rax]
      adc rcx,[rbx+rax]
      mov [rdi+rax],rcx
      lea rax,[rax+8]
      loop loop

This requires 4 REX bytes, so it's 20 bytes total, but that's where the 64-bitness helps. It'll probably run at about 0.4 bytes per cycle.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #76 on: December 13, 2018, 07:00:45 am »
Operating systems and compiler runtime libraries are always going to have a little bit of assembler in them -- at least as long as there exist CSRs that are not memory-mapped.
That's really a sign of a bad design in 2018.

I disagree.

There are very good reasons that you don't want settings that control the deep modes of operation of a processor to be able to be altered by a store instruction that can have its address arbitrarily computed. Things such as enabling or disabling MMUs, or setting the base address for an interrupt vector, or changing the register width and instruction set between 32 bits and 64 bits. These should all be recognised as special instructions early in the pipeline, with the "name" of the affected register/setting hardcoded in the instruction, not something that is discovered many cycles later, and possibly only after waiting hundreds of cycles for a memory load or other operation. You really don't want later instructions to have been already been attempted to be executed out of order -- possibly with an entirely different instruction set or opcode encoding.

Maybe you can get away with that in a single issue in-order tiny microcontroller -- and people implementing those are *welcome* to memory map everything they want -- but in that case you're stepping outside what is standard and can be depended on across *all* RISC-V implementations.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #77 on: December 13, 2018, 07:07:46 am »
Anyone want to show off their hand coding?

Why not. dsPIC33:

Code: [Select]
      dec w0,w0
      add #0,w0 ; clear carry
      do w0,loop
      mov [w1++],w4
loop: addc w4,[w2++],[w3++]

5 instructions, 15 bytes, n+3 instruction cycles (where n is the number of bytes)

Very nice.

How does gcc do on it?

I'm actually very disappointed that manufacturers of machines with condition codes don't seem to have added recognition of the C idiom for the carry flag and generated "add with carry" from it. gcc on every machine does recognise idioms for things such as rotate and generate rotate instructions even though C doesn't have an operator for it. Or maybe I just didn't find the correct idiom? Can anyone assist with that?
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #78 on: December 13, 2018, 07:13:25 am »
Anyone want to show off their hand coding?

Of course, Intel is not slauch neither.

32-bit:

Code: [Select]
      sub eax,eax

loop: mov ecx,[esi+eax]
      adc ecx,[ebx+eax]
      mov [edi+eax],ecx
      lea eax,[eax+4]
      loop loop

6 instructions, 16 bytes, probably below 1 cycle per byte.

64-bit:

Code: [Select]
      sub eax,eax

loop: mov rcx,[rsi+rax]
      adc rcx,[rbx+rax]
      mov [rdi+rax],rcx
      lea rax,[rax+8]
      loop loop

This requires 4 REX bytes, so it's 20 bytes total, but that's where the 64-bitness helps. It'll probably run at about 0.4 bytes per cycle.

I'm afraid I don't understand how those work.

I thought "loop" decrements cx/ecx/rcx and loops if it's not zero. But you're using cx as a temporary to hold the result of the adc.

Also, where is the "ret", even if nothing else is needed?
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #79 on: December 13, 2018, 08:08:53 am »
MIPS doesn't have any Control and Status Register, and it's fine this way  :D
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #80 on: December 13, 2018, 08:22:23 am »
MIPS doesn't have any Control and Status Register, and it's fine this way  :D

Sure it does. They live in Coprocessor #0 and are accessed using special  MTC0 and MFC0 instructions.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #81 on: December 13, 2018, 08:24:11 am »
Multiword math.

What do you need exactly? 128bit math? and for what?
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #82 on: December 13, 2018, 08:25:27 am »
MIPS doesn't have any Control and Status Register, and it's fine this way  :D

Sure it does. They live in Coprocessor #0 and are accessed using special  MTC0 and MFC0 instructions.

exactly: the cop0 is not CPU, it's a Cop, thus it's "external" to the ISA  :D
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #83 on: December 13, 2018, 08:50:26 am »
MIPS doesn't have any Control and Status Register, and it's fine this way  :D

Sure it does. They live in Coprocessor #0 and are accessed using special  MTC0 and MFC0 instructions.

exactly: the cop0 is not CPU, it's a Cop, thus it's "external" to the ISA  :D

"Coprocessor 0 (also known as the CP0 or system control coprocessor) is a required coprocessor part of the MIPS32 and MIPS64 ISA which provides the facilities needed for an operating system."

It's exactly equivalent to the RISC-V CSRs and dedicated instructions distinct from memory load/store to move values to and from the CSRs.

And you're right .. this stuff exists outside the normal portable standardised "User" ISA. @ataradov appears to be unhappy that someone -- the operating system or at least runtime library writer -- should have to write a few lines of machine-dependent assembly language to set this stuff up. It only has to be done once per SoC (at most) and Joe Average applications programmer doesn't have to know it exists.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #84 on: December 13, 2018, 08:53:43 am »
@ataradov appears to be unhappy that someone -- the operating system or at least runtime library writer -- should have to write a few lines of machine-dependent assembly language to set this stuff up.
No, I'm unhappy with unnecessary proliferation of ways to access the hardware. This just makes makes things harder for no real benefit.
Alex
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #85 on: December 13, 2018, 09:00:48 am »
PA-RISC

on DTB we are still supporting Linux/HPPA-v2, everything else can R.I.P. and nobody (I can assure nobody) cares, except ... those who still have business with HPUX v11i2, but frankly it's just a matter of how much money they invested in this. I am specifically talking about avionics now.

If you paid 50K euro for a license (e.g. for VectorCast &C) and got a binary for HPPA, you still need to run it on HPPA until you find someone (in the circle of those who take decisions = managers) who is willing to pay for the new version. But it's just it, while in the meanwhile VectorCast&C have been ported to Intel-x86  :D

The same applies to software designed for SGI/MIPS.  I have a couple of friends who love using Autodesk software for IRIX. They do video editing, but ... they use this software simply because they want to play the retro-collector game. Autodesk-2008 is 10 years obsolete, and out of modern standards about video-editing.

Besides, a modern PC consumes less electricity than - say, an SGI Tezro or SGI Origin - and produces better results.

In short, there is no regret.  :D

We support HPPAv2 for two specific reasons: -1- the hardware is very cheap, and -2- the PCI on HPPA doesn't come with all the shit that IBM put in the BIOS in order to support ancestor video card. ISA Cards??? Yes, there is still support in modern BIOS and this makes everything a hell when you want to develop your own PCI_FPGA card.

My HP C3600 comes with a neat BIOS, there is no legacy shit, and it's cleaner than what you find on an Apple PowerMac, where the OpenFirmware a mess. Look at Linux source for the PCI.

About SGI MIPS we only support IP30: this machine runs Linux, and it's the only MIPS4 machine available since modern MIPS are all MIPS32 or MIPS64. Besides, IP30 comes with a crossbar matrix, and this is interesting.

There are no other good reasons to regret old RISC machines. Except, the manual (and I am saying THE manual) of 88K. (eight-eight key, by Motorola) which is super marvelous!  ;D
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #86 on: December 13, 2018, 09:12:34 am »
Quote from: brucehoult link=topic=156007.msg2036098#msg2036098ate=1544691026
"Coprocessor 0 (also known as the CP0 or system control coprocessor) is a required coprocessor part of the MIPS32 and MIPS64 ISA which provides the facilities needed for an operating system."

The Cop0 it's not covered by the ISA, it's * theoretically * optional. In fact, SPIM doesn't have it(1), neither MARS, and you can implement a MIPS-R2K without the Cop0, and it works.

Of course, SPIM, MARS and such a simplified CPU don't handle interrupts, thus they are useless for any practical application, except making students have a laboratory on software simulators

Yes, SPIM and MARS are widely used in universities. I used it a lot during my Erasmus in Oxford (2007-2008)

But it's not the point: the point is, let the CSR-stuff *OUT* of the ISA, so, do not implement any specific instructions like "Move CSR To/from CSR", let's have a Cop to handle it.

In m68k ... it was a mess when the 68010 redefined MOVE-CSR as a privileged instruction, while in 68000 was not privileged, and this caused a lot of troubles in Amiga Computers.



(1) can be compiled with/without Cop0 experimental support.
« Last Edit: December 13, 2018, 09:24:11 am by legacy »
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #87 on: December 13, 2018, 09:14:58 am »
There are no other good reasons to regret old RISC machines. Except, the manual (and I am saying THE manual) of 88K. (eight-eight key, by Motorola) which is super marvelous!  ;D

88K is a very nice ISA, and pleasingly obsessive about sticking to the 2-read-1-write integer instruction model. There are a couple of features in it that I'm gently trying to steal for future RISC-V extensions, mostly in the "Bit Manipulation" working group.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #88 on: December 13, 2018, 02:56:34 pm »
I'm afraid I don't understand how those work.

Good catch, we'll use DX instead of CX then.

32-bit:

Code: [Select]
      sub eax,eax

loop: mov edx,[esi+eax]
      adc edx,[ebx+eax]
      mov [edi+eax],edx
      lea eax,[eax+4]
      loop loop

64-bit:

Code: [Select]
      sub eax,eax

loop: mov rdx,[rsi+rax]
      adc rdx,[rbx+rax]
      mov [rdi+rax],rdx
      lea rax,[rax+8]
      loop loop

Surprsingly, it doesn't change the byte count.

Also, where is the "ret", even if nothing else is needed?

It's inlined. In assembler, you do not have to follow calling conventions :)

And if you really worry about byte count, you can add more CISC'iness and reduce byte count to 10, although it'll be slower.

Code: [Select]
      clc

loop: lodsd
      adc eax,[ebx]
      lea ebx,[ebx+4]
      stosd
      loop loop

 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #89 on: December 13, 2018, 03:15:23 pm »
Why not. dsPIC33:

Very nice.

How does gcc do on it?

As everywhere. Works hard but doesn't produce any magic.

C does very well on MIPS (and RISC-V too, of course), but this is not because C produces something magical, but because the instruction set is such that a human cannot improve much on what the C compiler have done.

I'm actually very disappointed that manufacturers of machines with condition codes don't seem to have added recognition of the C idiom for the carry flag and generated "add with carry" from it. gcc on every machine does recognise idioms for things such as rotate and generate rotate instructions even though C doesn't have an operator for it. Or maybe I just didn't find the correct idiom? Can anyone assist with that?

There many places like that, such as rotations. In such cases C is more difficult to code than assembler.

I look at C as a tool to convert text to assembler code. Sometimes it works well (for long expressions, for example). Sometimes it doesn't (as with long additions). But that's Ok. You can write few lines in assembler.

If you work with wood, you can do nice long cuts with a table saw, but there are small things which can only be done with a chisel. Should we re-design a table saw, so that it can do everything. Probably not. Same with C.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #90 on: December 13, 2018, 08:35:06 pm »
LOL - "risc-v-will-stop-hackers-dead-from-getting-into-your-computer" - said someone in this article on hackaday

 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #91 on: December 13, 2018, 09:00:11 pm »
LOL - "risc-v-will-stop-hackers-dead-from-getting-into-your-computer" - said someone in this article on hackaday

Up there with "Linux doesn't need antivirus".... :-)
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #92 on: December 13, 2018, 09:27:52 pm »
My little software RISC-V emulator seems to be alive! Has churned through 3,047 instructions of a HiFive 'blink' binary. Maybe a couple of evenings play to get it this far - you couldn't do that with x86... the actual RISC-V code is < 800 lines.

(If I don't point it out in advance, somebody will point me at https://github.com/adriancable/8086tiny, which has an 8086 in 760 lines, but not an 32-bit CPU)

I had to build dummy hardware for the "AON" (Always On) Peripheral, and the "PRCI" (used for clocking control) Peripheral, and it gets as far as attempting to configure the QSPI interface on address 0x10014000.

I am sure somebody is about to ask "but why when there are so many already?"... I'm doing it because it is easier to build an understanding and verify what should be happening in hardware, rather than writing and simulating HDL, and then learning that I really didn't understand what should be happening in the 1st place.

I want to get the emulator C code to the point it models the HDL pipeline, so I can verify against it that things are as expected. Maybe I could even use it for HLS  :-//
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: brucehoult

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #93 on: December 13, 2018, 09:33:59 pm »
You did in two days?  :o :o :o
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #94 on: December 13, 2018, 09:56:58 pm »
You did in two days?  :o :o :o
... of spare time between the boy going to bed, and me going to bed.

It's not much to look at:

A lookup table to decode opcode:
Code: [Select]
struct opcode_entry {
  char *spec;
  int (*func)(void);
  uint32_t value;
  uint32_t mask;
} opcodes[] = {
   {"-------------------------0010111", op_auipc},
   {"-------------------------0110111", op_lui},
   {"-------------------------1101111", op_jal},
   {"-----------------000-----1100111", op_jalr},

   {"-----------------000-----1100011", op_beq},
   {"-----------------001-----1100011", op_bne},
   {"-----------------100-----1100011", op_blt},
   {"-----------------101-----1100011", op_bge},
   {"-----------------110-----1100011", op_bltu},
   {"-----------------111-----1100011", op_bgeu},

   {"-----------------000-----0000011", op_lb},
   {"-----------------001-----0000011", op_lh},
   {"-----------------010-----0000011", op_lw},
   {"-----------------100-----0000011", op_lbu},
   {"-----------------101-----0000011", op_lhu},

   {"-----------------000-----0100011", op_sb},
   {"-----------------001-----0100011", op_sh},
   {"-----------------010-----0100011", op_sw},


   {"-----------------000-----0010011", op_addi},
   {"-----------------010-----0010011", op_slti},
   {"-----------------011-----0010011", op_sltiu},
   {"-----------------100-----0010011", op_xori},
   {"-----------------110-----0010011", op_ori},
   {"-----------------111-----0010011", op_andi},
   {"0000000----------001-----0010011", op_slli},
   {"0000000----------101-----0010011", op_srli},
   {"0100000----------101-----0010011", op_srai},

   {"0000000----------000-----0110011", op_add},
   {"0100000----------000-----0110011", op_sub},
   {"0000000----------001-----0110011", op_sll},
   {"0000000----------010-----0110011", op_slt},
   {"0000000----------011-----0110011", op_sltu},
   {"0000000----------100-----0110011", op_xor},
   {"0000000----------101-----0110011", op_srl},
   {"0100000----------101-----0110011", op_sra},
   {"0000000----------110-----0110011", op_or},
   {"0000000----------111-----0110011", op_and},

   {"0000--------00000000000000001111", op_fence},
   {"00000000000000000001000000001111", op_fence_i},

   {"00000000000000000000000001110011", op_ecall},
   {"00000000000100000000000001110011", op_ebreak},

   {"-----------------001-----1110011", op_csrrw},
   {"-----------------010-----1110011", op_csrrs},
   {"-----------------011-----1110011", op_csrrc},
   {"-----------------101-----1110011", op_csrrwi},
   {"-----------------110-----1110011", op_csrrsi},
   {"-----------------111-----1110011", op_csrrci},

   {"--------------------------------", op_unknown}  // Catches all the others
};

A function to break the instruction into fields (the ugly bit):
Code: [Select]
/****************************************************************************/
static void decode(uint32_t instr) {
  int32_t broffset_12_12, broffset_11_11, broffset_10_05, broffset_04_01;
  int32_t jmpoffset_20_20, jmpoffset_19_12, jmpoffset_11_11, jmpoffset_10_01;
  rs1     = (instr >> 15) & 0x1f ;
  rs2     = (instr >> 20) & 0x1F;
  rd      = (instr >> 7)  & 0x1f;
  csrid   = (instr >> 20);
  uimm    = (instr >> 15) & 0x1f;
  shamt   = (instr >> 20) & 0x1f;
  upper20 = instr & 0xFFFFF000;
  imm12   = ((int32_t)instr) >> 20;

  jmpoffset_20_20 = (int32_t)(instr & 0x80000000)>>11;
  jmpoffset_19_12 = (instr & 0x000FF000);
  jmpoffset_11_11 = (instr & 0x00100000) >>  9;
  jmpoffset_10_01 = (instr & 0x7FE00000) >> 20;
  jmpoffset = jmpoffset_20_20 | jmpoffset_19_12 | jmpoffset_11_11 | jmpoffset_10_01;

  broffset_12_12 = (int)(instr & 0x80000000) >> 19;
  broffset_11_11 = (instr & 0x00000080) << 4;
  broffset_10_05 = (instr & 0x7E000000) >> 20;
  broffset_04_01 = (instr & 0x00000F00) >> 7;
  broffset = broffset_12_12 | broffset_11_11 | broffset_10_05 | broffset_04_01;

  imm12wr   =  instr; /* Note - becomes signed */
  imm12wr  >>= 20;
  imm12wr  &= 0xFFFE0;
  imm12wr  |= (instr >> 7)  & 0x1f;
  current_instr = instr;
}

And some small functions to actually execute the instructions:
Code: [Select]
/****************************************************************************/
static int op_beq(void) {     trace("BEQ\tr%i, r%i, %i", rs1, rs2, broffset);
  if(regs[rs1] == regs[rs2]) {
    pc += broffset;
  } else {
    pc += 4;
  }
  return 1;
}

...and then the code to acutally run an instruction:

Code: [Select]
/****************************************************************************/
static int do_op(void) {
  uint32_t instr;
  int i;
  if((pc & 3) != 0) {
    display_log("Attempt to execute unaligned code");
    return 0;
  }

  /* Fetch */
  if(!memorymap_read(pc,4, &instr)) {
    display_log("Unable to fetch instruction");
    return 0;
  }
  /* Decode */
  decode(instr);
 
  /* Execute */
  for(i = 0; i < sizeof(opcodes)/sizeof(struct opcode_entry); i++) {
     if((instr & opcodes[i].mask) == opcodes[i].value) {
       return opcodes[i].func();
     }
  }
  return 0;
}
« Last Edit: December 13, 2018, 10:09:27 pm by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #95 on: December 13, 2018, 10:30:51 pm »
My little software RISC-V emulator seems to be alive! Has churned through 3,047 instructions of a HiFive 'blink' binary. Maybe a couple of evenings play to get it this far - you couldn't do that with x86... the actual RISC-V code is < 800 lines.


Your intention, at the moment, seems to be to emulate all of the instructions.  This seems like a great approach because instruction execution needs to be understood before even thinking about the other issues.

You mention coding the pipeline.  Is that where you are headed?  If so, I hope you'll publish your code as you go along. Either here or on your web site.

Instruction execution seems easy to code, the pipeline will be a lot more complex (I think...).
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #96 on: December 13, 2018, 11:26:36 pm »
Instruction execution seems easy to code, the pipeline will be a lot more complex (I think...).
For the level I am aiming at I don't think it will be too complex. All that is needed is a way to indicate if the instructions in the pipeline are no longer valid because the program counter was updated rather than incremented.

Unaligned memory accesses (which will need two cycles to execute) will a bit of complexity though, as it will stall the pipeline rather than requiring that it gets flushed.

Humm....
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #97 on: December 13, 2018, 11:56:47 pm »
Instruction execution seems easy to code, the pipeline will be a lot more complex (I think...).
For the level I am aiming at I don't think it will be too complex. All that is needed is a way to indicate if the instructions in the pipeline are no longer valid because the program counter was updated rather than incremented.

Unaligned memory accesses (which will need two cycles to execute) will a bit of complexity though, as it will stall the pipeline rather than requiring that it gets flushed.

Humm....

I was wondering if you planned to implement the various registers that save state through the pipeline.  I am interesting in detecting and overcoming hazards.

What I really need is a reference book for the RISC-V that covers all the hardware details.  Not just at 10,000 feet up but right down in the dirt.  Something I can convert from text to HDL or, better yet, maybe the HDL is given.

Are there any such references?
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #98 on: December 14, 2018, 12:32:18 am »
What I really need is a reference book for the RISC-V that covers all the hardware details.  Not just at 10,000 feet up but right down in the dirt.  Something I can convert from text to HDL or, better yet, maybe the HDL is given.

Are there any such references?

I think that this is the key of the RISC-V ethos - it is just the ISA specification. What you do with it is up to you.

As long as your hardware runs the RISC-V RV32I (+ whatever extensions) you don't have to worry too much about the software tooling.

RISC-V it isn't a hardware specification - it is a specification of the interface between the software layer and digital logic layers. If you build a CPU that implements RISC-V, you have a ready-made software layer.

And if you build software that targets RISC-V, you have ready-made hardware implementations.

So that is why I am using the SiFive HiFive's FE310 as a reference for my hacks:

- I have the real hardware on my desk https://www.sifive.com/boards/hifive1 (one of the early team signature edition boards, no less!), to clear up any of my misunderstandings

- the GCC  RISC-V toolchain is all there, installed on a Linux VM that I used playing with the HiFive

- Everything about the chip is well-documented at https://www.sifive.com/chip-designer#fe310

So now all I need to do is run 'objcopy' to convert the ELF image to a binary file, then load it into in my emulator's ROM, and I am ready to debug :).

« Last Edit: December 14, 2018, 12:34:00 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: brucehoult

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #99 on: December 14, 2018, 12:35:54 am »
Unaligned memory accesses (which will need two cycles to execute) will a bit of complexity though, as it will stall the pipeline rather than requiring that it gets flushed.

Since this is on FPGA, an easy solution to this is using multi-port BRAM which can fetch two consecutive words at a time (for example one port reads address (a&~3) while the next one reads addrsss (a&~3+4). Concatenating them together you get 8 bytes starting at (a&~3). Using a 4:1 mux controlled by (a&3), you can then read any unaligned 32-bit word. The mux will add some delay, but since you're going to have bigger delays elsewhere (such as in ALU), this probably doesn't matter. Of course, it'll only work if you use BRAM as your memory.
 
The following users thanked this post: hamster_nz

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #100 on: December 14, 2018, 01:06:59 am »
Quote
[MSP430 is] a little bit CISCy with memory-to-memory moves and adds. It falls into the PDP11-design M68000 space.

Based on the example I just tried, the code isn't very compact! At least as generated from C by gcc.
Variable instruction length and execution time.  Definitely CISCy.  Although of "elegantly minimal" form rather than the "we're going to implement cobol in microcode" form.  With twice the registers of a PDP11 and half the 68k, I think it qualifies as different enough to be "new."  And "relatively" successful.
The MSP430 code gcc produced for your example is depressingly bad.  It fails to refactor the array access into pointer-based accesses, dutifully incrementing the index and adding it to each array base on each loop, when it could have used auto-incrementing indexed addressing, I think.  (I thought that was an optimization that gcc would do before even getting to cpu-specific code generation.  I guess not.)

Quote
I'm actually very disappointed that manufacturers of machines with condition codes don't seem to have added recognition of the C idiom for the carry flag and generated "add with carry" from it. gcc on every machine does recognise idioms for things such as rotate and generate rotate instructions
Adding a "rotate" is relatively easy because it's a single instruction.  Supporting Carry means retaining awareness of state that isn't part of the C model of how things work.  "Which carry were they talking about?"  For example, the really short examples that people are posting are all based on having "loop" instructions that don't change the carry bit.  We saw how that restricts register choice on x86.  MSP430 doesn't have any such looping instructions (that I recall or see in summaries.)  So the compiler would have to decide that some math is different than other math, and ... it makes my brain hurt just thinking about it.  (ARM has the "S" suffix for instructions to specify that they should update the flags, which is cute, I guess.  But I'm not sure it's worth spending a bit on (and indeed, it's not there in Thumb-16)

Quote
Quote
Operating systems and compiler runtime libraries are always going to have a little bit of assembler in them...
   
That's really a sign of a bad design in 2018.
I don't know.  In some senses, having actual assembler modules seems "cleaner" than some of the things that compilers get forced into these days.  (Consider the whole "sfr |= bitmask;" optimization in AVR...)


Quote
[RISCV is] not going to disappear without trace when the company that owns it goes out of business or loses interest. That's a serious problem ... How much software has been lost as a result of the demise of PDP11, VAX, Alpha, Nova, Eclipse, PA-RISC, [etc]
PDP10.  "loses interest."  Sigh.

 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #101 on: December 14, 2018, 01:30:28 am »
I don't know.  In some senses, having actual assembler modules seems "cleaner" than some of the things that compilers get forced into these days.  (Consider the whole "sfr |= bitmask;" optimization in AVR...)

Memory mapped registers give you cleaner code. The reason AVR is hard is that it has limited address space. This is not a problem on 32-bit systems.

Essentially what I'm asking for is SCB on Cortex-M devices. It is still standard and defined by the architecture specification, so all vendors have to implement it to get a compliant core.

Special registers result in weird code where you move stuff to/from general purpose register. And I don't see how this is any better from implementation point of view. If you are writing the value into the special register, you still have to wait until register fetch stage. And at that point you know the target address of a store operation and can break the pipeline if necessary.
Alex
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #102 on: December 14, 2018, 02:44:51 am »
Quote
seems "cleaner"
Well, for instance, since "rotate" has been mentioned...
It's nice the compile can be made smart enough to see:

Code: [Select]
  ((x << n) | (x >> (opsize - n)));

and perhaps generate a "rotate left" instruction.  But I'd really rather a rotl(x,n); statement that I KNOW generates the appropriate assembly.

Or, in the case of ARM, it's nice that the ABI and the hardware agree on which registers get saved, so that ISR functions and normal C functions are indistinguishable.   I guess.  Other times I wish the ISRs in C code were more easily distinguishable, and that the HW interrupt entry was quicker...
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #103 on: December 14, 2018, 02:52:50 am »
But I'd really rather a rotl(x,n); statement that I KNOW generates the appropriate assembly.
Me too. But that's a question to the compiler/standard library creators. Such things can either be defined as part of the standard (no way it realistically will happen for C) or as part of the library in a form of intrinsics. Intrinsics are easier, but they simply reflect instruction set with all its limitations on types of arguments.

Or, in the case of ARM, it's nice that the ABI and the hardware agree on which registers get saved, so that ISR functions and normal C functions are indistinguishable.   I guess.  Other times I wish the ISRs in C code were more easily distinguishable, and that the HW interrupt entry was quicker...
That's a matter of future improvement.  I'll take ARMs system any day of the week over what we had before. Now to stay competitive they need to make it better. For example have a register in the NVIC that defines a bit mask of registers to save/restore. Your choice to stay with the default and be compatible with the ABI or do something manually.

RISC-V still has to catch up to what ARM has in this respect. An unfortunately I see no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs.
Alex
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #104 on: December 14, 2018, 03:20:22 am »
RISC-V still has to catch up to what ARM has in this respect. An unfortunately I see no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs.

I don't think ARM is somehow targeted to MCUs. Compare to ARM, RISC-V seems cleaner and better, and it is also free. There's no reason to choose ARM over RISC-V.

With MCUs, a big problem is that the instructions are fetched from flash. Flash fetching is slow, so you only can fetch so many instructions per unit of time. A natural way to improve the performance is to make your instructions wider, so that every single instruction can do more, which is CISC, totally different to what you see in either ARM or RISC-V. However, such approach doesn't seem to be very popular. Everybody wants ARM. Perhaps, 5 years from now everybody will want RISC-V, which is definitely a good thing.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #105 on: December 14, 2018, 03:23:24 am »
I don't think ARM is somehow targeted to MCUs.
And what is Cortex-M0+ then?

There's no reason to choose ARM over RISC-V.
What is the interrupt latency on the RISC-V?

Perhaps, 5 years from now everybody will want RISC-V, which is definitely a good thing.
Quite likely, but not without effort on RISC-V part.
Alex
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #106 on: December 14, 2018, 03:56:59 am »
You did in two days?  :o :o :o
... of spare time between the boy going to bed, and me going to bed.

It's not much to look at:

Good work!

One of my current work tasks is helping extend binutils (assembler, disassembler) and Spike to understand the proposed Vector instruction set. Very similar stuff. And then on to llvm...
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #107 on: December 14, 2018, 04:00:19 am »
I don't think ARM is somehow targeted to MCUs.
And what is Cortex-M0+ then?

An existing architecture used for a purpose which wasn't intended during original design.

There's no reason to choose ARM over RISC-V.
What is the interrupt latency on the RISC-V?

You don't know that. This is ISA, not architecture. You can design your MCU with very low interrupt latency. Or you can design an MCU with long pipeline and bad interrupt latency. That is actually were the benefit is. Anyone can design their own CPU with the design characteristics they want, and all of  them can use the same ISA. Such things were completely impossible with ARM because the core was copyrighted and you had to live with what they gave you.


 

Offline lucazader

  • Regular Contributor
  • *
  • Posts: 119
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #108 on: December 14, 2018, 04:25:34 am »
Continuing from NorthGuy's comments about interrupt latency being implementation independent:

If you look at SiFive's core design for the E20 core, which is targeted at the same level as the M0+ core by the looks of their marketing material, the interrupt latency is 6-cycle to a c handler whereas an M0+ is 15-cycles.
https://www.sifive.com/cores/e20

Now sure there is an note on there that this is when using the CLIC vectored mode. But the M0+ also has a vectored interrupt controller.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #109 on: December 14, 2018, 04:27:39 am »
the interrupt latency is 6-cycle to a c handler whereas an M0+ is 15-cycles.
Except that Cortex-M saves registers in those 15 cycles, and RISC-V only arrives at a register saving code after those 6.

EDIT: I don't see how it can do 6 cycles to the C code. May be someone can point out what it is doing those 6 cycles?
Alex
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #110 on: December 14, 2018, 04:37:10 am »
I had to build dummy hardware for the "AON" (Always On) Peripheral, and the "PRCI" (used for clocking control) Peripheral, and it gets as far as attempting to configure the QSPI interface on address 0x10014000.

You'd have an easier time making a generic binary rather than a HiFive1 one, and using stdin/stdout emulation.

Code: [Select]
$ cd freedom-e-sdk/software/hello
$ riscv64-unknown-elf-gcc -O -march=rv32i -mabi=ilp32 hello.c -o hello

objdump that and you'll find main calling puts calling _puts_r calling {strlen, __sinit, __sfvwrite_r}.
Eventually you'll find yourself (after all the bollocks in Newlib) down in _write()

Code: [Select]
00012cd0 <_write>:
   12cd0:       ff010113                addi    sp,sp,-16
   12cd4:       00112623                sw      ra,12(sp)
   12cd8:       00812423                sw      s0,8(sp)
   12cdc:       00000693                li      a3,0
   12ce0:       00000713                li      a4,0
   12ce4:       00000793                li      a5,0
   12ce8:       04000893                li      a7,64
   12cec:       00000073                ecall
   12cf0:       00050413                mv      s0,a0
   12cf4:       00055a63                bgez    a0,12d08 <_write+0x38>
   12cf8:       40800433                neg     s0,s0
   12cfc:       08c000ef                jal     ra,12d88 <__errno>
   12d00:       00852023                sw      s0,0(a0)
   12d04:       fff00413                li      s0,-1
   12d08:       00040513                mv      a0,s0
   12d0c:       00c12083                lw      ra,12(sp)
   12d10:       00812403                lw      s0,8(sp)
   12d14:       01010113                addi    sp,sp,16
   12d18:       00008067                ret

You need to have your emulator implement ecall, and do the right thing based on the code in a7:

57: _close
62: _lseek
63: _read
64: _write
80: _fstat
93: _exit
214: _sbrk

Except for _fstat (which needs a struct remapped), most of those are easy to just pass directly on to your host OS.
 
The following users thanked this post: oPossum

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #111 on: December 14, 2018, 04:49:58 am »
You'd have an easier time making a generic binary rather than a HiFive1 one, and using stdin/stdout emulation.

And if you cut your program down to ...

Code: [Select]
void _write(int fd, char *s, int len);

int main()
{
    _write(0, "hello world!\n", 13);
    return 0;
}

... you'll have a much smaller binary (I get 1532 bytes, 383 instructions) that still works fine on qemu user:

Code: [Select]
$ riscv64-unknown-elf-gcc -O -march=rv32i -mabi=ilp32 hello.c -o hello
$ size hello
   text    data     bss     dec     hex filename
   1532    1084      28    2644     a54 hello
$ qemu-riscv32 hello
hello world!

Make that work on yours and you'll be sweet :-)

ok .. you'll need a link map something like HiFive1 uses to let you extract the elf to a raw binary. I'm sure you can handle that.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #112 on: December 14, 2018, 05:11:35 am »
What I really need is a reference book for the RISC-V that covers all the hardware details.  Not just at 10,000 feet up but right down in the dirt.

RISC-V overview and tutorial: The RISC-V Reader: An Open Architecture Atlashttps://www.amazon.com/RISC-V-Reader-Open-Architecture-Atlas/dp/0999249118] [url]https://www.amazon.com/RISC-V-Reader-Open-Architecture-Atlas/dp/0999249118[/url]

RISC-V instruction set reference Work In Progress (User ISA, and privileged): https://github.com/riscv/riscv-isa-manual/releases/latest

Computer architecture textbook for undergrads, using RISC-V: https://www.amazon.com/Computer-Organization-Design-RISC-V-Architecture/dp/0128122757

Quote
Something I can convert from text to HDL or, better yet, maybe the HDL is given.

Are there any such references?

There is no such thing as The HDL. RISC-V is an ISA specification that anyone can implement any way they want.

And many people have already!

If you want concrete, open, HDL implementations, here is a selection you can study, build, put in an FPGA, and run: https://github.com/riscv/riscv-wiki/wiki/RISC-V-Cores-and-SoCs

Rocket is the original core design from Berkeley. Many other projects are based on it including SiFive's "freedom" and commercial x3n and x5n cores and BOOM.

PULPino is from ETH Zurich and is what has been used in the new NXP microcontroller SoC (with a RI5CY core and a Zero RISCY core as well as an M0 and an M4F)

PicoRV32 and VexRiscv are very popular for use in small FPGAs.

ReonV is based on the old Leon SPARC open-source implementation with the opcodes changes.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #113 on: December 14, 2018, 05:43:28 am »
- I have the real hardware on my desk https://www.sifive.com/boards/hifive1 (one of the early team signature edition boards, no less!), to clear up any of my misunderstandings

Yup. I ordered one of the Signature Edition boards back in Dec 2016 (late Jan by the time it arrived in Moscow) https://twitter.com/BruceHoult/status/824965355755991041

I was pretty impressed that a company with about ten people had got the chip taped out, and back, and it worked at 320 MHz, and made the boards and software (including porting the Arduino libraries), in 18 months after being founded.

Two weeks later, someone on the support forums posted a video of playing the Dr Who theme on a square-wave software synthesizer they'd written for the HiFive1. I responded by spending about six hours on a quick hack to play the same thing from a proper WAV file using straight Arduino digitalWrite():



And then a Queen song (kinda topical right now!):



I think SiFive noticed these videos at the time -- Megan Wachs just asked me about them a couple of weeks ago and got me to resurrect the code for one of the demos in the SiFive booth at the RISC-V Summit last week.

Long story short ... after Michael Clark and I presented the rv8 simulator at CARRV (Workshop on Computer Architecture Research with RISC-V) in Boston in October 2017 various SiFive people took me to dinner and bars and suggested that I might like to come and work for them. As I was already impressed by the HiFive1 and they were at that time already taping-out the FU540 for the linux board it was a pretty easy sell as being more interesting than what I was doing at Samsung :-)
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #114 on: December 14, 2018, 05:56:58 am »
I was wondering if you planned to implement the various registers that save state through the pipeline.  I am interested in detecting and overcoming hazards.

When a customer asked me to help him at debugging his pipelined-CPU (VHDL), we found a CPU which basically was working fine, except for some registers mysteriously corrupted during the execution.

Digging I found there were a couple of bugs in how the pipeline was stalling the ALU during divisions and multiplication, plus another bug of the same kind in the load/store.

The pipeline was not correctly stalled, thus this caused data corruption.

Books usually don't cover anything at this level of details, and it makes sense because courses are already too heavy.

Anyway, my customer's CPU has eight stages. Although the instruction and data memory occupy multiple cycles, they are fully pipelined, so that a new instruction can start on every clock. The function of each stage is given as follows: (NOTE: the stages described below are different with those of MIPS R2K)

  • IF - First half of instruction fetch. Program Counter selection actually happens here, together with the initiation of instruction cache access
  • IS - Second half of instruction fetch, complete instruction cache access. Note that it's assumed that instruction accesses always hit the instruction cache
  • ID - Instruction decode, hazard checking (this stage is critical)
  • RF - RegisterFile Fetch
  • EX - Execution, which includes effective address calculation, ALU operation, and branch-target computation and condition evaluation (this stage is critical)
  • DF - Data fetch first half of data cache access
  • DS - Second half of data fetch, completion of data cache access. Note that the data access always hit the data cache.
  • WB - Write back for loads and register-register operations.


This working scheme is immediately defective in this description, because EX, DF and FS are assumed to take 1 clock edge, while there are scenarios when they take more than 1 clock edge (e.g. multiplication, division, data not in cache --> access to the ram --> n wait-states ---> n + m clock edges)

Therefore the pipeline needs to be stalled properly. This is usually not considered in books, but it's what you find in reality.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #115 on: December 14, 2018, 05:58:50 am »
The MSP430 code gcc produced for your example is depressingly bad.  It fails to refactor the array access into pointer-based accesses, dutifully incrementing the index and adding it to each array base on each loop, when it could have used auto-incrementing indexed addressing, I think.  (I thought that was an optimization that gcc would do before even getting to cpu-specific code generation.  I guess not.)

Yes, I don't know why. gcc is perfectly capable of doing this on other ISAs.

Quote
Quote
I'm actually very disappointed that manufacturers of machines with condition codes don't seem to have added recognition of the C idiom for the carry flag and generated "add with carry" from it. gcc on every machine does recognise idioms for things such as rotate and generate rotate instructions
Adding a "rotate" is relatively easy because it's a single instruction.  Supporting Carry means retaining awareness of state that isn't part of the C model of how things work.  "Which carry were they talking about?"  For example, the really short examples that people are posting are all based on having "loop" instructions that don't change the carry bit.  We saw how that restricts register choice on x86.  MSP430 doesn't have any such looping instructions (that I recall or see in summaries.)  So the compiler would have to decide that some math is different than other math, and ... it makes my brain hurt just thinking about it.

Yes.

I did a little googling and found that one idiom said to produce near-optimal code with some compilers and ISAs is to do the arithmetic in double precision and then cast/mask/shift the result back down to normal precision:

Code: [Select]
long tmp = (long)a + b + carryIn;
int sum = (int)tmp;
int carry = (int)(tmp>>(sizeof(int)*8);

I've luck with the same kind of approach to generate instructions such as "give me the high bits of a multiply" in the past, but I haven't checked this idiom for carry myself yet.

Quote
(ARM has the "S" suffix for instructions to specify that they should update the flags, which is cute, I guess.  But I'm not sure it's worth spending a bit on (and indeed, it's not there in Thumb-16)

PowerPC does the same thing with a "." suffix on opcodes to update the condition codes in cr0. (cr1..cr7 are updated only by cmp instructions).
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #116 on: December 14, 2018, 06:24:28 am »
I don't know.  In some senses, having actual assembler modules seems "cleaner" than some of the things that compilers get forced into these days.  (Consider the whole "sfr |= bitmask;" optimization in AVR...)

Memory mapped registers give you cleaner code. The reason AVR is hard is that it has limited address space. This is not a problem on 32-bit systems.

HC11 (8bit register machine, 16bit address space) comes with soft registers; basically, gcc uses the fist 256byte of CPU internal ram for this.

But are you sure that this makes cleaner code?
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #117 on: December 14, 2018, 06:44:41 am »
But are you sure that this makes cleaner code?
We are talking about different things. Memory-mapped general purpose registers is a horrible idea.

I'm talking about special registers that use special commands to access them (lie co-processors on ARM and MIPS) vs just mapping the same special registers into the regular address space where regular load/store instructions can get to them.
Alex
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #118 on: December 14, 2018, 08:23:56 am »
RISC-V still has to catch up to what ARM has in this respect. An unfortunately I see no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs.

Than you're not looking.

There is an Embedded ABI being worked on, with fewer volatile registers -- probably a0..a3 and t0..t1 instead of a0..a7 and t0..t6 for Linux, thus cutting down the number of registers that need to be saved on an interrupt to six (plus ra) instead of fifteen. Also, all registers from 16..31 become callee-save, making the ABI identical between rv32i and rv32e. This is being worked on by people in the embedded community.

There is the CLIC (Core Local Interrupt Controller), a backward-compatible enhancement that was *specifically* designed with the needs and inputs of the embedded community. I've already pointed you to this. It provides direct vectoring to C functions decorated with an attribute, plus with a very small amount of ROMable code that can be built into a processor it provides vectoring to *standard* ABI C functions along with features such as interrupt chaining (dispatching to the next handler without restoring and re-saving registers), late dispatch (if I higher priority interrupt comes in while registers are being saved), and similar latencies to those ARM cores provide.

SiFive has developed a small 2-stage pipeline processor core *specifically* for deeply embedded real-time applications. There is no branch prediction .. all taken branches take 2 cycles. Other suppliers such as Syntacore and PULPino have similar cores, for example the Zero RISCy in the new NXP chip. SiFive has also developed an extension for the 3-series and 5-series cores to disable branch prediction for embedded real-time tasks (and of course turning part of the the icache into instruction scratchpad).

The Vector Extension working group had been bending over backwards to accommodate the wishes and needs of the embedded community and make the very lowest-end implementations simpler and better performing. As a simple example, the high-end people such as Esperano, Barcelona Supercomputer Centre, and Krste at SiFive wanted predicated-off vector lanes to be set to zero i.e. vadd dst,src1,src2 should not have to read dst. The embedded guys wanted the predicated-off vector lanes to be "left untouched". In this and in several other areas the design has been modified to better suit small embedded cores.

The Bit Manipulation Working Group is almost exclusively looking at things that primarily the embedded community want.

Claims that there is "no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs" is so far from the clearly obvious truth that it just about has to be trolling.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #119 on: December 14, 2018, 08:58:30 am »
But are you sure that this makes cleaner code?
We are talking about different things. Memory-mapped general purpose registers is a horrible idea.

I agree with this. It's very important for execution pipelines that registers have "names" not "numbers". That is, the register an instruction refers to must be specified explicitly in the instruction, and not be subject to modification or calculation.

Quote
I'm talking about special registers that use special commands to access them (lie co-processors on ARM and MIPS) vs just mapping the same special registers into the regular address space where regular load/store instructions can get to them.

For peripherals, fine.

But *exactly* the same reasons that apply to general purpose registers not being memory mapped also apply to registers that affect the execution environment of the machine. And more.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #120 on: December 14, 2018, 09:05:34 am »
You'd have an easier time making a generic binary rather than a HiFive1 one, and using stdin/stdout emulation.

And if you cut your program down to ...

Even simpler, of course, you could just allocate a few bytes at some address in low memory (non RAM) --  call it STDIO_BASE perhaps. Then make loads from STDIO_BASE read a character from the host OS stdin, stores to STDIO_BASE+1 write a character to the host OS stdout and (optional) stores to STDIO_BASE+1 write a character to the host OS stderr.

That dead easy both to implement in the simulator and to write programs for.

Bonus points: write implementations of _read(), _write() that do that, so the rest of <stdio> Just Works.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #121 on: December 14, 2018, 11:22:54 am »
I'd just like to say thanks to all who have contributed to this thread (and I doubt it's dead yet): rstofer, lucazader, legacy, ehughes, DavidH, hamster_nz, NorthGuy, westfw, ataradov, obiwanjacobi, FlyingDutch

Cheers, guys :-)
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #122 on: December 15, 2018, 07:27:32 am »
 
Quote
The MSP430 code gcc produced for your example is depressingly bad.

Here's what I get for a hand-written MSP430 version.   It's sort-of interesting the way there ends up being a "local variable" for the carry, but I still get to use the addc instruction, thanks to the status also being available as a register...


(Now, it's been a while since I did any MSP430 assembly, I'm not sure I fully understand the C ABI, and I didn't actually compile or test this.  But I think it should be pretty close.  OTOH, I think some of those auto-incrementing mov instructions may turn out to be 32bits.  But... SO MUCH better than gcc did :-( )


Edit: I fixed it up to the point where it will at least go through the assembler OK...  (auto-increment indexed addressing doesn't work for a destination...)  (maybe I shouldn't clear ALL the flags...)

Code: [Select]
bignumAdd:
        push    sum
        push    savSR

        cmp     #1, cnt         ; if (cnt <= 0) return
        jl      exit
        clrc                     ; is this already clear?
        mov     SR, savSR       ; clear carry to start

loop:   mov     savSR, SR       ; get carry from sum, not cnt decrement.
        mov     @a+, sum        ; get a[n]
        addc    @b+, sum        ;  add b[n]
        mov     sum, 0(c)       ;   store c[n]
        mov     SR, savSR       ; save the carry info
        incd    sum             ;  increment destination (by 2)
        dec     cnt             ; decrement count
        jnz     loop            ; next word.
exit:
        pop     r10
        pop     sum
        ret
« Last Edit: December 15, 2018, 08:54:25 am by westfw »
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #123 on: December 15, 2018, 07:34:10 am »
Are are any books/curricula written on "comparative assembly language" ?
Kids today are barely exposed to one, I think, and even "back in the day" when we had both IBM360 and PDP11, we didn't really compare them...
I guess some of that gets covered in "computer architecture" classes, but I remember those being a lot more hardware-oriented...
Maybe you can't compare them without a hardware orientation?  But it seems like it ought to be possible.  I mean, *I* enjoy comparing instruction sets, and my background gives only a pretty vague handwave to the the actual implementation...

 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #124 on: December 15, 2018, 07:39:31 am »
Maybe you can't compare them without a hardware orientation?
That is 100% the case. At least if you are actually comparing for performance. You can compare for size easily.

There is a number of fun examples for Cortex-M7 where rearranging the order of a couple absolutely independent instructions, changes the speed of execution by a factor of two. This is because CM7 is a dual-issue pipeline and integer  and floating point instructions can essentially execute at the same time. It is still the same instruction set as Cortex-M4, but how you write the code now matters.

And comparing any of this to modern X86 is just silly.
« Last Edit: December 15, 2018, 07:42:30 am by ataradov »
Alex
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #125 on: December 15, 2018, 07:53:55 am »
Quote
The MSP430 code gcc produced for your example is depressingly bad.
Here's what I get for a hand-written MSP430 version.

And that's what I'd expect, looking at the instruction set. The mystery is why gcc so completely fails to do that, when it can for other ISAs.

It's the gcc I get by doing apt get on Ubtuntu 18.04. It's a little old, from 2012:

msp430-gcc (GCC) 4.6.3 20120301 (mspgcc LTS 20120406 unpatched)

But, still ... that vintage gcc could do this stuff on other ISAs. SH4, for example. Do people use gcc for msp430, or something else?

Looking at some of those, I can certainly understand why people still like to use assembly language quite often. It's hard to understand why they'd put up with compiler results like that at all.
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #126 on: December 15, 2018, 09:16:03 am »
If anybody is interested, I've put my RISC-V toy up on Github - https://github.com/hamsternz/emulate-risc-v - I've even added a little colour.

Does anybody know where I can find the encoding for the RV32M extensions? I've got to the point where the binary I am using uses DIVU...

I can find this, but it is a bit obscure for me!

Code: [Select]
mul     rd rs1 rs2 31..25=1 14..12=0 6..2=0x0C 1..0=3
mulh    rd rs1 rs2 31..25=1 14..12=1 6..2=0x0C 1..0=3
mulhsu  rd rs1 rs2 31..25=1 14..12=2 6..2=0x0C 1..0=3
mulhu   rd rs1 rs2 31..25=1 14..12=3 6..2=0x0C 1..0=3
div     rd rs1 rs2 31..25=1 14..12=4 6..2=0x0C 1..0=3
divu    rd rs1 rs2 31..25=1 14..12=5 6..2=0x0C 1..0=3
rem     rd rs1 rs2 31..25=1 14..12=6 6..2=0x0C 1..0=3
remu    rd rs1 rs2 31..25=1 14..12=7 6..2=0x0C 1..0=3

Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: oPossum

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #127 on: December 15, 2018, 10:10:52 am »
Quote
Do people use gcc for msp430, or something else?
There is gcc, now maintained by someone else and distributed by TI, and there is TI's CCS compiler.

The version I have that was distributed with CCS8 is "v7.3.1.24 (Mitto Systems Limited)", and produces significantly different (but still not very good) code.   Here's the loop (down to 20 instructions!)
Code: [Select]
    fd4e:    27 4d           mov    @r13,    r7    ;
    fd50:    08 47           mov    r7,    r8    ;
    fd52:    28 5e           add    @r14,    r8    ;
    fd54:    09 48           mov    r8,    r9    ;
    fd56:    09 5a           add    r10,    r9    ;
    fd58:    8c 49 00 00     mov    r9,    0(r12)    ;
    fd5c:    4b 46           mov.b    r6,    r11    ;
    fd5e:    08 97           cmp    r7,    r8    ;
    fd60:    01 28           jnc    $+4          ;abs 0xfd64
    fd62:    4b 45           mov.b    r5,    r11    ;
    fd64:    48 46           mov.b    r6,    r8    ;
    fd66:    09 9a           cmp    r10,    r9    ;
    fd68:    01 28           jnc    $+4          ;abs 0xfd6c
    fd6a:    48 45           mov.b    r5,    r8    ;
    fd6c:    4b d8           bis.b    r8,    r11    ;
    fd6e:    4a 4b           mov.b    r11,    r10    ;
    fd70:    2d 53           incd    r13        ;
    fd72:    2e 53           incd    r14        ;
    fd74:    2c 53           incd    r12        ;
    fd76:    0f 9d           cmp    r13,    r15    ;
    fd78:    ea 23           jnz    $-42         ;abs 0xfd4e

TI's compiler does a bit better (17 instructions.)  It manages to use the autoincrement address modes, and actually doesn't look too bad, for a faithful translation of the the source algorithm (without  using the availabe carry flag):
Code: [Select]
   c:   38 4d           mov     @r13+,  r8
   e:   3b 4e           mov     @r14+,  r11
  10:   0b 58           add     r8,     r11
  12:   0a 43           clr     r10
  14:   0b 98           cmp     r8,     r11
  16:   01 2c           jc      $+4             ;abs 0x1a
  18:   1a 43           mov     #1,     r10     ;r3 As==01
  1a:   0b 59           add     r9,     r11
  1c:   2c 53           incd    r12
  1e:   8c 4b fe ff     mov     r11,    -2(r12) ; 0xfffe
  22:   08 43           clr     r8
  24:   0b 99           cmp     r9,     r11
  26:   01 2c           jc      $+4             ;abs 0x2a
  28:   18 43           mov     #1,     r8      ;r3 As==01
  2a:   09 48           mov     r8,     r9
  2c:   09 da           bis     r10,    r9
  2e:   1f 83           dec     r15
  30:   ed 23           jnz     $-36            ;abs 0xc
(!12 of those instructions are faking the carry status, which is the sort of thing that makes assembly programmers curse at HLLs...)
 
The following users thanked this post: oPossum

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #128 on: December 15, 2018, 10:36:35 am »
Huh.   I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming, but it actually did really well!  15 instuctions in the loop, and only 46 bytes total - significantly shorter than the thumb2 code, slightly beating the RISCV.

Code: [Select]
   e:   594b            ldr     r3, [r1, r5]
  10:   5954            ldr     r4, [r2, r5]
  12:   191c            adds    r4, r3, r4
  14:   19a7            adds    r7, r4, r6
  16:   42b7            cmp     r7, r6
  18:   41b6            sbcs    r6, r6
  1a:   429c            cmp     r4, r3
  1c:   41a4            sbcs    r4, r4
  1e:   5147            str     r7, [r0, r5]
  20:   4264            negs    r4, r4
  22:   4276            negs    r6, r6
  24:   3504            adds    r5, #4
  26:   4326            orrs    r6, r4
  28:   45ac            cmp     ip, r5
  2a:   d1f0            bne.n   e <bignumAdd+0xe>
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #129 on: December 15, 2018, 10:55:54 am »
If anybody is interested, I've put my RISC-V toy up on Github - https://github.com/hamsternz/emulate-risc-v - I've even added a little colour.

Nice!

Quote
Does anybody know where I can find the encoding for the RV32M extensions? I've got to the point where the binary I am using uses DIVU...

Sure, but you don't need them. If you're using freedom-e-sdk then just use a build command like:

Code: [Select]
make software PROGRAM=hello RISCV_ARCH=rv32i

Everything is in the "RV32/64G Instruction Set Listings" of the ISA manual. https://github.com/riscv/riscv-isa-manual/blob/master/release/riscv-spec-v2.2.pdf
« Last Edit: December 15, 2018, 12:41:04 pm by brucehoult »
 

Offline josip

  • Regular Contributor
  • *
  • Posts: 69
  • Country: hr
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #130 on: December 15, 2018, 12:05:15 pm »
I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming

I am coding CM0+ in assembler, and didn't found any unpleasant surprises till now. Coming from MSP430 (20-bit CPUvX2) assembler.

Also, for comparing code that is executing on different MCU's, relevant is number of cycles, not number of instructions.
« Last Edit: December 15, 2018, 12:07:32 pm by josip »
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #131 on: December 15, 2018, 01:02:43 pm »
I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming

I am coding CM0+ in assembler, and didn't found any unpleasant surprises till now. Coming from MSP430 (20-bit CPUvX2) assembler.

Definitely Thumb1 (which is what CM0 basically is) is not awful. I spent three years programming the ARM7TDMI in assembly language and we did 95+% of the code in Thumb and ARM only where necessary because of things missing in Thumb.

Mostly it's just a bit short of registers that can be used by all instructions, and it's tricky to incorporate the hi registers.

Quote
Also, for comparing code that is executing on different MCU's, relevant is number of cycles, not number of instructions.

Number of clock cycles depends not only on the instruction set but on the implementation, for example single or multiple issue, in-order or out-of-order.

Also, even within, say, single-issue in-order implementations you have effects such as a CPU with a 2-stage pipeline might use slightly fewer clock cycles than a CPU with a 5-stage pipeline because fewer cycles are wasted in pipeline flushes after conditional branches. *But* the CPU with a 2-stage pipeline will almost certainly be capable of a lower maximum MHz than the CPU with the 5-stage pipeline, given the same manufacturing technology for both.

There are also instruction set features that allow programs in one ISA to use fewer instructions and clock cycles than programs in another ISA, but that increase the work required within each clock cycle enough to limit the MHz to lower than the other ISA.

These days you also have to consider the silicon area used by a CPU, and the energy consumed in executing a complete program.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #132 on: December 15, 2018, 02:44:49 pm »
Number of clock cycles depends not only on the instruction set but on the implementation

The best example is the div unit.


(old-school 8bit traditional division algorithm)

Intel developed a super fast Newton-Raphson-ish method that takes fastly converges to the result, while others methods take 1 clock cycle per bit + a residual, thus say ... DIV U/S 32 bit is computed in 33-34 clock cycles. Newton-Raphson-ish methods converge in a quarter of cycles or less.

The pipeline needs to be stalled during computation.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #133 on: December 15, 2018, 02:48:17 pm »
OT:
a bit of humor about acronyms used for instruction-set  :D
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #134 on: December 15, 2018, 03:10:09 pm »
Every ISA has some sort of history. It was designed for the conditions and tasks which were important back then. Then it evolved to meet new requirements. While doing so, the designers had to maintain backward compatibility. Thus most existing ISA have lots of ugly details where the old standards didn't come along with new requirements.

RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Without any doubts, you can create CISC ISA which will provide better code density, the same way as Huffman compression will always take less space than plain text. Or, you can create a totally different CISC ISA for high deterministic performance. I don't see anything wrong with comparing RISC and CISC code. Such comparisons show the differences very well, even though it's hard to come up with formal criteria.
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #135 on: December 15, 2018, 04:48:39 pm »
Or, in the case of ARM, it's nice that the ABI and the hardware agree on which registers get saved, so that ISR functions and normal C functions are indistinguishable.   I guess.  Other times I wish the ISRs in C code were more easily distinguishable, and that the HW interrupt entry was quicker...

Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #136 on: December 15, 2018, 05:04:07 pm »
What I really need is a reference book for the RISC-V that covers all the hardware details.  Not just at 10,000 feet up but right down in the dirt.  Something I can convert from text to HDL or, better yet, maybe the HDL is given.

Are there any such references?

I think that this is the key of the RISC-V ethos - it is just the ISA specification. What you do with it is up to you.

As long as your hardware runs the RISC-V RV32I (+ whatever extensions) you don't have to worry too much about the software tooling.

RISC-V it isn't a hardware specification - it is a specification of the interface between the software layer and digital logic layers. If you build a CPU that implements RISC-V, you have a ready-made software layer.

I think I am coming at this from the other end.  I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks,  But, as long as I'm implementing something, it might as well be for a modern ISA.  The two are tied together, without doubt, but an ISA without hardware is pretty meaningless. 

In some ways, it's like the 8086 I designed using AMD Am2900 series logic for a class I took back in the early '80s.  It looked great on paper (well, it was more like 'adequate') but I will never know if it actually worked.  Microcode, all the way!

All those with a copy of Mick and Brick raise your hands!  Nobody remembers the title of the book but they sure remember who wrote it!
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #137 on: December 15, 2018, 05:54:05 pm »
I think I am coming at this from the other end.  I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks,  But, as long as I'm implementing something, it might as well be for a modern ISA.  The two are tied together, without doubt, but an ISA without hardware is pretty meaningless. 

It will be easier to implement RISC-V ISA and you're likely to make it run at faster clock speeds. Of course, you can probably do better if you design your own RISC ISA which is specifically suited for your particular hardware (such as specific FPGA), but not by much, and with RISC-V  you get free software tools.
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #138 on: December 15, 2018, 06:10:06 pm »
I think I am coming at this from the other end.  I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks,  But, as long as I'm implementing something, it might as well be for a modern ISA.  The two are tied together, without doubt, but an ISA without hardware is pretty meaningless. 

It will be easier to implement RISC-V ISA and you're likely to make it run at faster clock speeds. Of course, you can probably do better if you design your own RISC ISA which is specifically suited for your particular hardware (such as specific FPGA), but not by much, and with RISC-V  you get free software tools.

I think the software tools is the whole idea.  There are lots of interesting CPUs to emulate (think CDC 6400) but unless the software is out in the wild, the CPU is useless.

The LC3 project has an assembler and C compiler so it is actually a reasonable project.  The documentation for the project makes no attempt at pipelining and, since it is an undergrad project, that's as it should be.

I have the "Reader" book and it's quite good.  I've read about 1/3 of it.

The other day I was reading something about generic RISC architectures and it went in to great detail about hazards.  Yes, the taken branch is one example but it's trivial - flush the pipeline and restart.  The more interesting problems are hazards where a register is being written at one stage and is an operand for an instruction in the pipeline.  There are many examples where the datapath needs to pass results backwards in the pipeline.  Detecting and controlling the path is the design issue that concerns me.

It would be pretty easy to design a multi-cycle version of the RISC-V and that's probably where I will start but the end goal is a fully pipelined CPU. Hamster_nz's work will be a good start.

My HiFive1 board showed up today and the diagnostic screen comes up in PuTTY.
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 6351
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #139 on: December 15, 2018, 09:53:32 pm »
I have VS Code and PlatformIO installed and I can build the blinking LED example from the videos.  What I haven't tumbled to is how to get Debug to work.  If I attempt to debug, the .elf file is created, a bunch of messages pour out on the terminal then, after a few second timeout, I get an error dialog that says the connection was refused.

I wandered through PlatformIOs site and while they extol the virtues of the debugger, I can't seem to find PHD type instructions (Push Here Dummy).  There doesn't seem to be much help on the SiFive site either.  Or, I missed it...

Any hints?
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #140 on: December 15, 2018, 11:13:30 pm »
I have VS Code and PlatformIO installed and I can build the blinking LED example from the videos.  What I haven't tumbled to is how to get Debug to work.  If I attempt to debug, the .elf file is created, a bunch of messages pour out on the terminal then, after a few second timeout, I get an error dialog that says the connection was refused.

I wandered through PlatformIOs site and while they extol the virtues of the debugger, I can't seem to find PHD type instructions (Push Here Dummy).  There doesn't seem to be much help on the SiFive site either.  Or, I missed it...

Any hints?

The videos at the start of this thread show exactly how to use the debugger in PlatformIO.

Sadly, you have to get "pro" and pay $10/month for the privilege -- or at least sign up for the 30 day free trial.

SiFive's Eclipse-based "Freedom Studio" does debugging for free. Or you can use gdb on the command line. The secret there is to open OpenOCD in one terminal and gdb in another. The HiFive1 Getting Started document shows how.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #141 on: December 15, 2018, 11:45:04 pm »
RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

The minor 32 bit or 64 bit ISAs are a different matter. I think they're dead. Andes have shipped billions of cores using their propriety nds32 ISA, and it just recently got accepted into the main Linux kernel repository, but they're switching to RISC-V. The same with C-SKY. Pretty much everyone using ARC or Xtensa is likely to switch to RISC-V on their next major redesign or for new projects. I wouldn't be surprised to see MicroChip convert their 32 bit PIC line from MIPS to RISC-V.

Quote
Without any doubts, you can create CISC ISA which will provide better code density, the same way as Huffman compression will always take less space than plain text.

I think that ignores two things:

1) modern RISC ISAs such as Thumb2 and RISC-V are already Huffman encoded.

2) 8086 is nowhere near Huffman encoded. It's encoded as "if it doesn't need any arguments then it gets a short encoding". Just look at AAA, AAD, AAM, AAS, ADC, CLC, CLD, CLI, CMC, DAA, DAS, HLT, IN, INT, INTO, IRET, JNP, JO, JP, JPE, JPO, LAHF, OUT, RCL, RCR, SAHF, SBB, STC, STD, STI, XLATB. That's 31 instructions -- almost 1/8th of the opcode space -- taken up by instructions that are either statistically never used (especially now), or that even in 8086 days were not used often enough to justify a 1-byte encoding (plus offset for the Jumps). Most of them probably do need to exist (or did) but the effect on program size or speed if they'd been hidden away in a secondary opcode page would be minuscule. And those opcodes could have been used for something useful.

The same with VAX. *Every* instruction gets a 1-byte opcode, followed by the arguments. The length of the instructions is decided by the number and size of arguments, not by the frequency of use of the instruction.
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #142 on: December 16, 2018, 12:01:55 am »
Quote
Quote
CM0 ... has a bunch of unpleasant surprises
I am coding CM0+ in assembler, and didn't found any unpleasant surprises

It's mostly the lack of "op2" and the limited range of literal values in the instructions that still have them.


My surprises show up when initializing periperals.  I expected code like:
Code: [Select]
       PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
       PORT->Group[0].DIRSET.reg |= 1<<12;
       
To be implementable with code something like:
Code: [Select]
       ldr r1, =(PORT + <offset of GROUP[0]>)
       ldr r2, [r1, #<offset of PINCFG[12]>]
       orr r2, #PORT_PINCFG_DRVSTR
       str r2, [r1, #<offset of PINCFG[12]>]
       ldr r2, [r1, #PORT_DIRSET]
       orr r2, #4096
       str r2, [r1, #PORT_DIRSET]

Instead, you run into "orr doesn't have immediate arguments any more" and "PINCFG is beyond the range allowed by the [r, #const] encoding", so the code takes an extra 5 instructions and two additional registers.  The extra instructions may be a wash with the 32bit forms on the v7m chips, but having to use the extra registers (out of the limited set available) is ... annoying.

Now, what Bruce's example code seems to demonstrate is that the "peripheral initialization" is essentially a degenerate case and that the issues I'm complaining about show up less in the "meat" of a real program.  That could be, and it's an interesting result.

(I was impressed by the RV32i summary that was posted, WRT the impressive array of "immediate" operands.  But I haven't looked too carefully to see if it does the things I want.)
« Last Edit: December 16, 2018, 02:10:00 am by westfw »
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #143 on: December 16, 2018, 12:20:30 am »
Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

*They* have Acorn (where Arm was born) and RISC-PC computers, manufactured and used in the UK. I love my R/600, it comes with a 586 hardware emulator (it's called "guest PC card") so I can also run DOS programs as well as RISC-OS applications  :D

The best and more interesting is the Desktop Development Environment (DDE), a full-featured development suite of tools required to build Applications for RISCOS (mine is v4.39 Adjust/classic). It dates back to the days when Acorn developed RISC-OS and is derived from the in-house development tools. It includes:
- C compiler optimised to producing efficient ARM code
- ARM assembler, more powerful and advanced than any current Open Source ARM assembler
- Makefile utility
- Desktop debugger
- GUI resource file editor
- Object compression/decompression tools
- Intelligent ARM disassembler
- ABC (Archimedes BASIC compiler) to convert BBC BASIC source into machine code
- ARM Cortex A8 instruction timing simulator
- Comprehensive full documentation

It's great for both classic machines (RiscPC/600 with StrongArm, 26bit-space) and newer ones (misc/Cortex A8, 32bit-space),  suitable for running on and producing both 26 & 32-bit versions of RISC-OS.

I think RISC-V would be more interesting if a similar solution (a RISC-V workstation + RISC-V/OS and DDE) existed  :D



Besides, another great motivation for Arm is ... the Nintendo GBA with its low-cost development kit (200 euro all inclusive): yet again RISC-V would be more interesting if a mini-video-game portable console existed.

 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #144 on: December 16, 2018, 12:26:18 am »

(NUMWorks, ARM-based)

Probably I will buy a tiny RISC-V board to develop a pocket calculator. This idea sounds really intriguing to me  :D

I have already reverse engineered a CASIO Graphics calculator, thus I can re-use the keyboards, I just need a proper LCD ... and a motherboard. The software can be derived from the NUMWorks's project (opensource).
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #145 on: December 16, 2018, 12:45:20 am »
RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

I'm sure MIPS is not that much worse, but everyone chooses ARM. Do you really think Xilinx used ARM cores in Zynq because of the technical merit? I don't think so. It's pure marketing. Popularity. People want ARM, Xilinx gives them ARM. But popularity comes and goes. When the next popular think emerges, the old one dies very quickly.

I wouldn't be surprised to see MicroChip convert their 32 bit PIC line from MIPS to RISC-V.

After their failure with MIPS and PIC32, I'm sure they won't want to miss the opportunity with RISC-V.

1) modern RISC ISAs such as Thumb2 and RISC-V are already Huffman encoded.

This only applies to single instructions. If you analyze the real code generated by compilers, you can find multi-instruction frequent combinations. For example, in your RV32I ISA, setting a single bit in memory takes 3 instructions - 12 bytes. IMHO, in real life the Huffman code for this action would be much shorter.

2) 8086 is nowhere near Huffman encoded. It's encoded as "if it doesn't need any arguments then it gets a short encoding". Just look at AAA, AAD, AAM, AAS, ADC, CLC, CLD, CLI, CMC, DAA, DAS, HLT, IN, INT, INTO, IRET, JNP, JO, JP, JPE, JPO, LAHF, OUT, RCL, RCR, SAHF, SBB, STC, STD, STI, XLATB. That's 31 instructions -- almost 1/8th of the opcode space -- taken up by instructions that are either statistically never used (especially now), or that even in 8086 days were not used often enough to justify a 1-byte encoding (plus offset for the Jumps). Most of them probably do need to exist (or did) but the effect on program size or speed if they'd been hidden away in a secondary opcode page would be minuscule. And those opcodes could have been used for something useful.

Of course, it has long history, so the coding is far from perfect. I'm sure, if they started from scratch now, they would have much better encoding in terms of numbers of bytes.

Many things, such as ENTER, LEAVE, LODS, STOS, SCAS, CMPS do save lots of bytes, but are not efficient, so nobody uses them.

BTW: JP and JPE is the same code (also JNP is the same as JO).
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #146 on: December 16, 2018, 02:12:14 am »
Quote
If RISC-V spreads, it should outcompete ARM fairly quickly.
I think you underestimate the effectiveness and importance of a large marketing, sales, and support organization...
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #147 on: December 16, 2018, 03:55:57 am »
Quote
If RISC-V spreads, it should outcompete ARM fairly quickly.
I think you underestimate the effectiveness and importance of a large marketing, sales, and support organization...

Yes, I'm bad at marketing.

But, if Apple (or Google) decides that their phones batteries can last 30% longer with RISC-V, it'll get all the marketing it needs. Of course, this may not happen, and RISC-V gets forgotten. Impossible to see the future is :)
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #148 on: December 16, 2018, 04:43:14 am »
My surprises show up when initializing periperals.  I expected code like:
Code: [Select]
       PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
       PORT->Group[0].DIRSET.reg |= 1<<12;
       
To be implementable with code something like:
Code: [Select]
       ldr r1, =(PORT + <offset of GROUP[0]>)
       ldr r2, [r1, #<offset of PINCFG[12]>]
       orr r2, #PORT_PINCFG_DRVSTR
       str r2, [r1, #<offset of PINCFG[12]>]
       ldr r2, [r1, #PORT_DIRSET]
       orr r2, #4096
       str r2, [r1, #PORT_DIRSET]

Instead, you run into "orr doesn't have immediate arguments any more" and "PINCFG is beyond the range allowed by the [r, #const] encoding", so the code takes an extra 5 instructions and two additional registers.  The extra instructions may be a wash with the 32bit forms on the v7m chips, but having to use the extra registers (out of the limited set available) is ... annoying.

I guess there are two options: 1) let the C compiler figure it out, or 2) do something like

Code: [Select]
ldr r1, =(PORT + <offset of GROUP[0]> + #<offset of PINCFG[12]>)
ldr r2, [r1]
ldr r3, #PORT_PINCFG_DRVSTR
orr r2, r3
str r2, [r1]
ldr r1, =(PORT + <offset of GROUP[0]> + #PORT_DIRSET)
ldr r2, [r1]
ldr r3, #4096
orr r2, r3
str r2, [r1]

One extra register and three extra instructions. And four 32-bit values in a nearby constant poo instead of the three you'd have in ARM/Thumb2 mode, if that code was actually valid (I didn't check too hard)

So:
A32 is a total of 7*4 + 3*4 = 40 bytes
T16 is a total of 10*2 + 4*4 = 36 bytes

Some size savings, but not a lot. I *think* T32 would be the same size as the A32.

Quote
Now, what Bruce's example code seems to demonstrate is that the "peripheral initialization" is essentially a degenerate case and that the issues I'm complaining about show up less in the "meat" of a real program.  That could be, and it's an interesting result.

Sure. Computations with values that are already in registers are where 16 bit opcodes shine. That's equally true with PDP11, M68k, Thumb1, RISC-V C, MSP430, SH4. Or even x86 with opcode + ModR/M byte for reg-reg opertions, until it starts needing prefix bytes to set the operand size.

Quote
(I was impressed by the RV32i summary that was posted, WRT the impressive array of "immediate" operands.  But I haven't looked too carefully to see if it does the things I want.)

12 bit immediates and offsets on everything. It's often enough, but you can't do your #4096 as an immediate (only -2048...+2047 is covered). You can do it as LUI t0, #00001. In general you can make any 32 bit constant with LUI t0,#nnnnn; ADDI t0,t0,#nnn, or any 32-bit offset from the PC with LUIPC t0,#nnnnn;ADDI t0,t0,#nnn. Or you can load or store to any 32 bit absolute or PC-relative address with an LUI or AUIPC followed by a load or store with an offset.

As with ARM, there are assembler pseudo ops like LDR so you don't have to worry about the exact instructions used in a particular case.

RISC-V is allergic to constant pools. They are ok in low end processors, but as soon as you get an instruction cache you have the problem that the constant pools will likely get into the instruction cache, but be useless there. And if you have a data cache then instructions around the constant pool get into the data cache, and are useless there. Maybe the compiler/linker could arrange for the constant pools to be in different cache lines to instructions, but I haven't seen that happen.

So RISC-V, along with MIPS, Alpha, and ARM64 prefers using inline code to load constants, even if it needs several instructions to do it.
« Last Edit: December 16, 2018, 04:59:52 am by brucehoult »
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #149 on: December 16, 2018, 04:55:45 am »
Probably I will buy a tiny RISC-V board to develop a pocket calculator. This idea sounds really intriguing to me  :D

I have already reverse engineered a CASIO Graphics calculator, thus I can re-use the keyboards, I just need a proper LCD ... and a motherboard. The software can be derived from the NUMWorks's project (opensource).

You could try the LoFive: https://store.groupgets.com/products/lofive-risc-v

Note: you need a JTAG interface to program it. Most people use the Olimex ARM-USB-TINY-H, but others should work as long as OpenOCD can find them.

But for this low performance task you'd do it just as well using a soft RISC-V core in a small FPGA.

The TinyFPGA A2 *might* just about be big enough, but the BX certainly is and lots of people use them for this purpose.

https://www.crowdsupply.com/tinyfpga/tinyfpga-bx/updates/tinyfpga-b2-and-bx-projects

https://tinyfpga.com/
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #150 on: December 16, 2018, 10:45:02 am »
Quote
Quote
I wish the ISRs in C code [on ARM] that the HW interrupt entry was quicker...
Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?
Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.


Quote
Pretty much everyone using ARC or Xtensa is likely to switch to RISC-V
Espressif too?  Is there any indication that the "mostly China" manufacturers would switch?


Quote
[complaints about CM0 code]I guess there are two options: 1) let the C compiler figure
That's where I got the 4-register version.  Offsets larger than 32 get converted into a MOV of an offset into the 4th register, and "LDR r1,[r2,r3]" addressing mode.  In assembly language, I could presumably add/sub manually from the base register or something, at the expense of ... unpleasantness and cryptic code.


Computations with values that are already in registers are where 16 bit opcodes shine.
I think the big thing I was missing is that in simple assembly programs, arrays might be addressed as "[Rindex, #constantSymbolAddress]", while in an only slightly more complex program, they'll be passed around as pointers, and the double-index-register addressing modes will work just fine.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #151 on: December 16, 2018, 12:40:34 pm »
You could try the LoFive: https://store.groupgets.com/products/lofive-risc-v

Yup, of this size  :D

A little MPU can handle the keyboard (the key-matrix is 9x10), interfacing serially to the CPU, and a small LCD is usually SPI. It sounds something that can be done.

 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #152 on: December 16, 2018, 01:15:15 pm »
My surprises show up when initializing periperals.  I expected code like:
Code: [Select]
       PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
       PORT->Group[0].DIRSET.reg |= 1<<12;

Just for fun, I made a couple of definitions so your code would be compilable and tried it on a few things.

Code: [Select]
#include <stdint.h>

 #define PORT_PINCFG_DRVSTR (1<<7)

struct {
    struct {
        struct {
            uint32_t foo;
            uint32_t reg;
            uint32_t bar;
        } PINCFG[16];
        struct {
            uint64_t baz;
            uint32_t reg;
        } DIRSET;
    } Group[10];
} *PORT = (void*)0xdecaf000;

void main(){
    PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
    PORT->Group[0].DIRSET.reg |= 1<<12;
}

And I checked it with for example:

Code: [Select]
arm-linux-gnueabihf-gcc -O initPorts.c -o initPorts -nostartfiles && \
arm-linux-gnueabihf-objdump -D initPorts | expand | less -p'<main>'

So ... ARMv7 (Thumb2):

Code: [Select]
000001c0 <main>:
 1c0:   4b07            ldr     r3, [pc, #28]   ; (1e0 <main+0x20>)
 1c2:   447b            add     r3, pc
 1c4:   681b            ldr     r3, [r3, #0]
 1c6:   f8d3 2094       ldr.w   r2, [r3, #148]  ; 0x94
 1ca:   f042 0280       orr.w   r2, r2, #128    ; 0x80
 1ce:   f8c3 2094       str.w   r2, [r3, #148]  ; 0x94
 1d2:   f8d3 20c8       ldr.w   r2, [r3, #200]  ; 0xc8
 1d6:   f442 5280       orr.w   r2, r2, #4096   ; 0x1000
 1da:   f8c3 20c8       str.w   r2, [r3, #200]  ; 0xc8
 1de:   4770            bx      lr
 1e0:   00010e3a        andeq   r0, r1, sl, lsr lr

00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}

Arm32:

Code: [Select]
000001c0 <main>:
 1c0:   e59f3020        ldr     r3, [pc, #32]   ; 1e8 <main+0x28>
 1c4:   e08f3003        add     r3, pc, r3
 1c8:   e5933000        ldr     r3, [r3]
 1cc:   e5932094        ldr     r2, [r3, #148]  ; 0x94
 1d0:   e3822080        orr     r2, r2, #128    ; 0x80
 1d4:   e5832094        str     r2, [r3, #148]  ; 0x94
 1d8:   e59320c8        ldr     r2, [r3, #200]  ; 0xc8
 1dc:   e3822a01        orr     r2, r2, #4096   ; 0x1000
 1e0:   e58320c8        str     r2, [r3, #200]  ; 0xc8
 1e4:   e12fff1e        bx      lr
 1e8:   00010e34        andeq   r0, r1, r4, lsr lr

00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}
/code]

Thumb1:

[code]
000001c0 <main>:
 1c0:   4b07            ldr     r3, [pc, #28]   ; (1e0 <main+0x20>)
 1c2:   447b            add     r3, pc
 1c4:   681b            ldr     r3, [r3, #0]
 1c6:   2194            movs    r1, #148        ; 0x94
 1c8:   2280            movs    r2, #128        ; 0x80
 1ca:   5858            ldr     r0, [r3, r1]
 1cc:   4302            orrs    r2, r0
 1ce:   505a            str     r2, [r3, r1]
 1d0:   3134            adds    r1, #52 ; 0x34
 1d2:   2280            movs    r2, #128        ; 0x80
 1d4:   0152            lsls    r2, r2, #5
 1d6:   5858            ldr     r0, [r3, r1]
 1d8:   4302            orrs    r2, r0
 1da:   505a            str     r2, [r3, r1]
 1dc:   4770            bx      lr
 1de:   46c0            nop                     ; (mov r8, r8)
 1e0:   00010e3a        andeq   r0, r1, sl, lsr lr

 00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}

Arm64

Code: [Select]
00000000000002ac <main>:
 2ac:   b0000080        adrp    x0, 11000 <PORT>
 2b0:   f9400000        ldr     x0, [x0]
 2b4:   b9409401        ldr     w1, [x0, #148]
 2b8:   32190021        orr     w1, w1, #0x80
 2bc:   b9009401        str     w1, [x0, #148]
 2c0:   b940c801        ldr     w1, [x0, #200]
 2c4:   32140021        orr     w1, w1, #0x1000
 2c8:   b900c801        str     w1, [x0, #200]
 2cc:   d65f03c0        ret

0000000000011000 <PORT>:
   11000:       decaf000        .word   0xdecaf000
   11004:       00000000        .word   0x00000000

Thumb1:

Code: [Select]
000001c0 <main>:
 1c0:   4b07            ldr     r3, [pc, #28]   ; (1e0 <main+0x20>)
 1c2:   447b            add     r3, pc
 1c4:   681b            ldr     r3, [r3, #0]
 1c6:   2194            movs    r1, #148        ; 0x94
 1c8:   2280            movs    r2, #128        ; 0x80
 1ca:   5858            ldr     r0, [r3, r1]
 1cc:   4302            orrs    r2, r0
 1ce:   505a            str     r2, [r3, r1]
 1d0:   3134            adds    r1, #52 ; 0x34
 1d2:   2280            movs    r2, #128        ; 0x80
 1d4:   0152            lsls    r2, r2, #5
 1d6:   5858            ldr     r0, [r3, r1]
 1d8:   4302            orrs    r2, r0
 1da:   505a            str     r2, [r3, r1]
 1dc:   4770            bx      lr
 1de:   46c0            nop                     ; (mov r8, r8)
 1e0:   00010e3a        andeq   r0, r1, sl, lsr lr

00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}

RISC-V rv32ic: (without c is identical except all instructions take 4 bytes. 64 bit is identical except for a "ld" to get <PORT> and the pointer is 8 bytes instead of 4)

Code: [Select]
00010074 <main>:
   10074:       67c5                    lui     a5,0x11
   10076:       0947a783                lw      a5,148(a5) # 11094 <PORT>
   1007a:       6685                    lui     a3,0x1
   1007c:       0947a703                lw      a4,148(a5)
   10080:       08076713                ori     a4,a4,128
   10084:       08e7aa23                sw      a4,148(a5)
   10088:       0c87a703                lw      a4,200(a5)
   1008c:       8f55                    or      a4,a4,a3
   1008e:       0ce7a423                sw      a4,200(a5)
   10092:       8082                    ret

00011094 <PORT>:
   11094:       f000                    fsw     fs0,32(s0)
   11096:       deca                    sw      s2,124(sp)

M68k:

Code: [Select]
800001ac <main>:
800001ac:       2079 8000 400c  moveal 8000400c <PORT>,%a0
800001b2:       0068 0080 0096  oriw #128,%a0@(150)
800001b8:       0068 1000 00ca  oriw #4096,%a0@(202)
800001be:       4e75            rts

8000400c <PORT>:
8000400c:       deca            addaw %a2,%sp
8000400e:       f000

i686:

Code: [Select]
000001b5 <main>:
 1b5:   e8 20 00 00 00          call   1da <__x86.get_pc_thunk.ax>
 1ba:   05 3a 1e 00 00          add    $0x1e3a,%eax
 1bf:   8b 80 0c 00 00 00       mov    0xc(%eax),%eax
 1c5:   81 88 94 00 00 00 80    orl    $0x80,0x94(%eax)
 1cc:   00 00 00
 1cf:   81 88 c8 00 00 00 00    orl    $0x1000,0xc8(%eax)
 1d6:   10 00 00
 1d9:   c3                      ret   

000001da <__x86.get_pc_thunk.ax>:
 1da:   8b 04 24                mov    (%esp),%eax
 1dd:   c3                      ret   

00002000 <PORT>:
    2000:       00 f0                   add    %dh,%al
    2002:       ca                      .byte 0xca
    2003:       de                      .byte 0xde

SH4:

Code: [Select]
004001b0 <main>:
  4001b0:       07 d1           mov.l   4001d0 <main+0x20>,r1   ! 411000 <PORT>
  4001b2:       12 61           mov.l   @r1,r1
  4001b4:       13 62           mov     r1,r2
  4001b6:       7c 72           add     #124,r2
  4001b8:       26 50           mov.l   @(24,r2),r0
  4001ba:       80 cb           or      #-128,r0
  4001bc:       06 12           mov.l   r0,@(24,r2)
  4001be:       05 92           mov.w   4001cc <main+0x1c>,r2   ! bc
  4001c0:       2c 31           add     r2,r1
  4001c2:       13 52           mov.l   @(12,r1),r2
  4001c4:       03 93           mov.w   4001ce <main+0x1e>,r3   ! 1000
  4001c6:       3b 22           or      r3,r2
  4001c8:       0b 00           rts     
  4001ca:       23 11           mov.l   r2,@(12,r1)
  4001cc:       bc 00           mov.b   @(r0,r11),r0
  4001ce:       00 10           mov.l   r0,@(0,r0)
  4001d0:       00 10           mov.l   r0,@(0,r0)
  4001d2:       41 00           .word 0x0041

00411000 <PORT>:
  411000:       00 f0           .word 0xf000
  411002:       ca de           mov.l   41132c <__bss_start+0x31c>,r14

#InstrCodeDataTotalISA
1032840Thumb2
1040848Arm32
15301040Thumb1
936844Arm64
1032840RISC-V rv64ic
1032436RISC-V rv32ic
1040444RISC-V rv32i
420424M68k
841445i686
13261440SH4

Good old Motorola 68000 wins by miles on both number of instructions and total number of bytes!

Thumb1 and SH4 use a lot of instructions, but are the next smallest in code size after m68k. They're just middle of the pack once you include .data

rv31i is slightly smaller than Arm32 and rv32ic is slightly smaller than Thumb2 in total size. The number of instructions is identical for all of them and rv32i/Arm32 and rv32ic/Thumb2 have the same code size as each other.

rv64ic has one instruction more than Arm64, but the code is 4 bytes smaller. Both have to load a 64 bit pointer from the .data section, costing 4 bytes, but they don't need an intermediate pointer at the end of the function code, saving 4 bytes.
« Last Edit: December 16, 2018, 01:33:52 pm by brucehoult »
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #153 on: December 16, 2018, 04:06:14 pm »
My surprises show up when initializing periperals.  I expected code like:
Code: [Select]
       PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
       PORT->Group[0].DIRSET.reg |= 1<<12;

Just for fun, I made a couple of definitions so your code would be compilable and tried it on a few things.

Code: [Select]
#include <stdint.h>

 #define PORT_PINCFG_DRVSTR (1<<7)

struct {
    struct {
        struct {
            uint32_t foo;
            uint32_t reg;
            uint32_t bar;
        } PINCFG[16];
        struct {
            uint64_t baz;
            uint32_t reg;
        } DIRSET;
    } Group[10];
} *PORT = (void*)0xdecaf000;

void main(){
    PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
    PORT->Group[0].DIRSET.reg |= 1<<12;
}


In SAM, "Group" represents a group of registers 128 bytes long and everything below is just unions. "PORT" would be a fixed location in memory space. So, what the code actually does is setting 2 bits at the fixed memory location.

There's no pointer loading (which takes whopping 50% in Motorola, and 49% in Intel which you decided to compile as position-independent code). Moreover, when someone builds an MCU with RISC-V, they will probably provide some way of setting bits without reading registers, as Atmel did here:

Code: [Select]
PORT->Group[0].DIRSET.reg = 1<<12; // no need for "|="
The register is called DIRSET because writing to it only sets the bits (and the bits which are written "0" remain unchanged), and there's an opposite register called DIRCLR which clears the bits, and also DIRTGL which xors.

The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area, so the whole thing boils down to this:

Code: [Select]
6685                    lui     a3,0x1
0ce7a423                sw      a3,200(a5) ; replace "200" with correct offset from a5

<edit>Can't help it. In dsPIC33 you get:

Code: [Select]
bset LATA,#12
one instruction and 3 bytes (50% compared to RISC-V).

« Last Edit: December 16, 2018, 04:17:30 pm by NorthGuy »
 

Offline lucazader

  • Regular Contributor
  • *
  • Posts: 119
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #154 on: December 16, 2018, 06:25:52 pm »
Quote
Pretty much everyone using ARC or Xtensa is likely to switch to RISC-V
Espressif too?  Is there any indication that the "mostly China" manufacturers would switch?

Yea they are a member of the RISC-V foundation, a "Founding Gold" member, whatever that means.
https://riscv.org/members-at-a-glance/

Judging from timing on when they would have started development on an ESP32 successor, I'd put it at about 50% chance of the switching over to risc-v in the next chip, but a lot higher in the chip after that.
 

Online rhodges

  • Regular Contributor
  • *
  • Posts: 133
  • Country: us
  • Available for embedded projects.
    • My STM8 libraries
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #155 on: December 16, 2018, 06:45:45 pm »

Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.
I have really been enjoying this discussion  :-+

A decade and a half ago, I had the pleasure of working with  a VLIW processor, the Trimedia/Philips PNX1302. It dispatched up to 5 operations per instruction word at 200mhz. It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required. Further, an interrupt only happens when the user code makes a jump. So user code could (with care) use the top 64 between jumps. An interesting and useful side-effect is that user code could assume no interrupts while doing code that needs to be atomic.

I just thought some might find this interesting.
Currently developing STM8. Past includes 6809, Z80, 8086, PIC, MIPS, PNX1302, and some 8748 and 6805.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #156 on: December 16, 2018, 08:09:43 pm »
It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required.

Some modern MCUs have multiple register sets. When an interrupt happens, the new set gets loaded. When it quits, the old one gets restored. It doesn't take any additional time and thus decreases the interrupt latency by a lot. If you have a separate register set for every interrupt level, you never need to save anything.

However, I think in the future, as everything moves to multi-cores, things may get even better. If you assign a designated core to an interrupt, then the core can simply sit there waiting for the interrupt to happen. Then there's no latency except for the short period necessary to synchronize the interrupt signal to the CPU clock.

 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 1433
  • Country: dk
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #157 on: December 16, 2018, 08:40:28 pm »
Quote
Quote
I wish the ISRs in C code [on ARM] that the HW interrupt entry was quicker...
Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?
Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.

slower in the rare case you need to do something in a few cycles with no registers, likely faster in the majority of cases
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #158 on: December 16, 2018, 08:51:26 pm »
However, I think in the future, as everything moves to multi-cores, things may get even better. If you assign a designated core to an interrupt, then the core can simply sit there waiting for the interrupt to happen. Then there's no latency except for the short period necessary to synchronize the interrupt signal to the CPU clock.
The limiting factor here will be memory. You either need to have a dedicated memory per core, which will make the maximum size of the handler inflexible, or deal with concurrent access by multiple cores, which will slow down everything.
Alex
 

Online andersm

  • Super Contributor
  • ***
  • Posts: 1044
  • Country: fi
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #159 on: December 16, 2018, 09:00:04 pm »
Some modern MCUs have multiple register sets. When an interrupt happens, the new set gets loaded. When it quits, the old one gets restored. It doesn't take any additional time and thus decreases the interrupt latency by a lot. If you have a separate register set for every interrupt level, you never need to save anything.
Register banks do make code that need to access registers across priority levels a whole lot messier (eg. task switching using a low-priority interrupt, like is usually done on Cortex-M MCUs, or exception handlers). I guess with modern manufacturing processes the extra state required by the additional register banks isn't a big deal anymore (eg. 31 32-bit registers by 8 banks is a bit less than 1000 bytes).

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 9902
  • Country: us
  • DavidH
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #160 on: December 16, 2018, 09:20:08 pm »
Register banks do make code that need to access registers across priority levels a whole lot messier (eg. task switching using a low-priority interrupt, like is usually done on Cortex-M MCUs, or exception handlers). I guess with modern manufacturing processes the extra state required by the additional register banks isn't a big deal anymore (eg. 31 32-bit registers by 8 banks is a bit less than 1000 bytes).

It does not cost as much due to area now but the register bank is within the critical timing path for the pipeline so it limits performance in an aggressive design.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #161 on: December 16, 2018, 09:32:49 pm »
However, I think in the future, as everything moves to multi-cores, things may get even better. If you assign a designated core to an interrupt, then the core can simply sit there waiting for the interrupt to happen. Then there's no latency except for the short period necessary to synchronize the interrupt signal to the CPU clock.
The limiting factor here will be memory. You either need to have a dedicated memory per core, which will make the maximum size of the handler inflexible, or deal with concurrent access by multiple cores, which will slow down everything.

I have ideas for this too. Most of the cores should have very limited amount of dedicated regular memory, but they will have one or more deep hardware FIFOs. The other end of the FIFOs may be muxed to other cores, which provides wide address-less communication channels between cores. This removes bus congestion altogether. The central core (or cores), in contrast, will have bigger memory so they can process data.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #162 on: December 16, 2018, 09:36:01 pm »
I have ideas for this too. Most of the cores should have very limited amount of dedicated regular memory, but they will have one or more deep hardware FIFOs. The other end of the FIFOs may be muxed to other cores, which provides wide address-less communication channels between cores. This removes bus congestion altogether. The central core (or cores), in contrast, will have bigger memory so they can process data.
That does not address code memory.
Alex
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #163 on: December 16, 2018, 10:01:03 pm »
That does not address code memory.

Doesn't have to. Code memory can be made completely separate from data memory. Each peripheral core has its own limited amount of code memory which can be programmed by the central core as needed. Small memories can be made very fast. This ensures very fast deterministic execution for the peripheral cores. In contrast, the central core doesn't have to be deterministic - may have caches, pipelines - if it ever needs access to data, it all gets smoothed out by FIFOs.

 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #164 on: December 16, 2018, 10:03:38 pm »
Code memory can be made completely separate from data memory.
That's exactly what I'm talking about. You will essentially limit what your "interrupt" handler can do by defining the amount of code memory it has. I think this will be enough of a limitation to make this system impractical. At least for common microcontroller uses. It may be useful in an MPU environment. Kind of like ARM's big.LITTLE stuff.
Alex
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #165 on: December 16, 2018, 10:05:51 pm »
... Further, an interrupt only happens when the user code makes a jump... An interesting and useful side-effect is that user code could assume no interrupts while doing code that needs to be atomic.

I just thought some might find this interesting.
I found that very interesting!
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #166 on: December 16, 2018, 10:17:36 pm »
Quote
[ARM Cortex NVIC register stacking] likely faster in the majority of cases

I'm not convinced.  We're talking register stacking, probably limited by memory speed, and taking all of 1 instruction (push multiple) in the ISR to save exactly which ones you need...


Quote
The register is called DIRSET because writing to it only sets the bits

Yeah, ....DIRSET |= bitmask; was not the best example.


Quote
The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area

Maybe.  32bit processors tend to really spread those IO registers out, perhaps occupying more than even a reasonable offset constant for indexed addressing.And constant-folding upper bits of an address might be too much to ask of a compiler.   I remember looking at PIC32 code (MIPS), which loads 32bit constants half-at-a-time (LUI/ORI), and being disappointed that it it kept re-loading the same upper value.  OTOH, I think Microchip was defining those symbols at link time rather than in C source, so there wasn't much choice...  (This was quite a while ago.  Maybe now, with LTO and similar, it does better.)
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #167 on: December 16, 2018, 10:36:09 pm »
Code memory can be made completely separate from data memory.
That's exactly what I'm talking about. You will essentially limit what your "interrupt" handler can do by defining the amount of code memory it has. I think this will be enough of a limitation to make this system impractical. At least for common microcontroller uses.

You do not need a lot of memory for peripheral cores - you need speed and determinism. And that is what MCUs are lacking now. You always can have a central core with enormous amount of memory to do any kind of processing.

The approach where you have a single memory bus for both data and code which is accessed simultaneously by CPU and 15 DMA channels through the bus arbiter, is not very suitable for real-time applications.
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 1433
  • Country: dk
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #168 on: December 16, 2018, 10:47:00 pm »
Quote
[ARM Cortex NVIC register stacking] likely faster in the majority of cases

I'm not convinced.  We're talking register stacking, probably limited by memory speed, and taking all of 1 instruction (push multiple) in the ISR to save exactly which ones you need...


but before you get to your push multiple, first the core has read the vector table and fetch the first instruction of the ISR (prolog)
done automatically it can often be done in parallel


 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #169 on: December 16, 2018, 11:16:19 pm »
Quote
The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area
Maybe.  32bit processors tend to really spread those IO registers out, perhaps occupying more than even a reasonable offset constant for indexed addressing.And constant-folding upper bits of an address might be too much to ask of a compiler.   I remember looking at PIC32 code (MIPS), which loads 32bit constants half-at-a-time (LUI/ORI), and being disappointed that it it kept re-loading the same upper value.

Microchip went overboard with spreading the registers all over the place in PIC32. There's no reason for that. In PIC24, everything fits into 2048 bytes quite nicely, even with space to spare. RISC-V has only 4096 reach, but I think this is Ok for hardware registers.

If you locate all your peripheral registers at the beginning of the memory space, you already have the zero register which creates free zero base for you. So, you have 2048 bytes which are easily accessible. Good place for hardware registers.

It would be full 4096 bytes, but RISC-V went the traditional sign-extended (instead of more reasonable zero-extended) road for offsets. Although addresses 0xfffff000 to 0xffffffff may be used for peripheral registers too.

OTOH, I think Microchip was defining those symbols at link time rather than in C source, so there wasn't much choice... 

That's true. Although it's not a very good idea. I remember I had to copy definitions from the linker scripts to the inc files when I was working with PIC24.

 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #170 on: December 17, 2018, 12:25:56 am »
In SAM, "Group" represents a group of registers 128 bytes long and everything below is just unions.

I don't suppose the exact sizes matter much, as long as you stay within what can be done with a simple offset.

Quote
"PORT" would be a fixed location in memory space. So, what the code actually does is setting 2 bits at the fixed memory location.

Setting two bits at fixed locations .. yep .. that's what I compiled.

Quote
There's no pointer loading (which takes whopping 50% in Motorola, and 49% in Intel which you decided to compile as position-independent code).

I compiled them the way they came. None of the other ISAs have problems using PC-relative addressing.

You need to get the address of the hardware registers *somehow*. Now, it's true that you'd probably get slightly smaller code using the address of "PORT" as a #define instead of as a global variable, but that's the same for all ISAs and doesn't favour one over another.

Quote
Moreover, when someone builds an MCU with RISC-V, they will probably provide some way of setting bits without reading registers, as Atmel did here:

Code: [Select]
PORT->Group[0].DIRSET.reg = 1<<12; // no need for "|="
The register is called DIRSET because writing to it only sets the bits (and the bits which are written "0" remain unchanged), and there's an opposite register called DIRCLR which clears the bits, and also DIRTGL which xors.

I took the C code exactly as given by westfw, which also matches the ARM assembly language he gave in loading, ORing, and storing.

Incidentally, RISC-V *does* have a way to change bits without bringing the data to the CPU and back, but it seemed unfair to use it. I'm concentrating here on compiled C code.

AMOOR.W res,addr,val

This sends a message with the address, value, and operation out over the TileLink bus. If all the channels of the bus go as far as the peripheral, then the peripheral itself will do the OR operation locally and report back the new value. If at some point on the way to the peripheral the bus narrows to just a simple read/write bus then the controller at that point will do the read/modify/write and report the result back to the CPU.

Quote
The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area, so the whole thing boils down to this:

Code: [Select]
6685                    lui     a3,0x1
0ce7a423                sw      a3,200(a5) ; replace "200" with correct offset from a5

Sure, of course. But that value has to *get* into a5 somehow, and I showed that.

If I'd chosen to put the code into a function that took PORT as an argument then *all* of the ISAs would show shorter code.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #171 on: December 17, 2018, 12:39:43 am »
A decade and a half ago, I had the pleasure of working with  a VLIW processor, the Trimedia/Philips PNX1302. It dispatched up to 5 operations per instruction word at 200mhz. It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required.

You can do this on any CPU with a reasonably large number of registers. It's just a matter of documenting it and making sure the compiler (and/or assembly language programmers) know about it.

Even three or four registers is enough for many interrupt routines, so you could reasonably do this on machines with 16 registers -- but 32 would be better.

Quote
Further, an interrupt only happens when the user code makes a jump. So user code could (with care) use the top 64 between jumps. An interesting and useful side-effect is that user code could assume no interrupts while doing code that needs to be atomic.

This is a nice property. I've worked on a machine that (potentially) switched threads after every "block" of code -- not quite a basic block as there was a way to do if/then/else and small loops within a block, but there was a limit on the number of instructions executed in the block. Once you were in a block you were guaranteed NOT to be interrupted. And there was a bank of 8 fast registers (1 cycle latency) that could be used within a block but went *poof* at the end of the block. The 256 global registers had several cycles more latency than that.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #172 on: December 17, 2018, 12:47:40 am »
It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required.

Some modern MCUs have multiple register sets. When an interrupt happens, the new set gets loaded. When it quits, the old one gets restored. It doesn't take any additional time and thus decreases the interrupt latency by a lot. If you have a separate register set for every interrupt level, you never need to save anything.

I don't know about "modern". The Z80 did this. Old ARM chips had a set of registers for every privilege level (not necessarily a whole set). And SPARC and Itanium had register windows that were used nto only by interrupts, but by function calls.

There are two problems with this that explain why no one does it any more:

1) at some point you run out and want three sets instead of two, or seventeen sets instead of sixteen. And then you have a whole lot of delay while you swap stuff. And you have to swap the entire set of registers even if the function/task using them is only using a small proportion of them.

2) it's just a huge waste of hardware resources that, in the end, is not actually used all that effectively. You're better off spending those transistors on something else -- such as a cache or write buffer that can absorb manually saved registers quickly on interrupts, but also makes your normal code run faster the rest of the time as well.

« Last Edit: December 17, 2018, 01:07:45 am by brucehoult »
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #173 on: December 17, 2018, 12:58:33 am »
Quote
Quote
I wish the ISRs in C code [on ARM] that the HW interrupt entry was quicker...
Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?
Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.

slower in the rare case you need to do something in a few cycles with no registers, likely faster in the majority of cases

Not faster. If the hardware managed to write those 8 words to memory (or at least to a write buffer or something) in one or two clock cycles then it would be faster. But it doesn't. Cortex M3, M4, M7 all have 12 cycle interrupt latency (M0 has 16). It's sitting there writing those eight registers out at one per clock cycle, exactly the same as you could do yourself in software.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #174 on: December 17, 2018, 01:00:58 am »
Register banks do make code that need to access registers across priority levels a whole lot messier (eg. task switching using a low-priority interrupt, like is usually done on Cortex-M MCUs, or exception handlers). I guess with modern manufacturing processes the extra state required by the additional register banks isn't a big deal anymore (eg. 31 32-bit registers by 8 banks is a bit less than 1000 bytes).

It does not cost as much due to area now but the register bank is within the critical timing path for the pipeline so it limits performance in an aggressive design.

Yes.

Also, there are other ways to use that 1 KB worth of transistors that give more bang for the buck, more of the time.
 

Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #175 on: December 17, 2018, 02:40:34 am »
re: compact 68K code

I may be one of the few, but I like the idea of split register sets in the 68K. Compilers have no problems with A vs. D registers, and at worst it takes one extra move. With that, you can save one bit for register operand specifier. It all can add up.

...and we know that CISC ISA like the x86 can be decoded into micro-RISC-ops, so wonder what a highly tuned 68K, or for that matter, PDP-11/VAX-11 micro-architecture could be like. We can throw away the flags ~_o if  they make a difference and add a couple instructions as mentioned.
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #176 on: December 17, 2018, 03:11:46 am »
re: compact 68K code

I may be one of the few, but I like the idea of split register sets in the 68K. Compilers have no problems with A vs. D registers, and at worst it takes one extra move. With that, you can save one bit for register operand specifier. It all can add up.

Yes, it's quite a natural split, worked well, and seldom caused any problems.

The main problem is just that it pre-determines that every program will want a 50/50 split of data values and pointers and that's not usually the case -- you usually want a lot fewer pointers. It's more or less ok with 8 of each, especially as a couple of address registers get used up by the stack pointer and maybe a frame pointer and a pointer to globals. But I don't think 16 of each would work well.

Quote
...and we know that CISC ISA like the x86 can be decoded into micro-RISC-ops, so wonder what a highly tuned 68K, or for that matter, PDP-11/VAX-11 micro-architecture could be like. We can throw away the flags ~_o if  they make a difference and add a couple instructions as mentioned.

You might be interested in:

http://www.apollo-core.com/

The basic 68000 (or at least 68010) instruction set is good.

The main problem it had was they went in a bad direction with complexity in the 68020 just because, y'know, it's microcode and you can do anything. They had to back away from that in the 68040 and 68060.

Well, maybe the main problem it has was that is was proprietary and owned by a company that stopped caring about it enough to put in the necessary investment. And then Motorola did that *again* with the PowerPC, not putting in the investment necessary to give Apple mobile chips competitive with the Centrino -> Core 2 and forcing Apple into Intel's arms. (IBM's G5 and successors are just fine for professional desktop systems)

Is ColdFire still a thing? It doesn't seem to have had any love since about 2010.

Wikipedia says it topped out at 300 MHz, and it does around 1.58 Dhrystone VAX MIPS/MHz (slightly less than Rocket-based RISC-V).

OK, element14 is showing me 76 SKUs with a maximum 250 MHz but mostly at 50 or 66 MHz. So it's still a thing.
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #177 on: December 17, 2018, 03:24:47 am »
Quote
before you get to your push multiple, first the core has read the vector table and fetch the first instruction of the ISR (prolog) done automatically it can often be done in parallel
I'm not convinced.  CM0 is listed as von Neuman architecture with both flash and ram connected to the same memory bus matrix.  And it always has to save the PC, anyway, so if it could do simultaneous vector fetch (one word) and PC save (one word), it would be caught up by then, more or less.  (and ... I would tend to relocate the vector table to RAM, anyway.)

Quote
Quote
Microchip was defining those symbols at link time
That's true. Although it's not a very good idea.
I actually asked Microchip about it.  They said it let them distribute binary libraries that worked across a range of chips (with identical peripherals at different locations.)  That makes some sense - it's a good thing that disk space is cheap with many vendors distributing close to one library per chip.  (OTOH, not entirely happy with the idea of binary-only libraries.)
Quote
I like the idea of split register sets in the 68K.
It seems to work OK on the 68k.  Partially because there were a lot of them (16 each, right?)  The crop of 8bit chips with "we have TWO index registers!  PLUS a stack!" was depressing...I'm not sure that it buys you much from a hardware implementation PoV - can't you pretty much use the same instruction bit you used to pick "Address or Data" to address twice as many GP registers?  (I don't quite remember which instructions were different between A/D registers.)  Maybe some speed-up from having separate banks?  (there's an idea for optimization: "we have 32 registers organized in 4 banks of 8.  Operations that use registers from different banks can be more easily parallelized..." (Lots of CPUs have done this with Memory.  The Cray-1, for instance: "write your algorithm so that you access memory at 8-word intervals", or something like that.)  Or Disk (remember "interleaving"?))
Quote
wonder what a highly tuned 68K or PDP-11 ... could be like.
Yeah.     I wonder what the internal architecture of the more recent Coldfire chips is like; my impression is that that's about what they've done...The PDP-10 emulator "using an x86 for its microcode interpreter" apparently ran something like 6x faster than the fastest DEC ever built.  (and that was a decade or two ago, I think.)
 

Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #178 on: December 17, 2018, 04:34:33 am »
I don't think there is any new ColdFire implementation in YEARS. Possibly process shrink, if that.

Once all the old HP printer models that used ColdFire are EOL'ed, then that will probably be the end of the line... Oh wait, they are also used in automotive, and those go forever as well. Heck, Tesla *might* have used CPU12 in their cars.

re: 68K registers
It's 8 registers each for address and data.

Motorola had junked so many processor architectures in the 2000s that it's not even funny. 88K was one, and there's also mCore. By the look of it, it should have been competitive, but when even their own phone division wouldn't use it, that's the end.
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 9902
  • Country: us
  • DavidH
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #179 on: December 17, 2018, 04:48:00 am »
I may be one of the few, but I like the idea of split register sets in the 68K. Compilers have no problems with A vs. D registers, and at worst it takes one extra move. With that, you can save one bit for register operand specifier. It all can add up.

Quote
...and we know that CISC ISA like the x86 can be decoded into micro-RISC-ops, so wonder what a highly tuned 68K, or for that matter, PDP-11/VAX-11 micro-architecture could be like. We can throw away the flags ~_o if  they make a difference and add a couple instructions as mentioned.

The 68K had ISA features like double indirect addressing which made it even worse than x86 when scaled up.  The separate address and data registers was one of those features although I do not remember why now.
« Last Edit: December 17, 2018, 08:12:15 pm by David Hess »
 

Online andersm

  • Super Contributor
  • ***
  • Posts: 1044
  • Country: fi
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #180 on: December 17, 2018, 05:49:44 am »
Cortex M3, M4, M7 all have 12 cycle interrupt latency (M0 has 16). It's sitting there writing those eight registers out at one per clock cycle, exactly the same as you could do yourself in software.
The Cortex-M hardware prologue also sets up nested interrupts. So while they have relatively long interrupt latencies, you also get a decent amount of functionality out of those cycles.
« Last Edit: December 17, 2018, 06:44:24 am by andersm »
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #181 on: December 17, 2018, 06:01:36 am »
Motorola had junked so many processor architectures in the 2000s that it's not even funny. 88K was one, and there's also mCore. By the look of it, it should have been competitive, but when even their own phone division wouldn't use it, that's the end.

I just took a close look at the ISA. It's a pretty clean very RISC with fixed-length 16 bit opcodes. No addressing modes at all past register plus (very) short displacement, but it has special instructions designed to help create effective addresses quickly e.g. rd = rd + 4*rs.

Chinese company C-SKY makes a series of CK6nn chips that use the M-CORE instruction set.

They also have a CK8nn series of chips that use a 16/32 bit opcode ISA called C-SKY V2. I'm not sure if it's just an extension of the 600-series ISA.

Anyway, they're switching to RISC-V.

The problem with Motorola ISAs -- 68k, 88k, PowerPC (with IBM), M-CORE -- isn't a technical one. It's that if you tie your company to them then you have a huge risk of being orphaned within a decade.

This, more than any technical superiority, is one of the things that makes RISC-V so attractive.
 

Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #182 on: December 17, 2018, 08:00:26 am »
...It's that if you tie your company to them then you have a huge risk of being orphaned within a decade.

This, more than any technical superiority, is one of the things that makes RISC-V so attractive.

No disagreement from me. I think if some companies back RISC-V based MCU, that would give ARM a serious competition. Of course, finding such company could be difficult. A Chinese company maybe a possibility.
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #183 on: December 17, 2018, 08:38:42 am »
The 68K had ISA features like double indirect addressing which made it even worse than x86 when scaled up.  The separate address and data registers was one of those features although I do not remember why now.

Not in the 68000. The 68020 did that, and even Motorola later realised it's a mistake.

Having memory-to-memory arithmetic is a problem for fast implementations though even in base 68000. x86 stops at reg-mem and mem-reg.

Of course neither one is a problem on big high end implementations that can break it into a bunch of uops and let OoO machinery chew on them.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #184 on: December 17, 2018, 11:37:01 am »
Motorola had junked so many processor architectures in the 2000s that it's not even funny. 88K was one

Data General did a few computers based on 88K, and also a couple of companies located in Japan made 88K-computers (see OpenBSD supported machines), but they were a niche, and their orders weren't so consistent in term of money.

The 88000 appeared too late on the marketplace, later than MIPS and SPARC, and since it was not compatible with 68K it was not competitive at all: Amiga/classic? 68k! Atari ST? 68k! Macintosh/classic? 68k!

In short, Motorola was not happy because they had problems at selling the chip.

Now I know that the 88K was abandoned after the Dash prototype when Motorola was collaborating with the Standford University. It sounds like the last chance to put a foot into the supercomputers field, which was niche but with a lot of money involved, and yet again ... bad luck, since for some obscure reason, someone preferred to go on with MIPS instead than with 88K.

Was it the last lost occasion? Definitively YES, since someone with a lot of money, someone like Silicon Graphics, chose the use the Dash technology combined with MIPS and this was the beginning of the CrayLink2,3,4, ... SGI-supercomputers, yet again a lot of money back.

In such a scenario there was no choice for Motorola: 88k project dropped!

As far as I have understood, IBM was working on S/370 since a long while, and their researching was on the IBM 801 chip, which was the first POWER chip, so ... to make the money Motorola promoted a collaboration with Apple and IBM, which then developed the first PowerPC chip: the MCP601 appeared in 1992, sort of hybrid chip between POWER1 spec and the new PowerPC spec.

This way managers in Motorola were happy. Anyway, this didn't work so long, they these companies drop the collaboration.

Now IBM is on POWER9 which is funded by DARPA, which means a lot of money for IBM. POWER9 workstations and servers are very expensive. Say the entry level for the low spec workstation is no less than 5K USD  :palm:
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #185 on: December 17, 2018, 12:07:33 pm »
The 88000 appeared too late on the marketplace, later than MIPS and SPARC, and since it was not compatible with 68K it was not competitive at all: Amiga/classic? 68k! Atari ST? 68k! Macintosh/classic? 68k!

If 2015 is not too late (or 2012 for Arm64) then 1990 was certainly not too late.

88000 is an excellent ISA, even today, and if someone put good engineers on to making chips and good marketers on to selling it then it could be competitive.

Particular chips have a short lifespan, but a good ISA can be good for 50 years. The main thing is to *start* with a plan for compatible 16, 32, 64, 128 bit pointer/integer successors.

If I've done my arithmetic correctly, if you could somehow store 1 bit on every atom of silicon (or carbon, or ...), then 2^128 bytes of storage would need 100,000,000,000 tonnes of silicon. That's a cube a bit over 40 km on a side.

128 bits is probably going to be enough for a while, even with sparse address spaces.

Quote
In short, Motorola was not happy because they had problems at selling the chip.

No one's fault but Motorola. Great engineers, awful management.

Quote
As far as I have understood, IBM was working on S/370 since a long while, and their researching was on the IBM 801 chip, which was the first POWER chip

No, that's not correct. The IBM 801 was the world's first RISC chip (though that name wasn't invented by Dave Patterson until several years later when he independently came up with the concept) but it's very different to POWER/PowerPC. For a start, it had both 16 bit and 32 bit opcodes that could be freely mixed, an important code density feature that didn't find its way back into RISC until Thumb2 in 2003.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #186 on: December 17, 2018, 12:27:33 pm »
If 2015 is not too late (or 2012 for Arm64) then 1990 was certainly not too late.

It was too late for RISC workstations, due to SPARC and MIPS ones, already promoted and used before Motorola released the 88k, and it was too late for supercomputers, yet again due to MIPS at SGI.

If the "Dash/88k" project at Standford University or the MIT "T/88110MP" project hadn't had failed (at the management level, not at the technical level) ... but they did.

This is a fact!

The IBM 801

801 was a proof of concept, made in 1974. But POWER and PowerPC are derived from the evolution of this POC. Directly and indirectly, since, of course, in 1974 "RISC" was not as we know today, but the idea was already there in the simulator of the first 801. It's written in every Red, Green, and Blue-book published by IBM.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #187 on: December 17, 2018, 12:44:34 pm »
What would happen to IBM-POWER9 if DARPA didn't fund it?
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #188 on: December 17, 2018, 01:10:37 pm »
The IBM 801

801 was a proof of concept, made in 1974. But POWER and PowerPC are derived from the evolution of this POC.

Derived, certainly. But very different.

Btw, the project formally started in October 1975 though some investigation work had been done before that. The first running hardware was in 1978.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #189 on: December 17, 2018, 01:18:48 pm »
My IBM Red, Green, and Blu books (sort of encyclopedia about POWER and PowerPC) point to this article.

Probably to underline that one of their men, mr.Cocke, received the Turing Award in 1987, the US National Medal of Science in 1994, and the US National Medal of Technology in 1991  ;D

To me, it sounds sort of "hey? we are IBM, you might know us for the ugliest thing ever invented - IBM-PeeeeCeeeeee - PersonalComputers and IBM-PC-compatibles computers - which are really shitty, but we also do serious stuff. Don't you believe our words? See that one of our prestigious men received an award for having invented the RISC before any H&P book started talking about it".

IBM is really funny :D
« Last Edit: December 17, 2018, 01:27:27 pm by legacy »
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #190 on: December 17, 2018, 01:37:48 pm »
My IBM Red, Green, and Blu books point to this article.

Yup, that ties in with my sources. In 1974 they wanted to make faster telephone exchanges (my sources said they wanted to handle 300 calls per second, and decided 12 MIPS was needed for that), they did some thinking and wrote effectively a White Paper, did some preliminary design on an instruction set, and then got approval and funding and the 801 project formally kicked off in October 1975.

Your article says first hardware was 1980 compared to my previous message that says 1978 (and your message that 801 was "made in 1974"). I believe 1980 was first production hardware for deployment, or possibly the 2nd prototype after they got experience with the first one and made changes.

One of the changes was dropping the variable length 16/32 bit instructions and going with 32 bit only -- mostly because they needed to support virtual memory in the production model and didn't want to have to support instructions crossing a VM page boundary. The 2nd version also increased the number of registers from 16 to 32, and increased the register size (and addresses) from 24 bits to 32 bits. They also changed from destructive 2-address instructions to 3-address, so although instructions increased in size from an average of about 3 bytes each (common for Thumb2 and RISC-V these days too) to exactly four bytes each, programs needed fewer instructions so the increase in program size was less than 33%.

Quote
Probably to underline that one of their men, mr.Cocke, received the Turing Award in 1987, the US National Medal of Science in 1994, and the US National Medal of Technology in 1991  ;D

Indeed he did, and very well deserved.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #191 on: December 17, 2018, 01:39:47 pm »
See that one of our prestigious men received an award for having invented the RISC before any H&P book started talking about it

I see you edited your message while I was replying to it.

H&P had to wait until 2017 to receive their Turing Awards.

Quote
“The main idea is not to add any complexity to the machine unless it pays for itself by how frequently you would use it. And so, for example, a machine which was being used in a heavily scientific way, where floating point instructions were important, might make a different set of tradeoffs than another machine where that wasn't important. Similarly, one in which compatibility with other machines was important or in which certain types of networking was important would include different features. But in each case they ought to be done as the result of measurements of relative frequency of use and the penalty that you would pay for the inclusion or non-inclusion of a particular feature.”

Joel Birnbaum
FORMER DIRECTOR OF COMPUTER SCIENCES AT IBM
“Computer Chronicles: RISC Computers (1986),”
October 2, 1986

Now there is a guy absolutely on the same page as H&P. (And the people who invented RISC-V: namely P and his students, and his students' students. H is a fan too.)


« Last Edit: December 17, 2018, 01:54:55 pm by brucehoult »
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #192 on: December 17, 2018, 06:05:18 pm »
I compiled them the way they came.

It doesn't matter if you deliberately tweaked the compiler options and offsets to make RISC-V look good, or they magically came out this way. The problem is that your tests do not reflect reality, but rather a blunder of inconsequential side effects.

If you tweak the offsets a different way, and use the default Makefile from my computer, the whole thing goes from this:

Code: [Select]
000001b5 <main>:
 1b5:   e8 20 00 00 00          call   1da <__x86.get_pc_thunk.ax>
 1ba:   05 3a 1e 00 00          add    $0x1e3a,%eax
 1bf:   8b 80 0c 00 00 00       mov    0xc(%eax),%eax
 1c5:   81 88 94 00 00 00 80    orl    $0x80,0x94(%eax)
 1cc:   00 00 00
 1cf:   81 88 c8 00 00 00 00    orl    $0x1000,0xc8(%eax)
 1d6:   10 00 00
 1d9:   c3                      ret   

000001da <__x86.get_pc_thunk.ax>:
 1da:   8b 04 24                mov    (%esp),%eax
 1dd:   c3                      ret   

00002000 <PORT>:
    2000:       00 f0                   add    %dh,%al

to this:

Code: [Select]
08048450 <main>:
 8048450: a1 c0 95 04 08        mov    0x80495c0,%eax
 8048455: 83 48 30 20          orl    $0x20,0x30(%eax)
 8048459: 83 48 40 20          orl    $0x20,0x40(%eax)
 804845d: c3                    ret   

For what it worth, it's now 14 bytes for i386 (plus 4 bytes data, of course), which is now a leader, way better than Motorola, and leaving RISC-V absolutely in the dust.

Here's the tweaked C code:

Code: [Select]
#include <stdio.h>
 #include <stdint.h>

 #define PORT_PINCFG_DRVSTR (1<<5)

struct {
    struct {
        struct {
            uint32_t reg;
        } PINCFG[16];
        struct {
            uint32_t reg;
        } DIRSET;
    } Group[10];
} *PORT = (void*)0xdecaf000;

void main(){
    PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
    PORT->Group[0].DIRSET.reg |= 1<<5;
}

Here's the line from the Makefile:

Code: [Select]
gcc a.c -o c -save-temps -O1 -fomit-frame-pointer -masm=intel

Here's the assembler output

Code: [Select]
.file "a.c"
.intel_syntax noprefix
.text
.globl main
.type main, @function
main:
mov eax, DWORD PTR PORT
or DWORD PTR [eax+48], 32
or DWORD PTR [eax+64], 32
ret
.size main, .-main
.globl PORT
.data
.align 4
.type PORT, @object
.size PORT, 4
PORT:
.long -557125632
.ident "GCC: (GNU) 4.5.0"
.section .note.GNU-stack,"",@progbits




 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #193 on: December 18, 2018, 02:01:52 am »
I compiled them the way they came.

It doesn't matter if you deliberately tweaked the compiler options and offsets to make RISC-V look good, or they magically came out this way. The problem is that your tests do not reflect reality, but rather a blunder of inconsequential side effects.

If you tweak the offsets a different way

Oh come on. You not change only change the data structure (which I freely admit I made up at random, as westfw didn't provide it) to be less than 128 bytes to suit your favourite ISA, you *ALSO* change the bit offsets in the constants to be less than 8 so the masks fit in a byte. If you hadn't done *both* of those then your code would have 32 bit literals for both offset and bit mask, the same as mine, not 8 bit. You also changed the code compilation and linking model from that used by all the other ISAs, which would all benefit pretty much equally from the same change.

And you accuse me of bad faith?
« Last Edit: December 18, 2018, 02:05:28 am by brucehoult »
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3023
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #194 on: December 18, 2018, 02:37:12 am »
Quote
[CM0 and limitations on offsets/constants, making assembly unpleasant]
(I did specifically choose offsets and bitvalues to be "beyond" what CM0 allows.)

As another example, I *think* that the assembly for my CM0 example (the actual data structure is from Atmel SAMD21, but it's scattered across several files) can be improved by accessing the port as 8bit registers instead of 32bit.  All I have to do is look really carefully at the datasheet (and test!) to see if that actually works, rewrite or obfuscate the standard definitions in ways that would confuse everyone and perhaps not be CMSIS-compatible, and remember to make sure that it remains legal if I move to a slightly different chip.

Perhaps I have a high bar for what makes a pleasant assembly language.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #195 on: December 18, 2018, 03:50:18 am »
Oh come on. You not change only change the data structure (which I freely admit I made up at random, as westfw didn't provide it) to be less than 128 bytes to suit your favourite ISA, you *ALSO* change the bit offsets in the constants to be less than 8 so the masks fit in a byte. If you hadn't done *both* of those then your code would have 32 bit literals for both offset and bit mask, the same as mine, not 8 bit. You also changed the code compilation and linking model from that used by all the other ISAs, which would all benefit pretty much equally from the same change.

I restored the offsets to where they were in the original code. I restored the linkage to normal. Masks I admit. But the masks are not important because you can achieve the same effect with byte access, thus the mask never should be more than 8 bits. Of course, the superoptimized C compiler couldn't figure that out, so I had to nudge masks a bit. When we get better compilers, there will be no need to tweak masks, right?

The $1M question is. How is my tweaking is any worse than yours?

And you accuse me of bad faith?

Of course not.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #196 on: December 18, 2018, 04:05:39 am »
Quote
[CM0 and limitations on offsets/constants, making assembly unpleasant]
(I did specifically choose offsets and bitvalues to be "beyond" what CM0 allows.)

As another example, I *think* that the assembly for my CM0 example (the actual data structure is from Atmel SAMD21, but it's scattered across several files) can be improved by accessing the port as 8bit registers instead of 32bit.  All I have to do is look really carefully at the datasheet (and test!) to see if that actually works, rewrite or obfuscate the standard definitions in ways that would confuse everyone and perhaps not be CMSIS-compatible, and remember to make sure that it remains legal if I move to a slightly different chip.

Perhaps I have a high bar for what makes a pleasant assembly language.

x86, 68k and VAX were all designed at a time when maximizing the productivity of the assembly language programmer was seen as one of the highest (if not actual highest) priorities. They'd gone past simply trying to make a computer that worked and even making the fastest computer and come to a point that computers were not only fast *enough* for many applications but had hit a speed plateau. (It's hard to believe now that Apple sold 1 MHz 6502 machines for over *seventeen* years, and the Apple //e alone for 11 years.)

What they had was a "software crisis". The machines had quirky instruction sets that were unpleasant for assembly language programmers -- and next to impossible for the compilers of the time to generate efficient code for.

The x86, 68k and VAX were all vastly easier for the assembly language programmer than their predecessors the 8080, 6800, and PDP-11 (or PDP-10). They also were better for compilers, though people still didn't trust them.

The RISC people came along and said "If you simplify the hardware in *this* way then you can build faster machines cheaper, compilers actually have an easier time making optimal code, and everyone will be using high level languages in future anyway".

I remember the time when RISC processors were regarded as being next to impossible (certainly impractical) to program in assembly language!

A lot of that was because you had to calculate instruction latencies yourself and put dependent instructions far enough away that the result of the previous instruction was already available -- and not doing it meant not just that your program was not as efficient as possible but that it didn't work at all! Fortunately, that stage didn't last long, for two reasons: 1) your next generation CPU would have different latencies (sometimes longer as pipeline lengths increased), meaning old binaries would not work, and 2) as CPUs increased in MHz faster than memory did caches were introduced and then you couldn't predict whether a load would take 2 cycles or 10 and the same code had to be able to cope with 10 but run faster when you got a cache hit.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #197 on: December 18, 2018, 04:35:16 am »
The $1M question is. How is my tweaking is any worse than yours?

That's easy. I'm taking code provided by someone else without any reference to a specific processor and then using default compiler settings (adding only -O, and -fomit-frame-pointer for the m68k as it's the only one that generated a frame otherwise) and seeing how it works out.

You on the other hand worked backwards from a processor to make code that suited it.

If westfw had provided the definitions for the structure he was accessing then I would have used that, as is. But he didn't so I had to come up with something in order to have compilable code.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #198 on: December 18, 2018, 02:31:05 pm »
You on the other hand worked backwards from a processor to make code that suited it.

Haven't you?

Isn't this the way it should be. When you compile for a CPU you select the settings which maximize performance for this particular CPU instead of using settings which produce the bloat. As, by your own admission, you did for Motorola.

If you haven't done this for RISC-V, why don't you tweak it so that it produces better code? Go ahead, try to beat my 14 bytes, or even get remotely close.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #199 on: December 18, 2018, 03:11:04 pm »
You on the other hand worked backwards from a processor to make code that suited it.

Haven't you?

No.

Quote
Isn't this the way it should be. When you compile for a CPU you select the settings which maximize performance for this particular CPU instead of using settings which produce the bloat. As, by your own admission, you did for Motorola.

If you haven't done this for RISC-V, why don't you tweak it so that it produces better code? Go ahead, try to beat my 14 bytes, or even get remotely close.

Not interested in winning some dick size competition. If RISC-V ends up in the middle of the pack and competitive on measures such as code size or number of instructions just by compiling straightforward C code in a wide variety of situations with no special effort then I'm perfectly content. Other factors are then more important.

Everyone is going to "win" at some comparison. x86 can OR a constant with a memory location in a single instruction. Cool. So can dsPIC33. Awsome. That has approximately zero chance of being the deciding factor on which processor is used by anyone.

You didn't change the compiler settings. You changed the semantics of the code -- you changed what problem is being solved.
« Last Edit: December 18, 2018, 03:33:22 pm by brucehoult »
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 1770
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #200 on: December 18, 2018, 06:20:32 pm »
Not interested in winning some dick size competition. If RISC-V ends up in the middle of the pack and competitive on measures such as code size or number of instructions just by compiling straightforward C code in a wide variety of situations with no special effort then I'm perfectly content.

"interested", "content", "bad faith". I don't think in these categories. These are emotions. The reality exists independent of them, and independent from what you (or me) think. Similarly, the truth cannot be voted upon by customers (although, if anything, Intel has way more of them than SiFive).

All things being equal, CISC approach creates better code density than RISC. Because an ability to use more information allows for better compression. This is pure mathematics. If empirical tests show otherwise, the only explanation is faulty methodology.

 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 9902
  • Country: us
  • DavidH
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #201 on: December 18, 2018, 11:09:27 pm »
x86, 68k and VAX were all designed at a time when maximizing the productivity of the assembly language programmer was seen as one of the highest (if not actual highest) priorities. They'd gone past simply trying to make a computer that worked and even making the fastest computer and come to a point that computers were not only fast *enough* for many applications but had hit a speed plateau. (It's hard to believe now that Apple sold 1 MHz 6502 machines for over *seventeen* years, and the Apple //e alone for 11 years.)

They were also designed at a time when memory access time was still long and memory width was still small so tiny instruction lengths and complex instructions were advantageous.  ARM was unusual in being designed to specifically take advantage of the fast page mode memory which had become available leading to instructions like load and store multiple.  I would argue that not blindly adhering to RISC is what made ARM successful in the long run.

Quote
The x86, 68k and VAX were all vastly easier for the assembly language programmer than their predecessors the 8080, 6800, and PDP-11 (or PDP-10). They also were better for compilers, though people still didn't trust them.

I went up the 8080 and Z80 route and loved the 8086 but only dabbled in 6502 which seemed primitive compared to 8080.  Later I become proficient in accumulator centric designs like 68HC11 and PIC and learned to love macro assemblers even more.

Quote
The RISC people came along and said "If you simplify the hardware in *this* way then you can build faster machines cheaper, compilers actually have an easier time making optimal code, and everyone will be using high level languages in future anyway".

The people actually producing commercial RISC designs had a conflict of interest.  What made RISC popular so quickly is that a small design team could do it so suddenly everybody had a 32 bit RISC processor available and was happy to proclaim that RISC is the future.  Where this fell apart is that Intel's development budget was already much greater than the sum of all of the RISC efforts combined.  It did not matter that equivalent performance could be produced for a fraction of the development cost because Intel could afford any development effort.

Development of ARM is slowed by the same problem.  All of the separate ARM development efforts do not join to become Voltron-ARM.  Intel only has to beat the best of them but I expect at some point ARM will catch up if only because Intel has become so dysfunctional.

Quote
A lot of that was because you had to calculate instruction latencies yourself and put dependent instructions far enough away that the result of the previous instruction was already available -- and not doing it meant not just that your program was not as efficient as possible but that it didn't work at all! Fortunately, that stage didn't last long, for two reasons: 1) your next generation CPU would have different latencies (sometimes longer as pipeline lengths increased), meaning old binaries would not work, and 2) as CPUs increased in MHz faster than memory did caches were introduced and then you couldn't predict whether a load would take 2 cycles or 10 and the same code had to be able to cope with 10 but run faster when you got a cache hit.

The lack of interlocking like branch delay slots just ended up being a millstone around the neck of performance.  "Just recompile your software" should be included with "the policeman is your friend" and "the check is in the mail".  Maybe this will change with ubiquitous just-in-time compiling.
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2028
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #202 on: December 19, 2018, 12:43:07 am »
Hey... anybody know if 'The Mill' still grinding away?

https://millcomputing.com/

:popcorn:
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 5833
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #203 on: December 19, 2018, 12:48:36 am »
They probably are, but I imagine at this point it is mostly a VC money sucking enterprise.
Alex
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #204 on: December 19, 2018, 10:09:17 am »
Hey... anybody know if 'The Mill' still grinding away?

https://millcomputing.com/

:popcorn:

They are.

Ivan gave some hints in a recent comp.arch posting.

https://groups.google.com/forum/#!original/comp.arch/bGBeaNjAKvc/zQcA-R6FAgAJ
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4207
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #205 on: December 19, 2018, 12:46:34 pm »
ARM was unusual in being designed to specifically take advantage of the fast page mode memory which had become available leading to instructions like load and store multiple-time compiling.

ARM has grown from a small company called Acorn - maker of some of the earliest home computers, initially used by BBC as kits for kids and students - into one of the world's most important designers of semiconductors, providing the brains for Apple's must-have iPhones and iPads.

Back to the beginning,  in 1978  Acorn Computers is established in Cambridge, and produces computers which are particularly successful in the UK. Acorn's BBC Micro computer was the most widely-used computer in school in the 1980s.

In the same year, Motorola was going to release the 68000, from their MASS program, which engineers in Acorn later (1981-82?) took into consideration for the next generation of their computes.


Sophie Wilson, a British computer scientist and software engineer.

This woman is definitively a superheroine, and like if it was a weird coincidence (a lot of computer science events happened in 1978?!? there should be a scientific reason for this), exactly in 1978, Sophie Wilson joined Acorn Computers Ltd. She designed the Acorn Micro-Computer watching the wedding of Charles, Prince of Wales, and Lady Diana Spencer on a small portable television (made by Mr. Clive Sinclair, a rival of Acorn) while attempting to debug and re-solder the prototype. And it worked!

OMG !!! WOW !!!  :D :D :D

The prototype was then released as "The Proton", a mini computer that became the BBC Micro and its BASIC evolved into BBC BASIC, which was then used to develop the CPU simulator for the next generation, and, in October 1983, Wilson began designing the instruction set for one of the first RISC processors, the Acorn RISC Machine, so the ARM v1 was delivered on 26 April 1985 and it was a worldwide success!

She said the 68000 had been taken into consideration but then rejected due to the long latency it has, especially at reacting to interrupts, which was a must-have feature for a new computer where everything is done in software. She also said new DRAM integrated circuits needed to be sourced directly from Hitachi because the project needed something really really fast for the RAM.

Computers like Amiga used the 68000 with the help of specialized chip for the graphics and sound, while Acorn ARM computers did everything in software, thus the CPU must be super fast for the I/O, and super fast at reacting at interrupts.



The latest machine developed by Acord was the RISC-PC, with a StrongArm CPU @ 200Mhz.
 
The following users thanked this post: NorthGuy

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 1097
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #206 on: December 19, 2018, 01:01:40 pm »
This woman is definitively a superheroine

Roger that, job very well done. It's stood up well for nearly 35 years. I remember being quite jealous of a friend with an Archimedes. An 8 MHz ARM2 was pretty good in 1987, standing up very well against a much more expensive 16 MHz 68020 or 80386.
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 1433
  • Country: dk
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 9902
  • Country: us
  • DavidH