EEVblog Electronics Community Forum

Electronics => Microcontrollers => Topic started by: brucehoult on December 08, 2018, 02:37:44 am

Title: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 08, 2018, 02:37:44 am
Western Digital (WD) has just posted a 12-part YouTube series in which CTO Martin Fink (!!) presents assembly language programming for RISC-V, using a SiFive HiFive1 with VS Code.

Sadly, they don't seem to be linked to each other.

https://www.youtube.com/watch?v=eR50d6CDOys (https://www.youtube.com/watch?v=eR50d6CDOys)
https://www.youtube.com/watch?v=daGHhrkF41U (https://www.youtube.com/watch?v=daGHhrkF41U)
https://www.youtube.com/watch?v=k3tpNwXEWhU (https://www.youtube.com/watch?v=k3tpNwXEWhU)
https://www.youtube.com/watch?v=MnWI9qplfvA (https://www.youtube.com/watch?v=MnWI9qplfvA)
https://www.youtube.com/watch?v=nqXRzUFnM9w (https://www.youtube.com/watch?v=nqXRzUFnM9w)
https://www.youtube.com/watch?v=tthKXGxAUjY (https://www.youtube.com/watch?v=tthKXGxAUjY)
https://www.youtube.com/watch?v=90udyEHBiwg (https://www.youtube.com/watch?v=90udyEHBiwg)
https://www.youtube.com/watch?v=Xmes__VpfiA (https://www.youtube.com/watch?v=Xmes__VpfiA)
https://www.youtube.com/watch?v=PMLqqRHpbsQ (https://www.youtube.com/watch?v=PMLqqRHpbsQ)
https://www.youtube.com/watch?v=6K1FZK1Kc5w (https://www.youtube.com/watch?v=6K1FZK1Kc5w)
https://www.youtube.com/watch?v=edzX3c2r0YQ (https://www.youtube.com/watch?v=edzX3c2r0YQ)
https://www.youtube.com/watch?v=C16UE8oTZY0 (https://www.youtube.com/watch?v=C16UE8oTZY0)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 08, 2018, 02:42:25 am
Not linked, but there is a playlist (https://www.youtube.com/playlist?list=PL6noQ0vZDAdh_aGvqKvxd0brXImHXMuLY)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 08, 2018, 02:58:37 pm
Anyone spotted the deliberate (?) error in his assembly language code?

Answer: 729de934392445a122503b40747a83e50b3c4a20
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 08, 2018, 03:23:46 pm
2nd more minor bug: when MTIME wraps around after about 18 hours, while(MTIME < targetTime){} can misbehave, giving a zero delay. Should use while(int(targetTime-MTIME) > 0){}
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 09, 2018, 01:19:50 am
Interesting presentation!  I installed the tools and, I must say, I like Visual Studio Code.  It looks a lot like Eclipse with a few more buttons but it works really well.

To add the Debug functionality, you need an account and it seems that something is only temporary because they talk about something happening in about a month.  It costs about $10/month for the Professional version which includes the Unified Debugger and some other features.  Without the debugger option, the IDE is free to use.  Actually, you give up a lot of functionality in the free version.

Well, I couldn't get the debugger to work.  I probably need the board to do that.  The demo doesn't say anything about the board while discussing the debugger.

The HiFive1 board is fairly expensive at around $60 and has no ADCs.  Maybe just use an SPI ADC on a Shield.

Here's where I go off the rails:  Following the demo with 10 lines of C and 3 tiny assembly language files (total 38 instructions), the compiled output is 4,624 bytes of RAM and 53,710 bytes of flash.  To flash an LED!

There seems to be a lot of library code included in the build but since I can't find the linker map file (or how to turn it on), I have no idea how to fix this.  The size doesn't seem to change much when I comment out #include <stdio.h> nor do I know why it was included in the project.

The FE310-G000 chip is FAST - upwards of 320 MHz.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 09, 2018, 02:19:31 am
Interesting presentation!  I installed the tools and, I must say, I like Visual Studio Code.  It looks a lot like Eclipse with a few more buttons but it works really well.

Yeah, I'm allergic to Javascript, but I installed VS Code a while ago and it seems ok. Pretty quick (which Eclipse isn't). I didn't actually realise you could add that kind of embedded programming functionality to it.

Quote
To add the Debug functionality, you need an account and it seems that something is only temporary because they talk about something happening in about a month.  It costs about $10/month for the Professional version which includes the Unified Debugger and some other features.  Without the debugger option, the IDE is free to use.  Actually, you give up a lot of functionality in the free version.

Ugh. I'd pay $10, but not $10/month. I guess I'll stick to using gdb, with or without emacs.

Quote
Well, I couldn't get the debugger to work.  I probably need the board to do that.  The demo doesn't say anything about the board while discussing the debugger.

Sure, he's using the board. You should be able to run the compiled code on qemu, but you're not going to flash a LED that way :-)

Quote
The HiFive1 board is fairly expensive at around $60 and has no ADCs.  Maybe just use an SPI ADC on a Shield.

SiFive didn't have access to analogue IP two years ago then the HiFive1 shipped. Do now.

People have been using:

MCP3008 (SPI) https://www.adafruit.com/product/856 (https://www.adafruit.com/product/856)
ADS1015 (I2C) https://www.adafruit.com/product/1083 (https://www.adafruit.com/product/1083)

Quote
Here's where I go off the rails:  Following the demo with 10 lines of C and 3 tiny assembly language files (total 38 instructions), the compiled output is 4,624 bytes of RAM and 53,710 bytes of flash.  To flash an LED!

There seems to be a lot of library code included in the build but since I can't find the linker map file (or how to turn it on), I have no idea how to fix this.  The size doesn't seem to change much when I comment out #include <stdio.h> nor do I know why it was included in the project.

There's a tension in supplying IDEs in that beginners just want code they grab from who knows where to work and like to have fully featured libraries, while pros want small code. We definitely default to the former as much as possible. Pros have the knowledge needed to cut the size down if required. And this board has 16 *mega*bytes of flash for the program, so it's not a big issue for beginners.

I haven't tried following these videos yet and don't know exactly what toolchain they're using, but the size seems normal for Newlib. Newlib is really designed for PC use rather than embedded. Even Newlib Nano doesn't help much. We're aware of and working on this...

It doesn't help that ARMs C library gets much smaller size by using a software FP library that doesn't meet IEEE 754 requirements, while ours does.

Using SiFive's freedom-e-sdk I get the following text sizes:

51498 HelloWorld with printf
 2042 program with main() just "return 0"
 6924 Dhrystone using a custom minimal printf https://github.com/sifive/freedom-e-sdk/blob/master/software/dhrystone/dhry_printf.c (https://github.com/sifive/freedom-e-sdk/blob/master/software/dhrystone/dhry_printf.c)

Basically, the Newlib printf(), among other things, is pretty bloaty.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 09, 2018, 05:51:52 am
To add the Debug functionality, you need an account and it seems that something is only temporary because they talk about something happening in about a month.  It costs about $10/month for the Professional version which includes the Unified Debugger and some other features.  Without the debugger option, the IDE is free to use.  Actually, you give up a lot of functionality in the free version.

Now I've followed through the presentation actually doing it :-) Yes, using their debugger needs the "pro" version of PlatformIO at $10/month. You do get a free 30 day trial.

I guess the good news is that if you pay that, you can work with 550 or so other boards as well: AVRs, PICs, any number of ARM boards...

It *looks* as though you should be able to use gdb directly. It's given as an option in the debugger menu

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: lucazader on December 09, 2018, 08:58:08 am
Thanks for the post Bruce.
Definitely keen to see where all this Risc-V stuff is headed.
Especially that new core/chip designer stuff you are rolling out at SiFive.

As far as VScode goes, PIO is great if you are a beginner not wanting to setup tool-chains etc.
However if your slightly more advanced, using makefiles and GDB within vscode for no cost is super easy! No need to install the great hulk that is PIO.
Just the specific gcc for your MCU (of which you can get pre-built binaries from the SiFive website), and then GDB with the correct script, which you can probably find on the internet for pretty much any board.

Would definitely be interested in looking into some dev boards and MCU's from the E20 series as they come out, especially with integrated ADC's etc...
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: FlyingDutch on December 09, 2018, 09:06:59 am
Hello,

very interesting subject, but this board HiFive1 is very expensive 59$. I would like to evaluate the "RISC-V" architecture, but HiFive1 board is too expensive for me.
I am vondering if it is possible to evaluate RISC-V architecture using FPGA board and RISC-V Ip-Core? I have few FPGA boards (the biggest one with Artix7 17000 logic cells). I founf link for RISC-V cores:

https://github.com/riscv/riscv-wiki/wiki/RISC-V-Cores-and-SoCs (https://github.com/riscv/riscv-wiki/wiki/RISC-V-Cores-and-SoCs)

Has somebody tried RISC-V as IP-core running on FPGA board. Any hints before start?

Kind Regards
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 09, 2018, 11:46:01 am
As far as VScode goes, PIO is great if you are a beginner not wanting to setup tool-chains etc.
However if your slightly more advanced, using makefiles and GDB within vscode for no cost is super easy! No need to install the great hulk that is PIO.
Just the specific gcc for your MCU (of which you can get pre-built binaries from the SiFive website), and then GDB with the correct script, which you can probably find on the internet for pretty much any board.

I completely agree. I almost never use an IDE myself, just emacs, gcc, openocd (or avrdude or whatever, as appropriate), gdb. However obviously this tutorial is aimed at beginners, and Western Digital happened to choose VS Code with PIO for it .. so I figured I'd better find out something about it.

Our command-line freedom-e-sdk (which PIO downloads and uses internally) is fairly easy to use itself. For one of the provided projects just type...

Code: [Select]
make software PROGRAM=led_fade
make upload PROGRAM=led_fade

That's it! Now it's running on the board. If you want to do your own program just duplicate one of the example programs, change the name, and you're away.

I personally find it much more understandable to work with standard tools such as shell commands, make, my favourite editor etc than to learn the latest IDE every couple of years.

We do however provide a completely free Eclipse-based IDE "Freedom Studio" that supports GUI debugging, much as PIO does https://sifive.cdn.prismic.io/sifive%2F08af66c3-f408-4ffd-8e92-0428e5b8011a_freedomstudio_manual.v1p3.pdf

Quote
Would definitely be interested in looking into some dev boards and MCU's from the E20 series as they come out, especially with integrated ADC's etc...

SiFive's business model is to enable -- to ENCOURAGE -- *other people* to decide that there will be demand for a chip with a particular core (or cores), memory, peripherals and make it themselves (or Sifive can organise the volume manufacturing, put your logo on the chips etc). With the IP partners added during 2018 this now includes analogue peripherals such as ADCs.

The FE310 and FU540 won't be the last chips SiFive makes, but don't expect to see dozens or hundreds of them, in every possible configuration.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 09, 2018, 12:01:21 pm
very interesting subject, but this board HiFive1 is very expensive 59$. I would like to evaluate the "RISC-V" architecture, but HiFive1 board is too expensive for me.

The 3rd-party designed "LoFive" uses the same CPU chip as the HiFive1 and programs are compatible, but it's been available for $25 to $30 at various times. https://store.groupgets.com/products/lofive-risc-v Note that it does require a JTAG programmer, whereas the HiFive1 is programmed using USB, so if you don't already have one you won't see much savings.

You can also buy bare FE310 chips yourself for $25 for 5 on the HiFive1 CrowdSupply page and build your own board. Complete design files for the LoFive are available on github https://github.com/mwelling/lofive

Quote
I am vondering if it is possible to evaluate RISC-V architecture using FPGA board and RISC-V Ip-Core? I have few FPGA boards (the biggest one with Artix7 17000 logic cells). I founf link for RISC-V cores:

https://github.com/riscv/riscv-wiki/wiki/RISC-V-Cores-and-SoCs (https://github.com/riscv/riscv-wiki/wiki/RISC-V-Cores-and-SoCs)

Has somebody tried RISC-V as IP-core running on FPGA board. Any hints before start?

Sure! There are literally dozens of cores and cores with peripherals designs for putting into FPGAs from a wide variety of manufacturers.

Probably one of the easiest to get going (and using 100% open-source software) is picorv32/picosoc running on a TinyFPGA BX https://discourse.tinyfpga.com/t/riscv-example-project-on-tinyfpga-bx/451

VexRiscv is also very much worth checking out. It works on a wide range of FPGAs from different vendors, including Artix 7: https://github.com/SpinalHDL/VexRiscv
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 09, 2018, 03:48:21 pm

Here's where I go off the rails:  Following the demo with 10 lines of C and 3 tiny assembly language files (total 38 instructions), the compiled output is 4,624 bytes of RAM and 53,710 bytes of flash.  To flash an LED!

There seems to be a lot of library code included in the build but since I can't find the linker map file (or how to turn it on), I have no idea how to fix this.  The size doesn't seem to change much when I comment out #include <stdio.h> nor do I know why it was included in the project.

There's a tension in supplying IDEs in that beginners just want code they grab from who knows where to work and like to have fully featured libraries, while pros want small code. We definitely default to the former as much as possible. Pros have the knowledge needed to cut the size down if required. And this board has 16 *mega*bytes of flash for the program, so it's not a big issue for beginners.

Basically, the Newlib printf(), among other things, is pretty bloaty.

Yes, newlib is a pig!  But in the demo project, no library function is ever called.  Yes, I imagine some C library functions might be called but it seems unlikely given the code that was written.  I can't see where newlib comes into this project.

I need to look around in the IDE and find the compile/link options.  They have to be there somewhere.  I want to see the assembly output and the link map.

The really good news:  The referenced book, 'RISC V Reader' is less than $20, shipped from Amazon.  The other 2 books are pricey but the author of the videos refers to the 'Reader' as THE book to buy.  So I did...

The interesting part of RISC ISAs is how the hardware handles hazards.  The deeper the pipeline, the more complex it gets.  Throwing away partially executed instructions because a branch was taken but couldn't be predicted because the condition code hadn't been updated from an arithmetic operation that hadn't completed - it gets complicated!

Next question:  How many of the more esoteric op codes does the compiler use?  The problem with building an FPGA version is to ensure you have implemented enough of the instruction set to allow the compiler to generate workable code.  Or, be prepared to rewrite the code generation of the C compiler!

At the moment, the RISC V is in its infancy.  It's been around a while but there is strong competition from ARM and ARM covers quite a broad spectrum of computing.  In the meantime, it's worth knowing how the RISC V works.  If you like knowing that kind of stuff...
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 09, 2018, 04:13:35 pm
but this board HiFive1 is very expensive 59$

expensive? 60 USD are nothing for a board.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 09, 2018, 05:21:57 pm
but this board HiFive1 is very expensive 59$

expensive? 60 USD are nothing for a board.

It is essentially a stripped down Arduino, at least in form factor, that runs a lot faster.  The Arduino UNO is $19 at Amazon.  It also has no ADCs which limits its applicability.  What it does have is a LOT of memory.

I'm vacillating, I want to know more about the ISA as a matter of general interest and, truly, $60 isn't a number that concerns me.  But I have to be aware that a) I don't need the board and b) even if I did, it lacks certain features.

So, the book gets here Tuesday and after I skim through it, I'll probably buy the board just to play with it.  Compared to the FPGA boards I have bought, $60 is nothing!

Pipelined processors are interesting and RISC V is definitely approachable at the hardware level (as opposed to just buying an ARM <whatever>).  If I so desire, I can pick up one of the FPGA cores and get right down in the dirt of hardware design.  I like hardware design!

Still vacillating...
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 09, 2018, 06:40:48 pm
It is essentially a stripped down Arduino, at least in form factor, that runs a lot faster.

An Infineon 4500 (Arm core with Cordic) costs something like 45 Euro, so 60 Euro is definitively not too much.


Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 09, 2018, 06:53:34 pm
It also has no ADCs which limits its applicability.  What it does have is a LOT of memory.

my MIPS ATLAS board costs 600 Euro. I got a second-hand one for 120 Euro. It comes with no ADC, DAC, SPI, etc ... just the CPU, 32Mbyte of ram, a PCI bus, two serial line, a couple of timers, LA headers and e-Jtag.

It's not a problem. Simply it's not the kind of target I will ever use to interface a CNC, or a 3D printer. It's not its job, and there are cheaper and better boards for this (Sanguino for example).

I am enjoying my ATLAS board for XINU, and other software stuff that runs more comfortably there. Besides, the e-Jtag makes the experience more comfortable for the debugging point of view.

Still vacillating...

No doubt! And no need to compare a true 32bit RISC processor to ... Arduino1 (at least you should consider Arduino2, or Arduino/Zero ... or Infineon 4500)!

Anyway, if you like RISC-V, go and buy the board  :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 09, 2018, 10:16:53 pm
Anyway, if you like RISC-V, go and buy the board  :D

I decided to buy myself a Christmas present - $75 including tax and shipping.  It'll probably be here next week.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 09, 2018, 10:58:42 pm
Yes, newlib is a pig!  But in the demo project, no library function is ever called.  Yes, I imagine some C library functions might be called but it seems unlikely given the code that was written.  I can't see where newlib comes into this project.

.platformio/packages/framework-freedom-e-sdk/env/freedom-e300-hifive1/init.c:225:25
Code: [Select]
printf("core freq at %d Hz\n", get_cpu_freq());

On newer versions I've replaced that with a custom itoa() and three puts() which cuts the bloat hugely.

Quote
I need to look around in the IDE and find the compile/link options.  They have to be there somewhere.  I want to see the assembly output and the link map.

PlatformIO isn't supplying all the tools, for example there is no objdump :-(

If you want to poke more deeply you'll be better off using the command-line freedom-e-sdk direct from SiFive. I recommend building it -- it's 25 minutes on a quad core. (but there are precompiled binaries too)

https://github.com/sifive/freedom-e-sdk

Quote
The interesting part of RISC ISAs is how the hardware handles hazards.  The deeper the pipeline, the more complex it gets.  Throwing away partially executed instructions because a branch was taken but couldn't be predicted because the condition code hadn't been updated from an arithmetic operation that hadn't completed - it gets complicated!

No problem in an in-order CPU, which includes everything anyone has shipped so far. Instructions after the branch have been fetched and decoded and their operands fetched, but they don't *execute* until after the branch does .. at which point it is known whether it will branch or not. If the prediction was wrong then execution of already-fetched instructions is squashed and fetch/decode starts again at the correct place.

Quote
Next question:  How many of the more esoteric op codes does the compiler use?  The problem with building an FPGA version is to ensure you have implemented enough of the instruction set to allow the compiler to generate workable code.  Or, be prepared to rewrite the code generation of the C compiler!

All of them!

If you want to build your own CPU in an FPGA then you only need to implement RV32I, which has only and exactly what is needed to compile C code ... but omitting multiply and divide. gcc will happily emit code for this, using library functions for multiply and divide.

The complete RV32I instruction set:

Code: [Select]
Registers r0 ... r31. r0 is always 0, can be used to discard unwanted results (e.g. jal/jalr)
PC is separate. There is no dedicated SP or LR at the hardware level.
All immediate values are signed.

OP rd, rs1, rs2  ; OP = add/sub, slt/sltu (rd=1 if less than, else 0), and/or/xor, sll/srl/sla (shifts)
OPi rd, rs1, 0xNNN ; OP = add, slt/sltu, and/or/xor, sll/srl/sra

lui rd, 0xNNNNN ; load immediate<<12 into rd
auipc rd, 0xNNNNN ; add immediate<<12 to the PC and store in rd

jal rd, 0xNNNNN ; add immediate<<1 to the PC. Store old PC in rd
jalr rd, 0xNNN(rs1) ; add immediate to rs1, clear the low bit, store in the PC. Store old PC in rd

bOP rs1, rs2, 0xNNN ; if rs1 OP rs2 is true add immediate<<1 to PC; OP = eq/ne/lt/ltu/ge/geu

sSZ rs2, 0xNNN(rs1); add immediate to rs1, use as address to store from rs2, SZ = b/h/w
lSZ rd, 0xNNN(rs1); load to rd. SZ = b/bu/h/hu/w (u zero extends, others sign extend)

ecall/ebreak ; no arguments. Call OS or debugger.

There's not a lot of fat there.

Optional: implement only r0 .. r15 and then use -march=rv32e to gcc ("e" for embedded)

Optional: implement multiply/divide -march=rv32im (or em)

Optional: implement atomic operations for SMP -march=rv32a

Optional: implement 16 bit opcodes duplicating the most common 32 bit opcodes for code density comparable to Thumb2 instead of comparable to ARM -march=rv32ic

Optional: implement floating point -march=rv32if or rv32ifd

You can compile any normal C/C++ code and newlib using rv32i or rv32e. The linux kernel and glibc require at least rv32ia (actually I think they require rv64ia at present, but that is being fixed).

Quote
At the moment, the RISC V is in its infancy.  It's been around a while but there is strong competition from ARM and ARM covers quite a broad spectrum of computing.  In the meantime, it's worth knowing how the RISC V works.  If you like knowing that kind of stuff...

Yes, it's early days, but momentum is building.

There is no great *technical* advantage over ARM or MIPS, but also no disadvantage. Compare code size, compare Dhrystone or Coremark or SPEC ... it's a photo finish in most cases. MIPS code is the biggest (and microMIPS doesn't help as much as Thumb or rvc), rv32i is comparable to ARM, rv32ic to Thumb2. In 64 bit, rv64ic is much smaller than anything else (ARM didn't see fit to duplicate Thumb in 64 bit!).

The advantage is that absolutely anyone is free to implement their own CPU (FPGA, ASIC, emulator), call it "RISC-V" if it passes the conformance tests, and use it, sell it, give it away as you please. You are not going to get any nasty lawyers letters.

The advantage over implementing your own instruction set is that someone else already wrote/ported a huge amount of software for you.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 09, 2018, 11:24:57 pm
Yes, newlib is a pig!  But in the demo project, no library function is ever called.  Yes, I imagine some C library functions might be called but it seems unlikely given the code that was written.  I can't see where newlib comes into this project.

.platformio/packages/framework-freedom-e-sdk/env/freedom-e300-hifive1/init.c:225:25
Code: [Select]
printf("core freq at %d Hz\n", get_cpu_freq());

On newer versions I've replaced that with a custom itoa() and three puts() which cuts the bloat hugely.

Yup!  I found the code.  Of course there is also a lot of framework code to initialize the CPU and there is a bunch of space for vectors.  This is typical of modern processors.  In a lot of ways, the vector table feels a lot like ARM7-TDMI.  So does 'start.S'.

I would probably modify the code to just eliminate displaying the CPU frequency.  Maybe I'll try that just to see what  happens.

Quote

PlatformIO isn't supplying all the tools, for example there is no objdump :-(

If you want to poke more deeply you'll be better off using the command-line freedom-e-sdk direct from SiFive. I recommend building it -- it's 25 minutes on a quad core. (but there are precompiled binaries too)

https://github.com/sifive/freedom-e-sdk


I'll try to build the tools next week.  Not a big deal - my main workstation dual boots and it has plenty of horsepower (I7-7700).

I have great expectations of the "RISC-V Reader" book.  I imagine many of my questions will be answered.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 09, 2018, 11:36:09 pm
Yes, it's early days, but momentum is building.

There is no great *technical* advantage over ARM or MIPS, but also no disadvantage. Compare code size, compare Dhrystone or Coremark or SPEC ... it's a photo finish in most cases. MIPS code is the biggest (and microMIPS doesn't help as much as Thumb or rvc), rv32i is comparable to ARM, rv32ic to Thumb2. In 64 bit, rv64ic is much smaller than anything else (ARM didn't see fit to duplicate Thumb in 64 bit!).

The advantage is that absolutely anyone is free to implement their own CPU (FPGA, ASIC, emulator), call it "RISC-V" if it passes the conformance tests, and use it, sell it, give it away as you please. You are not going to get any nasty lawyers letters.

The advantage over implementing your own instruction set is that someone else already wrote/ported a huge amount of software for you.

I have always been of the opinion that having software to run on a CPU is more important than the CPU itself. 

In the same way, I am looking at microcontrollers as opposed to microcomputers.  I want a bunch of peripherals.  I don't much care about the CPU architecture, if I want to apply the chip to something, it needs peripherals.

I'll be looking at one of the FPGA implementations of the RISC-V to see if I can actually bring up a working CPU.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 10, 2018, 04:48:29 am
In case anyone is interested in instruction encodings...

(http://hoult.org/rv32i.png)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 10, 2018, 12:22:31 pm
oh, it seems there is/will be support for  lauterbach (http://www2.lauterbach.com/pdf/debugger_riscv.pdf)'s debuggers  :o
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 10, 2018, 01:42:08 pm
oh, it seems there is/will be support for  lauterbach (http://www2.lauterbach.com/pdf/debugger_riscv.pdf)'s debuggers  :o

Already for a year https://www.youtube.com/watch?v=Yw9OdJafACU (https://www.youtube.com/watch?v=Yw9OdJafACU)
And Segger https://www.youtube.com/watch?v=f0Wya-ujyRs (https://www.youtube.com/watch?v=f0Wya-ujyRs)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ehughes on December 10, 2018, 01:56:25 pm
What is the value proposition for this core?     It seems that the value might be for IC vendors to avoid licensing fees from ARM but that is a drop in the bucket compared everything else.     The instruction set doesn't seem all that interesting.    There are a handful of interesting instruction for ARM CM4 (such as SMLAL) and there were interesting features such as bit-banding and simple interrupt handling.   

I am not seeing anything that is particularly useful here?  Or am I missing something?

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 10, 2018, 02:44:26 pm
What is the value proposition for this core?

It's not a core, it's an instruction set, like x86 or ARM that can be implemented in many different cores. There are already dozens of different core designs from both commercial and non-commercial sources.

Quote
It seems that the value might be for IC vendors to avoid licensing fees from ARM but that is a drop in the bucket compared everything else.

It's about "Free as in speech, not free as in beer". Yes, you can design your own core or download a free one from github. But that's a lot of work and there are no guarantees or support. Companies such as SiFive, Andes, Syntacore, Esperanto etc are going to want licensing fees just as ARM are (perhaps less, perhaps not).

Quote
The instruction set doesn't seem all that interesting.

It's not intended to be interesting. It's intended to be extremely boring, simple, enabling of both very simple and small implementations and complex high performance implementations, as technically effective as the others, and patent and license-free.

The standard parts of RISC-V very deliberately tread only well worn and established paths that are either provably public-domain in the first place or else covered by expired patents.

Quote
There are a handful of interesting instruction for ARM CM4 (such as SMLAL) and there were interesting features such as bit-banding and simple interrupt handling.   

I am not seeing anything that is particularly useful here?  Or am I missing something?

If you want SMLAL in RISC-V you can add it yourself. Or ask your chip vendor to add it for you. You don't have to persuade the trademark/patent/copyright holder that it's a useful instruction for you and hopefully lots of other users, and wait for them to incorporate it in the next version of their standard and eventually produce chips.

Ok, ARM already did SMLAL in Cortex M4. Cool. But what if you want it in a small microarchitecture such as CM0? You're out of luck. If you want an equivalent to SMLAL in a CM0-sized core such as the SiFive E20 or the Pulpino ZERO RISCY -- no problem, just add it, or have them add it for you.

ARM and Intel have smart people, but will they think of every possible instruction that might make someone's program go 10x or 100x faster? No, they can't. And, worse, if they do add it they're going to add it for everyone -- for your competition as well as for you. Everyone gets any feature that might be useful to anyone .. or else no one gets it. So at the same time you can't get chips with features that you really want, AND you get bloated chips with a lot of features that you don't want.

Radical personalization is the present and future of many industries. Now it's available in CPUs too.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 10, 2018, 02:47:40 pm
I haven't yet looked it in detail, but the "FENCE" class looks interesting!
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 10, 2018, 02:53:01 pm
I haven't yet looked it in detail, but the "FENCE" class looks interesting!

EIEIO. And related :-)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 10, 2018, 03:16:28 pm
I just went over to ARM to get a sense of the size of their instruction set.  Somehow, I think they have moved beyond Reduced Instruction Set with the latest designs.  There certainly are some 'interesting' instructions but I wonder which opcodes GCC actually uses.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ehughes on December 10, 2018, 03:29:34 pm
Quote
Radical personalization is the present and future of many industries. Now it's available in CPUs to
o.

So if I understand correctly,   RISC-V  is more intended for high volume SoC type customers who want to make specialized cores?      (i.e. the Western Digital use case).       

I looked through the SiFive site and it seems the message is that you can get your own custom SoC made quicker.     
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 10, 2018, 03:30:54 pm
I haven't yet looked it in detail, but the "FENCE" class looks interesting!

EIEIO. And related :-)

This will add more "fun" to every superscalar implementation of the RISC-V. EIEIO + isync + sync on our PowerPC460 is able to cause great emotions, like people hammering their heads on the desk and going to throw the target-board out of the window ... which is ... love ... in reverse order :D


another interesting point I see: like MIPS and PowerPC, even RISC-V uses LL/SC to emulate CAS, which is to say, LL/SC is used to write a tiny bit of code which loads a target memory address, compares it to a comparand, and then writes back a swap value to the target if the comparand and target values are equal.

It would be interesting ... how LL monitors an address (say, a semaphore), and how SC does its job.


A senior here said that X86/x64 is better because it implements DWCAS (sort of CAS, but more complex) instead of LL/SC ... dunno, I have ZERO experience with Intel x86  :-//
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: FlyingDutch on December 10, 2018, 06:57:50 pm

expensive? 60 USD are nothing for a board.

Hello,

comparing for example to these FPGA boards (I have both of them):

https://numato.com/product/mimas-v2-spartan-6-fpga-development-board-with-ddr-sdram (https://numato.com/product/mimas-v2-spartan-6-fpga-development-board-with-ddr-sdram)

https://store.digilentinc.com/cmod-a7-breadboardable-artix-7-fpga-module/ (https://store.digilentinc.com/cmod-a7-breadboardable-artix-7-fpga-module/)

Yes, it seems to me expensive ;)

Regards
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 10, 2018, 08:27:44 pm
Thinking about the smallest FPGA incantation, does the RISC-V make sense as a general purpose drop-in core?  Maybe there is a project where the CPU is just handling details (maybe console IO or file IO) but the majority of the project is some kind of hardware thing (even including another CPU) that just needs a little high level help - that is, the full hardware description is too ugly to contemplate and a programmable core would smooth things out.

OK, I'll fess up!  I would rather write C code than HDL when it comes to things like a microSD driver.

Assuming adequate resources, of course.

Board has shipped, toolchain has been built.  One thing about a fresh Linux install, there are a bunch of dependent tools that need to be built or just installed.  Among my favorites:  MPR, MPFC, GMP.  Nothing is as simple as it seems it would be!

This thread has links to amazing resources.  Once I get to play with the HiFive1 board for a while, I am almost certain to be looking at an Artix incantation of the core.  I have a couple of Arty 7 boards and that Nexys 4 board I have uses an Artix 100T chip.  Lots of resources!

It should be an interesting winter.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on December 10, 2018, 08:34:37 pm
There is no great *technical* advantage over ARM or MIPS, but also no disadvantage. Compare code size, compare Dhrystone or Coremark or SPEC ... it's a photo finish in most cases. MIPS code is the biggest (and microMIPS doesn't help as much as Thumb or rvc), rv32i is comparable to ARM, rv32ic to Thumb2. In 64 bit, rv64ic is much smaller than anything else (ARM didn't see fit to duplicate Thumb in 64 bit!).

Lack of flags increasing code size by 4 times and requiring 2 extra registers to detect various conditions sure seems like a disadvantage.  That extra code and register pressure also has the effect of making the caches effectively smaller.  Having to effectively execute an ALU operation twice or more cannot help power efficiency.

Technically only flags which represent changes in state like carry and overflow are required; for instance zero, negative, and parity can be computed at any time.  What I would like to see is a design where flags requiring state are stored in a register dedicated to each destination register which avoids the hazard of having a single flags register like in x86 or requiring a flags register operand which would require extra instruction bits.

Some ISAs do this to track whether a register has been used in the current execution context so that the entire register set does not need to be saved on a context switch.  The first use of a register is just another bit of state to save.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 10, 2018, 09:01:28 pm
In case anyone is interested in instruction encodings...

(http://hoult.org/rv32i.png)

I've been tinkering with RISC-V in my spare time, and I have to say that the 32-bit integer instruction set is quite nice for hardware implementation:

- The source and destination registers are always encoded in the same place.
- The most significant bit of any constant is always in the same place (makes for easy sign extension)
- The privileged instructions (ones that need to be trapped for OS / Hypervisor) are all nicely contained

The only thing I find awkward is that the encoding of the offsets on the jump instructions - fine for H/W but painful to decode for a naive software emulation.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 10, 2018, 11:27:09 pm
I just went over to ARM to get a sense of the size of their instruction set.  Somehow, I think they have moved beyond Reduced Instruction Set with the latest designs.  There certainly are some 'interesting' instructions but I wonder which opcodes GCC actually uses.

Exactly what "RISC" means has always been and remains the subject of some debate :-)

For me, I think the most important characteristics are:

- strict separation of computation from data transfer (load/store)

- enough registers that you don't touch memory much. Arguments for most functions fit in registers, and the return address too (the otherwise RISC AVR8 violates this).

- general purpose registers rather than special purpose.

- no instruction can cause more than one cache or TLB miss, or two adjacent lines/pages if unaligned access is supported (and this case might be trapped and emulated)

- the instruction length can be determined easily (combinatorial circuit with few gates) by examining only the initial part of the instruction.

- each instruction modifies at most one register.

- integer instructions read at most two registers. This is ultra-purist :-) A number of RISC ISAs break it in order to have e.g. a register plus (scaled) register addressing mode, or conditional select. But no more than three!

- no microcode or hardware sequencing. Each instruction executes in a small and fixed number of clock cycles (usually one). Load/Store multiple are the main offenders in both ARM and PowerPC. They help with code size, but it's interesting that ARM didn't put them in Aarch64 and is deprecating them in 32 bit as well, providing the much less offensive load/store pair.


Something that I think is *not* necessary in order to be "RISC" is to have a small number of instructions. Yes, PowerPC has a huge number of instructions, as does modern ARM and Aarch64. This does not disqualify from being RISC as long as each instruction follows the above rules.

What a huge number of instructions *does* do is make very small low end implementations impossible. And puts a big burden of work on every hardware and every emulator implementer.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 11, 2018, 12:04:35 am
Quote
Radical personalization is the present and future of many industries. Now it's available in CPUs to
o.

So if I understand correctly,   RISC-V  is more intended for high volume SoC type customers who want to make specialized cores?      (i.e. the Western Digital use case).       

I don't think you can distil one single thing that a standard backed by 100+ companies is "intended for".

It's a free and open standard that software can be written to, and that anyone is free to implement in any way they chose: hardware, software (interpret/JIT), or FPGA.

It is intended that nothing in RISC-V disqualify is from being applicable to everything from the smallest (32 bit) microcontroller to the largest supercomputer, and everything between. See for example the European Processor Initiative which is developing processors for supercomputers. Based on the RISC-V ISA.

Quote
I looked through the SiFive site and it seems the message is that you can get your own custom SoC made quicker.   

SiFive is just one company of many doing things with RISC-V. It happens to be one of the first out of the starting gate (founded in September 2015) and therefore currently one of the most visible.

SiFive's business model is indeed not to be a chip vendor, but to enable others to make chips.

At the moment. most of the RISC-V activity has been people (often individuals) making soft cores for FPGAs and large companies who are already making SoCs putting a RISC-V processor in on corner.

What a lot of people on this forum want is to be able to go to digikey/mouser/element14 and choose a microcontroller with a CPU (and they don't really care what CPU) and the selection of peripherals they need for some task.

Those doesn't exist now, but they will start to in 2019, from a number of vendors.

The first off the block appears to be NXP, with an SoC (RV32M1) with two RISC-V cores (not SiFive ones), two ARM cores, and a bunch of peripherals including BlueTooth, USB, ADC, RTC, uSDHC, crypto acceleration. They have a web site where you can get a board with this for free http://open-isa.org/order/ (http://open-isa.org/order/) and they gave away a few hundred boards at the RISC-V Summit last week.

MicroSemi/Microchip have announced a version of their PolarFire FPGA with embedded SiFive FU540 complex (five 64 bit cores, four with FPU&MMU).

There will be a *lot* more to follow during 2019.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 11, 2018, 12:34:45 am
I haven't yet looked it in detail, but the "FENCE" class looks interesting!

EIEIO. And related :-)

This will add more "fun" to every superscalar implementation of the RISC-V. EIEIO + isync + sync on our PowerPC460 is able to cause great emotions, like people hammering their heads on the desk and going to throw the target-board out of the window ... which is ... love ... in reverse order :D

Alas, if you want to do out of order CPUs then you absolutely need a well thought-out system for ensuring memory consistency. Hopefully RISC-V has got that right -- there's been a committee with very very experienced people in this field (industry and academics) who've spent over a year on this. The PowerPC/Alpha/ARM experience has hopefully been learned from -- certainly it's not lack of will or effort.

The RISC-V spec also allows TSO (like SPARC, x86) as an optional feature. That will inevitably be a little lower performance, especially when scaled to large numbers of cores (dozens, hundreds, thousands), but it does make life easier for programmers. Standard RISC-V code written for a weak memory model will run correctly on systems with TSO, but not vice-versa.

Quote
another interesting point I see: like MIPS and PowerPC, even RISC-V uses LL/SC to emulate CAS, which is to say, LL/SC is used to write a tiny bit of code which loads a target memory address, compares it to a comparand, and then writes back a swap value to the target if the comparand and target values are equal.

Right. You can also implement many other interesting things using LL/SC.

RISC-V also has a number of Atomic Memory Operations (AMOs) which take one integer argument (not two like CAS) and an address, atomically do ... something ... with the integer and the memory contents, and return an integer. The allowed operations are swap (unconditional), add, and/or/xor, min/max (signed and unsigned).

This can be done by bringing the data into the CPU and sending the new value back, but the TileLink protocol (from Berkeley, which SiFive and others use) supports pushing these out to be executed in a cache controller, or on another CPU or peripheral that owns the address.

I see AMBA 5 got similar capability this year, although it also includes a remote CAS, which TileLink doesn't.

Quote
It would be interesting ... how LL monitors an address (say, a semaphore), and how SC does its job.

That is entirely up to whoever implements an individual core/memory system. The most common way would be for the CPU to take exclusive ownership of a cache line, and then the SC checks that it still has exclusive ownership. There is a small limit on the number (and type) of instructions you are allowed to execute between the LL and SC if you want to guarantee forward progress. One way this might work is that the CPU might ... delay ... its response to other CPUs requests to read or take ownership of that cache line for a few clock cycles.

Quote
A senior here said that X86/x64 is better because it implements DWCAS (sort of CAS, but more complex) instead of LL/SC ... dunno, I have ZERO experience with Intel x86  :-//

DCAS is useful for some things, such as manipulating queues without the expense of using a full semaphore. But it's not cheap to implement and has its limitations. e.g. see http://www.cs.tau.ac.il/~shanir/nir-pubs-web/Papers/DCAS.pdf (http://www.cs.tau.ac.il/~shanir/nir-pubs-web/Papers/DCAS.pdf)

The RISC-V community is interesting in adopting some more powerful mechanism than LL/SC, but I think it's more likely to be a form of LL/SC that accepts a small number (2 to 5) of addresses rather than something as specific as DCAS ... and rather than something as general as full STM (which Intel has had numerous bugs trying to implement).
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 11, 2018, 12:44:25 am
Thinking about the smallest FPGA incantation, does the RISC-V make sense as a general purpose drop-in core?  Maybe there is a project where the CPU is just handling details (maybe console IO or file IO) but the majority of the project is some kind of hardware thing (even including another CPU) that just needs a little high level help - that is, the full hardware description is too ugly to contemplate and a programmable core would smooth things out.

Sure. That's one of the major uses of RISC-V right now. Some of the stripped down RV32I cores are using around 300 LUT6's! In fact the winner of a recent contest, engine-V uses only 306 LUT4's -- amazing.

https://riscv.org/2018/12/risc-v-softcpu-contest-winners-demonstrate-cutting-edge-risc-v-implementations-for-fpgas/
https://github.com/micro-FPGA/engine-V

PicoRV32 and VexRiscv are also worth checking out.

Quote
It should be an interesting winter.

Have fun!
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on December 11, 2018, 12:59:07 am
- strict separation of computation from data transfer (load/store)

On the other hand, allowing ALU instructions to have one memory operand acts as a type of instruction set compression, lowers register pressure, and seems to have little disadvantage when out-of-order execution allows long load-to-use latencies from cache.

Quote
- enough registers that you don't touch memory much. Arguments for most functions fit in registers, and the return address too (the otherwise RISC AVR8 violates this).

But if the register set is too large, it means more state to save.  There are other solutions for this though.

Quote
- no instruction can cause more than one cache or TLB miss, or two adjacent lines/pages if unaligned access is supported (and this case might be trapped and emulated)

The lack of hardware support for unaligned access always seems to end up being a performance problem once a processor gets deployed into the real world.

Weak memory ordering which seems like it should be an advantage also becomes a liability.

Quote
- each instruction modifies at most one register.

That is pretty standard but how then do you handle integer multiplies and divides?  Break them up into two instructions?

Quote
- integer instructions read at most two registers. This is ultra-purist :-) A number of RISC ISAs break it in order to have e.g. a register plus (scaled) register addressing mode, or conditional select. But no more than three!

Internally it seems like this sort of thing and modifying more than one register should be broken up into separate micro-operations so that the register file has a lower number of read and write ports.  The alternative is having to decode more instructions which clog up the front end once an out-of-order implementation is desired.

On the other hand, this means discarding the performance advantages of the FMA instruction.

Quote
- no microcode or hardware sequencing. Each instruction executes in a small and fixed number of clock cycles (usually one). Load/Store multiple are the main offenders in both ARM and PowerPC. They help with code size, but it's interesting that ARM didn't put them in Aarch64 and is deprecating them in 32 bit as well, providing the much less offensive load/store pair.

Maybe more interesting is why ARM even included them in the first place.  Load and store multiple took advantage of fast-page-mode DRAM access when ARMs instruction pipeline was closely linked with DRAM access.

Should stack instructions be broken up as well?

Quote
What a huge number of instructions *does* do is make very small low end implementations impossible. And puts a big burden of work on every hardware and every emulator implementer.

I do not know about that.  Multiple physical designs covering a wide performance range are possible with the same ISA.  Microcode is convenient to handle seldom used instructions.  Vector operations can be broken up into instructions which fit the micro-architecture's ALU width while allowing support for the same vector instruction set across a wide range of implementations.

Or you can use instruction set extensions every time you want to support a different vector length.  How many FPU ISAs has ARM gone through now?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 11, 2018, 01:07:18 am
There is no great *technical* advantage over ARM or MIPS, but also no disadvantage. Compare code size, compare Dhrystone or Coremark or SPEC ... it's a photo finish in most cases. MIPS code is the biggest (and microMIPS doesn't help as much as Thumb or rvc), rv32i is comparable to ARM, rv32ic to Thumb2. In 64 bit, rv64ic is much smaller than anything else (ARM didn't see fit to duplicate Thumb in 64 bit!).

Lack of flags increasing code size by 4 times and requiring 2 extra registers to detect various conditions sure seems like a disadvantage.  That extra code and register pressure also has the effect of making the caches effectively smaller.  Having to effectively execute an ALU operation twice or more cannot help power efficiency.

That happens only with the overflow flag, which is used ... well ... never ... in standard software. C/C++ code does not use or require it.

Carry flag requires *one* instruction to branch on it (same as in an ISA with condition codes), or *one* instruction to set a register to 0 or 1 (one more than an ISA with condition codes). It's also extremely rarely needed -- mostly in bignum libraries, which are going to be limited by memory load/store anyway.

Quote
Technically only flags which represent changes in state like carry and overflow are required; for instance zero, negative, and parity can be computed at any time.  What I would like to see is a design where flags requiring state are stored in a register dedicated to each destination register which avoids the hazard of having a single flags register like in x86 or requiring a flags register operand which would require extra instruction bits.

That would be better for OoO implementations than a single flags register, yes. You'd need BVC, BVS, BCC, BCS instructions that took a full register number as well as the branch offset.

It would be an interesting experiment to do to implement this. And this is EXACTLY what RISC-V enables you to do for low cost in time and money. Modify your favourite FPGA implementation to have your new instructions, modify gcc or llvm to generate them, and run dhrystone/coremark/SPEC/your favourite benchmark suite without and without using the new instructions. Publish the results with execution time, energy use, area cost, and any effect on MHz. We all learn something!

Quote
Some ISAs do this to track whether a register has been used in the current execution context so that the entire register set does not need to be saved on a context switch.  The first use of a register is just another bit of state to save.

I haven't seen a bit for every register, but it's common for FPUs or vector units to have a single bit for the whole unit, as many programs don't use FP or vectors at all.

Again, worth trying, though context switches are very rare on normal systems.

Back in June, Intel disclosed a "Lazy FPU State Restore" bug in all Core-based processors. Microsoft and others fixed the bug by disabling the use of the FPU dirty bit and just saving and restoring everything on every context switch. The effect on performance was basically unmeasurable.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 11, 2018, 01:34:29 am
I've been tinkering with RISC-V in my spare time, and I have to say that the 32-bit integer instruction set is quite nice for hardware implementation:

- The source and destination registers are always encoded in the same place.
- The most significant bit of any constant is always in the same place (makes for easy sign extension)
- The privileged instructions (ones that need to be trapped for OS / Hypervisor) are all nicely contained

Yes, all of those were deliberate goals when Krste Asanovic designed the encoding -- or should I say, redesigned it after experience implementing an earlier version.

Something that almost everyone else does is put the data register for load and store instructions in the same place, even though for a load it's the destination but for a store it's a source!

Quote
The only thing I find awkward is that the encoding of the offsets on the jump instructions - fine for H/W but painful to decode for a naive software emulation.

Yes, I find it painful for trying to mentally decode instructions too. It's the result of the interaction of three things:

- always having the sign bit in the same place (as you noted)

- wanting to scale branch offsets to increase reach as you don't need byte-addressing (except for J(AL)R, which is almost always either paired with a LUI/AUIPC making increased reach unnecessary OR has a zero offset)

- minimising the number of places in the opcode where each bit of a literal or offset can come from and *not* requiring mass shifters. I *think* it's the case that only bit 11 can come from three places in the instruction while all the other bits can come from at most two places in the instruction (and bits 13 to 19 only one). A constant 0 or sign extension is also possible as well of course.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 11, 2018, 02:44:40 am
Quote
- each instruction modifies at most one register.

That is pretty standard but how then do you handle integer multiplies and divides?  Break them up into two instructions?

Yes, Exactly this.

From the ISA spec (https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf):

Quote
If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[ S ]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 11, 2018, 02:52:36 am
Video #1 has been replaced today. I don't know what changed. I wonder if Western Digital will update the videos that show buggy code (at least with a text "oops.." overlay as they already did for a few things).
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 11, 2018, 04:34:15 am
Is there a preferred or recommended memory map for a RISC-V environment?

The ISA spec doesn't have much to say, apart from that the ISA is set up be helpful for generating relocatable code. Is there a guide of "common/best practice" for where you put your memory mapped I/O, bootstrap ROMs, and so on that is helpful?

For my software emulator I was thinking of trying something like the FE310-G000:

Quote
00000000:00000FFF Debug address space
00001000:01FFFFFF On-chip Non volatile memory
02000000:1FFFFFFF I/O
20000000:7FFFFFFF Off-chip Non volatile memory
80000000:FFFFFFFF On-chip volatile memory

And after a reset execution starts at 0000:1000.

Does that sound sane?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 11, 2018, 08:23:25 am
Is there a preferred or recommended memory map for a RISC-V environment?

The ISA spec doesn't have much to say, apart from that the ISA is set up be helpful for generating relocatable code. Is there a guide of "common/best practice" for where you put your memory mapped I/O, bootstrap ROMs, and so on that is helpful?

For my software emulator I was thinking of trying something like the FE310-G000:

Quote
00000000:00000FFF Debug address space
00001000:01FFFFFF On-chip Non volatile memory
02000000:1FFFFFFF I/O
20000000:7FFFFFFF Off-chip Non volatile memory
80000000:FFFFFFFF On-chip volatile memory

And after a reset execution starts at 0000:1000.

Does that sound sane?

Looks ok to me.

Neither the RISC-V user mode architecture nor the privileged mode architecture define a memory map. SiFive follows rocket-chip and I think other prior Berkeley practice with the memory map. I don't know whether other vendors do too.

You're supposed to read the configuration string to figure out where things are. The FE310 has a config string at 0x100C  in the mask ROM(just after the reset vector) containing:

Code: [Select]
/cs-v1/;
/{
  model = \"SiFive,FE310G-0000-Z0\";
  compatible = \"sifive,fe300\";
  /include/ 0x20004;
};

And then 0x20004 is in the OTP.

For Linux, the Boot Loader creates a deviceTree somehow (for example from config string, or it could be a DTB on disk) and passes it to the kernel when it starts it.

The bottom 4 GB on the FU540 are very similar the FE310 (including RAM at 0x8000_000:0xFFFF_FFFF), and then above 4 GB you have:

Code: [Select]
0x01_0000_0000:0x0F_FFFF_FFFF Peripherals
0x10_0000_0000:0x1F_FFFF_FFFF System
0x20_0000_0000:0x3F_FFFF_FFFF RAM
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 11, 2018, 02:17:23 pm
You probably can re-code it on MIPS one-to-one, except for LUI (if not followed by XORI or ADDI), which would require an extra instruction - very simple hardware emulator :)

Why every instruction has "11" at the end? This way it only uses 1/4 of the code point space.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 11, 2018, 09:01:39 pm
You probably can re-code it on MIPS one-to-one, except for LUI (if not followed by XORI or ADDI), which would require an extra instruction - very simple hardware emulator :)

Why every instruction has "11" at the end? This way it only uses 1/4 of the code point space.

This is just the RV32I (32-bit integer) instructions - the minimal set set you need to support. On top of this are the mult/div extensions, the floating point, compressed instructions and so on.

It is encoded this way to make life easier (i.e. smaller, faster, simpler) for the instruction fetching/decode.

The ISA supports quite a few different instruction lengths:
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on December 11, 2018, 09:27:29 pm
It would be an interesting experiment to do to implement this. And this is EXACTLY what RISC-V enables you to do for low cost in time and money. Modify your favourite FPGA implementation to have your new instructions, modify gcc or llvm to generate them, and run dhrystone/coremark/SPEC/your favourite benchmark suite without and without using the new instructions. Publish the results with execution time, energy use, area cost, and any effect on MHz. We all learn something!

It would be too big of a change to RISC-V.  It alters the basic ISA and architecture and then a new code generator would be required anyway.  It goes against the design principles of RISC-V.

Quote
Quote
Some ISAs do this to track whether a register has been used in the current execution context so that the entire register set does not need to be saved on a context switch.  The first use of a register is just another bit of state to save.

I haven't seen a bit for every register, but it's common for FPUs or vector units to have a single bit for the whole unit, as many programs don't use FP or vectors at all.

Back in June, Intel disclosed a "Lazy FPU State Restore" bug in all Core-based processors. Microsoft and others fixed the bug by disabling the use of the FPU dirty bit and just saving and restoring everything on every context switch. The effect on performance was basically unmeasurable.

That is what I was thinking of.  Intel of course managed to screw it up.  It had a limited effect on performance because of its limited applicability; task switching the vector instructions was already a performance problem.

Intel has a lot of performance problems with their vector units and so much so that they had to issue a directive not to use them for things like memory copies.

It would be more appropriate for a design intended to take advantage of it like with old ARM's load and store multiple.

Quote
Again, worth trying, though context switches are very rare on normal systems.

But subroutine calls are not.

Stack dumps would be marvelous in a bad way I think with a feature like this but I would want it anyway.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 11, 2018, 10:03:10 pm
It would be an interesting experiment to do to implement this. And this is EXACTLY what RISC-V enables you to do for low cost in time and money. Modify your favourite FPGA implementation to have your new instructions, modify gcc or llvm to generate them, and run dhrystone/coremark/SPEC/your favourite benchmark suite without and without using the new instructions. Publish the results with execution time, energy use, area cost, and any effect on MHz. We all learn something!

It would be too big of a change to RISC-V.  It alters the basic ISA and architecture and then a new code generator would be required anyway.  It goes against the design principles of RISC-V.

I sort of think that Bruce's use is fully in in line with the design principles of RISC-V...
"RISC-V (pronounced “risk-five”) is a new instruction set architecture (ISA) that was originally
designed to support computer architecture research and education..."

"An ISA separated into a small base integer ISA, usable by itself as a base for customized
accelerators or for educational purposes, and optional standard extensions, to support general purpose
software development."
/
To me, a RISC-V RV32I core, with a custom hyperconverged-blockchain-crypto-quantum-showlace-tying extension sounds perfectly in line with the design principles.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 11, 2018, 10:57:07 pm
Intel has a lot of performance problems with their vector units and so much so that they had to issue a directive not to use them for things like memory copies.

I definitely need to read this. I'm one of those who have been using them for memory copying for 20 years or so, and it always had performance increases in my tests when I moved to the next bigger register size over the years. Do you have any reference for the document?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on December 12, 2018, 02:40:52 am
Intel has a lot of performance problems with their vector units and so much so that they had to issue a directive not to use them for things like memory copies.

I definitely need to read this. I'm one of those who have been using them for memory copying for 20 years or so, and it always had performance increases in my tests when I moved to the next bigger register size over the years. Do you have any reference for the document?

The discussion was in the RWT (Real World Technologies) forums months ago.  The problem was library routines or periodically executed code which blindly uses AVX for string copies thereby triggering lower core clock rates when certain AVX instructions are used.  The result was lower scalar performance.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 12, 2018, 05:33:22 am
At the outset here, I want to point out that my previous post was describing what characteristics are generally associated with the term "RISC", not whether they are individually or collectively good or bad.

I think the collective goodness or badness is sufficiently addressed by the simple fact that since 1980 there has been no (successful) new CISC instruction set, and the ones that have survived have done so by translating to a more RISCy form internally.

CISC doesn't even win for compactness of programs in the 32 bit and 64 bit eras. RISC architectures with a simple mix of 2-byte and 4-byte instructions do. VAX and x86 both allow instructions with lengths from 1 byte to 15 or 16 bytes in 1 byte increments. VAX even bigger in some extreme instructions, but let's stick to common things such as ADDL3. In the 64 bit era, x86's 15 byte limitation is by fiat -- the syntax easily allows longer instructions.

- strict separation of computation from data transfer (load/store)

On the other hand, allowing ALU instructions to have one memory operand acts as a type of instruction set compression, lowers register pressure, and seems to have little disadvantage when out-of-order execution allows long load-to-use latencies from cache.

Lowers register pressure yes. "Little disadvantage" with OoO: yes. But a significant disadvantage with simpler in-order implementations, such as microcontrollers.

Program compression: it seems logical, but is not borne out by empirical data.

It's logical that not having to mention a temporary register name (twice!) in a load and subsequent arithmetic instruction could make for smaller programs. But no one seems to have been able to achieve this in practice.

I think the problem is that in all of PDP11, VAX, 68k, x86 each operand that can potentially be in memory is accompanied by an addressing mode field, which is *wasted* in the more common case (if you have enough registers) when the operand is actually in a register. PDP11 and 68k have in 16 bit opcodes two 3 bit register fields plus two 3 bit addressing modes. VAX has 4 bit register field plus 4 bit addressing mode for each operand. x86 limits one operand to be a register but effectively wastes three bits out of 16 for reg-reg operations (1 bit in the opcode to select reg-mem or mem-reg, plus 2 bits in the modr/m byte).

If not for those often-wasted addressing mode fields, all of PDP11, 68k and x86 could perhaps have instead fit 3-address instructions into a 16 bit opcode (as Thumb does for add&sub), saving a lot of extra instructions to copy one operand to where the result is required first.

Quote
Quote
- enough registers that you don't touch memory much. Arguments for most functions fit in registers, and the return address too (the otherwise RISC AVR8 violates this).

But if the register set is too large, it means more state to save.  There are other solutions for this though.

Sure. It's possible to go overboard. Four registers doesn't seem to be enough. 128 is too many. Sixteen seems to be a good number if you have complex addressing modes (big immediates and offsets, register plus scaled register), and thirty two if you don't. Eight isn't really enough, even with complex addressing modes, but two sets of eight seems workable e.g. 8 address plus 8 data (68k) or 8 general purpose lo registers plus 8 more limited hi (Thumb). Or 8 directly addressable, plus 8 that need an extra prefix byte to address (x86_64).

Quote
Quote
- no instruction can cause more than one cache or TLB miss, or two adjacent lines/pages if unaligned access is supported (and this case might be trapped and emulated)

The lack of hardware support for unaligned access always seems to end up being a performance problem once a processor gets deployed into the real world.

Weak memory ordering which seems like it should be an advantage also becomes a liability.

Unaligned access, I agree.

Weak memory ordering ... I think languages, compilers, and programmers have got that under control now. It took a while. I don't think TSO scales well to 100+ cores .. or even 50. We'll really start to see this bite (or not) in the next five years.

Quote
Quote
- each instruction modifies at most one register.

That is pretty standard but how then do you handle integer multiplies and divides?  Break them up into two instructions?

Yes. The high part of multiplies and the remainder from divisions are very rarely needed. Better to define separate instructions for them. Higher-end processors can notice both are being calculated and combine them, if that's profitable.

Quote
Quote
- integer instructions read at most two registers. This is ultra-purist :-) A number of RISC ISAs break it in order to have e.g. a register plus (scaled) register addressing mode, or conditional select. But no more than three!

Internally it seems like this sort of thing and modifying more than one register should be broken up into separate micro-operations so that the register file has a lower number of read and write ports.  The alternative is having to decode more instructions which clog up the front end once an out-of-order implementation is desired.

Breaking complex instructions up into several microops for execution is a valid thing to do, especially on a lower end implementation. Intel obviously does this at large scale, but even ARM does it quite a lot, including in Aarch64.

Recognising adjacent simple operations and dynamically combining them into a more powerful single operation is also a valid thing to do. Modern x86 does this, for example, when there is a compare followed by a conditional branch. Future high end RISC-V CPUs are also expected to do this heavily.

The former puts a burden on to low end implementations. The latter keeps low end implementations as simple as possible, while putting a burden onto high end implementations, which can perhaps more easily afford it.

Quote
On the other hand, this means discarding the performance advantages of the FMA instruction.

I was explicitly talking about *integer* instructions. Most floating point code uses FMA very heavily (if -ffast-math is enabled) and IEEE 754-2008 mandates it. It's a big performance advantage to have three read ports on the FP register file, worth the expense. It would be wasted most of the time on the integer register file.

Quote
Quote
- no microcode or hardware sequencing. Each instruction executes in a small and fixed number of clock cycles (usually one). Load/Store multiple are the main offenders in both ARM and PowerPC. They help with code size, but it's interesting that ARM didn't put them in Aarch64 and is deprecating them in 32 bit as well, providing the much less offensive load/store pair.

Maybe more interesting is why ARM even included them in the first place.  Load and store multiple took advantage of fast-page-mode DRAM access when ARMs instruction pipeline was closely linked with DRAM access.

Yes. Modern caches achieve the same effect with individual stores.

Load/store multiple do make programs significantly smaller, especially in function prologue and epilogue.

RISC-V gcc has an option "-msave-restore" (this might get included in -Os later) that calls one of several special library routines as the first instruction in functions to create the stack frame and save the return address plus s0-s2, or ra&s0-s6 or ra&s0-s10 or ra&s0-s11. Function return is then done by tail-calling the corresponding return function.

This replaces anything up to 29 instructions (116 bytes with rv32i or rv64i, 58 bytes with the C extension) with two instructions (8 bytes with rv32i or rv64i, 6 bytes with the C extension though I have a plan to reduce that to 4). Of course the average case is a lot less than that -- most functions that create a stack frame at all use three or fewer callee-save registers, so you're usually replacing 5/7/9/11 by 2 instructions.

The time cost is three extra jump instructions, plus sometimes a couple of unneeded loads and stores because not every size is provided in order to keep the library size down.

In the current version, the total size of the save/restore library functions is 96 bytes with RVC enabled.

I made a couple of tests on a HiFive1 board with gcc 7.2.0.

Dhrystone w/rv32im, -msave-restore made the program 4 bytes smaller, and 5% slower (1.66 vs 1.58).

CoreMark w/rv32imac, -msave-restore made the program 252 bytes (0.35%) smaller, and 0.4% FASTER (2.676 vs 2.687, with +/- 0.001 variance on different runs).

I attribute the faster speed for CoreMark to greater effectiveness of the 16 KB instruction cache.

Both of these are very small programs of course (especially Dhrystone). Bigger programs will show more difference.

Quote
Should stack instructions be broken up as well?

Yes. This is done almost universally in RISCV ISAs. A function starts with decrementing the stack pointer (once) and ends with incrementing it. Each register (or sometimes register pair) is saved and restored with an individual instruction -- which on a superscalar processor might run multiple in each clock cycle.

The x86's push/pop instructions are very hard on OoO machinery, and recent ones have a special push/pop execution (keeping track of the stack pointer) IN THE DECODER, translating each push or pop in a series to a store or load at an offset from the stack pointer as it was at the start of the sequence.

Quote
Quote
What a huge number of instructions *does* do is make very small low end implementations impossible. And puts a big burden of work on every hardware and every emulator implementer.

I do not know about that.  Multiple physical designs covering a wide performance range are possible with the same ISA.  Microcode is convenient to handle seldom used instructions.  Vector operations can be broken up into instructions which fit the micro-architecture's ALU width while allowing support for the same vector instruction set across a wide range of implementations.

Or you can use instruction set extensions every time you want to support a different vector length.  How many FPU ISAs has ARM gone through now?

Seldom-used operations are handled just as easily by library routines as by microcode -- especially ones which are inherently slow. This is commonly done in every CPU family for floating point operations. Sometimes the compiler generates a binary with instructions replaced by library calls. Sometimes the instruction is generated but on execution traps into the operating system which then calls the appropriate library function.

Microcode made sense when ROM within the processor was faster than RAM, but since about 1980 SRAM has actually been faster than ROM. You could copy the microcode into SRAM at boot -- and even allow the user to write some custom microcode, as the VAX did. Or you can use that SRAM as an icache and create instruction sets that don't need microcode. Then real functions are just as fast to call as microcode ones.

When Dave Patterson (who invented the terms "RISC" and later "RAID" and is currently involved with RISC-V and Google's TPU) was on sabbatical at DEC he discovered that even on the VAX, using a series of simple instructions was faster than using the complex microcoded instructions such as CALLS and RET (which automatically saved and restored registers for you, according to a bitmask at the start of the function).

John Cocke at IBM independently discovered the same fact about the IBM 370 at about the same time (in fact a couple of years earlier).

I agree with you about vectors. ARM has gone through several vector/multimedia instruction sets, and Intel is up to at least number 4 (depending on what you count the different iterations of SSE as).

The RISC-V vector instruction set which is being finalised at the moment (I'm on the Working Group) -- and which has been in development and testing in various iterations for at least ten years (it's the main reason the RISC-V project was started in the first place) -- is vector length agnostic. The same code will run on hardware with any vector length.

ARM's new SVE is of course similar.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 12, 2018, 05:56:53 am
You probably can re-code it on MIPS one-to-one, except for LUI (if not followed by XORI or ADDI), which would require an extra instruction - very simple hardware emulator :)

Sure, it's very very similar to MIPS.

MIPS has 16 bits for both LUI and for XORI/ADDI (and all other immediates and offsets), while RISC-V has 20 bits for LUI/AUIPC/JAL and 12 bits for everything else. This is the major thing that buys a lot of spare instruction encoding space in RISC-V compared to MIPS.

Quote
Why every instruction has "11" at the end? This way it only uses 1/4 of the code point space.

Aaaand .. that's how a little of the extra instruction encoding space is spent :-)

Instructions with 11 in the LSBs are 32 bit instructions (30 bits available to be used).
Instructions with 00/01/10 in the LSBs are 16 bit instructions (49152 (48k) encodings available)

Instructions with 11111 in the LSBs are reserved for instructions longer than 32 bits. So actually there are only 939,524,096 possible 32 bit opcodes not 1,073,741,824.


You can compare this to Thumb2 where instructions with 111 in the MSBs of the first 16 bit packet are 32 bit instructions and all others are 16 bit instructions.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: obiwanjacobi on December 12, 2018, 06:07:35 am
I don't get why you would talk about Risc-V asm and then spend 6 episodes trying to take the beginners by the hand.
I would rather see video's that assume a certain level - close to the material at hand- and provide resources where to get up to speed.

Other than that it was a pretty nice series.

BTW, is debugging on PlatformIO free these days? Last I looked at it you needed a pro (paid) account...
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 12, 2018, 07:21:43 am
It would be an interesting experiment to do to implement this. And this is EXACTLY what RISC-V enables you to do for low cost in time and money. Modify your favourite FPGA implementation to have your new instructions, modify gcc or llvm to generate them, and run dhrystone/coremark/SPEC/your favourite benchmark suite without and without using the new instructions. Publish the results with execution time, energy use, area cost, and any effect on MHz. We all learn something!

It would be too big of a change to RISC-V.  It alters the basic ISA and architecture and then a new code generator would be required anyway.  It goes against the design principles of RISC-V.

I disagree. Your exact suggestion is a little outside the scope of what SiFive is at present allowing for automated generation of CPU cores with custom instructions specified by customers: those will at least at first be restricted to "A = B op C" where op is specified by customer-supplied HDL. However, if you take source code to an existing core it would be trivial to enlarge each registers and data bus by 1 bit, and add new branch instructions based on that bit.

However I'd suggest another plan. Just create a new instruction "SETV a,b,c" that sets a to 1 if b+c has a signed overflow and to 0 otherwise. And/or create an instruction "BVS b,c,label" that branches to label if b+c overflows. And BVC if you want. Or TRAPV b,c, or ADDV a,b,c (that traps if b+c overflows). It's up to you. All of those would fit into existing instruction formats and pipelines no problem at all. Well .. the trapping ones would take a little more work. But they're all easier than your original suggestion.

I note that MIPS "ADD" and "ADDI" instructions trap on overflow, but virtually no one ever using them, using the "Unsigned add" instruction even for signed values, and they are now deprecated.

Quote
Quote
Again, worth trying, though context switches are very rare on normal systems.

But subroutine calls are not.

But the vast majority of subroutine calls save and restore very few registers.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 12, 2018, 07:32:29 am
BTW, is debugging on PlatformIO free these days? Last I looked at it you needed a pro (paid) account...

Yep, $10/month for pro, with a 30 day free trial.

I guess that's my biggest problem with the series. It could have used SiFive's free Eclipse-based "Freedom Studio", which of course works with the SiFive board.

The VS Code plus PlatformIO setup works very nicely, but .... yeah.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on December 12, 2018, 03:11:59 pm
I disagree. Your exact suggestion is a little outside the scope of what SiFive is at present allowing for automated generation of CPU cores with custom instructions specified by customers: those will at least at first be restricted to "A = B op C" where op is specified by customer-supplied HDL. However, if you take source code to an existing core it would be trivial to enlarge each registers and data bus by 1 bit, and add new branch instructions based on that bit.

It is more than that because the flags are stored in a separate independently accessed register file so no additional ports need to be added to the regular register file.  This does not matter so much in low performance implementations but it is a performance limiting problem with superscalar designs.

Quote
However I'd suggest another plan. Just create a new instruction "SETV a,b,c" that sets a to 1 if b+c has a signed overflow and to 0 otherwise. And/or create an instruction "BVS b,c,label" that branches to label if b+c overflows. And BVC if you want. Or TRAPV b,c, or ADDV a,b,c (that traps if b+c overflows). It's up to you. All of those would fit into existing instruction formats and pipelines no problem at all. Well .. the trapping ones would take a little more work. But they're all easier than your original suggestion.

This gets back to the reason to store the flags in the first place.  It avoids having to execute the same ALU operation again which is a waste of power.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 12, 2018, 06:00:15 pm
Sure. It's possible to go overboard. Four registers doesn't seem to be enough. 128 is too many. Sixteen seems to be a good number if you have complex addressing modes (big immediates and offsets, register plus scaled register), and thirty two if you don't. Eight isn't really enough, even with complex addressing modes ...

So it seems. i86 in 32-bit mode has 8 registers, x64 has 16. Should be a huge improvement, right? But in practice, there's none. If you compile the same program for 32-bt and for 64-bit and run it on the same computer, there's no increase in speed whatsoever.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 12, 2018, 08:25:08 pm
Video #1 has been replaced today. I don't know what changed. I wonder if Western Digital will update the videos that show buggy code (at least with a text "oops.." overlay as they already did for a few things).

I watched the new version and no changes jumped out at me.  It is just an introduction with very little technical content other than describing the tools and 3 books.  Still, it a pretty good introduction!

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 12, 2018, 08:26:22 pm
Sure. It's possible to go overboard. Four registers doesn't seem to be enough. 128 is too many. Sixteen seems to be a good number if you have complex addressing modes (big immediates and offsets, register plus scaled register), and thirty two if you don't. Eight isn't really enough, even with complex addressing modes ...

So it seems. i86 in 32-bit mode has 8 registers, x64 has 16. Should be a huge improvement, right? But in practice, there's none. If you compile the same program for 32-bt and for 64-bit and run it on the same computer, there's no increase in speed whatsoever.

Is this because the code generator doesn't even bother with the extra registers?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 12, 2018, 09:20:24 pm
Is this because the code generator doesn't even bother with the extra registers?

I think it does. It has to because the calling conventions are different.

It would be a good experiment to run benchmarks on RISC-V with different number of registers and see how the performance depends on the number of registers. Either GCC or other compilers may allow this.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 01:43:15 am
Quote
However I'd suggest another plan. Just create a new instruction "SETV a,b,c" that sets a to 1 if b+c has a signed overflow and to 0 otherwise. And/or create an instruction "BVS b,c,label" that branches to label if b+c overflows. And BVC if you want. Or TRAPV b,c, or ADDV a,b,c (that traps if b+c overflows). It's up to you. All of those would fit into existing instruction formats and pipelines no problem at all. Well .. the trapping ones would take a little more work. But they're all easier than your original suggestion.

This gets back to the reason to store the flags in the first place.  It avoids having to execute the same ALU operation again which is a waste of power.

Where and when are you going to use this new oVerflow flag, and how often?

99.9999% of when it gets used on an x86 or ARM is because you don't have a direct "branch to label if A < B, signed" like on RISC-V but only a "CMP" instruction which does a subtraction and sets the flags but throws away the result. And then the conditional branch needs to reconstruct what happened -- and only the conditional branch knows whether you wanted a signed or unsigned comparison.

Given that you have "branch if A < B, signed", writing an overflow flag that's never going to be looked at from a million instructions will be a bigger waste of energy than computing an addition twice instead of once, one instruction in a million.

What is your use-case where this overflow flag is so critical?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on December 13, 2018, 01:51:02 am
Where and when are you going to use this new oVerflow flag, and how often?

99.9999% of when it gets used on an x86 or ARM is because you don't have a direct "branch to label if A < B, signed" like on RISC-V but only a "CMP" instruction which does a subtraction and sets the flags but throws away the result. And then the conditional branch needs to reconstruct what happened -- and only the conditional branch knows whether you wanted a signed or unsigned comparison.

Given that you have "branch if A < B, signed", writing an overflow flag that's never going to be looked at from a million instructions will be a bigger waste of energy than computing an addition twice instead of once, one instruction in a million.

What is your use-case where this overflow flag is so critical?

Multiword math.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 13, 2018, 02:06:49 am
Quote
The simple fact that since 1980 there has been no (successful) new CISC instruction set
What about the MSP430?

So ... what prompts the development of a new RISC instruction set, anyway?
You'd think that by the time things were "reduced" enough, there wouldn't be all that much room for innovation or improvement.   Do you learn from mistakes in other vendors' instruction sets (I've got to say that the more I look at it, the more unpleasant I find the ARM v6m instruction set. (CM0: thumb-16 only))  Do advances in hardware (what's "standard" in an FPGA, for instance, or the growing popularity of QSPI memory)or SW issues (security) drive things?
I guess RISC-V is somewhat motivated by wanting to provide an open-source instruction set.  Bur is that all?

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 02:20:11 am
So it seems. i86 in 32-bit mode has 8 registers, x64 has 16. Should be a huge improvement, right? But in practice, there's none. If you compile the same program for 32-bt and for 64-bit and run it on the same computer, there's no increase in speed whatsoever.

There are several differences between 32 bit code and 64 bit code, all of which will have an effect:

- 16 registers instead of 8. It does in fact make it faster for non-trivial programs :-)

- pointers are 64 bit instead of 32 bit. This makes data structures bigger and programs slower. Except when your program won't fit into 4 GB (or 3 GB or whatever) and so won't run at *all*.

- calling convention is to use registers for arguments, not stack. Does in fact make the program faster.

- arithmetic is 64 bit instead of 32 bit. Has no effect if your program only uses 32 bit variables, a significant effect if you use 64 bit variables.

It's pretty hard to separate out and test these factors individually. The Linux kernel and gcc support the `-mx32` flag that uses 64 bit registers, 16 registers, and arguments in registers, but uses only 32 bit pointers. It runs faster than standard 64 bit code, and a LOT faster than i686.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 02:25:07 am
Is this because the code generator doesn't even bother with the extra registers?

I think it does. It has to because the calling conventions are different.

It would be a good experiment to run benchmarks on RISC-V with different number of registers and see how the performance depends on the number of registers. Either GCC or other compilers may allow this.

They do. "rv32e" is a standard alternative for very small embedded systems that is the same as "rv32i" but only had 16 registers. gcc supports it.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 02:27:48 am
Where and when are you going to use this new oVerflow flag, and how often?

99.9999% of when it gets used on an x86 or ARM is because you don't have a direct "branch to label if A < B, signed" like on RISC-V but only a "CMP" instruction which does a subtraction and sets the flags but throws away the result. And then the conditional branch needs to reconstruct what happened -- and only the conditional branch knows whether you wanted a signed or unsigned comparison.

Given that you have "branch if A < B, signed", writing an overflow flag that's never going to be looked at from a million instructions will be a bigger waste of energy than computing an addition twice instead of once, one instruction in a million.

What is your use-case where this overflow flag is so critical?

Multiword math.

Multiword math uses carry, not overflow. Carry can be detected by simply testing if the result is less than one of the arguments (it doesn't matter which one), and either branch on that (using BLTU) or set a register to the value of the carry (using SLTU). Pick your poison.

It's also extremely rare and not a performance influencer outside of specialised domains.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 13, 2018, 02:52:21 am
IP Checksum can make use of a carry flag (a one's complement add does an end-around carry, and you can fold that into a series of "add with carry" on the widest word your CPU implements, to achieve a improvement over the methods that are (now) "traditionally" used in C.
(eg twice the speed and half the size on AVR: https://github.com/WestfW/Duino-hacks/blob/master/ipchecksum_test/ipchecksum_test.ino (https://github.com/WestfW/Duino-hacks/blob/master/ipchecksum_test/ipchecksum_test.ino) )
I'm not sure I'd call that "common" enough justify a carry flag bit, except that ... today's CPUs tend to do an awful lot of IP checksumming!
(also: limited by memory speed fetching the data to-be-checksummed, so probably irrelevant on faster RISC chips.  It takes a little getting used to when a valid answer to "but it takes more instructions" is "so what?")
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 03:49:33 am
(also: limited by memory speed fetching the data to-be-checksummed, so probably irrelevant on faster RISC chips.  It takes a little getting used to when a valid answer to "but it takes more instructions" is "so what?")

Precisely.

While you were posting that, I did a little experiment and whipped up a quick and dirty multi-word add in C.

Code: [Select]
typedef unsigned int uint;

void bignumAdd(uint *res, uint *a, uint *b, int len){
    uint carry = 0;
    for (int i=0; i<len; ++i){
        uint t = a[i] + b[i];
        uint carryOut = t < a[i];
        t += carry;
        res[i] = t;
        carry = (t<carry) | carryOut;
    }
}

Here's the code for 32 bit RISC-V with gcc -O2

Code: [Select]
00000000 <bignumAdd>:
   0:   02d05763                blez    a3,2e <.L1>
   4:   068a                    slli    a3,a3,0x2
   6:   00d588b3                add     a7,a1,a3
   a:   4781                    li      a5,0

0000000c <.L3>:
   c:   4198                    lw      a4,0(a1)
   e:   4214                    lw      a3,0(a2)
  10:   0591                    addi    a1,a1,4
  12:   0611                    addi    a2,a2,4
  14:   96ba                    add     a3,a3,a4
  16:   00f68833                add     a6,a3,a5
  1a:   01052023                sw      a6,0(a0)
  1e:   00e6b733                sltu    a4,a3,a4
  22:   00f837b3                sltu    a5,a6,a5
  26:   8fd9                    or      a5,a5,a4
  28:   0511                    addi    a0,a0,4
  2a:   feb891e3                bne     a7,a1,c <.L3>

0000002e <.L1>:
  2e:   8082                    ret

Here's for Thumb2:

Code: [Select]
00000000 <bignumAdd>:
   0:   2b00            cmp     r3, #0
   2:   dd19            ble.n   38 <bignumAdd+0x38>
   4:   3a04            subs    r2, #4
   6:   eb01 0383       add.w   r3, r1, r3, lsl #2
   a:   3804            subs    r0, #4
   c:   b4f0            push    {r4, r5, r6, r7}
   e:   2500            movs    r5, #0

  10:   f851 4b04       ldr.w   r4, [r1], #4
  14:   2700            movs    r7, #0
  16:   f852 6f04       ldr.w   r6, [r2, #4]!
  1a:   19a4            adds    r4, r4, r6
  1c:   bf28            it      cs
  1e:   2701            movcs   r7, #1
  20:   1964            adds    r4, r4, r5
  22:   f840 4f04       str.w   r4, [r0, #4]!
  26:   bf2c            ite     cs
  28:   2401            movcs   r4, #1
  2a:   2400            movcc   r4, #0
  2c:   428b            cmp     r3, r1
  2e:   ea47 0504       orr.w   r5, r7, r4
  32:   d1ed            bne.n   10 <bignumAdd+0x10>

  34:   bcf0            pop     {r4, r5, r6, r7}
  36:   4770            bx      lr
  38:   4770            bx      lr
  3a:   bf00            nop

48 bytes for RISC-V, with 12 instructions in the loop.
58 bytes for ARM, with 14 instructions in the loop. (I'm not counting the nop for alignment)

It seems pretty obvious you could improve the ARM one by hand coding in assembly language, but few people are going to do that -- they just use gcc and take what they get.

The RISC-V one can't be improved by hand coding assembly language. I like that. Who doesn't want to get the best results possible just by coding in C?

By the way, I swear I wrote the C code at the start, compiled it for both, and did not adjust the C in any way.

Let's try 64 bit ARM!

Code: [Select]
0000000000000000 <bignumAdd>:
   0:   7100007f        cmp     w3, #0x0
   4:   5400020d        b.le    44 <bignumAdd+0x44>
   8:   d2800004        mov     x4, #0x0                        // #0
   c:   52800005        mov     w5, #0x0                        // #0

  10:   b8647828        ldr     w8, [x1, x4, lsl #2]
  14:   b8647846        ldr     w6, [x2, x4, lsl #2]
  18:   0b060106        add     w6, w8, w6
  1c:   0b0500c7        add     w7, w6, w5
  20:   6b06011f        cmp     w8, w6
  24:   1a9f97e6        cset    w6, hi  // hi = pmore
  28:   b8247807        str     w7, [x0, x4, lsl #2]
  2c:   6b0500ff        cmp     w7, w5
  30:   91000484        add     x4, x4, #0x1
  34:   1a9f27e5        cset    w5, cc  // cc = lo, ul, last
  38:   6b04007f        cmp     w3, w4
  3c:   2a0500c5        orr     w5, w6, w5
  40:   54fffe8c        b.gt    10 <bignumAdd+0x10>

  44:   d65f03c0        ret

So that's 72 bytes of code (by far the biggest) and 13 instructions in the loop (one more than RISC-V, one less than Thumb2).

Maybe x86_64?

Code: [Select]
0000000000000000 <bignumAdd>:
   0:   85 c9                   test   %ecx,%ecx
   2:   7e 39                   jle    3d <bignumAdd+0x3d>
   4:   8d 41 ff                lea    -0x1(%rcx),%eax
   7:   45 31 c0                xor    %r8d,%r8d
   a:   31 c9                   xor    %ecx,%ecx
   c:   4c 8d 14 85 04 00 00    lea    0x4(,%rax,4),%r10
  13:   00
  14:   0f 1f 40 00             nopl   0x0(%rax)

  18:   45 31 c9                xor    %r9d,%r9d
  1b:   8b 04 0a                mov    (%rdx,%rcx,1),%eax
  1e:   03 04 0e                add    (%rsi,%rcx,1),%eax
  21:   72 1c                   jb     3f <bignumAdd+0x3f>
  23:   44 01 c0                add    %r8d,%eax
  26:   41 0f 92 c0             setb   %r8b
  2a:   89 04 0f                mov    %eax,(%rdi,%rcx,1)
  2d:   48 83 c1 04             add    $0x4,%rcx
  31:   45 0f b6 c0             movzbl %r8b,%r8d
  35:   45 09 c8                or     %r9d,%r8d
  38:   49 39 ca                cmp    %rcx,%r10
  3b:   75 db                   jne    18 <bignumAdd+0x18>

  3d:   f3 c3                   repz retq

  3f:   41 b9 01 00 00 00       mov    $0x1,%r9d
  45:   eb dc                   jmp    23 <bignumAdd+0x23>

71 bytes of code, 12 instructions in the loop but 2 more outside (the compiler introduces conditional branches where there were none).

How about 32 bit x86?

Code: [Select]
00000000 <bignumAdd>:
   0:   55                      push   %ebp
   1:   57                      push   %edi
   2:   56                      push   %esi
   3:   53                      push   %ebx
   4:   8b 44 24 20             mov    0x20(%esp),%eax
   8:   8b 7c 24 14             mov    0x14(%esp),%edi
   c:   8b 5c 24 18             mov    0x18(%esp),%ebx
  10:   8b 6c 24 1c             mov    0x1c(%esp),%ebp
  14:   85 c0                   test   %eax,%eax
  16:   7e 29                   jle    41 <bignumAdd+0x41>
  18:   31 c9                   xor    %ecx,%ecx
  1a:   31 d2                   xor    %edx,%edx
  1c:   8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi

  20:   31 f6                   xor    %esi,%esi
  22:   8b 44 8d 00             mov    0x0(%ebp,%ecx,4),%eax
  26:   03 04 8b                add    (%ebx,%ecx,4),%eax
  29:   72 1b                   jb     46 <bignumAdd+0x46>
  2b:   01 d0                   add    %edx,%eax
  2d:   0f 92 c2                setb   %dl
  30:   89 04 8f                mov    %eax,(%edi,%ecx,4)
  33:   83 c1 01                add    $0x1,%ecx
  36:   0f b6 d2                movzbl %dl,%edx
  39:   09 f2                   or     %esi,%edx
  3b:   39 4c 24 20             cmp    %ecx,0x20(%esp)
  3f:   75 df                   jne    20 <bignumAdd+0x20>

  41:   5b                      pop    %ebx
  42:   5e                      pop    %esi
  43:   5f                      pop    %edi
  44:   5d                      pop    %ebp
  45:   c3                      ret   
  46:   be 01 00 00 00          mov    $0x1,%esi
  4b:   eb de                   jmp    2b <bignumAdd+0x2b>

77 bytes of code, and the same 12-14 instructions in the loop.

Motorola 68000?

Code: [Select]
00000000 <bignumAdd>:
   0:   4e56 0000       linkw %fp,#0
   4:   48e7 3030       moveml %d2-%d3/%a2-%a3,%sp@-
   8:   262e 0014       movel %fp@(20),%d3
   c:   6f36            bles 44 <bignumAdd+0x44>
   e:   206e 000c       moveal %fp@(12),%a0
  12:   266e 0010       moveal %fp@(16),%a3
  16:   246e 0008       moveal %fp@(8),%a2
  1a:   e58b            lsll #2,%d3
  1c:   d688            addl %a0,%d3
  1e:   4281            clrl %d1

  20:   2418            movel %a0@+,%d2
  22:   2002            movel %d2,%d0
  24:   d09b            addl %a3@+,%d0
  26:   2240            moveal %d0,%a1
  28:   d3c1            addal %d1,%a1
  2a:   24c9            movel %a1,%a2@+
  2c:   b082            cmpl %d2,%d0
  2e:   55c0            scs %d0
  30:   4400            negb %d0
  32:   b289            cmpl %a1,%d1
  34:   52c1            shi %d1
  36:   4401            negb %d1
  38:   8200            orb %d0,%d1
  3a:   0281 0000 00ff  andil #255,%d1
  40:   b1c3            cmpal %d3,%a0
  42:   66dc            bnes 20 <bignumAdd+0x20>

  44:   4cdf 0c0c       moveml %sp@+,%d2-%d3/%a2-%a3
  48:   4e5e            unlk %fp
  4a:   4e75            rts

76 bytes of code, 16 instruction in the loop.

msp430?

Code: [Select]
00000000 <bignumAdd>:
   0:   0b 12           push    r11             
   2:   0a 12           push    r10             
   4:   09 12           push    r9             
   6:   08 12           push    r8             
   8:   07 12           push    r7             
   a:   06 12           push    r6             
   c:   1c 93           cmp     #1,     r12     ;r3 As==01
   e:   1f 38           jl      $+64            ;abs 0x4e
  10:   29 4e           mov     @r14,   r9     
  12:   28 4d           mov     @r13,   r8     
  14:   0c 5c           rla     r12             
  16:   0b 43           clr     r11             
  18:   0a 43           clr     r10             
  1a:   06 3c           jmp     $+14            ;abs 0x28

  1c:   09 4e           mov     r14,    r9     
  1e:   09 5b           add     r11,    r9     
  20:   29 49           mov     @r9,    r9     
  22:   08 4d           mov     r13,    r8     
  24:   08 5b           add     r11,    r8     
  26:   28 48           mov     @r8,    r8     
  28:   08 59           add     r9,     r8     
  2a:   07 48           mov     r8,     r7     
  2c:   07 5a           add     r10,    r7     
  2e:   06 4f           mov     r15,    r6     
  30:   06 5b           add     r11,    r6     
  32:   86 47 00 00     mov     r7,     0(r6)   ;0x0000(r6)
  36:   16 43           mov     #1,     r6      ;r3 As==01
  38:   07 9a           cmp     r10,    r7     
  3a:   01 28           jnc     $+4             ;abs 0x3e
  3c:   06 43           clr     r6             
  3e:   1a 43           mov     #1,     r10     ;r3 As==01
  40:   08 99           cmp     r9,     r8     
  42:   01 28           jnc     $+4             ;abs 0x46
  44:   0a 43           clr     r10             
  46:   0a d6           bis     r6,     r10     
  48:   2b 53           incd    r11             
  4a:   0b 9c           cmp     r12,    r11     
  4c:   e7 23           jnz     $-48            ;abs 0x1c

  4e:   36 41           pop     r6             
  50:   37 41           pop     r7             
  52:   38 41           pop     r8             
  54:   39 41           pop     r9             
  56:   3a 41           pop     r10             
  58:   3b 41           pop     r11             instructions in the loop. (I'm not counting the nop for alignment)
  5a:   30 41           ret                     

92 bytes of code, and 24 instructions in the loop.

sh4:

Code: [Select]
00000000 <bignumAdd>:
   0:   15 47           cmp/pl  r7
   2:   13 8b           bf      2c <bignumAdd+0x2c>
   4:   73 63           mov     r7,r3
   6:   08 43           shll2   r3
   8:   fc 73           add     #-4,r3
   a:   09 43           shlr2   r3
   c:   00 e1           mov     #0,r1
   e:   01 73           add     #1,r3

  10:   56 60           mov.l   @r5+,r0
  12:   66 62           mov.l   @r6+,r2
  14:   0c 32           add     r0,r2
  16:   23 67           mov     r2,r7
  18:   26 30           cmp/hi  r2,r0
  1a:   1c 37           add     r1,r7
  1c:   29 02           movt    r2
  1e:   76 31           cmp/hi  r7,r1
  20:   29 01           movt    r1
  22:   72 24           mov.l   r7,@r4
  24:   10 43           dt      r3
  26:   04 74           add     #4,r4
  28:   f2 8f           bf.s    10 <bignumAdd+0x10>

  2a:   2b 21           or      r2,r1
  2c:   0b 00           rts     
  2e:   09 00           nop     

46 bytes of code and 13 instructions in the loop. I like it!

To summarise .. bytes of code and instructions in the loop for all

48 12 RISC-V
58 14 Thumb2
72 13 Aarch64
71 14 x86_64
77 14 i386
76 16 m68k
92 24 msp430
46 13 sh4

sh4 and RISC-V the clear winners, Thumb2 not too far behind.

x86_64, aarch64, m68k, i386 in a close bunch

msp430 bringing up the rear by some distance.

Once again, this is just one arbitrary silly example, made (I think) somewhat fair by using gcc for everything, as most people do. You could improve a lot of these by handing coding, at least for code size -- maybe or maybe not for execution speed.

I *like* a processor where you can do everything in C.

Anyone want to show off their hand coding?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 13, 2018, 04:38:44 am
I *like* a processor where you can do everything in C.
So do I. How do I do startup code and interrupt handlers in C with RISC-V?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 04:50:06 am
Quote
The simple fact that since 1980 there has been no (successful) new CISC instruction set

What about the MSP430?

Yes, it's a little bit CISCy with memory-to-memory moves and adds. It falls into the PDP11-design M68000 space.

Based on the example I just tried, the code isn't very compact! At least as generated from C by gcc.

Quote
So ... what prompts the development of a new RISC instruction set, anyway?
You'd think that by the time things were "reduced" enough, there wouldn't be all that much room for innovation or improvement.   Do you learn from mistakes in other vendors' instruction sets (I've got to say that the more I look at it, the more unpleasant I find the ARM v6m instruction set. (CM0: thumb-16 only))  Do advances in hardware (what's "standard" in an FPGA, for instance, or the growing popularity of QSPI memory)or SW issues (security) drive things?
I guess RISC-V is somewhat motivated by wanting to provide an open-source instruction set.  Bur is that all?

RISC-V was motivated originally by wanting something as simple as possible (while still being effective) for Berkeley students to use to:

- learn assembly language programming (undergrad)
- design and implement their own processor core (masters?)
- design and implement experimental instruction set extensions (PhD)

Their previous experience was mostly with MIPS, but:

- only 32 bit could be used freely
- it's got annoying warts
- there is very little spare opcode space to put experimental extensions in .. and no easy way to go variable-length and have longer instructions.

ARM and x86 were considered as bases, but:

- x86 was far too complex for students to implement, and legally completely impossible to get permission to use, especially if you wanted to distribute results of your work.

- ARM also very complex, 32 bit only (at that time), mostly impossible to get permission to use, and again then not publishable.

OpenRISC and LEON were also considered, but again the things you could use were 32-bit only, and also lacked space opcode space.

I've looked closely at OpenRISC and it's very nicely done, as an instruction set pitched at one point in time, one fixed set of capabilities.

RISC-V is intended as a simple base that is sufficient for standard software (Linux or similar kernel and applications (Windows or MacOS/iOS would be fine too), compiled from C/C++ and similar languages) and that base is fixed and software built for it will work forever. But there is room and mechanism to add any number of future extensions targeting as yet unthought of application areas.

Will that be successful or not, I don't know, but anyway it's trying :-)

At the very least, it's not going to disappear without trace when the company that owns it goes out of business or loses interest. That's a serious problem. How much software has been lost as a result of the demise of PDP11, VAX, Alpha, Nova, Eclipse, PA-RISC, Itanium (Kittson is the last generation ever), SPARC (Oracle has cancelled future development, though Fujitsu is soldiering on for the moment).

Linux and the GNU toolchain and environment help to keep software alive over changes in instruction set. RISC-V means it may not be necessary to have another instruction set in future, unless and until computers change fundamentally from the way they've been since at least the 1960s. The Linux kernel maintainers even speculated that RISC-V and C-SKY might be the last hardware ISAs ever added to Linux, as everyone not using ARM/x86/POWER is switching to RISC-V for future chips -- including C-SKY and Andes (nds32).

https://www.phoronix.com/scan.php?page=news_item&px=C-SKY-Approved-Last-Arch (https://www.phoronix.com/scan.php?page=news_item&px=C-SKY-Approved-Last-Arch)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 05:07:20 am
I *like* a processor where you can do everything in C.
So do I. How do I do startup code and interrupt handlers in C with RISC-V?

All algorithms found in user applications, I mean.

Operating systems and compiler runtime libraries are always going to have a little bit of assembler in them -- at least as long as there exist CSRs that are not memory-mapped.

However that's out of scope for both the C language and the RISC-V Unprivileged ISA (which is the thing that is portable and hopefully forever).

Interrupt handlers are no problem. You can either use hardware vectoring and add __attribute__((interrupt)) to your C function (in which case it will save all registers it uses), or else with the CLIC which is up for ratification as a standard for RISC-V you can use absolutely standard C functions along with a small firmware software vectoring function that can live in mask ROM (making it little different to microcode). When using the upcoming EABI instead of the Linux ABI the latency to get to a standard C function is very similar to that on ARM Cortex M.

Details at https://github.com/sifive/clic-spec/blob/master/clic.adoc#c-abi-trampoline-code

I notice your company's management have said they are 100% behind your subsidiary's move to put RISC-V into their FPGAs.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 13, 2018, 05:09:34 am
Operating systems and compiler runtime libraries are always going to have a little bit of assembler in them -- at least as long as there exist CSRs that are not memory-mapped.
That's really a sign of a bad design in 2018.

I notice your company's management have said they are 100% behind your subsidiary's move to put RISC-V into their FPGAs.
So?

Current RISC-V implementations are heavily slanted towards MPUs. Which is fine, but has to be acknowledged. Trying to shove the same design in an MCU will not go over well.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 13, 2018, 05:35:47 am
Anyone want to show off their hand coding?

Why not. dsPIC33:

Code: [Select]
      dec w0,w0
      add #0,w0 ; clear carry
      do w0,loop
      mov [w1++],w4
loop: addc w4,[w2++],[w3++]

5 instructions, 15 bytes, n+3 instruction cycles (where n is the number of bytes)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 13, 2018, 05:38:52 am
They do. "rv32e" is a standard alternative for very small embedded systems that is the same as "rv32i" but only had 16 registers. gcc supports it.

Were there any performance comparisons between these two?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 13, 2018, 06:26:35 am
Anyone want to show off their hand coding?

Of course, Intel is not slauch neither.

32-bit:

Code: [Select]
      sub eax,eax

loop: mov ecx,[esi+eax]
      adc ecx,[ebx+eax]
      mov [edi+eax],ecx
      lea eax,[eax+4]
      loop loop

6 instructions, 16 bytes, probably below 1 cycle per byte.

64-bit:

Code: [Select]
      sub eax,eax

loop: mov rcx,[rsi+rax]
      adc rcx,[rbx+rax]
      mov [rdi+rax],rcx
      lea rax,[rax+8]
      loop loop

This requires 4 REX bytes, so it's 20 bytes total, but that's where the 64-bitness helps. It'll probably run at about 0.4 bytes per cycle.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 07:00:45 am
Operating systems and compiler runtime libraries are always going to have a little bit of assembler in them -- at least as long as there exist CSRs that are not memory-mapped.
That's really a sign of a bad design in 2018.

I disagree.

There are very good reasons that you don't want settings that control the deep modes of operation of a processor to be able to be altered by a store instruction that can have its address arbitrarily computed. Things such as enabling or disabling MMUs, or setting the base address for an interrupt vector, or changing the register width and instruction set between 32 bits and 64 bits. These should all be recognised as special instructions early in the pipeline, with the "name" of the affected register/setting hardcoded in the instruction, not something that is discovered many cycles later, and possibly only after waiting hundreds of cycles for a memory load or other operation. You really don't want later instructions to have been already been attempted to be executed out of order -- possibly with an entirely different instruction set or opcode encoding.

Maybe you can get away with that in a single issue in-order tiny microcontroller -- and people implementing those are *welcome* to memory map everything they want -- but in that case you're stepping outside what is standard and can be depended on across *all* RISC-V implementations.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 07:07:46 am
Anyone want to show off their hand coding?

Why not. dsPIC33:

Code: [Select]
      dec w0,w0
      add #0,w0 ; clear carry
      do w0,loop
      mov [w1++],w4
loop: addc w4,[w2++],[w3++]

5 instructions, 15 bytes, n+3 instruction cycles (where n is the number of bytes)

Very nice.

How does gcc do on it?

I'm actually very disappointed that manufacturers of machines with condition codes don't seem to have added recognition of the C idiom for the carry flag and generated "add with carry" from it. gcc on every machine does recognise idioms for things such as rotate and generate rotate instructions even though C doesn't have an operator for it. Or maybe I just didn't find the correct idiom? Can anyone assist with that?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 07:13:25 am
Anyone want to show off their hand coding?

Of course, Intel is not slauch neither.

32-bit:

Code: [Select]
      sub eax,eax

loop: mov ecx,[esi+eax]
      adc ecx,[ebx+eax]
      mov [edi+eax],ecx
      lea eax,[eax+4]
      loop loop

6 instructions, 16 bytes, probably below 1 cycle per byte.

64-bit:

Code: [Select]
      sub eax,eax

loop: mov rcx,[rsi+rax]
      adc rcx,[rbx+rax]
      mov [rdi+rax],rcx
      lea rax,[rax+8]
      loop loop

This requires 4 REX bytes, so it's 20 bytes total, but that's where the 64-bitness helps. It'll probably run at about 0.4 bytes per cycle.

I'm afraid I don't understand how those work.

I thought "loop" decrements cx/ecx/rcx and loops if it's not zero. But you're using cx as a temporary to hold the result of the adc.

Also, where is the "ret", even if nothing else is needed?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 13, 2018, 08:08:53 am
MIPS doesn't have any Control and Status Register, and it's fine this way  :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 08:22:23 am
MIPS doesn't have any Control and Status Register, and it's fine this way  :D

Sure it does. They live in Coprocessor #0 and are accessed using special  MTC0 and MFC0 instructions.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 13, 2018, 08:24:11 am
Multiword math.

What do you need exactly? 128bit math? and for what?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 13, 2018, 08:25:27 am
MIPS doesn't have any Control and Status Register, and it's fine this way  :D

Sure it does. They live in Coprocessor #0 and are accessed using special  MTC0 and MFC0 instructions.

exactly: the cop0 is not CPU, it's a Cop, thus it's "external" to the ISA  :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 08:50:26 am
MIPS doesn't have any Control and Status Register, and it's fine this way  :D

Sure it does. They live in Coprocessor #0 and are accessed using special  MTC0 and MFC0 instructions.

exactly: the cop0 is not CPU, it's a Cop, thus it's "external" to the ISA  :D

"Coprocessor 0 (also known as the CP0 or system control coprocessor) is a required coprocessor part of the MIPS32 and MIPS64 ISA which provides the facilities needed for an operating system."

It's exactly equivalent to the RISC-V CSRs and dedicated instructions distinct from memory load/store to move values to and from the CSRs.

And you're right .. this stuff exists outside the normal portable standardised "User" ISA. @ataradov appears to be unhappy that someone -- the operating system or at least runtime library writer -- should have to write a few lines of machine-dependent assembly language to set this stuff up. It only has to be done once per SoC (at most) and Joe Average applications programmer doesn't have to know it exists.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 13, 2018, 08:53:43 am
@ataradov appears to be unhappy that someone -- the operating system or at least runtime library writer -- should have to write a few lines of machine-dependent assembly language to set this stuff up.
No, I'm unhappy with unnecessary proliferation of ways to access the hardware. This just makes makes things harder for no real benefit.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 13, 2018, 09:00:48 am
PA-RISC

on DTB we are still supporting Linux/HPPA-v2, everything else can R.I.P. and nobody (I can assure nobody) cares, except ... those who still have business with HPUX v11i2, but frankly it's just a matter of how much money they invested in this. I am specifically talking about avionics now.

If you paid 50K euro for a license (e.g. for VectorCast &C) and got a binary for HPPA, you still need to run it on HPPA until you find someone (in the circle of those who take decisions = managers) who is willing to pay for the new version. But it's just it, while in the meanwhile VectorCast&C have been ported to Intel-x86  :D

The same applies to software designed for SGI/MIPS.  I have a couple of friends who love using Autodesk software for IRIX. They do video editing, but ... they use this software simply because they want to play the retro-collector game. Autodesk-2008 is 10 years obsolete, and out of modern standards about video-editing.

Besides, a modern PC consumes less electricity than - say, an SGI Tezro or SGI Origin - and produces better results.

In short, there is no regret.  :D

We support HPPAv2 for two specific reasons: -1- the hardware is very cheap, and -2- the PCI on HPPA doesn't come with all the shit that IBM put in the BIOS in order to support ancestor video card. ISA Cards??? Yes, there is still support in modern BIOS and this makes everything a hell when you want to develop your own PCI_FPGA card.

My HP C3600 comes with a neat BIOS, there is no legacy shit, and it's cleaner than what you find on an Apple PowerMac, where the OpenFirmware a mess. Look at Linux source for the PCI.

About SGI MIPS we only support IP30: this machine runs Linux, and it's the only MIPS4 machine available since modern MIPS are all MIPS32 or MIPS64. Besides, IP30 comes with a crossbar matrix, and this is interesting.

There are no other good reasons to regret old RISC machines. Except, the manual (and I am saying THE manual) of 88K. (eight-eight key, by Motorola) which is super marvelous!  ;D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 13, 2018, 09:12:34 am
Quote from: brucehoult link=topic=156007.msg2036098#msg2036098ate=1544691026
"Coprocessor 0 (also known as the CP0 or system control coprocessor) is a required coprocessor part of the MIPS32 and MIPS64 ISA which provides the facilities needed for an operating system."

The Cop0 it's not covered by the ISA, it's * theoretically * optional. In fact, SPIM doesn't have it(1), neither MARS, and you can implement a MIPS-R2K without the Cop0, and it works.

Of course, SPIM, MARS and such a simplified CPU don't handle interrupts, thus they are useless for any practical application, except making students have a laboratory on software simulators

Yes, SPIM and MARS are widely used in universities. I used it a lot during my Erasmus in Oxford (2007-2008)

But it's not the point: the point is, let the CSR-stuff *OUT* of the ISA, so, do not implement any specific instructions like "Move CSR To/from CSR", let's have a Cop to handle it.

In m68k ... it was a mess when the 68010 redefined MOVE-CSR as a privileged instruction, while in 68000 was not privileged, and this caused a lot of troubles in Amiga Computers.



(1) can be compiled with/without Cop0 experimental support.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 13, 2018, 09:14:58 am
There are no other good reasons to regret old RISC machines. Except, the manual (and I am saying THE manual) of 88K. (eight-eight key, by Motorola) which is super marvelous!  ;D

88K is a very nice ISA, and pleasingly obsessive about sticking to the 2-read-1-write integer instruction model. There are a couple of features in it that I'm gently trying to steal for future RISC-V extensions, mostly in the "Bit Manipulation" working group.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 13, 2018, 02:56:34 pm
I'm afraid I don't understand how those work.

Good catch, we'll use DX instead of CX then.

32-bit:

Code: [Select]
      sub eax,eax

loop: mov edx,[esi+eax]
      adc edx,[ebx+eax]
      mov [edi+eax],edx
      lea eax,[eax+4]
      loop loop

64-bit:

Code: [Select]
      sub eax,eax

loop: mov rdx,[rsi+rax]
      adc rdx,[rbx+rax]
      mov [rdi+rax],rdx
      lea rax,[rax+8]
      loop loop

Surprsingly, it doesn't change the byte count.

Also, where is the "ret", even if nothing else is needed?

It's inlined. In assembler, you do not have to follow calling conventions :)

And if you really worry about byte count, you can add more CISC'iness and reduce byte count to 10, although it'll be slower.

Code: [Select]
      clc

loop: lodsd
      adc eax,[ebx]
      lea ebx,[ebx+4]
      stosd
      loop loop

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 13, 2018, 03:15:23 pm
Why not. dsPIC33:

Very nice.

How does gcc do on it?

As everywhere. Works hard but doesn't produce any magic.

C does very well on MIPS (and RISC-V too, of course), but this is not because C produces something magical, but because the instruction set is such that a human cannot improve much on what the C compiler have done.

I'm actually very disappointed that manufacturers of machines with condition codes don't seem to have added recognition of the C idiom for the carry flag and generated "add with carry" from it. gcc on every machine does recognise idioms for things such as rotate and generate rotate instructions even though C doesn't have an operator for it. Or maybe I just didn't find the correct idiom? Can anyone assist with that?

There many places like that, such as rotations. In such cases C is more difficult to code than assembler.

I look at C as a tool to convert text to assembler code. Sometimes it works well (for long expressions, for example). Sometimes it doesn't (as with long additions). But that's Ok. You can write few lines in assembler.

If you work with wood, you can do nice long cuts with a table saw, but there are small things which can only be done with a chisel. Should we re-design a table saw, so that it can do everything. Probably not. Same with C.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 13, 2018, 08:35:06 pm
LOL - "risc-v-will-stop-hackers-dead-from-getting-into-your-computer" - said someone in this (https://hackaday.com/2018/12/13/risc-v-will-stop-hackers-dead-from-getting-into-your-computer/) article on hackaday

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 13, 2018, 09:00:11 pm
LOL - "risc-v-will-stop-hackers-dead-from-getting-into-your-computer" - said someone in this (https://hackaday.com/2018/12/13/risc-v-will-stop-hackers-dead-from-getting-into-your-computer/) article on hackaday

Up there with "Linux doesn't need antivirus".... :-)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 13, 2018, 09:27:52 pm
My little software RISC-V emulator seems to be alive! Has churned through 3,047 instructions of a HiFive 'blink' binary. Maybe a couple of evenings play to get it this far - you couldn't do that with x86... the actual RISC-V code is < 800 lines.

(If I don't point it out in advance, somebody will point me at https://github.com/adriancable/8086tiny, which has an 8086 in 760 lines, but not an 32-bit CPU)

I had to build dummy hardware for the "AON" (Always On) Peripheral, and the "PRCI" (used for clocking control) Peripheral, and it gets as far as attempting to configure the QSPI interface on address 0x10014000.

I am sure somebody is about to ask "but why when there are so many already?"... I'm doing it because it is easier to build an understanding and verify what should be happening in hardware, rather than writing and simulating HDL, and then learning that I really didn't understand what should be happening in the 1st place.

I want to get the emulator C code to the point it models the HDL pipeline, so I can verify against it that things are as expected. Maybe I could even use it for HLS  :-//
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 13, 2018, 09:33:59 pm
You did in two days?  :o :o :o
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 13, 2018, 09:56:58 pm
You did in two days?  :o :o :o
... of spare time between the boy going to bed, and me going to bed.

It's not much to look at:

A lookup table to decode opcode:
Code: [Select]
struct opcode_entry {
  char *spec;
  int (*func)(void);
  uint32_t value;
  uint32_t mask;
} opcodes[] = {
   {"-------------------------0010111", op_auipc},
   {"-------------------------0110111", op_lui},
   {"-------------------------1101111", op_jal},
   {"-----------------000-----1100111", op_jalr},

   {"-----------------000-----1100011", op_beq},
   {"-----------------001-----1100011", op_bne},
   {"-----------------100-----1100011", op_blt},
   {"-----------------101-----1100011", op_bge},
   {"-----------------110-----1100011", op_bltu},
   {"-----------------111-----1100011", op_bgeu},

   {"-----------------000-----0000011", op_lb},
   {"-----------------001-----0000011", op_lh},
   {"-----------------010-----0000011", op_lw},
   {"-----------------100-----0000011", op_lbu},
   {"-----------------101-----0000011", op_lhu},

   {"-----------------000-----0100011", op_sb},
   {"-----------------001-----0100011", op_sh},
   {"-----------------010-----0100011", op_sw},


   {"-----------------000-----0010011", op_addi},
   {"-----------------010-----0010011", op_slti},
   {"-----------------011-----0010011", op_sltiu},
   {"-----------------100-----0010011", op_xori},
   {"-----------------110-----0010011", op_ori},
   {"-----------------111-----0010011", op_andi},
   {"0000000----------001-----0010011", op_slli},
   {"0000000----------101-----0010011", op_srli},
   {"0100000----------101-----0010011", op_srai},

   {"0000000----------000-----0110011", op_add},
   {"0100000----------000-----0110011", op_sub},
   {"0000000----------001-----0110011", op_sll},
   {"0000000----------010-----0110011", op_slt},
   {"0000000----------011-----0110011", op_sltu},
   {"0000000----------100-----0110011", op_xor},
   {"0000000----------101-----0110011", op_srl},
   {"0100000----------101-----0110011", op_sra},
   {"0000000----------110-----0110011", op_or},
   {"0000000----------111-----0110011", op_and},

   {"0000--------00000000000000001111", op_fence},
   {"00000000000000000001000000001111", op_fence_i},

   {"00000000000000000000000001110011", op_ecall},
   {"00000000000100000000000001110011", op_ebreak},

   {"-----------------001-----1110011", op_csrrw},
   {"-----------------010-----1110011", op_csrrs},
   {"-----------------011-----1110011", op_csrrc},
   {"-----------------101-----1110011", op_csrrwi},
   {"-----------------110-----1110011", op_csrrsi},
   {"-----------------111-----1110011", op_csrrci},

   {"--------------------------------", op_unknown}  // Catches all the others
};

A function to break the instruction into fields (the ugly bit):
Code: [Select]
/****************************************************************************/
static void decode(uint32_t instr) {
  int32_t broffset_12_12, broffset_11_11, broffset_10_05, broffset_04_01;
  int32_t jmpoffset_20_20, jmpoffset_19_12, jmpoffset_11_11, jmpoffset_10_01;
  rs1     = (instr >> 15) & 0x1f ;
  rs2     = (instr >> 20) & 0x1F;
  rd      = (instr >> 7)  & 0x1f;
  csrid   = (instr >> 20);
  uimm    = (instr >> 15) & 0x1f;
  shamt   = (instr >> 20) & 0x1f;
  upper20 = instr & 0xFFFFF000;
  imm12   = ((int32_t)instr) >> 20;

  jmpoffset_20_20 = (int32_t)(instr & 0x80000000)>>11;
  jmpoffset_19_12 = (instr & 0x000FF000);
  jmpoffset_11_11 = (instr & 0x00100000) >>  9;
  jmpoffset_10_01 = (instr & 0x7FE00000) >> 20;
  jmpoffset = jmpoffset_20_20 | jmpoffset_19_12 | jmpoffset_11_11 | jmpoffset_10_01;

  broffset_12_12 = (int)(instr & 0x80000000) >> 19;
  broffset_11_11 = (instr & 0x00000080) << 4;
  broffset_10_05 = (instr & 0x7E000000) >> 20;
  broffset_04_01 = (instr & 0x00000F00) >> 7;
  broffset = broffset_12_12 | broffset_11_11 | broffset_10_05 | broffset_04_01;

  imm12wr   =  instr; /* Note - becomes signed */
  imm12wr  >>= 20;
  imm12wr  &= 0xFFFE0;
  imm12wr  |= (instr >> 7)  & 0x1f;
  current_instr = instr;
}

And some small functions to actually execute the instructions:
Code: [Select]
/****************************************************************************/
static int op_beq(void) {     trace("BEQ\tr%i, r%i, %i", rs1, rs2, broffset);
  if(regs[rs1] == regs[rs2]) {
    pc += broffset;
  } else {
    pc += 4;
  }
  return 1;
}

...and then the code to acutally run an instruction:

Code: [Select]
/****************************************************************************/
static int do_op(void) {
  uint32_t instr;
  int i;
  if((pc & 3) != 0) {
    display_log("Attempt to execute unaligned code");
    return 0;
  }

  /* Fetch */
  if(!memorymap_read(pc,4, &instr)) {
    display_log("Unable to fetch instruction");
    return 0;
  }
  /* Decode */
  decode(instr);
 
  /* Execute */
  for(i = 0; i < sizeof(opcodes)/sizeof(struct opcode_entry); i++) {
     if((instr & opcodes[i].mask) == opcodes[i].value) {
       return opcodes[i].func();
     }
  }
  return 0;
}
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 13, 2018, 10:30:51 pm
My little software RISC-V emulator seems to be alive! Has churned through 3,047 instructions of a HiFive 'blink' binary. Maybe a couple of evenings play to get it this far - you couldn't do that with x86... the actual RISC-V code is < 800 lines.


Your intention, at the moment, seems to be to emulate all of the instructions.  This seems like a great approach because instruction execution needs to be understood before even thinking about the other issues.

You mention coding the pipeline.  Is that where you are headed?  If so, I hope you'll publish your code as you go along. Either here or on your web site.

Instruction execution seems easy to code, the pipeline will be a lot more complex (I think...).
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 13, 2018, 11:26:36 pm
Instruction execution seems easy to code, the pipeline will be a lot more complex (I think...).
For the level I am aiming at I don't think it will be too complex. All that is needed is a way to indicate if the instructions in the pipeline are no longer valid because the program counter was updated rather than incremented.

Unaligned memory accesses (which will need two cycles to execute) will a bit of complexity though, as it will stall the pipeline rather than requiring that it gets flushed.

Humm....
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 13, 2018, 11:56:47 pm
Instruction execution seems easy to code, the pipeline will be a lot more complex (I think...).
For the level I am aiming at I don't think it will be too complex. All that is needed is a way to indicate if the instructions in the pipeline are no longer valid because the program counter was updated rather than incremented.

Unaligned memory accesses (which will need two cycles to execute) will a bit of complexity though, as it will stall the pipeline rather than requiring that it gets flushed.

Humm....

I was wondering if you planned to implement the various registers that save state through the pipeline.  I am interesting in detecting and overcoming hazards.

What I really need is a reference book for the RISC-V that covers all the hardware details.  Not just at 10,000 feet up but right down in the dirt.  Something I can convert from text to HDL or, better yet, maybe the HDL is given.

Are there any such references?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 14, 2018, 12:32:18 am
What I really need is a reference book for the RISC-V that covers all the hardware details.  Not just at 10,000 feet up but right down in the dirt.  Something I can convert from text to HDL or, better yet, maybe the HDL is given.

Are there any such references?

I think that this is the key of the RISC-V ethos - it is just the ISA specification. What you do with it is up to you.

As long as your hardware runs the RISC-V RV32I (+ whatever extensions) you don't have to worry too much about the software tooling.

RISC-V it isn't a hardware specification - it is a specification of the interface between the software layer and digital logic layers. If you build a CPU that implements RISC-V, you have a ready-made software layer.

And if you build software that targets RISC-V, you have ready-made hardware implementations.

So that is why I am using the SiFive HiFive's FE310 as a reference for my hacks:

- I have the real hardware on my desk https://www.sifive.com/boards/hifive1 (https://www.sifive.com/boards/hifive1) (one of the early team signature edition boards, no less!), to clear up any of my misunderstandings

- the GCC  RISC-V toolchain is all there, installed on a Linux VM that I used playing with the HiFive

- Everything about the chip is well-documented at https://www.sifive.com/chip-designer#fe310 (https://www.sifive.com/chip-designer#fe310)

So now all I need to do is run 'objcopy' to convert the ELF image to a binary file, then load it into in my emulator's ROM, and I am ready to debug :).

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 14, 2018, 12:35:54 am
Unaligned memory accesses (which will need two cycles to execute) will a bit of complexity though, as it will stall the pipeline rather than requiring that it gets flushed.

Since this is on FPGA, an easy solution to this is using multi-port BRAM which can fetch two consecutive words at a time (for example one port reads address (a&~3) while the next one reads addrsss (a&~3+4). Concatenating them together you get 8 bytes starting at (a&~3). Using a 4:1 mux controlled by (a&3), you can then read any unaligned 32-bit word. The mux will add some delay, but since you're going to have bigger delays elsewhere (such as in ALU), this probably doesn't matter. Of course, it'll only work if you use BRAM as your memory.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 14, 2018, 01:06:59 am
Quote
[MSP430 is] a little bit CISCy with memory-to-memory moves and adds. It falls into the PDP11-design M68000 space.

Based on the example I just tried, the code isn't very compact! At least as generated from C by gcc.
Variable instruction length and execution time.  Definitely CISCy.  Although of "elegantly minimal" form rather than the "we're going to implement cobol in microcode" form.  With twice the registers of a PDP11 and half the 68k, I think it qualifies as different enough to be "new."  And "relatively" successful.
The MSP430 code gcc produced for your example is depressingly bad.  It fails to refactor the array access into pointer-based accesses, dutifully incrementing the index and adding it to each array base on each loop, when it could have used auto-incrementing indexed addressing, I think.  (I thought that was an optimization that gcc would do before even getting to cpu-specific code generation.  I guess not.)

Quote
I'm actually very disappointed that manufacturers of machines with condition codes don't seem to have added recognition of the C idiom for the carry flag and generated "add with carry" from it. gcc on every machine does recognise idioms for things such as rotate and generate rotate instructions
Adding a "rotate" is relatively easy because it's a single instruction.  Supporting Carry means retaining awareness of state that isn't part of the C model of how things work.  "Which carry were they talking about?"  For example, the really short examples that people are posting are all based on having "loop" instructions that don't change the carry bit.  We saw how that restricts register choice on x86.  MSP430 doesn't have any such looping instructions (that I recall or see in summaries.)  So the compiler would have to decide that some math is different than other math, and ... it makes my brain hurt just thinking about it.  (ARM has the "S" suffix for instructions to specify that they should update the flags, which is cute, I guess.  But I'm not sure it's worth spending a bit on (and indeed, it's not there in Thumb-16)

Quote
Quote
Operating systems and compiler runtime libraries are always going to have a little bit of assembler in them...
   
That's really a sign of a bad design in 2018.
I don't know.  In some senses, having actual assembler modules seems "cleaner" than some of the things that compilers get forced into these days.  (Consider the whole "sfr |= bitmask;" optimization in AVR...)


Quote
[RISCV is] not going to disappear without trace when the company that owns it goes out of business or loses interest. That's a serious problem ... How much software has been lost as a result of the demise of PDP11, VAX, Alpha, Nova, Eclipse, PA-RISC, [etc]
PDP10.  "loses interest."  Sigh.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 14, 2018, 01:30:28 am
I don't know.  In some senses, having actual assembler modules seems "cleaner" than some of the things that compilers get forced into these days.  (Consider the whole "sfr |= bitmask;" optimization in AVR...)

Memory mapped registers give you cleaner code. The reason AVR is hard is that it has limited address space. This is not a problem on 32-bit systems.

Essentially what I'm asking for is SCB on Cortex-M devices. It is still standard and defined by the architecture specification, so all vendors have to implement it to get a compliant core.

Special registers result in weird code where you move stuff to/from general purpose register. And I don't see how this is any better from implementation point of view. If you are writing the value into the special register, you still have to wait until register fetch stage. And at that point you know the target address of a store operation and can break the pipeline if necessary.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 14, 2018, 02:44:51 am
Quote
seems "cleaner"
Well, for instance, since "rotate" has been mentioned...
It's nice the compile can be made smart enough to see:

Code: [Select]
  ((x << n) | (x >> (opsize - n)));

and perhaps generate a "rotate left" instruction.  But I'd really rather a rotl(x,n); statement that I KNOW generates the appropriate assembly.

Or, in the case of ARM, it's nice that the ABI and the hardware agree on which registers get saved, so that ISR functions and normal C functions are indistinguishable.   I guess.  Other times I wish the ISRs in C code were more easily distinguishable, and that the HW interrupt entry was quicker...
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 14, 2018, 02:52:50 am
But I'd really rather a rotl(x,n); statement that I KNOW generates the appropriate assembly.
Me too. But that's a question to the compiler/standard library creators. Such things can either be defined as part of the standard (no way it realistically will happen for C) or as part of the library in a form of intrinsics. Intrinsics are easier, but they simply reflect instruction set with all its limitations on types of arguments.

Or, in the case of ARM, it's nice that the ABI and the hardware agree on which registers get saved, so that ISR functions and normal C functions are indistinguishable.   I guess.  Other times I wish the ISRs in C code were more easily distinguishable, and that the HW interrupt entry was quicker...
That's a matter of future improvement.  I'll take ARMs system any day of the week over what we had before. Now to stay competitive they need to make it better. For example have a register in the NVIC that defines a bit mask of registers to save/restore. Your choice to stay with the default and be compatible with the ABI or do something manually.

RISC-V still has to catch up to what ARM has in this respect. An unfortunately I see no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 14, 2018, 03:20:22 am
RISC-V still has to catch up to what ARM has in this respect. An unfortunately I see no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs.

I don't think ARM is somehow targeted to MCUs. Compare to ARM, RISC-V seems cleaner and better, and it is also free. There's no reason to choose ARM over RISC-V.

With MCUs, a big problem is that the instructions are fetched from flash. Flash fetching is slow, so you only can fetch so many instructions per unit of time. A natural way to improve the performance is to make your instructions wider, so that every single instruction can do more, which is CISC, totally different to what you see in either ARM or RISC-V. However, such approach doesn't seem to be very popular. Everybody wants ARM. Perhaps, 5 years from now everybody will want RISC-V, which is definitely a good thing.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 14, 2018, 03:23:24 am
I don't think ARM is somehow targeted to MCUs.
And what is Cortex-M0+ then?

There's no reason to choose ARM over RISC-V.
What is the interrupt latency on the RISC-V?

Perhaps, 5 years from now everybody will want RISC-V, which is definitely a good thing.
Quite likely, but not without effort on RISC-V part.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 14, 2018, 03:56:59 am
You did in two days?  :o :o :o
... of spare time between the boy going to bed, and me going to bed.

It's not much to look at:

Good work!

One of my current work tasks is helping extend binutils (assembler, disassembler) and Spike to understand the proposed Vector instruction set. Very similar stuff. And then on to llvm...
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 14, 2018, 04:00:19 am
I don't think ARM is somehow targeted to MCUs.
And what is Cortex-M0+ then?

An existing architecture used for a purpose which wasn't intended during original design.

There's no reason to choose ARM over RISC-V.
What is the interrupt latency on the RISC-V?

You don't know that. This is ISA, not architecture. You can design your MCU with very low interrupt latency. Or you can design an MCU with long pipeline and bad interrupt latency. That is actually were the benefit is. Anyone can design their own CPU with the design characteristics they want, and all of  them can use the same ISA. Such things were completely impossible with ARM because the core was copyrighted and you had to live with what they gave you.


Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: lucazader on December 14, 2018, 04:25:34 am
Continuing from NorthGuy's comments about interrupt latency being implementation independent:

If you look at SiFive's core design for the E20 core, which is targeted at the same level as the M0+ core by the looks of their marketing material, the interrupt latency is 6-cycle to a c handler whereas an M0+ is 15-cycles.
https://www.sifive.com/cores/e20 (https://www.sifive.com/cores/e20)

Now sure there is an note on there that this is when using the CLIC vectored mode. But the M0+ also has a vectored interrupt controller.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 14, 2018, 04:27:39 am
the interrupt latency is 6-cycle to a c handler whereas an M0+ is 15-cycles.
Except that Cortex-M saves registers in those 15 cycles, and RISC-V only arrives at a register saving code after those 6.

EDIT: I don't see how it can do 6 cycles to the C code. May be someone can point out what it is doing those 6 cycles?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 14, 2018, 04:37:10 am
I had to build dummy hardware for the "AON" (Always On) Peripheral, and the "PRCI" (used for clocking control) Peripheral, and it gets as far as attempting to configure the QSPI interface on address 0x10014000.

You'd have an easier time making a generic binary rather than a HiFive1 one, and using stdin/stdout emulation.

Code: [Select]
$ cd freedom-e-sdk/software/hello
$ riscv64-unknown-elf-gcc -O -march=rv32i -mabi=ilp32 hello.c -o hello

objdump that and you'll find main calling puts calling _puts_r calling {strlen, __sinit, __sfvwrite_r}.
Eventually you'll find yourself (after all the bollocks in Newlib) down in _write()

Code: [Select]
00012cd0 <_write>:
   12cd0:       ff010113                addi    sp,sp,-16
   12cd4:       00112623                sw      ra,12(sp)
   12cd8:       00812423                sw      s0,8(sp)
   12cdc:       00000693                li      a3,0
   12ce0:       00000713                li      a4,0
   12ce4:       00000793                li      a5,0
   12ce8:       04000893                li      a7,64
   12cec:       00000073                ecall
   12cf0:       00050413                mv      s0,a0
   12cf4:       00055a63                bgez    a0,12d08 <_write+0x38>
   12cf8:       40800433                neg     s0,s0
   12cfc:       08c000ef                jal     ra,12d88 <__errno>
   12d00:       00852023                sw      s0,0(a0)
   12d04:       fff00413                li      s0,-1
   12d08:       00040513                mv      a0,s0
   12d0c:       00c12083                lw      ra,12(sp)
   12d10:       00812403                lw      s0,8(sp)
   12d14:       01010113                addi    sp,sp,16
   12d18:       00008067                ret

You need to have your emulator implement ecall, and do the right thing based on the code in a7:

57: _close
62: _lseek
63: _read
64: _write
80: _fstat
93: _exit
214: _sbrk

Except for _fstat (which needs a struct remapped), most of those are easy to just pass directly on to your host OS.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 14, 2018, 04:49:58 am
You'd have an easier time making a generic binary rather than a HiFive1 one, and using stdin/stdout emulation.

And if you cut your program down to ...

Code: [Select]
void _write(int fd, char *s, int len);

int main()
{
    _write(0, "hello world!\n", 13);
    return 0;
}

... you'll have a much smaller binary (I get 1532 bytes, 383 instructions) that still works fine on qemu user:

Code: [Select]
$ riscv64-unknown-elf-gcc -O -march=rv32i -mabi=ilp32 hello.c -o hello
$ size hello
   text    data     bss     dec     hex filename
   1532    1084      28    2644     a54 hello
$ qemu-riscv32 hello
hello world!

Make that work on yours and you'll be sweet :-)

ok .. you'll need a link map something like HiFive1 uses to let you extract the elf to a raw binary. I'm sure you can handle that.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 14, 2018, 05:11:35 am
What I really need is a reference book for the RISC-V that covers all the hardware details.  Not just at 10,000 feet up but right down in the dirt.

RISC-V overview and tutorial: The RISC-V Reader: An Open Architecture Atlashttps://www.amazon.com/RISC-V-Reader-Open-Architecture-Atlas/dp/0999249118] [url]https://www.amazon.com/RISC-V-Reader-Open-Architecture-Atlas/dp/0999249118 (http://[url)[/url]

RISC-V instruction set reference Work In Progress (User ISA, and privileged): https://github.com/riscv/riscv-isa-manual/releases/latest (https://github.com/riscv/riscv-isa-manual/releases/latest)

Computer architecture textbook for undergrads, using RISC-V: https://www.amazon.com/Computer-Organization-Design-RISC-V-Architecture/dp/0128122757 (https://www.amazon.com/Computer-Organization-Design-RISC-V-Architecture/dp/0128122757)

Quote
Something I can convert from text to HDL or, better yet, maybe the HDL is given.

Are there any such references?

There is no such thing as The HDL. RISC-V is an ISA specification that anyone can implement any way they want.

And many people have already!

If you want concrete, open, HDL implementations, here is a selection you can study, build, put in an FPGA, and run: https://github.com/riscv/riscv-wiki/wiki/RISC-V-Cores-and-SoCs (https://github.com/riscv/riscv-wiki/wiki/RISC-V-Cores-and-SoCs)

Rocket is the original core design from Berkeley. Many other projects are based on it including SiFive's "freedom" and commercial x3n and x5n cores and BOOM.

PULPino is from ETH Zurich and is what has been used in the new NXP microcontroller SoC (with a RI5CY core and a Zero RISCY core as well as an M0 and an M4F)

PicoRV32 and VexRiscv are very popular for use in small FPGAs.

ReonV is based on the old Leon SPARC open-source implementation with the opcodes changes.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 14, 2018, 05:43:28 am
- I have the real hardware on my desk https://www.sifive.com/boards/hifive1 (https://www.sifive.com/boards/hifive1) (one of the early team signature edition boards, no less!), to clear up any of my misunderstandings

Yup. I ordered one of the Signature Edition boards back in Dec 2016 (late Jan by the time it arrived in Moscow) https://twitter.com/BruceHoult/status/824965355755991041 (https://twitter.com/BruceHoult/status/824965355755991041)

I was pretty impressed that a company with about ten people had got the chip taped out, and back, and it worked at 320 MHz, and made the boards and software (including porting the Arduino libraries), in 18 months after being founded.

Two weeks later, someone on the support forums posted a video of playing the Dr Who theme on a square-wave software synthesizer they'd written for the HiFive1. I responded by spending about six hours on a quick hack to play the same thing from a proper WAV file using straight Arduino digitalWrite():

https://www.youtube.com/watch?v=0eDS6pGYsCE (https://www.youtube.com/watch?v=0eDS6pGYsCE)

And then a Queen song (kinda topical right now!):

https://www.youtube.com/watch?v=RxPvWCQY5iA (https://www.youtube.com/watch?v=RxPvWCQY5iA)

I think SiFive noticed these videos at the time -- Megan Wachs just asked me about them a couple of weeks ago and got me to resurrect the code for one of the demos in the SiFive booth at the RISC-V Summit last week.

Long story short ... after Michael Clark and I presented the rv8 simulator at CARRV (Workshop on Computer Architecture Research with RISC-V) in Boston in October 2017 various SiFive people took me to dinner and bars and suggested that I might like to come and work for them. As I was already impressed by the HiFive1 and they were at that time already taping-out the FU540 for the linux board it was a pretty easy sell as being more interesting than what I was doing at Samsung :-)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 14, 2018, 05:56:58 am
I was wondering if you planned to implement the various registers that save state through the pipeline.  I am interested in detecting and overcoming hazards.

When a customer asked me to help him at debugging his pipelined-CPU (VHDL), we found a CPU which basically was working fine, except for some registers mysteriously corrupted during the execution.

Digging I found there were a couple of bugs in how the pipeline was stalling the ALU during divisions and multiplication, plus another bug of the same kind in the load/store.

The pipeline was not correctly stalled, thus this caused data corruption.

Books usually don't cover anything at this level of details, and it makes sense because courses are already too heavy.

Anyway, my customer's CPU has eight stages. Although the instruction and data memory occupy multiple cycles, they are fully pipelined, so that a new instruction can start on every clock. The function of each stage is given as follows: (NOTE: the stages described below are different with those of MIPS R2K)



This working scheme is immediately defective in this description, because EX, DF and FS are assumed to take 1 clock edge, while there are scenarios when they take more than 1 clock edge (e.g. multiplication, division, data not in cache --> access to the ram --> n wait-states ---> n + m clock edges)

Therefore the pipeline needs to be stalled properly. This is usually not considered in books, but it's what you find in reality.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 14, 2018, 05:58:50 am
The MSP430 code gcc produced for your example is depressingly bad.  It fails to refactor the array access into pointer-based accesses, dutifully incrementing the index and adding it to each array base on each loop, when it could have used auto-incrementing indexed addressing, I think.  (I thought that was an optimization that gcc would do before even getting to cpu-specific code generation.  I guess not.)

Yes, I don't know why. gcc is perfectly capable of doing this on other ISAs.

Quote
Quote
I'm actually very disappointed that manufacturers of machines with condition codes don't seem to have added recognition of the C idiom for the carry flag and generated "add with carry" from it. gcc on every machine does recognise idioms for things such as rotate and generate rotate instructions
Adding a "rotate" is relatively easy because it's a single instruction.  Supporting Carry means retaining awareness of state that isn't part of the C model of how things work.  "Which carry were they talking about?"  For example, the really short examples that people are posting are all based on having "loop" instructions that don't change the carry bit.  We saw how that restricts register choice on x86.  MSP430 doesn't have any such looping instructions (that I recall or see in summaries.)  So the compiler would have to decide that some math is different than other math, and ... it makes my brain hurt just thinking about it.

Yes.

I did a little googling and found that one idiom said to produce near-optimal code with some compilers and ISAs is to do the arithmetic in double precision and then cast/mask/shift the result back down to normal precision:

Code: [Select]
long tmp = (long)a + b + carryIn;
int sum = (int)tmp;
int carry = (int)(tmp>>(sizeof(int)*8);

I've luck with the same kind of approach to generate instructions such as "give me the high bits of a multiply" in the past, but I haven't checked this idiom for carry myself yet.

Quote
(ARM has the "S" suffix for instructions to specify that they should update the flags, which is cute, I guess.  But I'm not sure it's worth spending a bit on (and indeed, it's not there in Thumb-16)

PowerPC does the same thing with a "." suffix on opcodes to update the condition codes in cr0. (cr1..cr7 are updated only by cmp instructions).
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 14, 2018, 06:24:28 am
I don't know.  In some senses, having actual assembler modules seems "cleaner" than some of the things that compilers get forced into these days.  (Consider the whole "sfr |= bitmask;" optimization in AVR...)

Memory mapped registers give you cleaner code. The reason AVR is hard is that it has limited address space. This is not a problem on 32-bit systems.

HC11 (8bit register machine, 16bit address space) comes with soft registers; basically, gcc uses the fist 256byte of CPU internal ram for this.

But are you sure that this makes cleaner code?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 14, 2018, 06:44:41 am
But are you sure that this makes cleaner code?
We are talking about different things. Memory-mapped general purpose registers is a horrible idea.

I'm talking about special registers that use special commands to access them (lie co-processors on ARM and MIPS) vs just mapping the same special registers into the regular address space where regular load/store instructions can get to them.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 14, 2018, 08:23:56 am
RISC-V still has to catch up to what ARM has in this respect. An unfortunately I see no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs.

Than you're not looking.

There is an Embedded ABI being worked on, with fewer volatile registers -- probably a0..a3 and t0..t1 instead of a0..a7 and t0..t6 for Linux, thus cutting down the number of registers that need to be saved on an interrupt to six (plus ra) instead of fifteen. Also, all registers from 16..31 become callee-save, making the ABI identical between rv32i and rv32e. This is being worked on by people in the embedded community.

There is the CLIC (Core Local Interrupt Controller), a backward-compatible enhancement that was *specifically* designed with the needs and inputs of the embedded community. I've already pointed you to this. It provides direct vectoring to C functions decorated with an attribute, plus with a very small amount of ROMable code that can be built into a processor it provides vectoring to *standard* ABI C functions along with features such as interrupt chaining (dispatching to the next handler without restoring and re-saving registers), late dispatch (if I higher priority interrupt comes in while registers are being saved), and similar latencies to those ARM cores provide.

SiFive has developed a small 2-stage pipeline processor core *specifically* for deeply embedded real-time applications. There is no branch prediction .. all taken branches take 2 cycles. Other suppliers such as Syntacore and PULPino have similar cores, for example the Zero RISCy in the new NXP chip. SiFive has also developed an extension for the 3-series and 5-series cores to disable branch prediction for embedded real-time tasks (and of course turning part of the the icache into instruction scratchpad).

The Vector Extension working group had been bending over backwards to accommodate the wishes and needs of the embedded community and make the very lowest-end implementations simpler and better performing. As a simple example, the high-end people such as Esperano, Barcelona Supercomputer Centre, and Krste at SiFive wanted predicated-off vector lanes to be set to zero i.e. vadd dst,src1,src2 should not have to read dst. The embedded guys wanted the predicated-off vector lanes to be "left untouched". In this and in several other areas the design has been modified to better suit small embedded cores.

The Bit Manipulation Working Group is almost exclusively looking at things that primarily the embedded community want.

Claims that there is "no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs" is so far from the clearly obvious truth that it just about has to be trolling.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 14, 2018, 08:58:30 am
But are you sure that this makes cleaner code?
We are talking about different things. Memory-mapped general purpose registers is a horrible idea.

I agree with this. It's very important for execution pipelines that registers have "names" not "numbers". That is, the register an instruction refers to must be specified explicitly in the instruction, and not be subject to modification or calculation.

Quote
I'm talking about special registers that use special commands to access them (lie co-processors on ARM and MIPS) vs just mapping the same special registers into the regular address space where regular load/store instructions can get to them.

For peripherals, fine.

But *exactly* the same reasons that apply to general purpose registers not being memory mapped also apply to registers that affect the execution environment of the machine. And more.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 14, 2018, 09:05:34 am
You'd have an easier time making a generic binary rather than a HiFive1 one, and using stdin/stdout emulation.

And if you cut your program down to ...

Even simpler, of course, you could just allocate a few bytes at some address in low memory (non RAM) --  call it STDIO_BASE perhaps. Then make loads from STDIO_BASE read a character from the host OS stdin, stores to STDIO_BASE+1 write a character to the host OS stdout and (optional) stores to STDIO_BASE+1 write a character to the host OS stderr.

That dead easy both to implement in the simulator and to write programs for.

Bonus points: write implementations of _read(), _write() that do that, so the rest of <stdio> Just Works.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 14, 2018, 11:22:54 am
I'd just like to say thanks to all who have contributed to this thread (and I doubt it's dead yet): rstofer, lucazader, legacy, ehughes, DavidH, hamster_nz, NorthGuy, westfw, ataradov, obiwanjacobi, FlyingDutch

Cheers, guys :-)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 15, 2018, 07:27:32 am
 
Quote
The MSP430 code gcc produced for your example is depressingly bad.

Here's what I get for a hand-written MSP430 version.   It's sort-of interesting the way there ends up being a "local variable" for the carry, but I still get to use the addc instruction, thanks to the status also being available as a register...


(Now, it's been a while since I did any MSP430 assembly, I'm not sure I fully understand the C ABI, and I didn't actually compile or test this.  But I think it should be pretty close.  OTOH, I think some of those auto-incrementing mov instructions may turn out to be 32bits.  But... SO MUCH better than gcc did :-( )


Edit: I fixed it up to the point where it will at least go through the assembler OK...  (auto-increment indexed addressing doesn't work for a destination...)  (maybe I shouldn't clear ALL the flags...)

Code: [Select]
bignumAdd:
        push    sum
        push    savSR

        cmp     #1, cnt         ; if (cnt <= 0) return
        jl      exit
        clrc                     ; is this already clear?
        mov     SR, savSR       ; clear carry to start

loop:   mov     savSR, SR       ; get carry from sum, not cnt decrement.
        mov     @a+, sum        ; get a[n]
        addc    @b+, sum        ;  add b[n]
        mov     sum, 0(c)       ;   store c[n]
        mov     SR, savSR       ; save the carry info
        incd    sum             ;  increment destination (by 2)
        dec     cnt             ; decrement count
        jnz     loop            ; next word.
exit:
        pop     r10
        pop     sum
        ret
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 15, 2018, 07:34:10 am
Are are any books/curricula written on "comparative assembly language" ?
Kids today are barely exposed to one, I think, and even "back in the day" when we had both IBM360 and PDP11, we didn't really compare them...
I guess some of that gets covered in "computer architecture" classes, but I remember those being a lot more hardware-oriented...
Maybe you can't compare them without a hardware orientation?  But it seems like it ought to be possible.  I mean, *I* enjoy comparing instruction sets, and my background gives only a pretty vague handwave to the the actual implementation...

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 15, 2018, 07:39:31 am
Maybe you can't compare them without a hardware orientation?
That is 100% the case. At least if you are actually comparing for performance. You can compare for size easily.

There is a number of fun examples for Cortex-M7 where rearranging the order of a couple absolutely independent instructions, changes the speed of execution by a factor of two. This is because CM7 is a dual-issue pipeline and integer  and floating point instructions can essentially execute at the same time. It is still the same instruction set as Cortex-M4, but how you write the code now matters.

And comparing any of this to modern X86 is just silly.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 15, 2018, 07:53:55 am
Quote
The MSP430 code gcc produced for your example is depressingly bad.
Here's what I get for a hand-written MSP430 version.

And that's what I'd expect, looking at the instruction set. The mystery is why gcc so completely fails to do that, when it can for other ISAs.

It's the gcc I get by doing apt get on Ubtuntu 18.04. It's a little old, from 2012:

msp430-gcc (GCC) 4.6.3 20120301 (mspgcc LTS 20120406 unpatched)

But, still ... that vintage gcc could do this stuff on other ISAs. SH4, for example. Do people use gcc for msp430, or something else?

Looking at some of those, I can certainly understand why people still like to use assembly language quite often. It's hard to understand why they'd put up with compiler results like that at all.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 15, 2018, 09:16:03 am
If anybody is interested, I've put my RISC-V toy up on Github - https://github.com/hamsternz/emulate-risc-v - I've even added a little colour.

Does anybody know where I can find the encoding for the RV32M extensions? I've got to the point where the binary I am using uses DIVU...

I can find this, but it is a bit obscure for me!

Code: [Select]
mul     rd rs1 rs2 31..25=1 14..12=0 6..2=0x0C 1..0=3
mulh    rd rs1 rs2 31..25=1 14..12=1 6..2=0x0C 1..0=3
mulhsu  rd rs1 rs2 31..25=1 14..12=2 6..2=0x0C 1..0=3
mulhu   rd rs1 rs2 31..25=1 14..12=3 6..2=0x0C 1..0=3
div     rd rs1 rs2 31..25=1 14..12=4 6..2=0x0C 1..0=3
divu    rd rs1 rs2 31..25=1 14..12=5 6..2=0x0C 1..0=3
rem     rd rs1 rs2 31..25=1 14..12=6 6..2=0x0C 1..0=3
remu    rd rs1 rs2 31..25=1 14..12=7 6..2=0x0C 1..0=3

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 15, 2018, 10:10:52 am
Quote
Do people use gcc for msp430, or something else?
There is gcc, now maintained by someone else and distributed by TI, and there is TI's CCS compiler.

The version I have that was distributed with CCS8 is "v7.3.1.24 (Mitto Systems Limited)", and produces significantly different (but still not very good) code.   Here's the loop (down to 20 instructions!)
Code: [Select]
    fd4e:    27 4d           mov    @r13,    r7    ;
    fd50:    08 47           mov    r7,    r8    ;
    fd52:    28 5e           add    @r14,    r8    ;
    fd54:    09 48           mov    r8,    r9    ;
    fd56:    09 5a           add    r10,    r9    ;
    fd58:    8c 49 00 00     mov    r9,    0(r12)    ;
    fd5c:    4b 46           mov.b    r6,    r11    ;
    fd5e:    08 97           cmp    r7,    r8    ;
    fd60:    01 28           jnc    $+4          ;abs 0xfd64
    fd62:    4b 45           mov.b    r5,    r11    ;
    fd64:    48 46           mov.b    r6,    r8    ;
    fd66:    09 9a           cmp    r10,    r9    ;
    fd68:    01 28           jnc    $+4          ;abs 0xfd6c
    fd6a:    48 45           mov.b    r5,    r8    ;
    fd6c:    4b d8           bis.b    r8,    r11    ;
    fd6e:    4a 4b           mov.b    r11,    r10    ;
    fd70:    2d 53           incd    r13        ;
    fd72:    2e 53           incd    r14        ;
    fd74:    2c 53           incd    r12        ;
    fd76:    0f 9d           cmp    r13,    r15    ;
    fd78:    ea 23           jnz    $-42         ;abs 0xfd4e

TI's compiler does a bit better (17 instructions.)  It manages to use the autoincrement address modes, and actually doesn't look too bad, for a faithful translation of the the source algorithm (without  using the availabe carry flag):
Code: [Select]
   c:   38 4d           mov     @r13+,  r8
   e:   3b 4e           mov     @r14+,  r11
  10:   0b 58           add     r8,     r11
  12:   0a 43           clr     r10
  14:   0b 98           cmp     r8,     r11
  16:   01 2c           jc      $+4             ;abs 0x1a
  18:   1a 43           mov     #1,     r10     ;r3 As==01
  1a:   0b 59           add     r9,     r11
  1c:   2c 53           incd    r12
  1e:   8c 4b fe ff     mov     r11,    -2(r12) ; 0xfffe
  22:   08 43           clr     r8
  24:   0b 99           cmp     r9,     r11
  26:   01 2c           jc      $+4             ;abs 0x2a
  28:   18 43           mov     #1,     r8      ;r3 As==01
  2a:   09 48           mov     r8,     r9
  2c:   09 da           bis     r10,    r9
  2e:   1f 83           dec     r15
  30:   ed 23           jnz     $-36            ;abs 0xc
(!12 of those instructions are faking the carry status, which is the sort of thing that makes assembly programmers curse at HLLs...)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 15, 2018, 10:36:35 am
Huh.   I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming, but it actually did really well!  15 instuctions in the loop, and only 46 bytes total - significantly shorter than the thumb2 code, slightly beating the RISCV.

Code: [Select]
   e:   594b            ldr     r3, [r1, r5]
  10:   5954            ldr     r4, [r2, r5]
  12:   191c            adds    r4, r3, r4
  14:   19a7            adds    r7, r4, r6
  16:   42b7            cmp     r7, r6
  18:   41b6            sbcs    r6, r6
  1a:   429c            cmp     r4, r3
  1c:   41a4            sbcs    r4, r4
  1e:   5147            str     r7, [r0, r5]
  20:   4264            negs    r4, r4
  22:   4276            negs    r6, r6
  24:   3504            adds    r5, #4
  26:   4326            orrs    r6, r4
  28:   45ac            cmp     ip, r5
  2a:   d1f0            bne.n   e <bignumAdd+0xe>
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 15, 2018, 10:55:54 am
If anybody is interested, I've put my RISC-V toy up on Github - https://github.com/hamsternz/emulate-risc-v - I've even added a little colour.

Nice!

Quote
Does anybody know where I can find the encoding for the RV32M extensions? I've got to the point where the binary I am using uses DIVU...

Sure, but you don't need them. If you're using freedom-e-sdk then just use a build command like:

Code: [Select]
make software PROGRAM=hello RISCV_ARCH=rv32i

Everything is in the "RV32/64G Instruction Set Listings" of the ISA manual. https://github.com/riscv/riscv-isa-manual/blob/master/release/riscv-spec-v2.2.pdf
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: josip on December 15, 2018, 12:05:15 pm
I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming

I am coding CM0+ in assembler, and didn't found any unpleasant surprises till now. Coming from MSP430 (20-bit CPUvX2) assembler.

Also, for comparing code that is executing on different MCU's, relevant is number of cycles, not number of instructions.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 15, 2018, 01:02:43 pm
I was going to complain about CM0, since it has a bunch of unpleasant surprises for the assembly programming

I am coding CM0+ in assembler, and didn't found any unpleasant surprises till now. Coming from MSP430 (20-bit CPUvX2) assembler.

Definitely Thumb1 (which is what CM0 basically is) is not awful. I spent three years programming the ARM7TDMI in assembly language and we did 95+% of the code in Thumb and ARM only where necessary because of things missing in Thumb.

Mostly it's just a bit short of registers that can be used by all instructions, and it's tricky to incorporate the hi registers.

Quote
Also, for comparing code that is executing on different MCU's, relevant is number of cycles, not number of instructions.

Number of clock cycles depends not only on the instruction set but on the implementation, for example single or multiple issue, in-order or out-of-order.

Also, even within, say, single-issue in-order implementations you have effects such as a CPU with a 2-stage pipeline might use slightly fewer clock cycles than a CPU with a 5-stage pipeline because fewer cycles are wasted in pipeline flushes after conditional branches. *But* the CPU with a 2-stage pipeline will almost certainly be capable of a lower maximum MHz than the CPU with the 5-stage pipeline, given the same manufacturing technology for both.

There are also instruction set features that allow programs in one ISA to use fewer instructions and clock cycles than programs in another ISA, but that increase the work required within each clock cycle enough to limit the MHz to lower than the other ISA.

These days you also have to consider the silicon area used by a CPU, and the energy consumed in executing a complete program.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 15, 2018, 02:44:49 pm
Number of clock cycles depends not only on the instruction set but on the implementation

The best example is the div unit.

(http://www.downthebunker.xyz/wonderland/chunk_of/stuff/public/projects/arise-v2/poc-division.png)
(old-school 8bit traditional division algorithm)

Intel developed a super fast Newton-Raphson-ish method that takes fastly converges to the result, while others methods take 1 clock cycle per bit + a residual, thus say ... DIV U/S 32 bit is computed in 33-34 clock cycles. Newton-Raphson-ish methods converge in a quarter of cycles or less.

The pipeline needs to be stalled during computation.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 15, 2018, 02:48:17 pm
OT:
a bit of humor (http://www.downthebunker.xyz/wonderland/chunk_of/stuff/public/projects/arise-v2/instruction-set--humor.txt) about acronyms used for instruction-set  :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 15, 2018, 03:10:09 pm
Every ISA has some sort of history. It was designed for the conditions and tasks which were important back then. Then it evolved to meet new requirements. While doing so, the designers had to maintain backward compatibility. Thus most existing ISA have lots of ugly details where the old standards didn't come along with new requirements.

RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Without any doubts, you can create CISC ISA which will provide better code density, the same way as Huffman compression will always take less space than plain text. Or, you can create a totally different CISC ISA for high deterministic performance. I don't see anything wrong with comparing RISC and CISC code. Such comparisons show the differences very well, even though it's hard to come up with formal criteria.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 15, 2018, 04:48:39 pm
Or, in the case of ARM, it's nice that the ABI and the hardware agree on which registers get saved, so that ISR functions and normal C functions are indistinguishable.   I guess.  Other times I wish the ISRs in C code were more easily distinguishable, and that the HW interrupt entry was quicker...

Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 15, 2018, 05:04:07 pm
What I really need is a reference book for the RISC-V that covers all the hardware details.  Not just at 10,000 feet up but right down in the dirt.  Something I can convert from text to HDL or, better yet, maybe the HDL is given.

Are there any such references?

I think that this is the key of the RISC-V ethos - it is just the ISA specification. What you do with it is up to you.

As long as your hardware runs the RISC-V RV32I (+ whatever extensions) you don't have to worry too much about the software tooling.

RISC-V it isn't a hardware specification - it is a specification of the interface between the software layer and digital logic layers. If you build a CPU that implements RISC-V, you have a ready-made software layer.

I think I am coming at this from the other end.  I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks,  But, as long as I'm implementing something, it might as well be for a modern ISA.  The two are tied together, without doubt, but an ISA without hardware is pretty meaningless. 

In some ways, it's like the 8086 I designed using AMD Am2900 series logic for a class I took back in the early '80s.  It looked great on paper (well, it was more like 'adequate') but I will never know if it actually worked.  Microcode, all the way!

All those with a copy of Mick and Brick raise your hands!  Nobody remembers the title of the book but they sure remember who wrote it!
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 15, 2018, 05:54:05 pm
I think I am coming at this from the other end.  I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks,  But, as long as I'm implementing something, it might as well be for a modern ISA.  The two are tied together, without doubt, but an ISA without hardware is pretty meaningless. 

It will be easier to implement RISC-V ISA and you're likely to make it run at faster clock speeds. Of course, you can probably do better if you design your own RISC ISA which is specifically suited for your particular hardware (such as specific FPGA), but not by much, and with RISC-V  you get free software tools.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 15, 2018, 06:10:06 pm
I think I am coming at this from the other end.  I don't particularly care about the ISA, I am primarily interested in implementing pipelined hardware that implements the/an ISA in some minimal number of clocks,  But, as long as I'm implementing something, it might as well be for a modern ISA.  The two are tied together, without doubt, but an ISA without hardware is pretty meaningless. 

It will be easier to implement RISC-V ISA and you're likely to make it run at faster clock speeds. Of course, you can probably do better if you design your own RISC ISA which is specifically suited for your particular hardware (such as specific FPGA), but not by much, and with RISC-V  you get free software tools.

I think the software tools is the whole idea.  There are lots of interesting CPUs to emulate (think CDC 6400) but unless the software is out in the wild, the CPU is useless.

The LC3 project has an assembler and C compiler so it is actually a reasonable project.  The documentation for the project makes no attempt at pipelining and, since it is an undergrad project, that's as it should be.

I have the "Reader" book and it's quite good.  I've read about 1/3 of it.

The other day I was reading something about generic RISC architectures and it went in to great detail about hazards.  Yes, the taken branch is one example but it's trivial - flush the pipeline and restart.  The more interesting problems are hazards where a register is being written at one stage and is an operand for an instruction in the pipeline.  There are many examples where the datapath needs to pass results backwards in the pipeline.  Detecting and controlling the path is the design issue that concerns me.

It would be pretty easy to design a multi-cycle version of the RISC-V and that's probably where I will start but the end goal is a fully pipelined CPU. Hamster_nz's work will be a good start.

My HiFive1 board showed up today and the diagnostic screen comes up in PuTTY.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 15, 2018, 09:53:32 pm
I have VS Code and PlatformIO installed and I can build the blinking LED example from the videos.  What I haven't tumbled to is how to get Debug to work.  If I attempt to debug, the .elf file is created, a bunch of messages pour out on the terminal then, after a few second timeout, I get an error dialog that says the connection was refused.

I wandered through PlatformIOs site and while they extol the virtues of the debugger, I can't seem to find PHD type instructions (Push Here Dummy).  There doesn't seem to be much help on the SiFive site either.  Or, I missed it...

Any hints?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 15, 2018, 11:13:30 pm
I have VS Code and PlatformIO installed and I can build the blinking LED example from the videos.  What I haven't tumbled to is how to get Debug to work.  If I attempt to debug, the .elf file is created, a bunch of messages pour out on the terminal then, after a few second timeout, I get an error dialog that says the connection was refused.

I wandered through PlatformIOs site and while they extol the virtues of the debugger, I can't seem to find PHD type instructions (Push Here Dummy).  There doesn't seem to be much help on the SiFive site either.  Or, I missed it...

Any hints?

The videos at the start of this thread show exactly how to use the debugger in PlatformIO.

Sadly, you have to get "pro" and pay $10/month for the privilege -- or at least sign up for the 30 day free trial.

SiFive's Eclipse-based "Freedom Studio" does debugging for free. Or you can use gdb on the command line. The secret there is to open OpenOCD in one terminal and gdb in another. The HiFive1 Getting Started document shows how.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 15, 2018, 11:45:04 pm
RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

The minor 32 bit or 64 bit ISAs are a different matter. I think they're dead. Andes have shipped billions of cores using their propriety nds32 ISA, and it just recently got accepted into the main Linux kernel repository, but they're switching to RISC-V. The same with C-SKY. Pretty much everyone using ARC or Xtensa is likely to switch to RISC-V on their next major redesign or for new projects. I wouldn't be surprised to see MicroChip convert their 32 bit PIC line from MIPS to RISC-V.

Quote
Without any doubts, you can create CISC ISA which will provide better code density, the same way as Huffman compression will always take less space than plain text.

I think that ignores two things:

1) modern RISC ISAs such as Thumb2 and RISC-V are already Huffman encoded.

2) 8086 is nowhere near Huffman encoded. It's encoded as "if it doesn't need any arguments then it gets a short encoding". Just look at AAA, AAD, AAM, AAS, ADC, CLC, CLD, CLI, CMC, DAA, DAS, HLT, IN, INT, INTO, IRET, JNP, JO, JP, JPE, JPO, LAHF, OUT, RCL, RCR, SAHF, SBB, STC, STD, STI, XLATB. That's 31 instructions -- almost 1/8th of the opcode space -- taken up by instructions that are either statistically never used (especially now), or that even in 8086 days were not used often enough to justify a 1-byte encoding (plus offset for the Jumps). Most of them probably do need to exist (or did) but the effect on program size or speed if they'd been hidden away in a secondary opcode page would be minuscule. And those opcodes could have been used for something useful.

The same with VAX. *Every* instruction gets a 1-byte opcode, followed by the arguments. The length of the instructions is decided by the number and size of arguments, not by the frequency of use of the instruction.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 16, 2018, 12:01:55 am
Quote
Quote
CM0 ... has a bunch of unpleasant surprises
I am coding CM0+ in assembler, and didn't found any unpleasant surprises

It's mostly the lack of "op2" and the limited range of literal values in the instructions that still have them.


My surprises show up when initializing periperals.  I expected code like:
Code: [Select]
       PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
       PORT->Group[0].DIRSET.reg |= 1<<12;
       
To be implementable with code something like:
Code: [Select]
       ldr r1, =(PORT + <offset of GROUP[0]>)
       ldr r2, [r1, #<offset of PINCFG[12]>]
       orr r2, #PORT_PINCFG_DRVSTR
       str r2, [r1, #<offset of PINCFG[12]>]
       ldr r2, [r1, #PORT_DIRSET]
       orr r2, #4096
       str r2, [r1, #PORT_DIRSET]

Instead, you run into "orr doesn't have immediate arguments any more" and "PINCFG is beyond the range allowed by the [r, #const] encoding", so the code takes an extra 5 instructions and two additional registers.  The extra instructions may be a wash with the 32bit forms on the v7m chips, but having to use the extra registers (out of the limited set available) is ... annoying.

Now, what Bruce's example code seems to demonstrate is that the "peripheral initialization" is essentially a degenerate case and that the issues I'm complaining about show up less in the "meat" of a real program.  That could be, and it's an interesting result.

(I was impressed by the RV32i summary that was posted, WRT the impressive array of "immediate" operands.  But I haven't looked too carefully to see if it does the things I want.)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 16, 2018, 12:20:30 am
Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

*They* have Acorn (where Arm was born) and RISC-PC computers, manufactured and used in the UK. I love my R/600, it comes with a 586 hardware emulator (it's called "guest PC card") so I can also run DOS programs as well as RISC-OS applications  :D

The best and more interesting is the Desktop Development Environment (DDE), a full-featured development suite of tools required to build Applications for RISCOS (mine is v4.39 Adjust/classic). It dates back to the days when Acorn developed RISC-OS and is derived from the in-house development tools. It includes:
- C compiler optimised to producing efficient ARM code
- ARM assembler, more powerful and advanced than any current Open Source ARM assembler
- Makefile utility
- Desktop debugger
- GUI resource file editor
- Object compression/decompression tools
- Intelligent ARM disassembler
- ABC (Archimedes BASIC compiler) to convert BBC BASIC source into machine code
- ARM Cortex A8 instruction timing simulator
- Comprehensive full documentation

It's great for both classic machines (RiscPC/600 with StrongArm, 26bit-space) and newer ones (misc/Cortex A8, 32bit-space),  suitable for running on and producing both 26 & 32-bit versions of RISC-OS.

I think RISC-V would be more interesting if a similar solution (a RISC-V workstation + RISC-V/OS and DDE) existed  :D



Besides, another great motivation for Arm is ... the Nintendo GBA with its low-cost development kit (200 euro all inclusive): yet again RISC-V would be more interesting if a mini-video-game portable console existed.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 16, 2018, 12:26:18 am
(https://images.ctfassets.net/q092pc69zo4z/4TU3LpVrFY0OyACCGuywKq/ffab64c1bc5459af3fa01306e4864639/calc1.png)
(NUMWorks, ARM-based)

Probably I will buy a tiny RISC-V board to develop a pocket calculator. This idea sounds really intriguing to me  :D

I have already reverse engineered a CASIO Graphics calculator, thus I can re-use the keyboards, I just need a proper LCD ... and a motherboard. The software can be derived from the NUMWorks's project (opensource).
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 16, 2018, 12:45:20 am
RISC-V is at the very beginning. They have chosen RISC. For the RISC approach, it is designed exceptionally well, and I don't think it's possible to create RISC ISA which would be substantially better. If it spreads, it should outcompete ARM fairly quickly.

Hmm ... I will be surprised. RISC-V might take 10% of ARM's market in the next five years, but ARM is awfully entrenched. They have a good product, refined over many years.

I'm sure MIPS is not that much worse, but everyone chooses ARM. Do you really think Xilinx used ARM cores in Zynq because of the technical merit? I don't think so. It's pure marketing. Popularity. People want ARM, Xilinx gives them ARM. But popularity comes and goes. When the next popular think emerges, the old one dies very quickly.

I wouldn't be surprised to see MicroChip convert their 32 bit PIC line from MIPS to RISC-V.

After their failure with MIPS and PIC32, I'm sure they won't want to miss the opportunity with RISC-V.

1) modern RISC ISAs such as Thumb2 and RISC-V are already Huffman encoded.

This only applies to single instructions. If you analyze the real code generated by compilers, you can find multi-instruction frequent combinations. For example, in your RV32I ISA, setting a single bit in memory takes 3 instructions - 12 bytes. IMHO, in real life the Huffman code for this action would be much shorter.

2) 8086 is nowhere near Huffman encoded. It's encoded as "if it doesn't need any arguments then it gets a short encoding". Just look at AAA, AAD, AAM, AAS, ADC, CLC, CLD, CLI, CMC, DAA, DAS, HLT, IN, INT, INTO, IRET, JNP, JO, JP, JPE, JPO, LAHF, OUT, RCL, RCR, SAHF, SBB, STC, STD, STI, XLATB. That's 31 instructions -- almost 1/8th of the opcode space -- taken up by instructions that are either statistically never used (especially now), or that even in 8086 days were not used often enough to justify a 1-byte encoding (plus offset for the Jumps). Most of them probably do need to exist (or did) but the effect on program size or speed if they'd been hidden away in a secondary opcode page would be minuscule. And those opcodes could have been used for something useful.

Of course, it has long history, so the coding is far from perfect. I'm sure, if they started from scratch now, they would have much better encoding in terms of numbers of bytes.

Many things, such as ENTER, LEAVE, LODS, STOS, SCAS, CMPS do save lots of bytes, but are not efficient, so nobody uses them.

BTW: JP and JPE is the same code (also JNP is the same as JO).
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 16, 2018, 02:12:14 am
Quote
If RISC-V spreads, it should outcompete ARM fairly quickly.
I think you underestimate the effectiveness and importance of a large marketing, sales, and support organization...
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 16, 2018, 03:55:57 am
Quote
If RISC-V spreads, it should outcompete ARM fairly quickly.
I think you underestimate the effectiveness and importance of a large marketing, sales, and support organization...

Yes, I'm bad at marketing.

But, if Apple (or Google) decides that their phones batteries can last 30% longer with RISC-V, it'll get all the marketing it needs. Of course, this may not happen, and RISC-V gets forgotten. Impossible to see the future is :)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 16, 2018, 04:43:14 am
My surprises show up when initializing periperals.  I expected code like:
Code: [Select]
       PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
       PORT->Group[0].DIRSET.reg |= 1<<12;
       
To be implementable with code something like:
Code: [Select]
       ldr r1, =(PORT + <offset of GROUP[0]>)
       ldr r2, [r1, #<offset of PINCFG[12]>]
       orr r2, #PORT_PINCFG_DRVSTR
       str r2, [r1, #<offset of PINCFG[12]>]
       ldr r2, [r1, #PORT_DIRSET]
       orr r2, #4096
       str r2, [r1, #PORT_DIRSET]

Instead, you run into "orr doesn't have immediate arguments any more" and "PINCFG is beyond the range allowed by the [r, #const] encoding", so the code takes an extra 5 instructions and two additional registers.  The extra instructions may be a wash with the 32bit forms on the v7m chips, but having to use the extra registers (out of the limited set available) is ... annoying.

I guess there are two options: 1) let the C compiler figure it out, or 2) do something like

Code: [Select]
ldr r1, =(PORT + <offset of GROUP[0]> + #<offset of PINCFG[12]>)
ldr r2, [r1]
ldr r3, #PORT_PINCFG_DRVSTR
orr r2, r3
str r2, [r1]
ldr r1, =(PORT + <offset of GROUP[0]> + #PORT_DIRSET)
ldr r2, [r1]
ldr r3, #4096
orr r2, r3
str r2, [r1]

One extra register and three extra instructions. And four 32-bit values in a nearby constant poo instead of the three you'd have in ARM/Thumb2 mode, if that code was actually valid (I didn't check too hard)

So:
A32 is a total of 7*4 + 3*4 = 40 bytes
T16 is a total of 10*2 + 4*4 = 36 bytes

Some size savings, but not a lot. I *think* T32 would be the same size as the A32.

Quote
Now, what Bruce's example code seems to demonstrate is that the "peripheral initialization" is essentially a degenerate case and that the issues I'm complaining about show up less in the "meat" of a real program.  That could be, and it's an interesting result.

Sure. Computations with values that are already in registers are where 16 bit opcodes shine. That's equally true with PDP11, M68k, Thumb1, RISC-V C, MSP430, SH4. Or even x86 with opcode + ModR/M byte for reg-reg opertions, until it starts needing prefix bytes to set the operand size.

Quote
(I was impressed by the RV32i summary that was posted, WRT the impressive array of "immediate" operands.  But I haven't looked too carefully to see if it does the things I want.)

12 bit immediates and offsets on everything. It's often enough, but you can't do your #4096 as an immediate (only -2048...+2047 is covered). You can do it as LUI t0, #00001. In general you can make any 32 bit constant with LUI t0,#nnnnn; ADDI t0,t0,#nnn, or any 32-bit offset from the PC with LUIPC t0,#nnnnn;ADDI t0,t0,#nnn. Or you can load or store to any 32 bit absolute or PC-relative address with an LUI or AUIPC followed by a load or store with an offset.

As with ARM, there are assembler pseudo ops like LDR so you don't have to worry about the exact instructions used in a particular case.

RISC-V is allergic to constant pools. They are ok in low end processors, but as soon as you get an instruction cache you have the problem that the constant pools will likely get into the instruction cache, but be useless there. And if you have a data cache then instructions around the constant pool get into the data cache, and are useless there. Maybe the compiler/linker could arrange for the constant pools to be in different cache lines to instructions, but I haven't seen that happen.

So RISC-V, along with MIPS, Alpha, and ARM64 prefers using inline code to load constants, even if it needs several instructions to do it.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 16, 2018, 04:55:45 am
Probably I will buy a tiny RISC-V board to develop a pocket calculator. This idea sounds really intriguing to me  :D

I have already reverse engineered a CASIO Graphics calculator, thus I can re-use the keyboards, I just need a proper LCD ... and a motherboard. The software can be derived from the NUMWorks's project (opensource).

You could try the LoFive: https://store.groupgets.com/products/lofive-risc-v (https://store.groupgets.com/products/lofive-risc-v)

Note: you need a JTAG interface to program it. Most people use the Olimex ARM-USB-TINY-H, but others should work as long as OpenOCD can find them.

But for this low performance task you'd do it just as well using a soft RISC-V core in a small FPGA.

The TinyFPGA A2 *might* just about be big enough, but the BX certainly is and lots of people use them for this purpose.

https://www.crowdsupply.com/tinyfpga/tinyfpga-bx/updates/tinyfpga-b2-and-bx-projects (https://www.crowdsupply.com/tinyfpga/tinyfpga-bx/updates/tinyfpga-b2-and-bx-projects)

https://tinyfpga.com/ (https://tinyfpga.com/)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 16, 2018, 10:45:02 am
Quote
Quote
I wish the ISRs in C code [on ARM] that the HW interrupt entry was quicker...
Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?
Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.


Quote
Pretty much everyone using ARC or Xtensa is likely to switch to RISC-V
Espressif too?  Is there any indication that the "mostly China" manufacturers would switch?


Quote
[complaints about CM0 code]I guess there are two options: 1) let the C compiler figure
That's where I got the 4-register version.  Offsets larger than 32 get converted into a MOV of an offset into the 4th register, and "LDR r1,[r2,r3]" addressing mode.  In assembly language, I could presumably add/sub manually from the base register or something, at the expense of ... unpleasantness and cryptic code.


Computations with values that are already in registers are where 16 bit opcodes shine.
I think the big thing I was missing is that in simple assembly programs, arrays might be addressed as "[Rindex, #constantSymbolAddress]", while in an only slightly more complex program, they'll be passed around as pointers, and the double-index-register addressing modes will work just fine.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 16, 2018, 12:40:34 pm
You could try the LoFive: https://store.groupgets.com/products/lofive-risc-v (https://store.groupgets.com/products/lofive-risc-v)

Yup, of this size  :D

A little MPU can handle the keyboard (the key-matrix is 9x10), interfacing serially to the CPU, and a small LCD is usually SPI. It sounds something that can be done.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 16, 2018, 01:15:15 pm
My surprises show up when initializing periperals.  I expected code like:
Code: [Select]
       PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
       PORT->Group[0].DIRSET.reg |= 1<<12;

Just for fun, I made a couple of definitions so your code would be compilable and tried it on a few things.

Code: [Select]
#include <stdint.h>

#define PORT_PINCFG_DRVSTR (1<<7)

struct {
    struct {
        struct {
            uint32_t foo;
            uint32_t reg;
            uint32_t bar;
        } PINCFG[16];
        struct {
            uint64_t baz;
            uint32_t reg;
        } DIRSET;
    } Group[10];
} *PORT = (void*)0xdecaf000;

void main(){
    PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
    PORT->Group[0].DIRSET.reg |= 1<<12;
}

And I checked it with for example:

Code: [Select]
arm-linux-gnueabihf-gcc -O initPorts.c -o initPorts -nostartfiles && \
arm-linux-gnueabihf-objdump -D initPorts | expand | less -p'<main>'

So ... ARMv7 (Thumb2):

Code: [Select]
000001c0 <main>:
 1c0:   4b07            ldr     r3, [pc, #28]   ; (1e0 <main+0x20>)
 1c2:   447b            add     r3, pc
 1c4:   681b            ldr     r3, [r3, #0]
 1c6:   f8d3 2094       ldr.w   r2, [r3, #148]  ; 0x94
 1ca:   f042 0280       orr.w   r2, r2, #128    ; 0x80
 1ce:   f8c3 2094       str.w   r2, [r3, #148]  ; 0x94
 1d2:   f8d3 20c8       ldr.w   r2, [r3, #200]  ; 0xc8
 1d6:   f442 5280       orr.w   r2, r2, #4096   ; 0x1000
 1da:   f8c3 20c8       str.w   r2, [r3, #200]  ; 0xc8
 1de:   4770            bx      lr
 1e0:   00010e3a        andeq   r0, r1, sl, lsr lr

00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}

Arm32:

Code: [Select]
000001c0 <main>:
 1c0:   e59f3020        ldr     r3, [pc, #32]   ; 1e8 <main+0x28>
 1c4:   e08f3003        add     r3, pc, r3
 1c8:   e5933000        ldr     r3, [r3]
 1cc:   e5932094        ldr     r2, [r3, #148]  ; 0x94
 1d0:   e3822080        orr     r2, r2, #128    ; 0x80
 1d4:   e5832094        str     r2, [r3, #148]  ; 0x94
 1d8:   e59320c8        ldr     r2, [r3, #200]  ; 0xc8
 1dc:   e3822a01        orr     r2, r2, #4096   ; 0x1000
 1e0:   e58320c8        str     r2, [r3, #200]  ; 0xc8
 1e4:   e12fff1e        bx      lr
 1e8:   00010e34        andeq   r0, r1, r4, lsr lr

00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}
/code]

Thumb1:

[code]
000001c0 <main>:
 1c0:   4b07            ldr     r3, [pc, #28]   ; (1e0 <main+0x20>)
 1c2:   447b            add     r3, pc
 1c4:   681b            ldr     r3, [r3, #0]
 1c6:   2194            movs    r1, #148        ; 0x94
 1c8:   2280            movs    r2, #128        ; 0x80
 1ca:   5858            ldr     r0, [r3, r1]
 1cc:   4302            orrs    r2, r0
 1ce:   505a            str     r2, [r3, r1]
 1d0:   3134            adds    r1, #52 ; 0x34
 1d2:   2280            movs    r2, #128        ; 0x80
 1d4:   0152            lsls    r2, r2, #5
 1d6:   5858            ldr     r0, [r3, r1]
 1d8:   4302            orrs    r2, r0
 1da:   505a            str     r2, [r3, r1]
 1dc:   4770            bx      lr
 1de:   46c0            nop                     ; (mov r8, r8)
 1e0:   00010e3a        andeq   r0, r1, sl, lsr lr

 00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}

Arm64

Code: [Select]
00000000000002ac <main>:
 2ac:   b0000080        adrp    x0, 11000 <PORT>
 2b0:   f9400000        ldr     x0, [x0]
 2b4:   b9409401        ldr     w1, [x0, #148]
 2b8:   32190021        orr     w1, w1, #0x80
 2bc:   b9009401        str     w1, [x0, #148]
 2c0:   b940c801        ldr     w1, [x0, #200]
 2c4:   32140021        orr     w1, w1, #0x1000
 2c8:   b900c801        str     w1, [x0, #200]
 2cc:   d65f03c0        ret

0000000000011000 <PORT>:
   11000:       decaf000        .word   0xdecaf000
   11004:       00000000        .word   0x00000000

Thumb1:

Code: [Select]
000001c0 <main>:
 1c0:   4b07            ldr     r3, [pc, #28]   ; (1e0 <main+0x20>)
 1c2:   447b            add     r3, pc
 1c4:   681b            ldr     r3, [r3, #0]
 1c6:   2194            movs    r1, #148        ; 0x94
 1c8:   2280            movs    r2, #128        ; 0x80
 1ca:   5858            ldr     r0, [r3, r1]
 1cc:   4302            orrs    r2, r0
 1ce:   505a            str     r2, [r3, r1]
 1d0:   3134            adds    r1, #52 ; 0x34
 1d2:   2280            movs    r2, #128        ; 0x80
 1d4:   0152            lsls    r2, r2, #5
 1d6:   5858            ldr     r0, [r3, r1]
 1d8:   4302            orrs    r2, r0
 1da:   505a            str     r2, [r3, r1]
 1dc:   4770            bx      lr
 1de:   46c0            nop                     ; (mov r8, r8)
 1e0:   00010e3a        andeq   r0, r1, sl, lsr lr

00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}

RISC-V rv32ic: (without c is identical except all instructions take 4 bytes. 64 bit is identical except for a "ld" to get <PORT> and the pointer is 8 bytes instead of 4)

Code: [Select]
00010074 <main>:
   10074:       67c5                    lui     a5,0x11
   10076:       0947a783                lw      a5,148(a5) # 11094 <PORT>
   1007a:       6685                    lui     a3,0x1
   1007c:       0947a703                lw      a4,148(a5)
   10080:       08076713                ori     a4,a4,128
   10084:       08e7aa23                sw      a4,148(a5)
   10088:       0c87a703                lw      a4,200(a5)
   1008c:       8f55                    or      a4,a4,a3
   1008e:       0ce7a423                sw      a4,200(a5)
   10092:       8082                    ret

00011094 <PORT>:
   11094:       f000                    fsw     fs0,32(s0)
   11096:       deca                    sw      s2,124(sp)

M68k:

Code: [Select]
800001ac <main>:
800001ac:       2079 8000 400c  moveal 8000400c <PORT>,%a0
800001b2:       0068 0080 0096  oriw #128,%a0@(150)
800001b8:       0068 1000 00ca  oriw #4096,%a0@(202)
800001be:       4e75            rts

8000400c <PORT>:
8000400c:       deca            addaw %a2,%sp
8000400e:       f000

i686:

Code: [Select]
000001b5 <main>:
 1b5:   e8 20 00 00 00          call   1da <__x86.get_pc_thunk.ax>
 1ba:   05 3a 1e 00 00          add    $0x1e3a,%eax
 1bf:   8b 80 0c 00 00 00       mov    0xc(%eax),%eax
 1c5:   81 88 94 00 00 00 80    orl    $0x80,0x94(%eax)
 1cc:   00 00 00
 1cf:   81 88 c8 00 00 00 00    orl    $0x1000,0xc8(%eax)
 1d6:   10 00 00
 1d9:   c3                      ret   

000001da <__x86.get_pc_thunk.ax>:
 1da:   8b 04 24                mov    (%esp),%eax
 1dd:   c3                      ret   

00002000 <PORT>:
    2000:       00 f0                   add    %dh,%al
    2002:       ca                      .byte 0xca
    2003:       de                      .byte 0xde

SH4:

Code: [Select]
004001b0 <main>:
  4001b0:       07 d1           mov.l   4001d0 <main+0x20>,r1   ! 411000 <PORT>
  4001b2:       12 61           mov.l   @r1,r1
  4001b4:       13 62           mov     r1,r2
  4001b6:       7c 72           add     #124,r2
  4001b8:       26 50           mov.l   @(24,r2),r0
  4001ba:       80 cb           or      #-128,r0
  4001bc:       06 12           mov.l   r0,@(24,r2)
  4001be:       05 92           mov.w   4001cc <main+0x1c>,r2   ! bc
  4001c0:       2c 31           add     r2,r1
  4001c2:       13 52           mov.l   @(12,r1),r2
  4001c4:       03 93           mov.w   4001ce <main+0x1e>,r3   ! 1000
  4001c6:       3b 22           or      r3,r2
  4001c8:       0b 00           rts     
  4001ca:       23 11           mov.l   r2,@(12,r1)
  4001cc:       bc 00           mov.b   @(r0,r11),r0
  4001ce:       00 10           mov.l   r0,@(0,r0)
  4001d0:       00 10           mov.l   r0,@(0,r0)
  4001d2:       41 00           .word 0x0041

00411000 <PORT>:
  411000:       00 f0           .word 0xf000
  411002:       ca de           mov.l   41132c <__bss_start+0x31c>,r14

#InstrCodeDataTotalISA
1032840Thumb2
1040848Arm32
15301040Thumb1
936844Arm64
1032840RISC-V rv64ic
1032436RISC-V rv32ic
1040444RISC-V rv32i
420424M68k
841445i686
13261440SH4

Good old Motorola 68000 wins by miles on both number of instructions and total number of bytes!

Thumb1 and SH4 use a lot of instructions, but are the next smallest in code size after m68k. They're just middle of the pack once you include .data

rv31i is slightly smaller than Arm32 and rv32ic is slightly smaller than Thumb2 in total size. The number of instructions is identical for all of them and rv32i/Arm32 and rv32ic/Thumb2 have the same code size as each other.

rv64ic has one instruction more than Arm64, but the code is 4 bytes smaller. Both have to load a 64 bit pointer from the .data section, costing 4 bytes, but they don't need an intermediate pointer at the end of the function code, saving 4 bytes.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 16, 2018, 04:06:14 pm
My surprises show up when initializing periperals.  I expected code like:
Code: [Select]
       PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
       PORT->Group[0].DIRSET.reg |= 1<<12;

Just for fun, I made a couple of definitions so your code would be compilable and tried it on a few things.

Code: [Select]
#include <stdint.h>

#define PORT_PINCFG_DRVSTR (1<<7)

struct {
    struct {
        struct {
            uint32_t foo;
            uint32_t reg;
            uint32_t bar;
        } PINCFG[16];
        struct {
            uint64_t baz;
            uint32_t reg;
        } DIRSET;
    } Group[10];
} *PORT = (void*)0xdecaf000;

void main(){
    PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
    PORT->Group[0].DIRSET.reg |= 1<<12;
}


In SAM, "Group" represents a group of registers 128 bytes long and everything below is just unions. "PORT" would be a fixed location in memory space. So, what the code actually does is setting 2 bits at the fixed memory location.

There's no pointer loading (which takes whopping 50% in Motorola, and 49% in Intel which you decided to compile as position-independent code). Moreover, when someone builds an MCU with RISC-V, they will probably provide some way of setting bits without reading registers, as Atmel did here:

Code: [Select]
PORT->Group[0].DIRSET.reg = 1<<12; // no need for "|="
The register is called DIRSET because writing to it only sets the bits (and the bits which are written "0" remain unchanged), and there's an opposite register called DIRCLR which clears the bits, and also DIRTGL which xors.

The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area, so the whole thing boils down to this:

Code: [Select]
6685                    lui     a3,0x1
0ce7a423                sw      a3,200(a5) ; replace "200" with correct offset from a5

<edit>Can't help it. In dsPIC33 you get:

Code: [Select]
bset LATA,#12
one instruction and 3 bytes (50% compared to RISC-V).

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: lucazader on December 16, 2018, 06:25:52 pm
Quote
Pretty much everyone using ARC or Xtensa is likely to switch to RISC-V
Espressif too?  Is there any indication that the "mostly China" manufacturers would switch?

Yea they are a member of the RISC-V foundation, a "Founding Gold" member, whatever that means.
https://riscv.org/members-at-a-glance/

Judging from timing on when they would have started development on an ESP32 successor, I'd put it at about 50% chance of the switching over to risc-v in the next chip, but a lot higher in the chip after that.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rhodges on December 16, 2018, 06:45:45 pm

Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.
I have really been enjoying this discussion  :-+

A decade and a half ago, I had the pleasure of working with  a VLIW processor, the Trimedia/Philips PNX1302. It dispatched up to 5 operations per instruction word at 200mhz. It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required. Further, an interrupt only happens when the user code makes a jump. So user code could (with care) use the top 64 between jumps. An interesting and useful side-effect is that user code could assume no interrupts while doing code that needs to be atomic.

I just thought some might find this interesting.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 16, 2018, 08:09:43 pm
It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required.

Some modern MCUs have multiple register sets. When an interrupt happens, the new set gets loaded. When it quits, the old one gets restored. It doesn't take any additional time and thus decreases the interrupt latency by a lot. If you have a separate register set for every interrupt level, you never need to save anything.

However, I think in the future, as everything moves to multi-cores, things may get even better. If you assign a designated core to an interrupt, then the core can simply sit there waiting for the interrupt to happen. Then there's no latency except for the short period necessary to synchronize the interrupt signal to the CPU clock.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: langwadt on December 16, 2018, 08:40:28 pm
Quote
Quote
I wish the ISRs in C code [on ARM] that the HW interrupt entry was quicker...
Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?
Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.

slower in the rare case you need to do something in a few cycles with no registers, likely faster in the majority of cases
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 16, 2018, 08:51:26 pm
However, I think in the future, as everything moves to multi-cores, things may get even better. If you assign a designated core to an interrupt, then the core can simply sit there waiting for the interrupt to happen. Then there's no latency except for the short period necessary to synchronize the interrupt signal to the CPU clock.
The limiting factor here will be memory. You either need to have a dedicated memory per core, which will make the maximum size of the handler inflexible, or deal with concurrent access by multiple cores, which will slow down everything.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: andersm on December 16, 2018, 09:00:04 pm
Some modern MCUs have multiple register sets. When an interrupt happens, the new set gets loaded. When it quits, the old one gets restored. It doesn't take any additional time and thus decreases the interrupt latency by a lot. If you have a separate register set for every interrupt level, you never need to save anything.
Register banks do make code that need to access registers across priority levels a whole lot messier (eg. task switching using a low-priority interrupt, like is usually done on Cortex-M MCUs, or exception handlers). I guess with modern manufacturing processes the extra state required by the additional register banks isn't a big deal anymore (eg. 31 32-bit registers by 8 banks is a bit less than 1000 bytes).
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on December 16, 2018, 09:20:08 pm
Register banks do make code that need to access registers across priority levels a whole lot messier (eg. task switching using a low-priority interrupt, like is usually done on Cortex-M MCUs, or exception handlers). I guess with modern manufacturing processes the extra state required by the additional register banks isn't a big deal anymore (eg. 31 32-bit registers by 8 banks is a bit less than 1000 bytes).

It does not cost as much due to area now but the register bank is within the critical timing path for the pipeline so it limits performance in an aggressive design.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 16, 2018, 09:32:49 pm
However, I think in the future, as everything moves to multi-cores, things may get even better. If you assign a designated core to an interrupt, then the core can simply sit there waiting for the interrupt to happen. Then there's no latency except for the short period necessary to synchronize the interrupt signal to the CPU clock.
The limiting factor here will be memory. You either need to have a dedicated memory per core, which will make the maximum size of the handler inflexible, or deal with concurrent access by multiple cores, which will slow down everything.

I have ideas for this too. Most of the cores should have very limited amount of dedicated regular memory, but they will have one or more deep hardware FIFOs. The other end of the FIFOs may be muxed to other cores, which provides wide address-less communication channels between cores. This removes bus congestion altogether. The central core (or cores), in contrast, will have bigger memory so they can process data.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 16, 2018, 09:36:01 pm
I have ideas for this too. Most of the cores should have very limited amount of dedicated regular memory, but they will have one or more deep hardware FIFOs. The other end of the FIFOs may be muxed to other cores, which provides wide address-less communication channels between cores. This removes bus congestion altogether. The central core (or cores), in contrast, will have bigger memory so they can process data.
That does not address code memory.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 16, 2018, 10:01:03 pm
That does not address code memory.

Doesn't have to. Code memory can be made completely separate from data memory. Each peripheral core has its own limited amount of code memory which can be programmed by the central core as needed. Small memories can be made very fast. This ensures very fast deterministic execution for the peripheral cores. In contrast, the central core doesn't have to be deterministic - may have caches, pipelines - if it ever needs access to data, it all gets smoothed out by FIFOs.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 16, 2018, 10:03:38 pm
Code memory can be made completely separate from data memory.
That's exactly what I'm talking about. You will essentially limit what your "interrupt" handler can do by defining the amount of code memory it has. I think this will be enough of a limitation to make this system impractical. At least for common microcontroller uses. It may be useful in an MPU environment. Kind of like ARM's big.LITTLE stuff.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 16, 2018, 10:05:51 pm
... Further, an interrupt only happens when the user code makes a jump... An interesting and useful side-effect is that user code could assume no interrupts while doing code that needs to be atomic.

I just thought some might find this interesting.
I found that very interesting!
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 16, 2018, 10:17:36 pm
Quote
[ARM Cortex NVIC register stacking] likely faster in the majority of cases

I'm not convinced.  We're talking register stacking, probably limited by memory speed, and taking all of 1 instruction (push multiple) in the ISR to save exactly which ones you need...


Quote
The register is called DIRSET because writing to it only sets the bits

Yeah, ....DIRSET |= bitmask; was not the best example.


Quote
The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area

Maybe.  32bit processors tend to really spread those IO registers out, perhaps occupying more than even a reasonable offset constant for indexed addressing.And constant-folding upper bits of an address might be too much to ask of a compiler.   I remember looking at PIC32 code (MIPS), which loads 32bit constants half-at-a-time (LUI/ORI), and being disappointed that it it kept re-loading the same upper value.  OTOH, I think Microchip was defining those symbols at link time rather than in C source, so there wasn't much choice...  (This was quite a while ago.  Maybe now, with LTO and similar, it does better.)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 16, 2018, 10:36:09 pm
Code memory can be made completely separate from data memory.
That's exactly what I'm talking about. You will essentially limit what your "interrupt" handler can do by defining the amount of code memory it has. I think this will be enough of a limitation to make this system impractical. At least for common microcontroller uses.

You do not need a lot of memory for peripheral cores - you need speed and determinism. And that is what MCUs are lacking now. You always can have a central core with enormous amount of memory to do any kind of processing.

The approach where you have a single memory bus for both data and code which is accessed simultaneously by CPU and 15 DMA channels through the bus arbiter, is not very suitable for real-time applications.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: langwadt on December 16, 2018, 10:47:00 pm
Quote
[ARM Cortex NVIC register stacking] likely faster in the majority of cases

I'm not convinced.  We're talking register stacking, probably limited by memory speed, and taking all of 1 instruction (push multiple) in the ISR to save exactly which ones you need...


but before you get to your push multiple, first the core has read the vector table and fetch the first instruction of the ISR (prolog)
done automatically it can often be done in parallel


Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 16, 2018, 11:16:19 pm
Quote
The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area
Maybe.  32bit processors tend to really spread those IO registers out, perhaps occupying more than even a reasonable offset constant for indexed addressing.And constant-folding upper bits of an address might be too much to ask of a compiler.   I remember looking at PIC32 code (MIPS), which loads 32bit constants half-at-a-time (LUI/ORI), and being disappointed that it it kept re-loading the same upper value.

Microchip went overboard with spreading the registers all over the place in PIC32. There's no reason for that. In PIC24, everything fits into 2048 bytes quite nicely, even with space to spare. RISC-V has only 4096 reach, but I think this is Ok for hardware registers.

If you locate all your peripheral registers at the beginning of the memory space, you already have the zero register which creates free zero base for you. So, you have 2048 bytes which are easily accessible. Good place for hardware registers.

It would be full 4096 bytes, but RISC-V went the traditional sign-extended (instead of more reasonable zero-extended) road for offsets. Although addresses 0xfffff000 to 0xffffffff may be used for peripheral registers too.

OTOH, I think Microchip was defining those symbols at link time rather than in C source, so there wasn't much choice... 

That's true. Although it's not a very good idea. I remember I had to copy definitions from the linker scripts to the inc files when I was working with PIC24.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 12:25:56 am
In SAM, "Group" represents a group of registers 128 bytes long and everything below is just unions.

I don't suppose the exact sizes matter much, as long as you stay within what can be done with a simple offset.

Quote
"PORT" would be a fixed location in memory space. So, what the code actually does is setting 2 bits at the fixed memory location.

Setting two bits at fixed locations .. yep .. that's what I compiled.

Quote
There's no pointer loading (which takes whopping 50% in Motorola, and 49% in Intel which you decided to compile as position-independent code).

I compiled them the way they came. None of the other ISAs have problems using PC-relative addressing.

You need to get the address of the hardware registers *somehow*. Now, it's true that you'd probably get slightly smaller code using the address of "PORT" as a #define instead of as a global variable, but that's the same for all ISAs and doesn't favour one over another.

Quote
Moreover, when someone builds an MCU with RISC-V, they will probably provide some way of setting bits without reading registers, as Atmel did here:

Code: [Select]
PORT->Group[0].DIRSET.reg = 1<<12; // no need for "|="
The register is called DIRSET because writing to it only sets the bits (and the bits which are written "0" remain unchanged), and there's an opposite register called DIRCLR which clears the bits, and also DIRTGL which xors.

I took the C code exactly as given by westfw, which also matches the ARM assembly language he gave in loading, ORing, and storing.

Incidentally, RISC-V *does* have a way to change bits without bringing the data to the CPU and back, but it seemed unfair to use it. I'm concentrating here on compiled C code.

AMOOR.W res,addr,val

This sends a message with the address, value, and operation out over the TileLink bus. If all the channels of the bus go as far as the peripheral, then the peripheral itself will do the OR operation locally and report back the new value. If at some point on the way to the peripheral the bus narrows to just a simple read/write bus then the controller at that point will do the read/modify/write and report the result back to the CPU.

Quote
The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area, so the whole thing boils down to this:

Code: [Select]
6685                    lui     a3,0x1
0ce7a423                sw      a3,200(a5) ; replace "200" with correct offset from a5

Sure, of course. But that value has to *get* into a5 somehow, and I showed that.

If I'd chosen to put the code into a function that took PORT as an argument then *all* of the ISAs would show shorter code.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 12:39:43 am
A decade and a half ago, I had the pleasure of working with  a VLIW processor, the Trimedia/Philips PNX1302. It dispatched up to 5 operations per instruction word at 200mhz. It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required.

You can do this on any CPU with a reasonably large number of registers. It's just a matter of documenting it and making sure the compiler (and/or assembly language programmers) know about it.

Even three or four registers is enough for many interrupt routines, so you could reasonably do this on machines with 16 registers -- but 32 would be better.

Quote
Further, an interrupt only happens when the user code makes a jump. So user code could (with care) use the top 64 between jumps. An interesting and useful side-effect is that user code could assume no interrupts while doing code that needs to be atomic.

This is a nice property. I've worked on a machine that (potentially) switched threads after every "block" of code -- not quite a basic block as there was a way to do if/then/else and small loops within a block, but there was a limit on the number of instructions executed in the block. Once you were in a block you were guaranteed NOT to be interrupted. And there was a bank of 8 fast registers (1 cycle latency) that could be used within a block but went *poof* at the end of the block. The 256 global registers had several cycles more latency than that.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 12:47:40 am
It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required.

Some modern MCUs have multiple register sets. When an interrupt happens, the new set gets loaded. When it quits, the old one gets restored. It doesn't take any additional time and thus decreases the interrupt latency by a lot. If you have a separate register set for every interrupt level, you never need to save anything.

I don't know about "modern". The Z80 did this. Old ARM chips had a set of registers for every privilege level (not necessarily a whole set). And SPARC and Itanium had register windows that were used nto only by interrupts, but by function calls.

There are two problems with this that explain why no one does it any more:

1) at some point you run out and want three sets instead of two, or seventeen sets instead of sixteen. And then you have a whole lot of delay while you swap stuff. And you have to swap the entire set of registers even if the function/task using them is only using a small proportion of them.

2) it's just a huge waste of hardware resources that, in the end, is not actually used all that effectively. You're better off spending those transistors on something else -- such as a cache or write buffer that can absorb manually saved registers quickly on interrupts, but also makes your normal code run faster the rest of the time as well.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 12:58:33 am
Quote
Quote
I wish the ISRs in C code [on ARM] that the HW interrupt entry was quicker...
Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?
Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.

slower in the rare case you need to do something in a few cycles with no registers, likely faster in the majority of cases

Not faster. If the hardware managed to write those 8 words to memory (or at least to a write buffer or something) in one or two clock cycles then it would be faster. But it doesn't. Cortex M3, M4, M7 all have 12 cycle interrupt latency (M0 has 16). It's sitting there writing those eight registers out at one per clock cycle, exactly the same as you could do yourself in software.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 01:00:58 am
Register banks do make code that need to access registers across priority levels a whole lot messier (eg. task switching using a low-priority interrupt, like is usually done on Cortex-M MCUs, or exception handlers). I guess with modern manufacturing processes the extra state required by the additional register banks isn't a big deal anymore (eg. 31 32-bit registers by 8 banks is a bit less than 1000 bytes).

It does not cost as much due to area now but the register bank is within the critical timing path for the pipeline so it limits performance in an aggressive design.

Yes.

Also, there are other ways to use that 1 KB worth of transistors that give more bang for the buck, more of the time.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: richardman on December 17, 2018, 02:40:34 am
re: compact 68K code

I may be one of the few, but I like the idea of split register sets in the 68K. Compilers have no problems with A vs. D registers, and at worst it takes one extra move. With that, you can save one bit for register operand specifier. It all can add up.

...and we know that CISC ISA like the x86 can be decoded into micro-RISC-ops, so wonder what a highly tuned 68K, or for that matter, PDP-11/VAX-11 micro-architecture could be like. We can throw away the flags ~_o if  they make a difference and add a couple instructions as mentioned.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 03:11:46 am
re: compact 68K code

I may be one of the few, but I like the idea of split register sets in the 68K. Compilers have no problems with A vs. D registers, and at worst it takes one extra move. With that, you can save one bit for register operand specifier. It all can add up.

Yes, it's quite a natural split, worked well, and seldom caused any problems.

The main problem is just that it pre-determines that every program will want a 50/50 split of data values and pointers and that's not usually the case -- you usually want a lot fewer pointers. It's more or less ok with 8 of each, especially as a couple of address registers get used up by the stack pointer and maybe a frame pointer and a pointer to globals. But I don't think 16 of each would work well.

Quote
...and we know that CISC ISA like the x86 can be decoded into micro-RISC-ops, so wonder what a highly tuned 68K, or for that matter, PDP-11/VAX-11 micro-architecture could be like. We can throw away the flags ~_o if  they make a difference and add a couple instructions as mentioned.

You might be interested in:

http://www.apollo-core.com/ (http://www.apollo-core.com/)

The basic 68000 (or at least 68010) instruction set is good.

The main problem it had was they went in a bad direction with complexity in the 68020 just because, y'know, it's microcode and you can do anything. They had to back away from that in the 68040 and 68060.

Well, maybe the main problem it has was that is was proprietary and owned by a company that stopped caring about it enough to put in the necessary investment. And then Motorola did that *again* with the PowerPC, not putting in the investment necessary to give Apple mobile chips competitive with the Centrino -> Core 2 and forcing Apple into Intel's arms. (IBM's G5 and successors are just fine for professional desktop systems)

Is ColdFire still a thing? It doesn't seem to have had any love since about 2010.

Wikipedia says it topped out at 300 MHz, and it does around 1.58 Dhrystone VAX MIPS/MHz (slightly less than Rocket-based RISC-V).

OK, element14 is showing me 76 SKUs with a maximum 250 MHz but mostly at 50 or 66 MHz. So it's still a thing.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 17, 2018, 03:24:47 am
Quote
before you get to your push multiple, first the core has read the vector table and fetch the first instruction of the ISR (prolog) done automatically it can often be done in parallel
I'm not convinced.  CM0 is listed as von Neuman architecture with both flash and ram connected to the same memory bus matrix.  And it always has to save the PC, anyway, so if it could do simultaneous vector fetch (one word) and PC save (one word), it would be caught up by then, more or less.  (and ... I would tend to relocate the vector table to RAM, anyway.)

Quote
Quote
Microchip was defining those symbols at link time
That's true. Although it's not a very good idea.
I actually asked Microchip about it.  They said it let them distribute binary libraries that worked across a range of chips (with identical peripherals at different locations.)  That makes some sense - it's a good thing that disk space is cheap with many vendors distributing close to one library per chip.  (OTOH, not entirely happy with the idea of binary-only libraries.)
Quote
I like the idea of split register sets in the 68K.
It seems to work OK on the 68k.  Partially because there were a lot of them (16 each, right?)  The crop of 8bit chips with "we have TWO index registers!  PLUS a stack!" was depressing...I'm not sure that it buys you much from a hardware implementation PoV - can't you pretty much use the same instruction bit you used to pick "Address or Data" to address twice as many GP registers?  (I don't quite remember which instructions were different between A/D registers.)  Maybe some speed-up from having separate banks?  (there's an idea for optimization: "we have 32 registers organized in 4 banks of 8.  Operations that use registers from different banks can be more easily parallelized..." (Lots of CPUs have done this with Memory.  The Cray-1, for instance: "write your algorithm so that you access memory at 8-word intervals", or something like that.)  Or Disk (remember "interleaving"?))
Quote
wonder what a highly tuned 68K or PDP-11 ... could be like.
Yeah.     I wonder what the internal architecture of the more recent Coldfire chips is like; my impression is that that's about what they've done...The PDP-10 emulator "using an x86 for its microcode interpreter" apparently ran something like 6x faster than the fastest DEC ever built.  (and that was a decade or two ago, I think.)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: richardman on December 17, 2018, 04:34:33 am
I don't think there is any new ColdFire implementation in YEARS. Possibly process shrink, if that.

Once all the old HP printer models that used ColdFire are EOL'ed, then that will probably be the end of the line... Oh wait, they are also used in automotive, and those go forever as well. Heck, Tesla *might* have used CPU12 in their cars.

re: 68K registers
It's 8 registers each for address and data.

Motorola had junked so many processor architectures in the 2000s that it's not even funny. 88K was one, and there's also mCore. By the look of it, it should have been competitive, but when even their own phone division wouldn't use it, that's the end.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on December 17, 2018, 04:48:00 am
I may be one of the few, but I like the idea of split register sets in the 68K. Compilers have no problems with A vs. D registers, and at worst it takes one extra move. With that, you can save one bit for register operand specifier. It all can add up.

Quote
...and we know that CISC ISA like the x86 can be decoded into micro-RISC-ops, so wonder what a highly tuned 68K, or for that matter, PDP-11/VAX-11 micro-architecture could be like. We can throw away the flags ~_o if  they make a difference and add a couple instructions as mentioned.

The 68K had ISA features like double indirect addressing which made it even worse than x86 when scaled up.  The separate address and data registers was one of those features although I do not remember why now.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: andersm on December 17, 2018, 05:49:44 am
Cortex M3, M4, M7 all have 12 cycle interrupt latency (M0 has 16). It's sitting there writing those eight registers out at one per clock cycle, exactly the same as you could do yourself in software.
The Cortex-M hardware prologue also sets up nested interrupts. So while they have relatively long interrupt latencies, you also get a decent amount of functionality out of those cycles.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 06:01:36 am
Motorola had junked so many processor architectures in the 2000s that it's not even funny. 88K was one, and there's also mCore. By the look of it, it should have been competitive, but when even their own phone division wouldn't use it, that's the end.

I just took a close look at the ISA. It's a pretty clean very RISC with fixed-length 16 bit opcodes. No addressing modes at all past register plus (very) short displacement, but it has special instructions designed to help create effective addresses quickly e.g. rd = rd + 4*rs.

Chinese company C-SKY makes a series of CK6nn chips that use the M-CORE instruction set.

They also have a CK8nn series of chips that use a 16/32 bit opcode ISA called C-SKY V2. I'm not sure if it's just an extension of the 600-series ISA.

Anyway, they're switching to RISC-V.

The problem with Motorola ISAs -- 68k, 88k, PowerPC (with IBM), M-CORE -- isn't a technical one. It's that if you tie your company to them then you have a huge risk of being orphaned within a decade.

This, more than any technical superiority, is one of the things that makes RISC-V so attractive.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: richardman on December 17, 2018, 08:00:26 am
...It's that if you tie your company to them then you have a huge risk of being orphaned within a decade.

This, more than any technical superiority, is one of the things that makes RISC-V so attractive.

No disagreement from me. I think if some companies back RISC-V based MCU, that would give ARM a serious competition. Of course, finding such company could be difficult. A Chinese company maybe a possibility.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 08:38:42 am
The 68K had ISA features like double indirect addressing which made it even worse than x86 when scaled up.  The separate address and data registers was one of those features although I do not remember why now.

Not in the 68000. The 68020 did that, and even Motorola later realised it's a mistake.

Having memory-to-memory arithmetic is a problem for fast implementations though even in base 68000. x86 stops at reg-mem and mem-reg.

Of course neither one is a problem on big high end implementations that can break it into a bunch of uops and let OoO machinery chew on them.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 17, 2018, 11:37:01 am
Motorola had junked so many processor architectures in the 2000s that it's not even funny. 88K was one

Data General did a few computers based on 88K, and also a couple of companies located in Japan made 88K-computers (see OpenBSD supported machines), but they were a niche, and their orders weren't so consistent in term of money.

The 88000 appeared too late on the marketplace, later than MIPS and SPARC, and since it was not compatible with 68K it was not competitive at all: Amiga/classic? 68k! Atari ST? 68k! Macintosh/classic? 68k!

In short, Motorola was not happy because they had problems at selling the chip.

Now I know that the 88K was abandoned after the Dash prototype when Motorola was collaborating with the Standford University. It sounds like the last chance to put a foot into the supercomputers field, which was niche but with a lot of money involved, and yet again ... bad luck, since for some obscure reason, someone preferred to go on with MIPS instead than with 88K.

Was it the last lost occasion? Definitively YES, since someone with a lot of money, someone like Silicon Graphics, chose the use the Dash technology combined with MIPS and this was the beginning of the CrayLink2,3,4, ... SGI-supercomputers, yet again a lot of money back.

In such a scenario there was no choice for Motorola: 88k project dropped!

As far as I have understood, IBM was working on S/370 since a long while, and their researching was on the IBM 801 chip, which was the first POWER chip, so ... to make the money Motorola promoted a collaboration with Apple and IBM, which then developed the first PowerPC chip: the MCP601 appeared in 1992, sort of hybrid chip between POWER1 spec and the new PowerPC spec.

This way managers in Motorola were happy. Anyway, this didn't work so long, they these companies drop the collaboration.

Now IBM is on POWER9 which is funded by DARPA, which means a lot of money for IBM. POWER9 workstations and servers are very expensive. Say the entry level for the low spec workstation is no less than 5K USD  :palm:
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 12:07:33 pm
The 88000 appeared too late on the marketplace, later than MIPS and SPARC, and since it was not compatible with 68K it was not competitive at all: Amiga/classic? 68k! Atari ST? 68k! Macintosh/classic? 68k!

If 2015 is not too late (or 2012 for Arm64) then 1990 was certainly not too late.

88000 is an excellent ISA, even today, and if someone put good engineers on to making chips and good marketers on to selling it then it could be competitive.

Particular chips have a short lifespan, but a good ISA can be good for 50 years. The main thing is to *start* with a plan for compatible 16, 32, 64, 128 bit pointer/integer successors.

If I've done my arithmetic correctly, if you could somehow store 1 bit on every atom of silicon (or carbon, or ...), then 2^128 bytes of storage would need 100,000,000,000 tonnes of silicon. That's a cube a bit over 40 km on a side.

128 bits is probably going to be enough for a while, even with sparse address spaces.

Quote
In short, Motorola was not happy because they had problems at selling the chip.

No one's fault but Motorola. Great engineers, awful management.

Quote
As far as I have understood, IBM was working on S/370 since a long while, and their researching was on the IBM 801 chip, which was the first POWER chip

No, that's not correct. The IBM 801 was the world's first RISC chip (though that name wasn't invented by Dave Patterson until several years later when he independently came up with the concept) but it's very different to POWER/PowerPC. For a start, it had both 16 bit and 32 bit opcodes that could be freely mixed, an important code density feature that didn't find its way back into RISC until Thumb2 in 2003.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 17, 2018, 12:27:33 pm
If 2015 is not too late (or 2012 for Arm64) then 1990 was certainly not too late.

It was too late for RISC workstations, due to SPARC and MIPS ones, already promoted and used before Motorola released the 88k, and it was too late for supercomputers, yet again due to MIPS at SGI.

If the "Dash/88k" project at Standford University or the MIT "T/88110MP" project hadn't had failed (at the management level, not at the technical level) ... but they did.

This is a fact!

The IBM 801

801 was a proof of concept, made in 1974. But POWER and PowerPC are derived from the evolution of this POC. Directly and indirectly, since, of course, in 1974 "RISC" was not as we know today, but the idea was already there in the simulator of the first 801. It's written in every Red, Green, and Blue-book published by IBM.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 17, 2018, 12:44:34 pm
What would happen to IBM-POWER9 if DARPA didn't fund it?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 01:10:37 pm
The IBM 801

801 was a proof of concept, made in 1974. But POWER and PowerPC are derived from the evolution of this POC.

Derived, certainly. But very different.

Btw, the project formally started in October 1975 though some investigation work had been done before that. The first running hardware was in 1978.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 17, 2018, 01:18:48 pm
My IBM Red, Green, and Blu books (sort of encyclopedia about POWER and PowerPC) point to this (https://www.ibm.com/ibm/history/ibm100/us/en/icons/risc/) article.

Probably to underline that one of their men, mr.Cocke, received the Turing Award in 1987, the US National Medal of Science in 1994, and the US National Medal of Technology in 1991  ;D

To me, it sounds sort of "hey? we are IBM, you might know us for the ugliest thing ever invented - IBM-PeeeeCeeeeee - PersonalComputers and IBM-PC-compatibles computers - which are really shitty, but we also do serious stuff. Don't you believe our words? See that one of our prestigious men received an award for having invented the RISC before any H&P book started talking about it".

IBM is really funny :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 01:37:48 pm
My IBM Red, Green, and Blu books point to this (https://www.ibm.com/ibm/history/ibm100/us/en/icons/risc/) article.

Yup, that ties in with my sources. In 1974 they wanted to make faster telephone exchanges (my sources said they wanted to handle 300 calls per second, and decided 12 MIPS was needed for that), they did some thinking and wrote effectively a White Paper, did some preliminary design on an instruction set, and then got approval and funding and the 801 project formally kicked off in October 1975.

Your article says first hardware was 1980 compared to my previous message that says 1978 (and your message that 801 was "made in 1974"). I believe 1980 was first production hardware for deployment, or possibly the 2nd prototype after they got experience with the first one and made changes.

One of the changes was dropping the variable length 16/32 bit instructions and going with 32 bit only -- mostly because they needed to support virtual memory in the production model and didn't want to have to support instructions crossing a VM page boundary. The 2nd version also increased the number of registers from 16 to 32, and increased the register size (and addresses) from 24 bits to 32 bits. They also changed from destructive 2-address instructions to 3-address, so although instructions increased in size from an average of about 3 bytes each (common for Thumb2 and RISC-V these days too) to exactly four bytes each, programs needed fewer instructions so the increase in program size was less than 33%.

Quote
Probably to underline that one of their men, mr.Cocke, received the Turing Award in 1987, the US National Medal of Science in 1994, and the US National Medal of Technology in 1991  ;D

Indeed he did, and very well deserved.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 17, 2018, 01:39:47 pm
See that one of our prestigious men received an award for having invented the RISC before any H&P book started talking about it

I see you edited your message while I was replying to it.

H&P had to wait until 2017 to receive their Turing Awards.

Quote
“The main idea is not to add any complexity to the machine unless it pays for itself by how frequently you would use it. And so, for example, a machine which was being used in a heavily scientific way, where floating point instructions were important, might make a different set of tradeoffs than another machine where that wasn't important. Similarly, one in which compatibility with other machines was important or in which certain types of networking was important would include different features. But in each case they ought to be done as the result of measurements of relative frequency of use and the penalty that you would pay for the inclusion or non-inclusion of a particular feature.”

Joel Birnbaum
FORMER DIRECTOR OF COMPUTER SCIENCES AT IBM
“Computer Chronicles: RISC Computers (1986),”
October 2, 1986

Now there is a guy absolutely on the same page as H&P. (And the people who invented RISC-V: namely P and his students, and his students' students. H is a fan too.)

https://www.youtube.com/watch?v=3LVeEjsn8Ts (https://www.youtube.com/watch?v=3LVeEjsn8Ts)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 17, 2018, 06:05:18 pm
I compiled them the way they came.

It doesn't matter if you deliberately tweaked the compiler options and offsets to make RISC-V look good, or they magically came out this way. The problem is that your tests do not reflect reality, but rather a blunder of inconsequential side effects.

If you tweak the offsets a different way, and use the default Makefile from my computer, the whole thing goes from this:

Code: [Select]
000001b5 <main>:
 1b5:   e8 20 00 00 00          call   1da <__x86.get_pc_thunk.ax>
 1ba:   05 3a 1e 00 00          add    $0x1e3a,%eax
 1bf:   8b 80 0c 00 00 00       mov    0xc(%eax),%eax
 1c5:   81 88 94 00 00 00 80    orl    $0x80,0x94(%eax)
 1cc:   00 00 00
 1cf:   81 88 c8 00 00 00 00    orl    $0x1000,0xc8(%eax)
 1d6:   10 00 00
 1d9:   c3                      ret   

000001da <__x86.get_pc_thunk.ax>:
 1da:   8b 04 24                mov    (%esp),%eax
 1dd:   c3                      ret   

00002000 <PORT>:
    2000:       00 f0                   add    %dh,%al

to this:

Code: [Select]
08048450 <main>:
 8048450: a1 c0 95 04 08        mov    0x80495c0,%eax
 8048455: 83 48 30 20          orl    $0x20,0x30(%eax)
 8048459: 83 48 40 20          orl    $0x20,0x40(%eax)
 804845d: c3                    ret   

For what it worth, it's now 14 bytes for i386 (plus 4 bytes data, of course), which is now a leader, way better than Motorola, and leaving RISC-V absolutely in the dust.

Here's the tweaked C code:

Code: [Select]
#include <stdio.h>
#include <stdint.h>

#define PORT_PINCFG_DRVSTR (1<<5)

struct {
    struct {
        struct {
            uint32_t reg;
        } PINCFG[16];
        struct {
            uint32_t reg;
        } DIRSET;
    } Group[10];
} *PORT = (void*)0xdecaf000;

void main(){
    PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
    PORT->Group[0].DIRSET.reg |= 1<<5;
}

Here's the line from the Makefile:

Code: [Select]
gcc a.c -o c -save-temps -O1 -fomit-frame-pointer -masm=intel

Here's the assembler output

Code: [Select]
.file "a.c"
.intel_syntax noprefix
.text
.globl main
.type main, @function
main:
mov eax, DWORD PTR PORT
or DWORD PTR [eax+48], 32
or DWORD PTR [eax+64], 32
ret
.size main, .-main
.globl PORT
.data
.align 4
.type PORT, @object
.size PORT, 4
PORT:
.long -557125632
.ident "GCC: (GNU) 4.5.0"
.section .note.GNU-stack,"",@progbits




Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 18, 2018, 02:01:52 am
I compiled them the way they came.

It doesn't matter if you deliberately tweaked the compiler options and offsets to make RISC-V look good, or they magically came out this way. The problem is that your tests do not reflect reality, but rather a blunder of inconsequential side effects.

If you tweak the offsets a different way

Oh come on. You not change only change the data structure (which I freely admit I made up at random, as westfw didn't provide it) to be less than 128 bytes to suit your favourite ISA, you *ALSO* change the bit offsets in the constants to be less than 8 so the masks fit in a byte. If you hadn't done *both* of those then your code would have 32 bit literals for both offset and bit mask, the same as mine, not 8 bit. You also changed the code compilation and linking model from that used by all the other ISAs, which would all benefit pretty much equally from the same change.

And you accuse me of bad faith?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 18, 2018, 02:37:12 am
Quote
[CM0 and limitations on offsets/constants, making assembly unpleasant]
(I did specifically choose offsets and bitvalues to be "beyond" what CM0 allows.)

As another example, I *think* that the assembly for my CM0 example (the actual data structure is from Atmel SAMD21, but it's scattered across several files) can be improved by accessing the port as 8bit registers instead of 32bit.  All I have to do is look really carefully at the datasheet (and test!) to see if that actually works, rewrite or obfuscate the standard definitions in ways that would confuse everyone and perhaps not be CMSIS-compatible, and remember to make sure that it remains legal if I move to a slightly different chip.

Perhaps I have a high bar for what makes a pleasant assembly language.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 18, 2018, 03:50:18 am
Oh come on. You not change only change the data structure (which I freely admit I made up at random, as westfw didn't provide it) to be less than 128 bytes to suit your favourite ISA, you *ALSO* change the bit offsets in the constants to be less than 8 so the masks fit in a byte. If you hadn't done *both* of those then your code would have 32 bit literals for both offset and bit mask, the same as mine, not 8 bit. You also changed the code compilation and linking model from that used by all the other ISAs, which would all benefit pretty much equally from the same change.

I restored the offsets to where they were in the original code. I restored the linkage to normal. Masks I admit. But the masks are not important because you can achieve the same effect with byte access, thus the mask never should be more than 8 bits. Of course, the superoptimized C compiler couldn't figure that out, so I had to nudge masks a bit. When we get better compilers, there will be no need to tweak masks, right?

The $1M question is. How is my tweaking is any worse than yours?

And you accuse me of bad faith?

Of course not.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 18, 2018, 04:05:39 am
Quote
[CM0 and limitations on offsets/constants, making assembly unpleasant]
(I did specifically choose offsets and bitvalues to be "beyond" what CM0 allows.)

As another example, I *think* that the assembly for my CM0 example (the actual data structure is from Atmel SAMD21, but it's scattered across several files) can be improved by accessing the port as 8bit registers instead of 32bit.  All I have to do is look really carefully at the datasheet (and test!) to see if that actually works, rewrite or obfuscate the standard definitions in ways that would confuse everyone and perhaps not be CMSIS-compatible, and remember to make sure that it remains legal if I move to a slightly different chip.

Perhaps I have a high bar for what makes a pleasant assembly language.

x86, 68k and VAX were all designed at a time when maximizing the productivity of the assembly language programmer was seen as one of the highest (if not actual highest) priorities. They'd gone past simply trying to make a computer that worked and even making the fastest computer and come to a point that computers were not only fast *enough* for many applications but had hit a speed plateau. (It's hard to believe now that Apple sold 1 MHz 6502 machines for over *seventeen* years, and the Apple //e alone for 11 years.)

What they had was a "software crisis". The machines had quirky instruction sets that were unpleasant for assembly language programmers -- and next to impossible for the compilers of the time to generate efficient code for.

The x86, 68k and VAX were all vastly easier for the assembly language programmer than their predecessors the 8080, 6800, and PDP-11 (or PDP-10). They also were better for compilers, though people still didn't trust them.

The RISC people came along and said "If you simplify the hardware in *this* way then you can build faster machines cheaper, compilers actually have an easier time making optimal code, and everyone will be using high level languages in future anyway".

I remember the time when RISC processors were regarded as being next to impossible (certainly impractical) to program in assembly language!

A lot of that was because you had to calculate instruction latencies yourself and put dependent instructions far enough away that the result of the previous instruction was already available -- and not doing it meant not just that your program was not as efficient as possible but that it didn't work at all! Fortunately, that stage didn't last long, for two reasons: 1) your next generation CPU would have different latencies (sometimes longer as pipeline lengths increased), meaning old binaries would not work, and 2) as CPUs increased in MHz faster than memory did caches were introduced and then you couldn't predict whether a load would take 2 cycles or 10 and the same code had to be able to cope with 10 but run faster when you got a cache hit.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 18, 2018, 04:35:16 am
The $1M question is. How is my tweaking is any worse than yours?

That's easy. I'm taking code provided by someone else without any reference to a specific processor and then using default compiler settings (adding only -O, and -fomit-frame-pointer for the m68k as it's the only one that generated a frame otherwise) and seeing how it works out.

You on the other hand worked backwards from a processor to make code that suited it.

If westfw had provided the definitions for the structure he was accessing then I would have used that, as is. But he didn't so I had to come up with something in order to have compilable code.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 18, 2018, 02:31:05 pm
You on the other hand worked backwards from a processor to make code that suited it.

Haven't you?

Isn't this the way it should be. When you compile for a CPU you select the settings which maximize performance for this particular CPU instead of using settings which produce the bloat. As, by your own admission, you did for Motorola.

If you haven't done this for RISC-V, why don't you tweak it so that it produces better code? Go ahead, try to beat my 14 bytes, or even get remotely close.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 18, 2018, 03:11:04 pm
You on the other hand worked backwards from a processor to make code that suited it.

Haven't you?

No.

Quote
Isn't this the way it should be. When you compile for a CPU you select the settings which maximize performance for this particular CPU instead of using settings which produce the bloat. As, by your own admission, you did for Motorola.

If you haven't done this for RISC-V, why don't you tweak it so that it produces better code? Go ahead, try to beat my 14 bytes, or even get remotely close.

Not interested in winning some dick size competition. If RISC-V ends up in the middle of the pack and competitive on measures such as code size or number of instructions just by compiling straightforward C code in a wide variety of situations with no special effort then I'm perfectly content. Other factors are then more important.

Everyone is going to "win" at some comparison. x86 can OR a constant with a memory location in a single instruction. Cool. So can dsPIC33. Awsome. That has approximately zero chance of being the deciding factor on which processor is used by anyone.

You didn't change the compiler settings. You changed the semantics of the code -- you changed what problem is being solved.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 18, 2018, 06:20:32 pm
Not interested in winning some dick size competition. If RISC-V ends up in the middle of the pack and competitive on measures such as code size or number of instructions just by compiling straightforward C code in a wide variety of situations with no special effort then I'm perfectly content.

"interested", "content", "bad faith". I don't think in these categories. These are emotions. The reality exists independent of them, and independent from what you (or me) think. Similarly, the truth cannot be voted upon by customers (although, if anything, Intel has way more of them than SiFive).

All things being equal, CISC approach creates better code density than RISC. Because an ability to use more information allows for better compression. This is pure mathematics. If empirical tests show otherwise, the only explanation is faulty methodology.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on December 18, 2018, 11:09:27 pm
x86, 68k and VAX were all designed at a time when maximizing the productivity of the assembly language programmer was seen as one of the highest (if not actual highest) priorities. They'd gone past simply trying to make a computer that worked and even making the fastest computer and come to a point that computers were not only fast *enough* for many applications but had hit a speed plateau. (It's hard to believe now that Apple sold 1 MHz 6502 machines for over *seventeen* years, and the Apple //e alone for 11 years.)

They were also designed at a time when memory access time was still long and memory width was still small so tiny instruction lengths and complex instructions were advantageous.  ARM was unusual in being designed to specifically take advantage of the fast page mode memory which had become available leading to instructions like load and store multiple.  I would argue that not blindly adhering to RISC is what made ARM successful in the long run.

Quote
The x86, 68k and VAX were all vastly easier for the assembly language programmer than their predecessors the 8080, 6800, and PDP-11 (or PDP-10). They also were better for compilers, though people still didn't trust them.

I went up the 8080 and Z80 route and loved the 8086 but only dabbled in 6502 which seemed primitive compared to 8080.  Later I become proficient in accumulator centric designs like 68HC11 and PIC and learned to love macro assemblers even more.

Quote
The RISC people came along and said "If you simplify the hardware in *this* way then you can build faster machines cheaper, compilers actually have an easier time making optimal code, and everyone will be using high level languages in future anyway".

The people actually producing commercial RISC designs had a conflict of interest.  What made RISC popular so quickly is that a small design team could do it so suddenly everybody had a 32 bit RISC processor available and was happy to proclaim that RISC is the future.  Where this fell apart is that Intel's development budget was already much greater than the sum of all of the RISC efforts combined.  It did not matter that equivalent performance could be produced for a fraction of the development cost because Intel could afford any development effort.

Development of ARM is slowed by the same problem.  All of the separate ARM development efforts do not join to become Voltron-ARM.  Intel only has to beat the best of them but I expect at some point ARM will catch up if only because Intel has become so dysfunctional.

Quote
A lot of that was because you had to calculate instruction latencies yourself and put dependent instructions far enough away that the result of the previous instruction was already available -- and not doing it meant not just that your program was not as efficient as possible but that it didn't work at all! Fortunately, that stage didn't last long, for two reasons: 1) your next generation CPU would have different latencies (sometimes longer as pipeline lengths increased), meaning old binaries would not work, and 2) as CPUs increased in MHz faster than memory did caches were introduced and then you couldn't predict whether a load would take 2 cycles or 10 and the same code had to be able to cope with 10 but run faster when you got a cache hit.

The lack of interlocking like branch delay slots just ended up being a millstone around the neck of performance.  "Just recompile your software" should be included with "the policeman is your friend" and "the check is in the mail".  Maybe this will change with ubiquitous just-in-time compiling.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 19, 2018, 12:43:07 am
Hey... anybody know if 'The Mill' still grinding away?

https://millcomputing.com/

:popcorn:
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 19, 2018, 12:48:36 am
They probably are, but I imagine at this point it is mostly a VC money sucking enterprise.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 19, 2018, 10:09:17 am
Hey... anybody know if 'The Mill' still grinding away?

https://millcomputing.com/

:popcorn:

They are.

Ivan gave some hints in a recent comp.arch posting.

https://groups.google.com/forum/#!original/comp.arch/bGBeaNjAKvc/zQcA-R6FAgAJ
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 19, 2018, 12:46:34 pm
ARM was unusual in being designed to specifically take advantage of the fast page mode memory which had become available leading to instructions like load and store multiple-time compiling.

ARM has grown from a small company called Acorn - maker of some of the earliest home computers, initially used by BBC as kits for kids and students - into one of the world's most important designers of semiconductors, providing the brains for Apple's must-have iPhones and iPads.

Back to the beginning,  in 1978  Acorn Computers is established in Cambridge, and produces computers which are particularly successful in the UK. Acorn's BBC Micro computer was the most widely-used computer in school in the 1980s.

In the same year, Motorola was going to release the 68000, from their MASS program, which engineers in Acorn later (1981-82?) took into consideration for the next generation of their computes.

(https://upload.wikimedia.org/wikipedia/commons/b/b3/Sophie_Wilson.jpg)
Sophie Wilson, a British computer scientist and software engineer.

This woman is definitively a superheroine, and like if it was a weird coincidence (a lot of computer science events happened in 1978?!? there should be a scientific reason for this), exactly in 1978, Sophie Wilson joined Acorn Computers Ltd. She designed the Acorn Micro-Computer watching the wedding of Charles, Prince of Wales, and Lady Diana Spencer on a small portable television (made by Mr. Clive Sinclair, a rival of Acorn) while attempting to debug and re-solder the prototype. And it worked!

OMG !!! WOW !!!  :D :D :D

The prototype was then released as "The Proton", a mini computer that became the BBC Micro and its BASIC evolved into BBC BASIC, which was then used to develop the CPU simulator for the next generation, and, in October 1983, Wilson began designing the instruction set for one of the first RISC processors, the Acorn RISC Machine, so the ARM v1 was delivered on 26 April 1985 and it was a worldwide success!

She said the 68000 had been taken into consideration but then rejected due to the long latency it has, especially at reacting to interrupts, which was a must-have feature for a new computer where everything is done in software. She also said new DRAM integrated circuits needed to be sourced directly from Hitachi because the project needed something really really fast for the RAM.

Computers like Amiga used the 68000 with the help of specialized chip for the graphics and sound, while Acorn ARM computers did everything in software, thus the CPU must be super fast for the I/O, and super fast at reacting at interrupts.

https://www.youtube.com/watch?v=XI4pfjTCs3o (https://www.youtube.com/watch?v=XI4pfjTCs3o)

The latest machine developed by Acord was the RISC-PC, with a StrongArm CPU @ 200Mhz.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 19, 2018, 01:01:40 pm
This woman is definitively a superheroine

Roger that, job very well done. It's stood up well for nearly 35 years. I remember being quite jealous of a friend with an Archimedes. An 8 MHz ARM2 was pretty good in 1987, standing up very well against a much more expensive 16 MHz 68020 or 80386.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: langwadt on December 19, 2018, 01:11:32 pm
This woman is definitively a superheroine

Roger that, job very well done.

;) http://www.computinghistory.org.uk/det/6615/Sophie-Wilson/ (http://www.computinghistory.org.uk/det/6615/Sophie-Wilson/)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on December 19, 2018, 01:18:02 pm
The Computer History Museum has a great transcript of an interview with Sophie Wilson about the development of the ARM processor here:

https://www.computerhistory.org/collections/catalog/102746190 (https://www.computerhistory.org/collections/catalog/102746190)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 22, 2018, 11:40:10 am
(http://93.55.217.0/wonderland/chunk_of/stuff/public/books/book-arm-system-architecture.jpg)

Latest purchase. This book tells about the ARM architecture before Cortex. Excellent book!
Now I need to buy something similar for the RISC-V  :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 23, 2018, 10:02:58 pm
https://www.youtube.com/watch?v=XXBxV6-zamM (https://www.youtube.com/watch?v=XXBxV6-zamM)

This is a free movie about the origin of Acorn: wow, there is also mr.Sinclair  :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 26, 2018, 06:58:45 pm
in the meanwhile (today, 2 hours), I added a couple of features to the simulator, but ... the endianess is really irritating  :palm:

Code: [Select]
# regs
reg00: 0x00000000
reg01: 0xdeadbeaf
reg02: 0x00000000
reg03: 0x00000000
reg04: 0x00000000
reg05: 0x00000000
reg06: 0x00000000
reg07: 0x00000000
reg08: 0x00000000
reg09: 0x00000000
reg10: 0x00000000
reg11: 0x00000000
reg12: 0x00000000
reg13: 0x00000000
reg14: 0x00000000
reg15: 0x00000000
reg16: 0x00000000
reg17: 0x00000000
reg18: 0x00000000
reg19: 0x00000000
reg20: 0x00000000
reg21: 0x00000000
reg22: 0x00000000
reg23: 0x00000000
reg24: 0x00000000
reg25: 0x00000000
reg26: 0x00000000
reg27: 0x00000000
reg28: 0x00000000
reg29: 0x00000000
reg30: 0x00000000
reg31: 0x00000000

# md 0xf1000000
f1000000..f10007ff       2048 byte I00:0 mem:1 hd:1 magic1 bin/data_cpu1reg.bin
showing memory @ 0xf1000000
0xf1000000 .. 0xf10007ff
f1000000: 00000000 afbeadde 00000000 00000000 [................]
f1000010: 00000000 00000000 00000000 00000000 [................]
f1000020: 00000000 00000000 00000000 00000000 [................]
f1000030: 00000000 00000000 00000000 00000000 [................]
f1000040: 00000000 00000000 00000000 00000000 [................]
f1000050: 00000000 00000000 00000000 00000000 [................]
f1000060: 00000000 00000000 00000000 00000000 [................]
f1000070: 00000000 00000000 00000000 00000000 [................]
f1000080: 00000000 00000000 00000000 00000000 [................]
f1000090: 00000000 00000000 00000000 00000000 [................]
f10000a0: 00000000 00000000 00000000 00000000 [................]
f10000b0: 00000000 00000000 00000000 00000000 [................]
f10000c0: 00000000 00000000 00000000 00000000 [................]
f10000d0: 00000000 00000000 00000000 00000000 [................]
f10000e0: 00000000 00000000 00000000 00000000 [................]
f10000f0: 00000000 00000000 00000000 00000000 [................]
#


inside the simulator, registers are also mapped to a chunk of ram, thus they can be accessed but ... the target is BigEndian, the host is LittleEndian, and ... things need more features to be properly managed.


I wonder WTF was in the head of Intel when they wanted to use LittleEndian ... it's unnatural for humans

see, you have a 32bit number 0x12345678, four bytes, in bigEndian it's 0x12, 0x34, 0x56, 0x78, thus in a 8bit memory, you find

0x12
0x34
0x56
0x78

perfect!!!

Whereas on a damn LittleEndian Machine you see

0x78
0x56
0x34
0x12

so 0xdeadbeaf becomes 0xafbeadde  :palm:


more to come ...
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 26, 2018, 07:20:51 pm
Well, it is a matter of opinion, is not it? I would never use anything that is big-endian. Little-endian naturally cast between bytes, halfs and words without the need to move things around. And how things are physically located in the memory is mostly irrelevant.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 26, 2018, 07:50:11 pm
strings (array of 8bit char) are naturally managed in the "be"-way: the first byte is the first char

which makes things more irritating now because I have to "patch" a couple of points in the simulator
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: Nominal Animal on December 27, 2018, 01:08:52 am
I don't mind either byte order.

What burns my goat is the way some documentation insists on labeling bits in decreasing order of imp8ortance: most significant bit 0.  The only bit labeling that makes any sense to me is the mathematical one; for unsigned integers, bit i corresponding to value 2i
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rstofer on December 27, 2018, 01:37:59 am
I don't mind either byte order.

What burns my goat is the way some documentation insists on labeling bits in decreasing order of imp8ortance: most significant bit 0.  The only bit labeling that makes any sense to me is the mathematical one; for unsigned integers, bit i corresponding to value 2i.

That numbering scheme was pretty common with IBM and is clearly the case for the IBM 1130.  As a Fortran programmer, it made no difference unless the program was reading the Console Entry Switches where the odd numbering mattered.

I definitely prefer the power of two numbering from right to left.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 27, 2018, 01:44:02 am
Quote
I wonder WTF was in the head of Intel when they wanted to use LittleEndian ... it's unnatural for humans
Copying the DEC PDP11, as were pretty much all the microcontroller manufacturers at the time.(although the 68000, with an arguably much-more-PDP11-like instruction set, is big endian.)

Internet protocols are largely big-endian.  https://www.ietf.org/rfc/ien/ien137.txt (https://www.ietf.org/rfc/ien/ien137.txt)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: chickenHeadKnob on December 27, 2018, 01:46:19 am
I wonder WTF was in the head of Intel when they wanted to use LittleEndian ... it's unnatural for humans

In my first job after uni I worked with some Israeli born engineers, who would argue about anything. With one in particular I would often have knock down drag-em out fights over the stupidest stuff  ;D. He would never admit he was wrong. He was the old school type of engineer who mostly learned about computers in a  self taught way, me a recent CompSci grad who learned electronics in a self taught way.

One day I was mentioning endianess he asked whats that. So I told him. He only knew Intel processors, 8080/8086 and thought I was shitting him when I said Motorola is big endian. Nobody would design a big endian machine he claimed. His argument was that with an 8 bit ALU and 16 bit addresses you want to get the low order byte first to start the addition right away on a computed effective address. I told him Motorola 6800 didn't and it didn't matter as it takes multiple clocks to get the bytes in anyway. He looked at me like I was a retard and not to be trusted with anything sharp.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on December 27, 2018, 01:53:54 am
Little-endian naturally cast between bytes, halfs and words without the need to move things around. And how things are physically located in the memory is mostly irrelevant.

That is how I see it.  The address points to the low order byte and you can start doing ALU + ALU and Carry/Borrow + ... operations immediately without indexing backwards from the low order byte.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 27, 2018, 03:42:57 am
Quote
I wonder WTF was in the head of Intel when they wanted to use LittleEndian ... it's unnatural for humans
Copying the DEC PDP11, as were pretty much all the microcontroller manufacturers at the time.(although the 68000, with an arguably much-more-PDP11-like instruction set, is big endian.)

I prefer big-endian, but not enough to fight about it.

It's a minor convenience to have character strings and numbers stored in the same order.

It's a minor convenience to have numbers easily readable in hex dumps -- VAX/VMS printed hex dumps from right to left for that reason.

I can't see any reason to care that storing a 32 bit integer at an address and then reading an 8 bit integer from that address will give the same value, if the 32 bit integer was small. Why on earth would you want to *do* that? Manipulating values in registers is easier and faster anyway.

The designers of RISC-V chose little-endian not because they think it is better (I happen to know they don't) but because x86 dominates servers and ARM dominates mobile and both are little endian, so why make problems for people porting badly-written software to RISC-V?

MIPS, SPARC, Power all started as the more sensible big-endian, but switched (or became bi-endian) to give fewer problems porting x86 and/or ARM software.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 27, 2018, 05:32:02 am
Compilers should be able to support endianness-tagged data pretty effectively.  Especially on RISC cpus with a byteswap instruction.   The "load, byteswap" sequence is only a tiny fraction slower than a mere load.Intel's C compiler supports this.  I don't know if anything else does :-(
(I think the standard "high-level" implementations of byte swapping end up being pretty difficult for a compiler to recognize, though.  :-( )
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 27, 2018, 05:38:30 am
How exactly do you tag data?

There are a lot of things that compilers can't do. There are some design decisions that affect how you actually want to store data. When working with networking stuff on the LE system, most of the time you want to convert the data when it enters/leaves the MCU. I don't want it to do the conversion before each operation.

Also, more recent networking standards, like IEEE 802.15.4 and ZigBee are all LE.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: Nominal Animal on December 27, 2018, 07:39:38 am
If you have specific-endian data in a buffer, current C compilers know how to optimize e.g.
Code: [Select]
static inline uint32_t  unpack_u32le(const unsigned char *const data)
{
    return ((uint32_t)data[0])
         | ((uint32_t)data[1] << 8)
         | ((uint32_t)data[2] << 16)
         | ((uint32_t)data[2] << 24);
}

static inline uint32_t  unpack_32be(const unsigned char *const data)
{
    return ((uint32_t)data[0] << 24)
         | ((uint32_t)data[1] << 16)
         | ((uint32_t)data[2] << 8)
         | ((uint32_t)data[3]);
}
depending on the surrounding code. That's why I don't mind.

It is details like bit order on the wire, or bit labels using 0 for most significant bit in documentation where you need to know the width of the register involved to calculate the corresponding numeric value, that trip me.

How exactly do you tag data?
GCC uses __attribute__((scalar_storage_order (byte-order))) (https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-scalar_005fstorage_005forder-type-attribute) as a structure type attribute to define the byte order of the scalar members, but I don't like it; I like to have the byte order conversions explicitly visible.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on December 27, 2018, 09:34:56 am
Quote
I like to have the byte order conversions explicitly visible.
"We" should have learned by now to put the appropriate n2hl() calls in all the appropriate places.Except it's a dangerously ambiguous "standard."  (I guess ideally you need ip_n2hl(), zb_n2hl(), etc.)And some code uses it to mean "byteswap" (assuming the endianness of the host.)

Quote
most of the time you want to convert the data when it enters/leaves the MCU.
That might be nice, but you seldom know which data needs byteswapped (and at which length) until after you've inspected the packet pretty deeply.  Maybe controllers are smarter now, but I never found the network controllers that offered to do byteswapping during DMA to be useful.

Quote
I don't want it to do the conversion before each operation.
Sure;  but it turns out that having the compiler do it for each load from memory is not very painful.Intel implemented an attribute, I think, and we had them add capabilities to say "everything defined in .h files with a particular prefix is big-endian."  (we had an huge amount of big-endian code that we wanted to see if could run on Intel CPUs, without excessive "pain.")  And pragmas.  And defaults.
https://software.intel.com/en-us/node/628923  (it's been a long time, actually.  Stuff might have changed.)

Quote
current C compilers know how to optimize e.g. static inline uint32_t  unpack_32be ...
Which compilers?  gcc-arm didn't optimize it "at all" (for CM4), nor does XCode LLVM  :-(
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 27, 2018, 10:43:23 am
Quote
current C compilers know how to optimize e.g. static inline uint32_t  unpack_32be ...
Which compilers?  gcc-arm didn't optimize it "at all" (for CM4), nor does XCode LLVM  :-(

Using || instead of | isn't going to help.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: Nominal Animal on December 27, 2018, 12:03:20 pm
Which compilers?  gcc-arm didn't optimize it "at all" (for CM4), nor does XCode LLVM  :-(
The generic case yields something like
Code: [Select]
00000000 <unpack_u32le>:
   0: e5d01002 ldrb r1, [r0, #2]
   4: e5d0c001 ldrb ip, [r0, #1]
   8: e5d02000 ldrb r2, [r0]
   c: e1a03801 lsl r3, r1, #16
  10: e183340c orr r3, r3, ip, lsl #8
  14: e1830002 orr r0, r3, r2
  18: e1800c01 orr r0, r0, r1, lsl #24
  1c: e12fff1e bx lr

00000020 <unpack_32be>:
  20: e5d02001 ldrb r2, [r0, #1]
  24: e5d03000 ldrb r3, [r0]
  28: e5d01003 ldrb r1, [r0, #3]
  2c: e5d00002 ldrb r0, [r0, #2]
  30: e1a02802 lsl r2, r2, #16
  34: e1823c03 orr r3, r2, r3, lsl #24
  38: e1833001 orr r3, r3, r1
  3c: e1830400 orr r0, r3, r0, lsl #8
  40: e12fff1e bx lr
which is not optimal, sure, but not absolutely horrible either.

Yes, I do know that on Cortex-M4 the optimal versions are
Code: [Select]
get_native_u32:
    ldr r0, [r0]
    bx  lr

get_byteswap_u32:
    ldr r0, [r0]
    rev r0, r0
    bx lr
since ldr on Cortex-M4 can handle unaligned accesses just fine; but, you need something like
Code: [Select]
static inline uint32_t get_native_u32(const unsigned char *const src)
{
    return *(const uint32_t *)src;
}

static inline uint32_t get_byteswap_u32(const unsigned char *const src)
{
    uint32_t  result = *(const uint32_t *)src;
    result = ((result & 0x0000FFFFu) << 16) | ((result >> 16) & 0x0000FFFFu);
    result = ((result & 0x00FF00FFu) << 8)  | ((result >> 8)  & 0x00FF00FFu);
    return result;
}

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define  get_u32le(src)  get_native_u32(src)
#define  get_u32be(src)  get_byteswap_u32(src)
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
#define  get_u32le(src)  get_byteswap_u32(src)
#define  get_u32be(src)  get_native_u32(src)
#else
#error Unsupported byte order
#endif
to get ARM-GCC to emit that.  Personally, I prefer the first one for readability, but will switch to the latter if it makes a measurable difference at run time.

Using || instead of | isn't going to help.
You meanie. :palm:  Fixed now.

I've actually used such conversion code when storing huge amounts of molecular dynamic data in a binary format. It makes sense to allow the nodes to save the data to local storage in native byte order, with prototype values for each numeric type used to detect the byte order.  When the data is read (and usually filtered/selected), the "slow" byte order conversion is completely masked/hidden by the slow storage I/O bottleneck (spinning disks).
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 27, 2018, 11:59:59 pm
Again, it is a matter of opinion. I personally will not use anything BE. It is just not worth my time. In most cases I want to interoperate with X86 and ARM systems, and I don't want to think about it ever again. No htonl()-like BS for me.

There is probably no reason to hard switch existing established systems, but if the same things gets carried over from design to design, then the switch will not hurt in a long run.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 28, 2018, 01:17:12 am
If CPU has instructions which operate on operands of different sizes located in memory, then LE is certainly more efficient, because fetching/storing of the low byte is the same regardless of the size of the operands. In contrast, BE would require different fetching/storing logic for every supported size.

If the CPU doesn't do any memory operations (such as RISC-V or ARM) then LE/BE doesn't matter. The gain for superscalar CPUs is probably too small anyway. But for a small MCU, such as dsPIC33, LE is a natural choice.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: Nominal Animal on December 28, 2018, 03:07:33 am
Does byte order make any difference in the complexity of the VHDL/Verilog code? Especially the ALU, or when loading/storing unaligned multibyte values?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 28, 2018, 03:08:35 am
Does byte order make any difference in the complexity of the VHDL/Verilog code? Especially the ALU, or when loading/storing unaligned multibyte values?
Zero difference.

But it is not about convenience of an implementer. It is about convenience of a programmer.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: Nominal Animal on December 28, 2018, 03:25:17 am
(I don't care about convenience, really; I only care if/how it affects the efficiency [resource use] of implementations, and whether one choice allows more alternative implementations than the other.)

The only case I can think of where the byte order actually matters (as in, makes a difference in complexity of implementation), is when accessing arbitrary precision number limbs, or bit strings.

In little-endian byte order, you can use any unsigned integer type to access the ith bit in the string. That is, if map is sufficiently aligned and large enough,
Code: [Select]
unsigned int get_bit64(const uint64_t *map, const size_t bit)
{
    return !!(map[bit/64] & ((uint64_t)1 << (bit & 63)));
}

unsigned int get_bit8(const uint8_t *map, const size_t bit)
{
    return !!(map[bit / 8] & (1 << (bit & 7)));
}
then you always have get_bit64(map, i) == get_bit8(map, i).  (Ignore any typos in the above code, if you find one.)  Not so with big-endian byte order, where you must use a specific word size to access the bit map.  Granted, it only matters in some rather odd cases, like when different operations wish to access the binary data in different-sized chunks.

Other than that, the byte order really does not seem to affect me as a programmer much.  The fact that there is more than one byte order in use, does, but I guess I'm used to that, having dealt with so many binary data formats with differing byte orders.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: hamster_nz on December 28, 2018, 03:33:21 am
Does byte order make any difference in the complexity of the VHDL/Verilog code? Especially the ALU, or when loading/storing unaligned multibyte values?
Zero difference.

But it is not about convenience of an implementer. It is about convenience of a programmer.

A tiny (but significant) difference is that block memory initialisation values are usually written at long vectors or hex number strings, so avoiding logical byte swaps is helpful with staying sane. 

So bit 8 should be logically to the left of bit 7, and bit 0 of the second 32-bit word should be to the left of bit 31 of the first.

The 64-bit vector x"deadbeef12345678" as 32 bit words is {0x12345678, 0xdeadbeef}, and as 8-bit words it is {0x78, 0x56, 0x34, 0x12, 0xef, 0xbe, 0xad, 0xde}. It doesn't HAVE to be that way, but that is the way that causes me least logical confusion.




Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 28, 2018, 03:35:35 am
That's FPGA specific. And I personally use scripts to generate memory files from binaries, so it makes no difference to me anyway.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on December 28, 2018, 04:18:48 am
Does byte order make any difference in the complexity of the VHDL/Verilog code? Especially the ALU, or when loading/storing unaligned multibyte values?

ALU doesn't have any byte order, so obviously there's no difference for ALU.

Loading/storing unaligned multi-byte values requires certain complexity - depending on the actual alignment the memory may need to be accessed twice, then the results need to be synchronized. Thus, if you're doing all this stuff, there will be quite a bit of logic to deal with the alignment, which will dwarf any benefits of LE.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 28, 2018, 01:04:29 pm
What I would really really like to see is a tensor processing unit implemented to help the A.I. stuff as well as calculous concerning the space-time. New theories seem to imply multi-dimensions (more than four ... e.g. for supersymmetric particles, aka "SUSY"), that means a lot of calculus to be made on bigger matrices: we definitively need a tensor processing unit like the one implemented by Google, or even a better one.

For Google, it's a sort of special add-on hardware with looks like a GPU attached on an ePCI bus, therefore external to the CPU.

Maybe someone in the near future will implement one directly coupled to the CPU, sort of SuperCOP with the proper instruction set and accommodations.

This will boost the performances by several orders of magnitude, which is the case, considering that at CERN they are going to implement a new particle accelerator, bigger than the LHC, for 100Km of trajectory for the energy of 100TeV, which means 10x the data we have today to process.

In Japan, they are considering something similar, smaller in energy, and dedicated to Higgs boson, which currently the LHC is only able to produce one per 1 million interactions and only able to track particles with a precision of micron, and only in 4 dimension matrices.

This produces 20Petabyte of a daily stream. In the near future (by 2024?), we will have to increase 10x this value. At least.

Anyway ... what is considered the most advanced computer for scientific purpose? The DARPA is promoting and funding the IBM (multi multicores supermassive) POWER9 for which they have resurrected the old AIX UNIX; what does is the CERN using for computing 20 Petabyte of daily streams?

looking at pics of the CERN's computer room ... I see .... a cluster of boring Xeon-x86 PeeeeeCeeees :palm:
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 28, 2018, 05:05:07 pm
All the things described above are only problems if you think that BE is better. In that case the other option will create problems. If you think that LE is better, then there are no such problems.

Xeons are highest performing general purpose processors on the market right now, so why not CERN use them?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 28, 2018, 05:14:19 pm
All the things described above are only problems if you think that BE is better. In that case

In that case, I have valid points, whereas you have ... none.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 28, 2018, 05:17:22 pm
Sure, why not. You can continue using  BE while the rest of the world is clearly switching to LE.

I personally don't really care which one to use, I will use whatever is used by all (or majority)  of platforms I care about. And all of them happen to be LE.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 28, 2018, 05:44:05 pm
Xeons are highest performing general purpose processors on the market right now, so why not CERN use them?

Well, on the LHC they are managing equipment with currents up to 11K A, thus they are clearly promoting technological innovation. Applying this to the computer science it means they should promote a better computer  architecture; therefore at CERN they should use the new POWER9, which are more interesting and well performing, and easier to be modified for an embedded tensor processor, not mentioning the built-in supercomputing capabilities to easier aggregate CPUs in a multiprocessing scenario with a sort of NUMAlink. This implies mechanisms in the hardware, and specific instructions ... all things that at Intel are not ready since their recent attempts to fix it are also still bugged.

Besides, x86 sucks and it's a very shitty architecture. But it's cheap and massively produced and used, which makes it even cheaper.

The POWER9 is currently the most advanced architecture on the planet; It's neat and strong, and well designed. Unfortunately ... IBM and DARPA have a reserved agreement, which means we cannot buy and use their supermassive POWER9 but only machines with voluntarily reduced performance :palm: :palm: :palm:

Our *little* P9Ty2 cost 4500 Euro + FedEx S/H from the US + importing tax = a lot of money, and you can also imagine that even this doesn't help: the ratio performance/cost is clearly inflated vs Intel's 86, for which the software is also more stable.

Linux on PPC64/LE (whose profile is a subset of the POWER9's one) is ... not exactly *stable*. Talking about Gentoo ... it's really really "experimental", and neither this helps.

At Cern they have a strong budget's and time's limit, and a lot of computers to be bought and managed, therefore they consider the performance/ratio as the first constraint to be respected.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 28, 2018, 05:46:33 pm
Oh, about that: if a RISC-V board with PCIe will be manufactured, and guys at Gentoo will decide to support it, I will be happy to support it too  :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on December 28, 2018, 05:46:51 pm
CERN has to do science and use whatever equipment works for that purpose. Nothing else.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 28, 2018, 06:03:38 pm
CERN *literally* invented the Web(1) and now is offering a good occasion to design a real tensor processor et al. Otherwise (my speculation) we will only end by using what Google did around their TPU (v3?), which was made for a different purpose: specifically for the speed up of A.I. algorithms.

(1) as a visitor, I saw a self-congratulatory plate signed by the authors. WoW  :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 28, 2018, 11:14:41 pm
(http://olavea.com/wp-content/uploads/2014/10/Tim-berners-lee.png)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 30, 2018, 11:17:44 am
Is there a document about all the requirements and spec for making Linux run on RISCV?
Is there a RISCV board with ePCI or PCI?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 30, 2018, 11:21:18 am
Quote
U54-MC
The SiFive U54-MC Standard Core is the world’s first RISC-V application processor, capable of supporting full-featured operating systems such as Linux.

The U54-MC has 4x 64-bit U5 cores and 1x 64-bit S5 core—providing high performance with maximum efficiency. This core is an ideal choice for low-cost Linux applications such as IoT nodes and gateways, storage, and networking.

it seems *this* is the big-guy  :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 30, 2018, 11:37:11 am
Quote
RISC-V system emulator supporting the RV128IMAFDQC base ISA (user level ISA version 2.2, priviledged architecture version 1.10) including:
32/64/128 bit integer registers
32/64/128 bit floating point instructions

who is willing to use 128 bit integer and fp registers? and for which application?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 30, 2018, 11:48:03 am
Quote
4) Technical notes
------------------

4.1) 128 bit support

The RISC-V specification does not define all the instruction encodings for the 128 bit integer and floating point operations. The missing ones were interpolated from the 32 and 64 ones.

Unfortunately there is no RISC-V 128 bit toolchain nor OS now (volunteers for the Linux port ?), so rv128test.bin may be the first 128 bit code for RISC-V !


So it's completely ... experimental. But I still wonder WHO needs 128bit registers, and for what  :-//
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 30, 2018, 02:58:39 pm
Is there a document about all the requirements and spec for making Linux run on RISCV?

The Linux kernel supports RISC-V. There are I suppose at least half a dozen Linux distributions that support RISC-V, the most heavily used probably being Debian, Fedora, and Buildroot.

The main requirement of a board maker is to implement a bootloader and the SBI (Supervidor Binary Interface). The most commonly used bootloader at the moment is BBL (Berkeley BootLoader, which also implements SBI), but it's pretty crude so there is a lot of work going into others such as Das U-Boot and coreboot.

Quote
Is there a RISCV board with ePCI or PCI?

I don't know of a board at the moment with PCI built in. That will come during 2019, at probably an under $500 all-up price.

Right now the usual way to get PCI is to attach a HiFive Unleashed ($1000) board's FMC connector to either a Xilinx VC707 board ($3500) with https://github.com/sifive/fpga-shells/tree/master/xilinx/vc707 loaded in the FPGA, or else a MicroSemi HiFive Unleashed Expansion Board ($2000) with https://github.com/sifive/fpga-shells/tree/master/microsemi loaded in the FPGA.

So that's kinda expensive for amateurs or hobbyists at the moment, but if a company is paying an engineer to work on RISC-V Linux stuff then that's a week of so of salary so not a big deal.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rhodges on December 30, 2018, 03:04:20 pm
So it's completely ... experimental. But I still wonder WHO needs 128bit registers, and for what  :-//
Maybe it would be useful for Single Instruction Multi Data, without having separate instructions? The programmer or compiler would have to avoid carry or overflow between the words. That would suggest the desire to have a flag to suppress carry and overflow between 8, 16, and 32 bit data.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 30, 2018, 03:12:28 pm
The Linux kernel supports RISC-V. There are I suppose at least half a dozen Linux distributions that support RISC-V, the most heavily used probably being Debian, Fedora, and Buildroot.

The main requirement of a board maker is to implement a bootloader and the SBI (Supervidor Binary Interface). The most commonly used bootloader at the moment is BBL (Berkeley BootLoader, which also implements SBI), but it's pretty crude so there is a lot of work going into others such as Das U-Boot and coreboot.

Gentoo is not on the list, and I am a supporter: it means I am not interested in anything else except a config file for Catalyst, this plus a profile for building a reasonable stage3. This, unfortunately, is not yet existing, and it's not clear to me what I have to support. Specifically when I say "doc and spec", I mean MMU/TLC and ISA's extension, since as retirement I got an emulator which ... -1- it's OS-less (and can't see any MMU/TLC code/doc/whatever code), and -2- does not compile  :palm:

So, what I have to support for the MMU, and "privileged" rings?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 30, 2018, 03:15:56 pm
Maybe it would be useful for Single Instruction Multi Data, without having separate instructions?

never come into the need: examples from the real world?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 30, 2018, 03:26:38 pm
Quote
4) Technical notes
------------------

4.1) 128 bit support

The RISC-V specification does not define all the instruction encodings for the 128 bit integer and floating point operations. The missing ones were interpolated from the 32 and 64 ones.

Unfortunately there is no RISC-V 128 bit toolchain nor OS now (volunteers for the Linux port ?), so rv128test.bin may be the first 128 bit code for RISC-V !

So it's completely ... experimental. But I still wonder WHO needs 128bit registers, and for what  :-//

Right now it's just about designing RISC-V so as to make sure there won't be any nasty surprises in transitioning from 64 bit to 128 bit at some time in the future. Space has been left for opcodes and there's a pretty obvious way that it could work just filling in the obvious blanks.

As you'll know, x86 suffered some pretty big compatibility and other problems figuring out how to go from 16 bit to 32 bit to 64 bit, as did MIPS, SPARC and others. PowerPC was designed with 64 bit in mind from the start. ARM changed their ISA totally for 64 bit and appear to have made a successful transition despite this. Alpha and Itanic on the other hand were also incompatible with their companies' previous 32 bit architectures and failed to gain traction.

Demanding technical users started to transition from 32 bits to 64 bits in the early 90s with the MIPS III R4000 in 1991 and DEC Alpha in 1992. Home and business computers went to 64 bit starting with the Athlon64 in 2003, Xeons in 2004, and Core 2 in 2007. Mobile phones went to 64 bit starting with the iPhone 5s in mid 2013 and Galaxy S6 in early 2015.

I think 128 bit may start to be used in large datacentres by 2030, but not in homes or offices until maybe 2050. Some high-security users might want it sooner in order to use some address bits for tags, or to provide a sparse address space (a super-ASLR).

I expect someone will start making experimental 128 bit RISC-V chips within the next two years (it's pretty trivial to do .. just somewhat pointless at present), just to seed them to academia and certain TLAs to get experience.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 30, 2018, 03:31:53 pm
The Linux kernel supports RISC-V. There are I suppose at least half a dozen Linux distributions that support RISC-V, the most heavily used probably being Debian, Fedora, and Buildroot.

The main requirement of a board maker is to implement a bootloader and the SBI (Supervidor Binary Interface). The most commonly used bootloader at the moment is BBL (Berkeley BootLoader, which also implements SBI), but it's pretty crude so there is a lot of work going into others such as Das U-Boot and coreboot.

Gentoo is not on the list, and I am a supporter: it means I am not interested in anything else except a config file for Catalyst, this plus a profile for building a reasonable stage3. This, unfortunately, is not yet existing, and it's not clear to me what I have to support. Specifically when I say "doc and spec", I mean MMU/TLC and ISA's extension, since as retirement I got an emulator which ... -1- it's OS-less (and can't see any MMU/TLC code/doc/whatever code), and -2- does not compile  :palm:

So, what I have to support for the MMU, and "privileged" rings?

The MMU/TLB is explicitly *not* defined by the RISC-V specification. That's entirely up to the individual chip manufacturer.

The page table layout in memory is specified. How that makes its way into a TLB (software or hardware PT walker) or how the TLB works is not. There are SBI calls to do things such as update or flush the TLB. It's up to the board/chip vendor to write those. The OS simply uses them.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: rhodges on December 30, 2018, 04:41:51 pm
Maybe it would be useful for Single Instruction Multi Data, without having separate instructions?
never come into the need: examples from the real world?
Close. The VLIW PNX1302 had 128 32-bit registers. They could do ordinary arithmetic, and there were also instructions for doing 4 8-bit or 2 16-bit operations on them. Think of the carry bit suppressed on the 8-bit or 16-bit boundaries. There was also the option of "saturation", where overflow resulted in the maximum or minimum values.

This is similar to the SIMD extensions to x86 (and probably all SIMD), but with the main register set. I believe x86 SIMD uses separate (FP?) registers.

This CPU was intended for media processing, so maybe it would be good to think of it as a SIMD processor that also did ordinary CPU functions.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: Nominal Animal on December 31, 2018, 08:12:38 am
I believe x86 SIMD uses separate registers.
Correct. SSE2 has 16 128-bit arithmetic registers XMM0 - XMM15, that can be treated as two double-precision floating-point numbers, four single-precision floating-point numbers, two 64-bit integers, four 32-bit integers, eight 16-bit integers, or sixteen 8-bit integers, signed or unsigned, depending on the instruction.  These are distinct from 387 floating-point registers. AVX renames those to YMM0-YMM15, extending them to 256 bits, AVX2 adding 128-bit and 256-bit integer operations. AVX512 renames them to ZMM0-ZMM31, not just doubling their number but extending them to 512 bits.

These are completely separate from the normal AMD64 (x86-64) general-purpose registers (rax, rbx, rcx, rdx, rsi, rdi, rbp, r8, r9, r10, r11, r12, r13, r14, and r15), and use a completely different set of instructions.

Single-precision floating point vectors are widely used in image and geometry processing (including wavelet transforms and such, unless done using a GPU), and also heavy sound analysis (single-precision FFT/DFT/Hartley transforms and such). Double-precision floating point vectors are heavily used in computational physics -- basically both ab-initio (quantum mechanics; vasp et cetera) and classical (potentials; lammps, gromacs et cetera).  Using the binary operations on the floating-point values is also surprisingly common (absolute values, min/max, masking/conditionals).  The major use for the various integer operations is speeding up cryptographic operations, which nowadays are absolutely ubiquitous; not just in securing socket communications, but in internal kernel operations (like ensuring unpredictability of kernel random number sources).

As to the underlying microcode and hardware implementation, it looks like Intel and AMD implementations do differ quite a bit. Mathematically their results are identical, but how different operations pipeline, and how efficient vector-intensive operations are, varies a lot between processor families.

Accelerating cryptographic operations, double-width unsigned integer multiplication is an absolute must. (Meaning, you really need a multiplication operation C = B × A where C is a pair of registers, or double the width of A and B. Apologies for the poor terminology; me fail English today worse.) The size of the unsigned integers we deal with will only increase; right now ordinary workstations do a surprising amount of work on 2048-bit and higher unsigned integers.  So, it is not just the size of the registers that matters, the basic operations (addition, subtraction, multiplication, and, or, xor, not) must also be fast/efficient enough to warrant their use.

(It turns out that at least some Intel implementations of AVX2 and AVX512 are not really worth the extra cost when mostly using double-precision floating-point vectors. Bummer.  But, that is the reason CERN and others doing heavy physics computations, really do not want to be using the very newest hardware, but on hardware chosen based on amount of computation per euro achieved.  Theoretical gains look nice on paper, but practice trumps theory.  That said, a lot of the existing simulation software and surrounding services (the CERN data is structured, not just "flat files"), is horribly inefficient design, and a lot more could be done to fix that... but don't get me started on that.  And yes, I have been an admin of a HPC cluster used to munch on terabytes of CERN data. Even built an auto-evaluation Linux USB stick with actual simulations for vendors to measure the performance of vendor offerings for a new cluster acquisition, once.)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: JPortici on December 31, 2018, 09:13:59 am
Maybe it would be useful for Single Instruction Multi Data, without having separate instructions?

never come into the need: examples from the real world?


Audio manipulation of any kind
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 31, 2018, 12:32:18 pm
Quote
Moreover, tinyemu assumes a little endian host so it has no chance to work on a PPC machine.

So tinyemu is LE only. It emulates x86 and RISCV (32, 64, 128bit),
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: LapTop006 on December 31, 2018, 12:56:23 pm
... Alpha and Itanic on the other hand were also incompatible with their companies' previous 32 bit architectures and failed to gain traction.

I disagree on the Alpha side, it had great traction, however two big things killed it early.

1. Windows 2000 killed Alpha support (late in the betas), cutting off a significant market. Windows support was also a 32-bit mode only, with the benefits fading away as x86 machines were improving around that time.
2. When HP bought Compaq Alpha, like PA-RISC was put on minimal life support with hopes on Itanium.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on December 31, 2018, 01:15:52 pm
I have been an admin of a HPC cluster used to munch on terabytes of CERN data

I see there was a plan for introducing IBM POWER9 HPC: maybe reconsidered for 2019? I hope  :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: Nominal Animal on December 31, 2018, 02:30:17 pm
I see there was a plan for introducing IBM POWER9 HPC: maybe reconsidered for 2019? I hope  :D
I've been out of touch for a few years now (I'm no longer on the mailing lists etc.), so I don't know; but I definitely hope different systems are stll tested here and there, and considered for wider adoption.

I'm sure you're perfectly aware that it isn't just the hardware, the full software stack needs to be there too to take advantage of it, and having a few SW engineers ensure compilers have good hardware support, and researchers port simulators to different architectures, leads to better field of competition and more bang for buck for users.  Unfortunately, that also tends to uncover software and simulation model bugs, and established professors don't like that, because it means they'd need to issue corrections to past publications with possibly lots of citations; bad for reputation.  So that is only done when you have professors more interested in the research than possible small dings to their own reputation, but still crafty enough to talk money out of politicians. Rare.

The overall data processing model is tiered, and individual clusters especially at Tier-2 not tied to any particular hardware (although users do need a dev environment to compile their simulators to each HW architecture).  Putting up a testbed POWER9 cluster at some university, with both local and CERN computation tasks, would be a perfect opportunity to fine-tune the Linux support, compiler options, and so on, giving practical real-world data as to the capabilities and efficiency of POWER9 in physics/chemistry HPC use.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on December 31, 2018, 11:50:49 pm
... Alpha and Itanic on the other hand were also incompatible with their companies' previous 32 bit architectures and failed to gain traction.

I disagree on the Alpha side, it had great traction, however two big things killed it early.

1. Windows 2000 killed Alpha support (late in the betas), cutting off a significant market. Windows support was also a 32-bit mode only, with the benefits fading away as x86 machines were improving around that time.
2. When HP bought Compaq Alpha, like PA-RISC was put on minimal life support with hopes on Itanium.

If it had great traction then neither of those things would have happened!

I'm sure there were a number of people who loved Alpha and invested heavily into systems and software using it, and that they were very upset when it was axed. I thought it was a wonderful design myself, with a future ahead of it much longer than the 25 years DEC said they designed it for.

If NT4 on Alpha was tearing up the market and people were throwing out their Pentium Pro/II/III all over the place then you can be sure Microsoft would have supported Alpha in Windows 2000.

The *huge* FUD campaign about Itanic resulting in the corporate deaths of the perfectly good Alpha, PA-RISC and others is the MAIN REASON I'm such a big fan of RISC-V. It's not that it's technically superior, it's that if I (or you) invest in it no one can take it away from us. One or many RISC-V companies may fail in the heat of competition, but others can continue.

By far the biggest thing wrong with Alpha was that is was owned by a company that failed.

Alpha did have technical problems. Mostly just that the program code was too big -- I think they simply didn't realise that once speeds went significantly over 1 GHz (which Alpha never did) it wasn't going to be possible to continue scaling up the L1 icache size while maintaining quick access. 21064 and 21164 had 8 KB icache and 21264 had 64 KB. Pretty much everyone these days has stopped at (or fallen back to) 32 KB for L1 icache and it's important to be able to fit as much program code as possible into that.

The minor thing wrong with Alpha was the lack of 8 and 16 bit loads and stores. That's -- again -- more of a code size problem than a speed problem, but anyway they fixed it in the 2nd (21164) generation.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 01, 2019, 04:32:32 pm
why do we need 128bit for audio algorithms?
and specifically, why 128bit integer, rather than fixed-point or floating point?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on January 01, 2019, 06:08:56 pm
why do we need 128bit for audio algorithms?
and specifically, why 128bit integer, rather than fixed-point or floating point?

You don't need big integers for audio/video, but big SIMD registers holding several smaller integers are very handy for audio/video because they let you do operations in parallel.

Big integers are good for public key cryptography (such as RSA or DH) where you currently deal with 2048-bit integers, which are likely to get even bigger as the algorithm strength frenzy moves forward. Having 128x128 multiplication would make these algorithms more efficient. 256x256 multiplication would be even better. I don't think this cause justifies 128-bit CPUs though.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: cepwin on January 20, 2019, 12:10:45 am
After hearing platformio discussed on a podcast I decided to check it out this afternoon.   I too am disappointed that it requires a subscription to simply use the debugger.   To me that's a basic feature they're putting behind a paywall.    I am going to check out the Freedom IDE.
Update:  Well, since FreedomStudio is only for SiFive based processors it won't work with atmel boards it's not what I'm looking for now.   Paying for the debugger in platformio bugs me but if one can use one IDE with the RISC-V and the Atmel based boards (as well as many others) it might be worth it.   The alternative is to use a separate IDE for boards based on different companies chips.....
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on January 20, 2019, 01:07:02 am
The platformio guy has done a nice job but I think he's being just a little too greedy on the licensing. I've heard he's been approached by people willing to buy the rights and give the software away but he thinks he's going to get rich doing it how he is. If it's just one guy then at $10/month he only needs 100 customers to have a pretty nice income in Ukraine or 1000 customers for a pretty nice income anywhere in the world.

If it was $1 a month I wouldn't hesitate.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: cepwin on January 20, 2019, 01:32:38 am
I have to agree with you Bruce.....given that most IDEs are free (your company's FreedomStudio, Atmel Studio, etc) charging $10/mo is a bit much.  He can argue he has a community edition but community editions should have all the basic functions and that includes debugging.   Of course Arduino has no debugging. 

I decided to go on their forums and respectfully present my impressions including the debugger paywall issue.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on January 20, 2019, 01:53:01 am
You might if you're lucky get $100k to $1m revenue a year selling dev tools, but if you give away the dev tools you can get $100m to $1b+ revenue a year selling chips or finished products.

That seems to me like an easy calculation for any Atmel, Microchip, SiFive, Xilinx, Apple, Microsoft...
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on January 20, 2019, 03:45:02 am
You might if you're lucky get $100k to $1m revenue a year selling dev tools, but if you give away the dev tools you can get $100m to $1b+ revenue a year selling chips or finished products.

That seems to me like an easy calculation for any Atmel, Microchip, SiFive, Xilinx, Apple, Microsoft...

I didn't know SiFive was making $100m to $1b+ selling chips.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on January 20, 2019, 04:56:12 am
You might if you're lucky get $100k to $1m revenue a year selling dev tools, but if you give away the dev tools you can get $100m to $1b+ revenue a year selling chips or finished products.

That seems to me like an easy calculation for any Atmel, Microchip, SiFive, Xilinx, Apple, Microsoft...

I didn't know SiFive was making $100m to $1b+ selling chips.

The RISC-V business isn't yet, but the overall revenue including other chips and IP may be -- Glassdoor thinks it is, but Crunchbase doesn't.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 20, 2019, 12:56:49 pm
.....given that most IDEs are free (your company's FreedomStudio, Atmel Studio, etc) charging $10/mo is a bit much.

I do find it irritating, to be honest. Atmel is a large company that makes a lot of business by selling chips and services, while a freelancer is only able to be engaged for a paid job. So, supposing you are a developer and you have developed a high-quality softcore with a built-in debugger: are you really willing to put it on the internet for free, just for the glory?

There are now those who are happy to release their free tools and then asking guys to contribute on Patreon for crowdfunding. Someone is really smart and does cool stuff, someone is ... not, but this model doesn't work very good. It simply works sometimes, anyway guys on YouTube make the money by releasing videos with the precise attempt of getting visualization-counters increased so they can get the attention of a sponsor who will pay them for promoting the advertising of their products on videos.

This always works, and those who release products/service for 1USD/month are usually in this basket, and their quality is usually low, except if they are big companies, like Atmel, or great artists like Benjamin J. Heckendorn (https://en.wikipedia.org/wiki/Benjamin_Heckendorn) who promotes Element14 on YouTube.

I have a friend who is a professional filmmaker, she said that for making a decent video she needs 3 working days, full time, with skills on Premier, FinalCut, AutoDesk-Inferno, and manual painting on a graphics tablet. This is usually charged at 200 USD/day in a decent filmmaking studio, while she doesn't get a cent for her job when she releases something on YouTube for free, except ... she gets credits by followers, and sometimes gadgets (free comics, free t-shirts, free mug cups).

Now she has got her page-visualization-counter at a decent value so she is able to catch the attention of sponsors who are happy to pay her for adverting. This also applies to websites promoting "opensource" projects since you have to find a way to get the money for a living.

So what I find really irritating is the assumption that, because someone wants to release his/her projects for free then we should do the same, and, worse still, I do find even more irritating the assumption that, because Atmel can release something for free, then we should do the same.

FSK, really  |O
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 20, 2019, 01:03:13 pm
You might if you're lucky get $100k to $1m revenue a year selling dev tools, but if you give away the dev tools you can get $100m to $1b+ revenue a year selling chips or finished products.

That seems to me like an easy calculation for any Atmel, Microchip, SiFive, Xilinx, Apple, Microsoft...

Yup, precisely.
Not yet sure about SiFive, anyway.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on January 20, 2019, 02:02:11 pm
So what I find really irritating is the assumption that, because someone wants to release his/her projects for free then we should do the same, and, worse still, I do find even more irritating the assumption that, because Atmel can release something for free, then we should do the same.

You're absolutely right. With lots of free crap around, it's incredibly hard to sell software nowadays. If this guy can do it and make profit, I admire him.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: cepwin on January 20, 2019, 02:52:11 pm
I have to stand corrected to some extent.    You can debug without the unified debugger...just a bit more work (according to the pio forum.)  In this case the additional ease *is* a professional feature and quite frankly after wrestling with Eclipse last night (as an alternative)   I am strongly considering paying for it.  As I mentioned when I posted on their forum, it is an impressive product.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: cepwin on January 20, 2019, 03:38:33 pm
Someone on an unrelated site I'm on talks a lot about the fact that the way to make money in a business is to solve peoples specific problems.  Clearly he has solved to a large extent the problem of needing separate environments for different types of chips as well as the difficulty in getting up and running in platforms such as eclipse. 
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on January 21, 2019, 08:20:11 pm
You might if you're lucky get $100k to $1m revenue a year selling dev tools, but if you give away the dev tools you can get $100m to $1b+ revenue a year selling chips or finished products.

That seems to me like an easy calculation for any Atmel, Microchip, SiFive, Xilinx, Apple, Microsoft...

Yup, precisely.
Not yet sure about SiFive, anyway.

I'm not 100% certain either, but it seemed worth taking a punt!
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: langwadt on January 21, 2019, 08:32:21 pm
So what I find really irritating is the assumption that, because someone wants to release his/her projects for free then we should do the same, and, worse still, I do find even more irritating the assumption that, because Atmel can release something for free, then we should do the same.

FSK, really  |O

that's the free market, you can set your price to what ever you want and people decide if they think it is worth paying that price
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: Nominal Animal on January 21, 2019, 09:04:45 pm
Incorrect assumptions do tend to be infuriating.

But.. how do you Frequency Shift Key assumptions?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: KL27x on January 22, 2019, 12:48:01 am
Quote
So what I find really irritating is the assumption that, because someone wants to release his/her projects for free then we should do the same, and, worse still, I do find even more irritating the assumption that, because Atmel can release something for free, then we should do the same.
This is even more annoying when the person saying it makes $$ from YT videos, and the only creations/inventions they have so generously shared might have been good for some views and hits but ultimately belong only in a dumpster.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: cepwin on January 23, 2019, 02:21:44 am
I finally got to watch the videos...very interesting.  Cool board too.    I also thought his presentation was very clear.   My only complaint was that the red for the instruction codes were very hard to read.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: David Hess on January 23, 2019, 07:18:23 pm
The minor thing wrong with Alpha was the lack of 8 and 16 bit loads and stores. That's -- again -- more of a code size problem than a speed problem, but anyway they fixed it in the 2nd (21164) generation.

A major thing, and this will sound familiar for Itanium, Mill, and perhaps RISC-V, was that Alpha was designed assuming an alternative to out-of-order execution which so far has not been found.

Another serious problem was the weak memory ordering; it seems great in theory but sure makes things difficult.  PowerPC and ARM suffer from this as well.  But everybody loves to recompile for every new implementation.  Right?  RIGHT?  Hello?

PS - I really hate doing searches on subjects like these and findings posts from 13 years ago (!) where I discussed them.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 23, 2019, 07:50:48 pm
the worst effects of memory reordering can only be observed when lock-free programming techniques are used.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on January 23, 2019, 10:02:22 pm
A major thing, and this will sound familiar for Itanium, Mill, and perhaps RISC-V, was that Alpha was designed assuming an alternative to out-of-order execution which so far has not been found.

I don't understand what you mean by this.

Certainly, Itanium and Mill and both something like VLIW with "instructions" containing a number of operations that have been proven to be safe to execute at the same time.

The semantics of RISC-V are that the program must appear to other code on the same core as if the instructions were executed sequentially. I'd have thought Alpha was the same.

Quote
Another serious problem was the weak memory ordering; it seems great in theory but sure makes things difficult.  PowerPC and ARM suffer from this as well.  But everybody loves to recompile for every new implementation.  Right?  RIGHT?  Hello?

You don't have to recompile for a new implementation if you followed the published memory consistency rules.

If you just hacked things until your program seemed to work then you might need to.

The RISC-V memory model has been developed by a group of industry and academic experts who are very familiar with the problems with the ARM and Alpha models. There is some almost two year old information here: https://riscv.org/2017/04/risc-v-memory-consistency-model/ The experts have since finished their work and the RISC-V memory model has been ratified -- in fact it's the first thing to be ratified, even before the base instruction set.

I think this is a great example of the strength of the RISC-V governance approach. Things take a bit longer than just having half a dozen people in a room at some company have a meeting and make a decision, but it's also much more likely to be correct.

Other good examples are the work of the "Fast Interrupts" working group and the Vector Extension working group.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 25, 2019, 01:09:06 pm
PowerPC and ARM suffer from this as well.  But everybody loves to recompile for every new implementation.  Right?  RIGHT?  Hello?

OT:
I have just wasted two weeks of my time at updating { HPPA{2.0BE}, PPC{32BE}, MIPS{32LE, 32BE, 3BE } } Linux stage4's because new packages and libraries need to be of the same C++ ABI.

Code: [Select]

 [4] gcc-4.1.2 (needed for legacy reasons)
 [5] gcc-4.3.6 (needed for legacy reasons)
 [6] gcc-4.4.7 (needed for legacy reasons)
 [7] gcc-4.5.4 (needed for legacy reasons)
 [8] gcc-6.4.0 <----- compiled with this
 [9] gcc-7.3.0 <----- now I am with this

Things compiled by GCC-v6.4.0 are completely screwed up against things compiled by GCC-v7.3.0.
It basically means stage1-3,4 need to be completely wiped and rebuild from scratch

Electricity, effort, time, and money wasted  :palm: :palm: :palm:

Digging deeper, I see changes on the C++ ABI, so things compiled by different versions of the compiler cannot work together.



There is a similar reason for being obliged to recompile the "memory barrier" support offered by C++11/$version, since $version_A is different from $version_B, and this is a very serious problem when you switch from a multi-CPUs SMP machine to a multi-core SMP machine  :palm:


edit: when you update the C++ compiler ... you might see mismatches like this
Code: [Select]
Mismatch between the program and library build versions detected.
The library used 3.0 (wchar_t,compiler with C++ ABI 1010,wx containers,compatible with 2.8),
and your program used 3.0 (wchar_t,compiler with C++ ABI 1011,wx containers,compatible with 2.8).
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 25, 2019, 01:44:41 pm
You don't have to recompile for a new implementation if you followed the published memory consistency rules.

sure, you have to!

powerpc440 and powerpc460 have different instructions for handling memory ordering; besides when your code uses POSIX semaphores to coordinate the beginning and end of each loop, sometimes the code uses asm volatile("" ::: "memory") to prevent compiler reordering (which for sure will make a mess otherwise); sometimes it's ok and enough for the job, but sometimes (usually in multicore SMP environments) this is not enough to avoid confusion so you explicitly need to prevent it with a StoreLoad Barrier instructions *IF* and only *IF* they are available, e.g.  volatile("memfence" ::: "memory");  to prevent memory reordering

This one has a different implementation on PowerPC vs PowerPC-embedded: e.g. vs ppc-e500  :palm:
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: Nominal Animal on January 25, 2019, 05:07:21 pm
(An aside: the STT_GNU_IFUNC extension to the ELF standard is very useful if you have that kind of variants -- functions that need to be implemented slightly differently, depending on small variations on the processor or architecture.  They're pretty easy to use (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes), too; just implement each variant and one resolver function that returns the pointer to the variant to be used.  The dynamic linker calls that resolver function at runtime, so there is no extra indirection cost either, just a tiny startup delay cost of running those resolver functions once.)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on January 26, 2019, 05:35:45 am
You don't have to recompile for a new implementation if you followed the published memory consistency rules.

sure, you have to!

powerpc440 and powerpc460 have different instructions for handling memory ordering; besides when your code uses POSIX semaphores to coordinate the beginning and end of each loop, sometimes the code uses asm volatile("" ::: "memory") to prevent compiler reordering (which for sure will make a mess otherwise); sometimes it's ok and enough for the job, but sometimes (usually in multicore SMP environments) this is not enough to avoid confusion so you explicitly need to prevent it with a StoreLoad Barrier instructions *IF* and only *IF* they are available, e.g.  volatile("memfence" ::: "memory");  to prevent memory reordering

This one has a different implementation on PowerPC vs PowerPC-embedded: e.g. vs ppc-e500  :palm:

The subject was RISC-V, not PowerPC. I fully accept that other architectures have handled these things badly in the past, and we've tried to learn from that. And hopefully done a good job. Time will tell.

If you follow the published RISC-V memory consistency rules then you will never have to recompile software for a new processor. The fence instruction is part of the base instruction set, will be accepted by every CPU, and is a zero or one cycle no-op on CPUs where no action is necessary.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 26, 2019, 06:20:11 am
The subject was RISC-V, not PowerPC. I fully accept that other architectures have handled these things badly in the past

With all the respect, IBM is IBM, a company that has been being fruitful in the computer science since the beginning of the computer science itself, they have done and they are still doing the history of computer science began, whereas RISC-V is ... nothing similar with just a subfraction of the experience and the competence of IBM.

so I trust more what IBM recommends about POWER9: don't expect that all the spec about a family will be written in stones, certain things do change so you'd best assume that things always need to be recompiled (e.g. AIX needs a specific patch media to be installed for operating on POWER9), otherwise be prepared to suffer on your hex-editor.

so my point is: I don't expect that in the loooong time  (10 years?) RISC-V will be better than what IBM has done in twenty years of PowerPC experience (from the PPC601 to the PPC970, including several embedded cores 4xx), certain things do change so I am already prepared to accept that I will have to spend time at recompiling things, and I am expecting to have/get (to get = to pay for obtaining it) the source of everything instead of just the binaries (for the reason that price is one-tenth if you don't request the sources).

This is the big mistake I did with power-pc when I purchased only the binaries.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 26, 2019, 07:17:07 am
I am finalizing just right now the stage4 for HPPA2. This architecture, made by HP, is the most stable in terms of changes, but the last PA-RISC CPU, PA8900, adds a few changes against the PA8700, and they are related to the "multi-cores" nature of the PA8900 which causes problems in Linux and in HPUX11 (which needs specific patches to be installed) exactly because it's not expected: previous CPUs were capable of multi-CPUs SMP but they are not multicores.

What have I learned from these experiences?  I have learned that every processor family has different habits when it comes to memory reordering, and those habits can only be observed in multicore or multiprocessor configurations, and given that multicore is now mainstream, it’s worth assuming that the market should have some familiarity with them, so new products are now developped with multi-cores which offer a certain compatibility with their predecessors, but don't assume that all the processors in a family are all the behave the same way in SMP because they do not because there are many types of memory reordering, and not all types of reordering occur equally often.

It all depends on the processor, on its implementation, and even if you’re targeting and/or the toolchain you’re using for development (e.g. java uses a different approach vs C++11).

This problem is known as "memory model" that tells you, for a given processor && toolchain, exactly what types of memory reordering to expect at runtime relative to a given source code listing. Keep in mind that the effects (and differences) of memory reordering can only be observed when lock-free programming techniques are used.

What I mean is that we have three kinds of memory models:

All of these three are hardware memory model that tells you what kind of memory ordering to expect at runtime relative to an assembly compiler (and here, you can expect other problems with the C compile ... the C is not thread-safe, so you have to correctly tell the compiler what it has to correctly do. C++11 helps al lot about this, C doesn't).

Now, talking about hardware, between both the HPPA and the PowerPC families you find certain members are the kind-B, certain are the kind-C, but in the embedded PowerPC you also find members that are the kind-A (because based on the oldest/simplest/safest/more conservative CPU-model. e.g. military PPCs need redundancy, which needs to be kind-A).

Even the x86/64 should be both the kind-A (in i386 emulation mode) and kind-B.

Besides, on the software side, Java is only kind-A oriented, C++11 default atomic is kind-A, but new the C++11/20xx low level atomic tends to be kind-C.


You cannot say what will happen in the future: new kind-D? new kind-E?

Be prepared  :-//
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on January 26, 2019, 11:39:19 pm
What I mean is that we have three kinds of memory models:
  • kind-A: you have CPUs that are ONLY sequentially consistent, and this is the ONLY way they van operate in SMP
  • kind-B: you might have CPUs that are usually strong, implementing explicit acquire and release, TSO. This usually works in multi-cores SMP at the cost of degrading performances, but .. sometimes it might not work correctly on multi-cores, while it for sure always works in multi-CPUs SMP
  • kind-C: you might also have multi-cores that are weak with data dependency reordering. This is assumed to be working in multi-CPUs/multi-cores SMP

All of these three are hardware memory model that tells you what kind of memory ordering to expect at runtime relative to an assembly compiler (and here, you can expect other problems with the C compile ... the C is not thread-safe, so you have to correctly tell the compiler what it has to correctly do. C++11 helps al lot about this, C doesn't).

RISC-V has been designed from the start for kind-C. Every RISC-V CPU has the ordering instructions ("fence") needed by kind-C for C++11 (including acquire and release semantics, and distinguishing memory rom I/O) and Java and C# and other currently-known languages. On low end CPUs that will never have multiple cores "fence" is still recognised as an instruction but is a no-op.

Note that programs written correctly for kind-C systems are guaranteed to run fine on kind-B and kind-A systems, and programs written correctly for kind-B systems are guaranteed to run fine on kind-A systems. As long as the instructions are recognised as valid instructions. Which they are.

The two year old FE310 32 bit RISC-V microcontroller test chip (in the HiFive1) implements the fence instruction, as well as a full set of AMO (Atomic Memory Operation) instructions designed to natively implement C++11 semantics.

RISC-V is by default kind-C, but you can as an extension build a system as TSO (kind-B). That makes programs a little simpler to write correctly, but you can then *only* run those programs on CPUs that implement the TSO extension. Normal programs run fine on TSO too.


Quote
You cannot say what will happen in the future: new kind-D? new kind-E?

Be prepared  :-//

New things are always a possibility.

But there is a *lot* of experience of old things that people have done wrong that can be learned from, from microcontrollers up to supercomputers.

What is sad is when people have a few decades of experience available to study and FAIL to learn from it.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 27, 2019, 02:57:51 pm
I have read many and I do find this (https://www.amazon.com/gp/product/0123973376/ref=as_li_ss_tl?ie=UTF8&tag=preshonprogr-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=0123973376) and this (https://www.amazon.com/gp/product/1933988770/ref=as_li_ss_tl?ie=UTF8&tag=preshonprogr-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=1933988770) books useful concerning multiprocessor programming.

Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: Nominal Animal on January 27, 2019, 04:20:37 pm
a full set of AMO (Atomic Memory Operation) instructions designed to natively implement C++11 semantics
Do you know/remember the maximum data size the LL/SC ops support?

For a C programmer (say, low-level libraries and such), the built-ins (https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html) (that GCC, clang, and Intel CC all support) are extremely useful, but the variation in the maximum size supported is a bit of a pickle.

What is sad is when people have a few decades of experience available to study and FAIL to learn from it.
Yes. It is one thing to not know and stumble, but refusing to learn from others is just baffling.

(I'm not referring to anything in this thread. I only mean I see that way too often in real life, and cannot wrap my head around it.  I can see why people repeat mistakes they didn't know about, but scientists and engineers whose entire job description is to build on top of existing knowledge? Weird.)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: NorthGuy on January 27, 2019, 05:06:36 pm
But there is a *lot* of experience of old things that people have done wrong that can be learned from, from microcontrollers up to supercomputers.

https://www.youtube.com/watch?v=HW4Q0IeTYbA (https://www.youtube.com/watch?v=HW4Q0IeTYbA)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on January 28, 2019, 02:57:11 pm
a full set of AMO (Atomic Memory Operation) instructions designed to natively implement C++11 semantics
Do you know/remember the maximum data size the LL/SC ops support?

XLEN.

i.e. 32 bits on a 32 bit machine, and 64 bits on a 64 bit machine.

In particular there is no native double width CAS AMO and no way to make one using LL/SC -- that needs to use an actual lock.

There is significant interest in more powerful lock-free programming primitives, and something will probably happen in the next year or two. I think that's most likely to be an extension to allow something between nested LL/SC and a very restricted STM. It's notable that others have stumbled over STM, so will be good to learn from that.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: Nominal Animal on January 28, 2019, 08:42:42 pm
In particular there is no native double width CAS AMO and no way to make one using LL/SC -- that needs to use an actual lock.
That is a minor pain with signal handers in C, because the only async-signal safe locking primitive is sem_post().  I end up having to work around it by using a dedicated thread to receive signals via sigwaitinfo(), rather than using signal handlers.

I wonder if anyone is experimenting with cacheline-wide CAS.  That is, instead of registers, entire cache lines are compared and atomically swapped.  Since partial address tags have already proven to be a security risk, it seems to me that swapping just the cacheline address tags, might work.  Even without CAS, an atomic cacheline swap would be useful for atomic structure updates.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 29, 2019, 12:22:38 am
I was googling for microdrive (micro harddrive), and I found risc-v (https://www.westerndigital.com/company/innovations#risc-v), LOL  ;D
(by westerndigital)
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on January 29, 2019, 01:09:43 am
I was googling for microdrive (micro harddrive), and I found risc-v (https://www.westerndigital.com/company/innovations#risc-v), LOL  ;D
(by westerndigital)

What exactly is it you find amusing there?
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: westfw on January 29, 2019, 07:08:09 am
Quote
I finally got to watch the videos...very interesting.  I also thought his presentation was very clear.
Yes, me too! (finally actually watched the videos.)It was a really well-done intro to setting up an embedded development environment and writing your first simple program.   But, based on the discussion that's popped up here, I was expecting a lot more detail about the RISC-V instruction set itself.  The videos were pretty "generic."
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 29, 2019, 09:58:27 am
What exactly is it you find amusing there?

It's nice to see that even Western Digital has an interest for RISC-V.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on January 29, 2019, 11:45:37 am
Quote
I finally got to watch the videos...very interesting.  I also thought his presentation was very clear.
Yes, me too! (finally actually watched the videos.)It was a really well-done intro to setting up an embedded development environment and writing your first simple program.   But, based on the discussion that's popped up here, I was expecting a lot more detail about the RISC-V instruction set itself.  The videos were pretty "generic."

You're absolutely right, but I think it's probably the best thing to do. The instruction set can be easily learned from a book or even reference card. It's getting an environment set up and blinky running that is the stumbling block for most people.

He showed enough of the instruction set to get started, even though there are a couple of bugs in his code.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: brucehoult on January 29, 2019, 11:51:51 am
What exactly is it you find amusing there?

It's nice to see that even Western Digital has an interest for RISC-V.

It's 14 months since WD announced they will be converting all of their 1+ billion cores a year to RISC-V. In April last year WD was announced as one of the major investors in a $50.6m Series C round raised by SiFive, and at the same time it was announced that WD signed a multi-year license for SiFive's "Freedom Platform".

And of course the videos that are the subject of this thread are narrated by WDs Chief Technical Officer.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: legacy on January 29, 2019, 01:06:41 pm
And of course the videos that are the subject of this thread are narrated by WDs Chief Technical Officer.

I haven't yet watched the videos yet, I haven't had the time, and I don't have the embedded video-plugin enabled on my browser.

My knowledge about WD is more about their legacy hard-drives, because I happen to buy a thousand of legacy HDs (pATA and alike) per year in order to assist my customers. A lot of internet links to documentation about HDs are now gone, and Google has a dead-cache about that, but WD seems to have something backup-ed in the backstage, so I have mainly cared about this instead of their new homepage which redirects to their participation to RISC-V.

Anyway, even with a delay of fourteen months for being aware of what is happening ... well, that's a super nice and awesome news :D
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: asmi on January 29, 2019, 06:23:57 pm
Is there Risk-V assembler and C compiler for Windows? Preferably as a standalone executables which don't require any external components.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on January 29, 2019, 06:28:39 pm
Is there Risk-V assembler and C compiler for Windows? Preferably as a standalone executables which don't require any external components.
Here you go https://github.com/gnu-mcu-eclipse/riscv-none-gcc/releases

And some corresponding documentation https://gnu-mcu-eclipse.github.io/toolchain/riscv/install/
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: asmi on January 29, 2019, 07:21:56 pm
Here you go https://github.com/gnu-mcu-eclipse/riscv-none-gcc/releases

And some corresponding documentation https://gnu-mcu-eclipse.github.io/toolchain/riscv/install/
Beautiful, thank you! Now I don't have to write my own ;D Just gotta figure out how to make the toolchain output the file suitable for $readmemh command...
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on January 29, 2019, 07:27:55 pm
Beautiful, thank you! Now I don't have to write my own ;D Just gotta figure out how to make the toolchain output the file suitable for $readmemh command...

Here is a simple Makefile I use https://github.com/ataradov/riscv/blob/master/firmware/Makefile It includes a target for generating a *.mem file, which works with $readmemh().
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: asmi on January 29, 2019, 08:02:00 pm
Here is a simple Makefile I use https://github.com/ataradov/riscv/blob/master/firmware/Makefile It includes a target for generating a *.mem file, which works with $readmemh().
What is "od" in that file?
Also - I was able to print it in format that $readmemh understand using "objcopy -O verilog --reverse-bytes=4", but the problem is my memory is 4 bytes wide, and readmemh assumes that each byte is infact a full 32bit value. So far I couldn't find a way to get the objcopy to output data in 32bit chunks. I mean of course I can write a simple exe to do that, but I'm hoping there is a some "built-in" way of doing so... Or maybe I can redesign my memory to be byte-wide.
Title: Re: RISC-V assembly language programming tutorial on YouTube
Post by: ataradov on January 29, 2019, 09:11:56 pm
What is "od" in that file?
It is this utility http://man7.org/linux/man-pages/man1/od.1.html (http://man7.org/linux/man-pages/man1/od.1.html)

I mean of course I can write a simple exe to do that, but I'm hoping there is a some "built-in" way of doing so...
This is probably the way to go. When I need to generate MIF file for Altera devices, I just created this https://github.com/ataradov/genmif (https://github.com/ataradov/genmif) after trying to cobble something from standard tools.