Author Topic: RISC-V assembly language programming tutorial on YouTube (Read 54719 times)

legacy · « **Reply #250 on:** December 30, 2018, 03:15:56 pm »

Quote from: rhodges on December 30, 2018, 03:04:20 pm

Maybe it would be useful for Single Instruction Multi Data, without having separate instructions?

never come into the need: examples from the real world?

brucehoult · « **Reply #251 on:** December 30, 2018, 03:26:38 pm »

Quote from: legacy on December 30, 2018, 11:48:03 am

Quote
4) Technical notes
------------------

4.1) 128 bit support

The RISC-V specification does not define all the instruction encodings for the 128 bit integer and floating point operations. The missing ones were interpolated from the 32 and 64 ones.

Unfortunately there is no RISC-V 128 bit toolchain nor OS now (volunteers for the Linux port ?), so rv128test.bin may be the first 128 bit code for RISC-V !

So it's completely ... experimental. But I still wonder WHO needs 128bit registers, and for what

Right now it's just about designing RISC-V so as to make sure there won't be any nasty surprises in transitioning from 64 bit to 128 bit at some time in the future. Space has been left for opcodes and there's a pretty obvious way that it could work just filling in the obvious blanks.

As you'll know, x86 suffered some pretty big compatibility and other problems figuring out how to go from 16 bit to 32 bit to 64 bit, as did MIPS, SPARC and others. PowerPC was designed with 64 bit in mind from the start. ARM changed their ISA totally for 64 bit and appear to have made a successful transition despite this. Alpha and Itanic on the other hand were also incompatible with their companies' previous 32 bit architectures and failed to gain traction.

Demanding technical users started to transition from 32 bits to 64 bits in the early 90s with the MIPS III R4000 in 1991 and DEC Alpha in 1992. Home and business computers went to 64 bit starting with the Athlon64 in 2003, Xeons in 2004, and Core 2 in 2007. Mobile phones went to 64 bit starting with the iPhone 5s in mid 2013 and Galaxy S6 in early 2015.

I think 128 bit may start to be used in large datacentres by 2030, but not in homes or offices until maybe 2050. Some high-security users might want it sooner in order to use some address bits for tags, or to provide a sparse address space (a super-ASLR).

I expect someone will start making experimental 128 bit RISC-V chips within the next two years (it's pretty trivial to do .. just somewhat pointless at present), just to seed them to academia and certain TLAs to get experience.

brucehoult · « **Reply #252 on:** December 30, 2018, 03:31:53 pm »

Quote from: legacy on December 30, 2018, 03:12:28 pm

Quote from: brucehoult on December 30, 2018, 02:58:39 pm
The Linux kernel supports RISC-V. There are I suppose at least half a dozen Linux distributions that support RISC-V, the most heavily used probably being Debian, Fedora, and Buildroot.

The main requirement of a board maker is to implement a bootloader and the SBI (Supervidor Binary Interface). The most commonly used bootloader at the moment is BBL (Berkeley BootLoader, which also implements SBI), but it's pretty crude so there is a lot of work going into others such as Das U-Boot and coreboot.

Gentoo is not on the list, and I am a supporter: it means I am not interested in anything else except a config file for Catalyst, this plus a profile for building a reasonable stage3. This, unfortunately, is not yet existing, and it's not clear to me what I have to support. Specifically when I say "doc and spec", I mean MMU/TLC and ISA's extension, since as retirement I got an emulator which ... -1- it's OS-less (and can't see any MMU/TLC code/doc/whatever code), and -2- does not compile

So, what I have to support for the MMU, and "privileged" rings?

The MMU/TLB is explicitly *not* defined by the RISC-V specification. That's entirely up to the individual chip manufacturer.

The page table layout in memory is specified. How that makes its way into a TLB (software or hardware PT walker) or how the TLB works is not. There are SBI calls to do things such as update or flush the TLB. It's up to the board/chip vendor to write those. The OS simply uses them.

rhodges · « **Reply #253 on:** December 30, 2018, 04:41:51 pm »

Quote from: legacy on December 30, 2018, 03:15:56 pm

Quote from: rhodges on December 30, 2018, 03:04:20 pm
Maybe it would be useful for Single Instruction Multi Data, without having separate instructions?
never come into the need: examples from the real world?

Close. The VLIW PNX1302 had 128 32-bit registers. They could do ordinary arithmetic, and there were also instructions for doing 4 8-bit or 2 16-bit operations on them. Think of the carry bit suppressed on the 8-bit or 16-bit boundaries. There was also the option of "saturation", where overflow resulted in the maximum or minimum values.

This is similar to the SIMD extensions to x86 (and probably all SIMD), but with the main register set. I believe x86 SIMD uses separate (FP?) registers.

This CPU was intended for media processing, so maybe it would be good to think of it as a SIMD processor that also did ordinary CPU functions.

Nominal Animal · « **Reply #254 on:** December 31, 2018, 08:12:38 am »

Quote from: rhodges on December 30, 2018, 04:41:51 pm

I believe x86 SIMD uses separate registers.

Correct. SSE2 has 16 128-bit arithmetic registers XMM0 - XMM15, that can be treated as two double-precision floating-point numbers, four single-precision floating-point numbers, two 64-bit integers, four 32-bit integers, eight 16-bit integers, or sixteen 8-bit integers, signed or unsigned, depending on the instruction. These are distinct from 387 floating-point registers. AVX renames those to YMM0-YMM15, extending them to 256 bits, AVX2 adding 128-bit and 256-bit integer operations. AVX512 renames them to ZMM0-ZMM31, not just doubling their number but extending them to 512 bits.

These are completely separate from the normal AMD64 (x86-64) general-purpose registers (rax, rbx, rcx, rdx, rsi, rdi, rbp, r8, r9, r10, r11, r12, r13, r14, and r15), and use a completely different set of instructions.

Single-precision floating point vectors are widely used in image and geometry processing (including wavelet transforms and such, unless done using a GPU), and also heavy sound analysis (single-precision FFT/DFT/Hartley transforms and such). Double-precision floating point vectors are heavily used in computational physics -- basically both ab-initio (quantum mechanics; vasp et cetera) and classical (potentials; lammps, gromacs et cetera). Using the binary operations on the floating-point values is also surprisingly common (absolute values, min/max, masking/conditionals). The major use for the various integer operations is speeding up cryptographic operations, which nowadays are absolutely ubiquitous; not just in securing socket communications, but in internal kernel operations (like ensuring unpredictability of kernel random number sources).

As to the underlying microcode and hardware implementation, it looks like Intel and AMD implementations do differ quite a bit. Mathematically their results are identical, but how different operations pipeline, and how efficient vector-intensive operations are, varies a lot between processor families.

Accelerating cryptographic operations, double-width unsigned integer multiplication is an absolute must. (Meaning, you really need a multiplication operation C = B × A where C is a pair of registers, or double the width of A and B. Apologies for the poor terminology; me fail English today worse.) The size of the unsigned integers we deal with will only increase; right now ordinary workstations do a surprising amount of work on 2048-bit and higher unsigned integers. So, it is not just the size of the registers that matters, the basic operations (addition, subtraction, multiplication, and, or, xor, not) must also be fast/efficient enough to warrant their use.

(It turns out that at least some Intel implementations of AVX2 and AVX512 are not really worth the extra cost when mostly using double-precision floating-point vectors. Bummer. But, that is the reason CERN and others doing heavy physics computations, really do not want to be using the very newest hardware, but on hardware chosen based on amount of computation per euro achieved. Theoretical gains look nice on paper, but practice trumps theory. That said, a lot of the existing simulation software and surrounding services (the CERN data is structured, not just "flat files"), is horribly inefficient design, and a lot more could be done to fix that... but don't get me started on that. And yes, I have been an admin of a HPC cluster used to munch on terabytes of CERN data. Even built an auto-evaluation Linux USB stick with actual simulations for vendors to measure the performance of vendor offerings for a new cluster acquisition, once.)

JPortici · « **Reply #255 on:** December 31, 2018, 09:13:59 am »

Quote from: legacy on December 30, 2018, 03:15:56 pm

Quote from: rhodges on December 30, 2018, 03:04:20 pm
Maybe it would be useful for Single Instruction Multi Data, without having separate instructions?

never come into the need: examples from the real world?

Audio manipulation of any kind

legacy · « **Reply #256 on:** December 31, 2018, 12:32:18 pm »

Quote

Moreover, tinyemu assumes a little endian host so it has no chance to work on a PPC machine.

So tinyemu is LE only. It emulates x86 and RISCV (32, 64, 128bit),

LapTop006 · « **Reply #257 on:** December 31, 2018, 12:56:23 pm »

Quote from: brucehoult on December 30, 2018, 03:26:38 pm

... Alpha and Itanic on the other hand were also incompatible with their companies' previous 32 bit architectures and failed to gain traction.

I disagree on the Alpha side, it had great traction, however two big things killed it early.

1. Windows 2000 killed Alpha support (late in the betas), cutting off a significant market. Windows support was also a 32-bit mode only, with the benefits fading away as x86 machines were improving around that time.
2. When HP bought Compaq Alpha, like PA-RISC was put on minimal life support with hopes on Itanium.

legacy · « **Reply #258 on:** December 31, 2018, 01:15:52 pm »

Quote from: Nominal Animal on December 31, 2018, 08:12:38 am

I have been an admin of a HPC cluster used to munch on terabytes of CERN data

I see there was a plan for introducing IBM POWER9 HPC: maybe reconsidered for 2019? I hope

Nominal Animal · « **Reply #259 on:** December 31, 2018, 02:30:17 pm »

Quote from: legacy on December 31, 2018, 01:15:52 pm

I see there was a plan for introducing IBM POWER9 HPC: maybe reconsidered for 2019? I hope

I've been out of touch for a few years now (I'm no longer on the mailing lists etc.), so I don't know; but I definitely hope different systems are stll tested here and there, and considered for wider adoption.

I'm sure you're perfectly aware that it isn't just the hardware, the full software stack needs to be there too to take advantage of it, and having a few SW engineers ensure compilers have good hardware support, and researchers port simulators to different architectures, leads to better field of competition and more bang for buck for users. Unfortunately, that also tends to uncover software and simulation model bugs, and established professors don't like that, because it means they'd need to issue corrections to past publications with possibly lots of citations; bad for reputation. So that is only done when you have professors more interested in the research than possible small dings to their own reputation, but still crafty enough to talk money out of politicians. Rare.

The overall data processing model is tiered, and individual clusters especially at Tier-2 not tied to any particular hardware (although users do need a dev environment to compile their simulators to each HW architecture). Putting up a testbed POWER9 cluster at some university, with both local and CERN computation tasks, would be a perfect opportunity to fine-tune the Linux support, compiler options, and so on, giving practical real-world data as to the capabilities and efficiency of POWER9 in physics/chemistry HPC use.

brucehoult · « **Reply #260 on:** December 31, 2018, 11:50:49 pm »

Quote from: LapTop006 on December 31, 2018, 12:56:23 pm

Quote from: brucehoult on December 30, 2018, 03:26:38 pm
... Alpha and Itanic on the other hand were also incompatible with their companies' previous 32 bit architectures and failed to gain traction.

I disagree on the Alpha side, it had great traction, however two big things killed it early.

1. Windows 2000 killed Alpha support (late in the betas), cutting off a significant market. Windows support was also a 32-bit mode only, with the benefits fading away as x86 machines were improving around that time.
2. When HP bought Compaq Alpha, like PA-RISC was put on minimal life support with hopes on Itanium.

If it had great traction then neither of those things would have happened!

I'm sure there were a number of people who loved Alpha and invested heavily into systems and software using it, and that they were very upset when it was axed. I thought it was a wonderful design myself, with a future ahead of it much longer than the 25 years DEC said they designed it for.

If NT4 on Alpha was tearing up the market and people were throwing out their Pentium Pro/II/III all over the place then you can be sure Microsoft would have supported Alpha in Windows 2000.

The *huge* FUD campaign about Itanic resulting in the corporate deaths of the perfectly good Alpha, PA-RISC and others is the MAIN REASON I'm such a big fan of RISC-V. It's not that it's technically superior, it's that if I (or you) invest in it no one can take it away from us. One or many RISC-V companies may fail in the heat of competition, but others can continue.

By far the biggest thing wrong with Alpha was that is was owned by a company that failed.

Alpha did have technical problems. Mostly just that the program code was too big -- I think they simply didn't realise that once speeds went significantly over 1 GHz (which Alpha never did) it wasn't going to be possible to continue scaling up the L1 icache size while maintaining quick access. 21064 and 21164 had 8 KB icache and 21264 had 64 KB. Pretty much everyone these days has stopped at (or fallen back to) 32 KB for L1 icache and it's important to be able to fit as much program code as possible into that.

The minor thing wrong with Alpha was the lack of 8 and 16 bit loads and stores. That's -- again -- more of a code size problem than a speed problem, but anyway they fixed it in the 2nd (21164) generation.

legacy · « **Reply #261 on:** January 01, 2019, 04:32:32 pm »

why do we need 128bit for audio algorithms?
and specifically, why 128bit integer, rather than fixed-point or floating point?

NorthGuy · « **Reply #262 on:** January 01, 2019, 06:08:56 pm »

Quote from: legacy on January 01, 2019, 04:32:32 pm

why do we need 128bit for audio algorithms?
and specifically, why 128bit integer, rather than fixed-point or floating point?

You don't need big integers for audio/video, but big SIMD registers holding several smaller integers are very handy for audio/video because they let you do operations in parallel.

Big integers are good for public key cryptography (such as RSA or DH) where you currently deal with 2048-bit integers, which are likely to get even bigger as the algorithm strength frenzy moves forward. Having 128x128 multiplication would make these algorithms more efficient. 256x256 multiplication would be even better. I don't think this cause justifies 128-bit CPUs though.

cepwin · « **Reply #263 on:** January 20, 2019, 12:10:45 am »

After hearing platformio discussed on a podcast I decided to check it out this afternoon. I too am disappointed that it requires a subscription to simply use the debugger. To me that's a basic feature they're putting behind a paywall. I am going to check out the Freedom IDE.
Update: Well, since FreedomStudio is only for SiFive based processors it won't work with atmel boards it's not what I'm looking for now. Paying for the debugger in platformio bugs me but if one can use one IDE with the RISC-V and the Atmel based boards (as well as many others) it might be worth it. The alternative is to use a separate IDE for boards based on different companies chips.....

brucehoult · « **Reply #264 on:** January 20, 2019, 01:07:02 am »

The platformio guy has done a nice job but I think he's being just a little too greedy on the licensing. I've heard he's been approached by people willing to buy the rights and give the software away but he thinks he's going to get rich doing it how he is. If it's just one guy then at $10/month he only needs 100 customers to have a pretty nice income in Ukraine or 1000 customers for a pretty nice income anywhere in the world.

If it was $1 a month I wouldn't hesitate.

cepwin · « **Reply #265 on:** January 20, 2019, 01:32:38 am »

I have to agree with you Bruce.....given that most IDEs are free (your company's FreedomStudio, Atmel Studio, etc) charging $10/mo is a bit much. He can argue he has a community edition but community editions should have all the basic functions and that includes debugging. Of course Arduino has no debugging.

I decided to go on their forums and respectfully present my impressions including the debugger paywall issue.

brucehoult · « **Reply #266 on:** January 20, 2019, 01:53:01 am »

You might if you're lucky get $100k to $1m revenue a year selling dev tools, but if you give away the dev tools you can get $100m to $1b+ revenue a year selling chips or finished products.

That seems to me like an easy calculation for any Atmel, Microchip, SiFive, Xilinx, Apple, Microsoft...

NorthGuy · « **Reply #267 on:** January 20, 2019, 03:45:02 am »

Quote from: brucehoult on January 20, 2019, 01:53:01 am

You might if you're lucky get $100k to $1m revenue a year selling dev tools, but if you give away the dev tools you can get $100m to $1b+ revenue a year selling chips or finished products.

That seems to me like an easy calculation for any Atmel, Microchip, SiFive, Xilinx, Apple, Microsoft...

I didn't know SiFive was making $100m to $1b+ selling chips.

brucehoult · « **Reply #268 on:** January 20, 2019, 04:56:12 am »

Quote from: NorthGuy on January 20, 2019, 03:45:02 am

Quote from: brucehoult on January 20, 2019, 01:53:01 am
You might if you're lucky get $100k to $1m revenue a year selling dev tools, but if you give away the dev tools you can get $100m to $1b+ revenue a year selling chips or finished products.

That seems to me like an easy calculation for any Atmel, Microchip, SiFive, Xilinx, Apple, Microsoft...

I didn't know SiFive was making $100m to $1b+ selling chips.

The RISC-V business isn't yet, but the overall revenue including other chips and IP may be -- Glassdoor thinks it is, but Crunchbase doesn't.

legacy · « **Reply #269 on:** January 20, 2019, 12:56:49 pm »

Quote from: cepwin on January 20, 2019, 01:32:38 am

.....given that most IDEs are free (your company's FreedomStudio, Atmel Studio, etc) charging $10/mo is a bit much.

I do find it irritating, to be honest. Atmel is a large company that makes a lot of business by selling chips and services, while a freelancer is only able to be engaged for a paid job. So, supposing you are a developer and you have developed a high-quality softcore with a built-in debugger: are you really willing to put it on the internet for free, just for the glory?

There are now those who are happy to release their free tools and then asking guys to contribute on Patreon for crowdfunding. Someone is really smart and does cool stuff, someone is ... not, but this model doesn't work very good. It simply works sometimes, anyway guys on YouTube make the money by releasing videos with the precise attempt of getting visualization-counters increased so they can get the attention of a sponsor who will pay them for promoting the advertising of their products on videos.

This always works, and those who release products/service for 1USD/month are usually in this basket, and their quality is usually low, except if they are big companies, like Atmel, or great artists like Benjamin J. Heckendorn who promotes Element14 on YouTube.

I have a friend who is a professional filmmaker, she said that for making a decent video she needs 3 working days, full time, with skills on Premier, FinalCut, AutoDesk-Inferno, and manual painting on a graphics tablet. This is usually charged at 200 USD/day in a decent filmmaking studio, while she doesn't get a cent for her job when she releases something on YouTube for free, except ... she gets credits by followers, and sometimes gadgets (free comics, free t-shirts, free mug cups).

Now she has got her page-visualization-counter at a decent value so she is able to catch the attention of sponsors who are happy to pay her for adverting. This also applies to websites promoting "opensource" projects since you have to find a way to get the money for a living.

So what I find really irritating is the assumption that, because someone wants to release his/her projects for free then we should do the same, and, worse still, I do find even more irritating the assumption that, because Atmel can release something for free, then we should do the same.

FSK, really

legacy · « **Reply #270 on:** January 20, 2019, 01:03:13 pm »

Quote from: brucehoult on January 20, 2019, 01:53:01 am

You might if you're lucky get $100k to $1m revenue a year selling dev tools, but if you give away the dev tools you can get $100m to $1b+ revenue a year selling chips or finished products.

That seems to me like an easy calculation for any Atmel, Microchip, ~~SiFive~~, Xilinx, Apple, Microsoft...

Yup, precisely.
Not yet sure about SiFive, anyway.

NorthGuy · « **Reply #271 on:** January 20, 2019, 02:02:11 pm »

Quote from: legacy on January 20, 2019, 12:56:49 pm

So what I find really irritating is the assumption that, because someone wants to release his/her projects for free then we should do the same, and, worse still, I do find even more irritating the assumption that, because Atmel can release something for free, then we should do the same.

You're absolutely right. With lots of free crap around, it's incredibly hard to sell software nowadays. If this guy can do it and make profit, I admire him.

cepwin · « **Reply #272 on:** January 20, 2019, 02:52:11 pm »

I have to stand corrected to some extent. You can debug without the unified debugger...just a bit more work (according to the pio forum.) In this case the additional ease *is* a professional feature and quite frankly after wrestling with Eclipse last night (as an alternative) I am strongly considering paying for it. As I mentioned when I posted on their forum, it is an impressive product.

cepwin · « **Reply #273 on:** January 20, 2019, 03:38:33 pm »

Someone on an unrelated site I'm on talks a lot about the fact that the way to make money in a business is to solve peoples specific problems. Clearly he has solved to a large extent the problem of needing separate environments for different types of chips as well as the difficulty in getting up and running in platforms such as eclipse.

brucehoult · « **Reply #274 on:** January 21, 2019, 08:20:11 pm »

Quote from: legacy on January 20, 2019, 01:03:13 pm

Quote from: brucehoult on January 20, 2019, 01:53:01 am
You might if you're lucky get $100k to $1m revenue a year selling dev tools, but if you give away the dev tools you can get $100m to $1b+ revenue a year selling chips or finished products.

That seems to me like an easy calculation for any Atmel, Microchip, ~~SiFive~~, Xilinx, Apple, Microsoft...

Yup, precisely.
Not yet sure about SiFive, anyway.

I'm not 100% certain either, but it seemed worth taking a punt!


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: RISC-V assembly language programming tutorial on YouTube (Read 54719 times)

Share me