RISC-V assembly language programming tutorial on YouTube

#200 Reply
Posted by NorthGuy on 18 Dec, 2018 18:20
Quote from: brucehoult on 18 Dec, 2018 15:11
Not interested in winning some dick size competition. If RISC-V ends up in the middle of the pack and competitive on measures such as code size or number of instructions just by compiling straightforward C code in a wide variety of situations with no special effort then I'm perfectly content.

"interested", "content", "bad faith". I don't think in these categories. These are emotions. The reality exists independent of them, and independent from what you (or me) think. Similarly, the truth cannot be voted upon by customers (although, if anything, Intel has way more of them than SiFive).

All things being equal, CISC approach creates better code density than RISC. Because an ability to use more information allows for better compression. This is pure mathematics. If empirical tests show otherwise, the only explanation is faulty methodology.

#201 Reply
Posted by David Hess on 18 Dec, 2018 23:09
Quote from: brucehoult on 18 Dec, 2018 04:05
x86, 68k and VAX were all designed at a time when maximizing the productivity of the assembly language programmer was seen as one of the highest (if not actual highest) priorities. They'd gone past simply trying to make a computer that worked and even making the fastest computer and come to a point that computers were not only fast *enough* for many applications but had hit a speed plateau. (It's hard to believe now that Apple sold 1 MHz 6502 machines for over *seventeen* years, and the Apple //e alone for 11 years.)

They were also designed at a time when memory access time was still long and memory width was still small so tiny instruction lengths and complex instructions were advantageous. ARM was unusual in being designed to specifically take advantage of the fast page mode memory which had become available leading to instructions like load and store multiple. I would argue that not blindly adhering to RISC is what made ARM successful in the long run.

Quote
The x86, 68k and VAX were all vastly easier for the assembly language programmer than their predecessors the 8080, 6800, and PDP-11 (or PDP-10). They also were better for compilers, though people still didn't trust them.

I went up the 8080 and Z80 route and loved the 8086 but only dabbled in 6502 which seemed primitive compared to 8080. Later I become proficient in accumulator centric designs like 68HC11 and PIC and learned to love macro assemblers even more.

Quote
The RISC people came along and said "If you simplify the hardware in *this* way then you can build faster machines cheaper, compilers actually have an easier time making optimal code, and everyone will be using high level languages in future anyway".

The people actually producing commercial RISC designs had a conflict of interest. What made RISC popular so quickly is that a small design team could do it so suddenly everybody had a 32 bit RISC processor available and was happy to proclaim that RISC is the future. Where this fell apart is that Intel's development budget was already much greater than the sum of all of the RISC efforts combined. It did not matter that equivalent performance could be produced for a fraction of the development cost because Intel could afford any development effort.

Development of ARM is slowed by the same problem. All of the separate ARM development efforts do not join to become Voltron-ARM. Intel only has to beat the best of them but I expect at some point ARM will catch up if only because Intel has become so dysfunctional.

Quote
A lot of that was because you had to calculate instruction latencies yourself and put dependent instructions far enough away that the result of the previous instruction was already available -- and not doing it meant not just that your program was not as efficient as possible but that it didn't work at all! Fortunately, that stage didn't last long, for two reasons: 1) your next generation CPU would have different latencies (sometimes longer as pipeline lengths increased), meaning old binaries would not work, and 2) as CPUs increased in MHz faster than memory did caches were introduced and then you couldn't predict whether a load would take 2 cycles or 10 and the same code had to be able to cope with 10 but run faster when you got a cache hit.

The lack of interlocking like branch delay slots just ended up being a millstone around the neck of performance. "Just recompile your software" should be included with "the policeman is your friend" and "the check is in the mail". Maybe this will change with ubiquitous just-in-time compiling.

#202 Reply
Posted by hamster_nz on 19 Dec, 2018 00:43
Hey... anybody know if 'The Mill' still grinding away?

https://millcomputing.com/

#203 Reply
Posted by ataradov on 19 Dec, 2018 00:48
They probably are, but I imagine at this point it is mostly a VC money sucking enterprise.

#204 Reply
Posted by brucehoult on 19 Dec, 2018 10:09
Quote from: hamster_nz on 19 Dec, 2018 00:43
Hey... anybody know if 'The Mill' still grinding away?

https://millcomputing.com/

They are.

Ivan gave some hints in a recent comp.arch posting.

https://groups.google.com/forum/#!original/comp.arch/bGBeaNjAKvc/zQcA-R6FAgAJ

#205 Reply
Posted by legacy on 19 Dec, 2018 12:46
Quote from: David Hess on 18 Dec, 2018 23:09
ARM was unusual in being designed to specifically take advantage of the fast page mode memory which had become available leading to instructions like load and store multiple-time compiling.

ARM has grown from a small company called Acorn - maker of some of the earliest home computers, initially used by BBC as kits for kids and students - into one of the world's most important designers of semiconductors, providing the brains for Apple's must-have iPhones and iPads.

Back to the beginning, in 1978 Acorn Computers is established in Cambridge, and produces computers which are particularly successful in the UK. Acorn's BBC Micro computer was the most widely-used computer in school in the 1980s.

In the same year, Motorola was going to release the 68000, from their MASS program, which engineers in Acorn later (1981-82?) took into consideration for the next generation of their computes.

Sophie Wilson, a British computer scientist and software engineer.

This woman is definitively a superheroine, and like if it was a weird coincidence (a lot of computer science events happened in 1978?!? there should be a scientific reason for this), exactly in 1978, Sophie Wilson joined Acorn Computers Ltd. She designed the Acorn Micro-Computer watching the wedding of Charles, Prince of Wales, and Lady Diana Spencer on a small portable television (made by Mr. Clive Sinclair, a rival of Acorn) while attempting to debug and re-solder the prototype. And it worked!

OMG !!! WOW !!!

The prototype was then released as "The Proton", a mini computer that became the BBC Micro and its BASIC evolved into BBC BASIC, which was then used to develop the CPU simulator for the next generation, and, in October 1983, Wilson began designing the instruction set for one of the first RISC processors, the Acorn RISC Machine, so the ARM v1 was delivered on 26 April 1985 and it was a worldwide success!

She said the 68000 had been taken into consideration but then rejected due to the long latency it has, especially at reacting to interrupts, which was a must-have feature for a new computer where everything is done in software. She also said new DRAM integrated circuits needed to be sourced directly from Hitachi because the project needed something really really fast for the RAM.

Computers like Amiga used the 68000 with the help of specialized chip for the graphics and sound, while Acorn ARM computers did everything in software, thus the CPU must be super fast for the I/O, and super fast at reacting at interrupts.

The latest machine developed by Acord was the RISC-PC, with a StrongArm CPU @ 200Mhz.

#206 Reply
Posted by brucehoult on 19 Dec, 2018 13:01
Quote from: legacy on 19 Dec, 2018 12:46
This woman is definitively a superheroine

Roger that, job very well done. It's stood up well for nearly 35 years. I remember being quite jealous of a friend with an Archimedes. An 8 MHz ARM2 was pretty good in 1987, standing up very well against a much more expensive 16 MHz 68020 or 80386.

#207 Reply
Posted by langwadt on 19 Dec, 2018 13:11
Quote from: brucehoult on 19 Dec, 2018 13:01
Quote from: legacy on 19 Dec, 2018 12:46
This woman is definitively a superheroine

Roger that, job very well done.

http://www.computinghistory.org.uk/det/6615/Sophie-Wilson/

#208 Reply
Posted by David Hess on 19 Dec, 2018 13:18
The Computer History Museum has a great transcript of an interview with Sophie Wilson about the development of the ARM processor here:

https://www.computerhistory.org/collections/catalog/102746190

#209 Reply
Posted by legacy on 22 Dec, 2018 11:40
Latest purchase. This book tells about the ARM architecture before Cortex. Excellent book!
Now I need to buy something similar for the RISC-V

#210 Reply
Posted by legacy on 23 Dec, 2018 22:02
This is a free movie about the origin of Acorn: wow, there is also mr.Sinclair

#211 Reply
Posted by legacy on 26 Dec, 2018 18:58
in the meanwhile (today, 2 hours), I added a couple of features to the simulator, but ... the endianess is really irritating

Code: [Select]
# regs reg00: 0x00000000 reg01: 0xdeadbeaf reg02: 0x00000000 reg03: 0x00000000 reg04: 0x00000000 reg05: 0x00000000 reg06: 0x00000000 reg07: 0x00000000 reg08: 0x00000000 reg09: 0x00000000 reg10: 0x00000000 reg11: 0x00000000 reg12: 0x00000000 reg13: 0x00000000 reg14: 0x00000000 reg15: 0x00000000 reg16: 0x00000000 reg17: 0x00000000 reg18: 0x00000000 reg19: 0x00000000 reg20: 0x00000000 reg21: 0x00000000 reg22: 0x00000000 reg23: 0x00000000 reg24: 0x00000000 reg25: 0x00000000 reg26: 0x00000000 reg27: 0x00000000 reg28: 0x00000000 reg29: 0x00000000 reg30: 0x00000000 reg31: 0x00000000 # md 0xf1000000 f1000000..f10007ff 2048 byte I00:0 mem:1 hd:1 magic1 bin/data_cpu1reg.bin showing memory @ 0xf1000000 0xf1000000 .. 0xf10007ff f1000000: 00000000 afbeadde 00000000 00000000 [................] f1000010: 00000000 00000000 00000000 00000000 [................] f1000020: 00000000 00000000 00000000 00000000 [................] f1000030: 00000000 00000000 00000000 00000000 [................] f1000040: 00000000 00000000 00000000 00000000 [................] f1000050: 00000000 00000000 00000000 00000000 [................] f1000060: 00000000 00000000 00000000 00000000 [................] f1000070: 00000000 00000000 00000000 00000000 [................] f1000080: 00000000 00000000 00000000 00000000 [................] f1000090: 00000000 00000000 00000000 00000000 [................] f10000a0: 00000000 00000000 00000000 00000000 [................] f10000b0: 00000000 00000000 00000000 00000000 [................] f10000c0: 00000000 00000000 00000000 00000000 [................] f10000d0: 00000000 00000000 00000000 00000000 [................] f10000e0: 00000000 00000000 00000000 00000000 [................] f10000f0: 00000000 00000000 00000000 00000000 [................] #

inside the simulator, registers are also mapped to a chunk of ram, thus they can be accessed but ... the target is BigEndian, the host is LittleEndian, and ... things need more features to be properly managed.

I wonder WTF was in the head of Intel when they wanted to use LittleEndian ... it's unnatural for humans

see, you have a 32bit number 0x12345678, four bytes, in bigEndian it's 0x12, 0x34, 0x56, 0x78, thus in a 8bit memory, you find

0x12
0x34
0x56
0x78

perfect!!!

Whereas on a damn LittleEndian Machine you see

0x78
0x56
0x34
0x12

so 0xdeadbeaf becomes 0xafbeadde

more to come ...

#212 Reply
Posted by ataradov on 26 Dec, 2018 19:20
Well, it is a matter of opinion, is not it? I would never use anything that is big-endian. Little-endian naturally cast between bytes, halfs and words without the need to move things around. And how things are physically located in the memory is mostly irrelevant.

#213 Reply
Posted by legacy on 26 Dec, 2018 19:50
strings (array of 8bit char) are naturally managed in the "be"-way: the first byte is the first char

which makes things more irritating now because I have to "patch" a couple of points in the simulator

#214 Reply
Posted by Nominal Animal on 27 Dec, 2018 01:08
I don't mind either byte order.

What burns my goat is the way some documentation insists on labeling bits in decreasing order of imp8ortance: most significant bit 0. The only bit labeling that makes any sense to me is the mathematical one; for unsigned integers, bit i corresponding to value 2ⁱ.

#215 Reply
Posted by rstofer on 27 Dec, 2018 01:37
Quote from: Nominal Animal on 27 Dec, 2018 01:08
I don't mind either byte order.

What burns my goat is the way some documentation insists on labeling bits in decreasing order of imp8ortance: most significant bit 0. The only bit labeling that makes any sense to me is the mathematical one; for unsigned integers, bit i corresponding to value 2ⁱ.

That numbering scheme was pretty common with IBM and is clearly the case for the IBM 1130. As a Fortran programmer, it made no difference unless the program was reading the Console Entry Switches where the odd numbering mattered.

I definitely prefer the power of two numbering from right to left.

#216 Reply
Posted by westfw on 27 Dec, 2018 01:44
Quote
I wonder WTF was in the head of Intel when they wanted to use LittleEndian ... it's unnatural for humans
Copying the DEC PDP11, as were pretty much all the microcontroller manufacturers at the time.(although the 68000, with an arguably much-more-PDP11-like instruction set, is big endian.)

Internet protocols are largely big-endian. https://www.ietf.org/rfc/ien/ien137.txt

#217 Reply
Posted by chickenHeadKnob on 27 Dec, 2018 01:46
Quote from: legacy on 26 Dec, 2018 18:58
I wonder WTF was in the head of Intel when they wanted to use LittleEndian ... it's unnatural for humans

In my first job after uni I worked with some Israeli born engineers, who would argue about anything. With one in particular I would often have knock down drag-em out fights over the stupidest stuff . He would never admit he was wrong. He was the old school type of engineer who mostly learned about computers in a self taught way, me a recent CompSci grad who learned electronics in a self taught way.

One day I was mentioning endianess he asked whats that. So I told him. He only knew Intel processors, 8080/8086 and thought I was shitting him when I said Motorola is big endian. Nobody would design a big endian machine he claimed. His argument was that with an 8 bit ALU and 16 bit addresses you want to get the low order byte first to start the addition right away on a computed effective address. I told him Motorola 6800 didn't and it didn't matter as it takes multiple clocks to get the bytes in anyway. He looked at me like I was a retard and not to be trusted with anything sharp.

#218 Reply
Posted by David Hess on 27 Dec, 2018 01:53
Quote from: ataradov on 26 Dec, 2018 19:20
Little-endian naturally cast between bytes, halfs and words without the need to move things around. And how things are physically located in the memory is mostly irrelevant.

That is how I see it. The address points to the low order byte and you can start doing ALU + ALU and Carry/Borrow + ... operations immediately without indexing backwards from the low order byte.

#219 Reply
Posted by brucehoult on 27 Dec, 2018 03:42
Quote from: westfw on 27 Dec, 2018 01:44
Quote
I wonder WTF was in the head of Intel when they wanted to use LittleEndian ... it's unnatural for humans
Copying the DEC PDP11, as were pretty much all the microcontroller manufacturers at the time.(although the 68000, with an arguably much-more-PDP11-like instruction set, is big endian.)

I prefer big-endian, but not enough to fight about it.

It's a minor convenience to have character strings and numbers stored in the same order.

It's a minor convenience to have numbers easily readable in hex dumps -- VAX/VMS printed hex dumps from right to left for that reason.

I can't see any reason to care that storing a 32 bit integer at an address and then reading an 8 bit integer from that address will give the same value, if the 32 bit integer was small. Why on earth would you want to *do* that? Manipulating values in registers is easier and faster anyway.

The designers of RISC-V chose little-endian not because they think it is better (I happen to know they don't) but because x86 dominates servers and ARM dominates mobile and both are little endian, so why make problems for people porting badly-written software to RISC-V?

MIPS, SPARC, Power all started as the more sensible big-endian, but switched (or became bi-endian) to give fewer problems porting x86 and/or ARM software.

#220 Reply
Posted by westfw on 27 Dec, 2018 05:32
Compilers should be able to support endianness-tagged data pretty effectively. Especially on RISC cpus with a byteswap instruction. The "load, byteswap" sequence is only a tiny fraction slower than a mere load.Intel's C compiler supports this. I don't know if anything else does :-(
(I think the standard "high-level" implementations of byte swapping end up being pretty difficult for a compiler to recognize, though. :-( )

#221 Reply
Posted by ataradov on 27 Dec, 2018 05:38
How exactly do you tag data?

There are a lot of things that compilers can't do. There are some design decisions that affect how you actually want to store data. When working with networking stuff on the LE system, most of the time you want to convert the data when it enters/leaves the MCU. I don't want it to do the conversion before each operation.

Also, more recent networking standards, like IEEE 802.15.4 and ZigBee are all LE.

#222 Reply
Posted by Nominal Animal on 27 Dec, 2018 07:39
If you have specific-endian data in a buffer, current C compilers know how to optimize e.g.
Code: [Select]
static inline uint32_t unpack_u32le(const unsigned char *const data) { return ((uint32_t)data[0]) | ((uint32_t)data[1] << 8) | ((uint32_t)data[2] << 16) | ((uint32_t)data[2] << 24); } static inline uint32_t unpack_32be(const unsigned char *const data) { return ((uint32_t)data[0] << 24) | ((uint32_t)data[1] << 16) | ((uint32_t)data[2] << 8) | ((uint32_t)data[3]); }depending on the surrounding code. That's why I don't mind.

It is details like bit order on the wire, or bit labels using 0 for most significant bit in documentation where you need to know the width of the register involved to calculate the corresponding numeric value, that trip me.

Quote from: ataradov on 27 Dec, 2018 05:38
How exactly do you tag data?
GCC uses __attribute__((scalar_storage_order (byte-order))) as a structure type attribute to define the byte order of the scalar members, but I don't like it; I like to have the byte order conversions explicitly visible.

#223 Reply
Posted by westfw on 27 Dec, 2018 09:34
Quote
I like to have the byte order conversions explicitly visible.
"We" should have learned by now to put the appropriate n2hl() calls in all the appropriate places.Except it's a dangerously ambiguous "standard." (I guess ideally you need ip_n2hl(), zb_n2hl(), etc.)And some code uses it to mean "byteswap" (assuming the endianness of the host.)

Quote
most of the time you want to convert the data when it enters/leaves the MCU.
That might be nice, but you seldom know which data needs byteswapped (and at which length) until after you've inspected the packet pretty deeply. Maybe controllers are smarter now, but I never found the network controllers that offered to do byteswapping during DMA to be useful.

Quote
I don't want it to do the conversion before each operation.
Sure; but it turns out that having the compiler do it for each load from memory is not very painful.Intel implemented an attribute, I think, and we had them add capabilities to say "everything defined in .h files with a particular prefix is big-endian." (we had an huge amount of big-endian code that we wanted to see if could run on Intel CPUs, without excessive "pain.") And pragmas. And defaults.
https://software.intel.com/en-us/node/628923 (it's been a long time, actually. Stuff might have changed.)

Quote
current C compilers know how to optimize e.g. static inline uint32_t unpack_32be ...
Which compilers? gcc-arm didn't optimize it "at all" (for CM4), nor does XCode LLVM :-(

#224 Reply
Posted by brucehoult on 27 Dec, 2018 10:43
Quote from: westfw on 27 Dec, 2018 09:34
Quote
current C compilers know how to optimize e.g. static inline uint32_t unpack_32be ...
Which compilers? gcc-arm didn't optimize it "at all" (for CM4), nor does XCode LLVM :-(

Using || instead of | isn't going to help.

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

There was an error while thanking

Thanking...

Go to page:

« 1 2 3 4 5 6 7 8 9 10 11 12 13 » All

Full site Menu

Navigation

Powered by SMFPacks Advanced Attachments Uploader Mod