Author Topic: RISC-V assembly language programming tutorial on YouTube  (Read 15024 times)

0 Members and 2 Guests are viewing this topic.

Offline westfw

  • Super Contributor
  • ***
  • Posts: 3027
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #150 on: December 16, 2018, 10:45:02 am »
Quote
Quote
I wish the ISRs in C code [on ARM] that the HW interrupt entry was quicker...
Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?
Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.


Quote
Pretty much everyone using ARC or Xtensa is likely to switch to RISC-V
Espressif too?  Is there any indication that the "mostly China" manufacturers would switch?


Quote
[complaints about CM0 code]I guess there are two options: 1) let the C compiler figure
That's where I got the 4-register version.  Offsets larger than 32 get converted into a MOV of an offset into the 4th register, and "LDR r1,[r2,r3]" addressing mode.  In assembly language, I could presumably add/sub manually from the base register or something, at the expense of ... unpleasantness and cryptic code.


Computations with values that are already in registers are where 16 bit opcodes shine.
I think the big thing I was missing is that in simple assembly programs, arrays might be addressed as "[Rindex, #constantSymbolAddress]", while in an only slightly more complex program, they'll be passed around as pointers, and the double-index-register addressing modes will work just fine.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4224
  • Country: ch
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #151 on: December 16, 2018, 12:40:34 pm »
You could try the LoFive: https://store.groupgets.com/products/lofive-risc-v

Yup, of this size  :D

A little MPU can handle the keyboard (the key-matrix is 9x10), interfacing serially to the CPU, and a small LCD is usually SPI. It sounds something that can be done.

 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1099
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #152 on: December 16, 2018, 01:15:15 pm »
My surprises show up when initializing periperals.  I expected code like:
Code: [Select]
       PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
       PORT->Group[0].DIRSET.reg |= 1<<12;

Just for fun, I made a couple of definitions so your code would be compilable and tried it on a few things.

Code: [Select]
#include <stdint.h>

 #define PORT_PINCFG_DRVSTR (1<<7)

struct {
    struct {
        struct {
            uint32_t foo;
            uint32_t reg;
            uint32_t bar;
        } PINCFG[16];
        struct {
            uint64_t baz;
            uint32_t reg;
        } DIRSET;
    } Group[10];
} *PORT = (void*)0xdecaf000;

void main(){
    PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
    PORT->Group[0].DIRSET.reg |= 1<<12;
}

And I checked it with for example:

Code: [Select]
arm-linux-gnueabihf-gcc -O initPorts.c -o initPorts -nostartfiles && \
arm-linux-gnueabihf-objdump -D initPorts | expand | less -p'<main>'

So ... ARMv7 (Thumb2):

Code: [Select]
000001c0 <main>:
 1c0:   4b07            ldr     r3, [pc, #28]   ; (1e0 <main+0x20>)
 1c2:   447b            add     r3, pc
 1c4:   681b            ldr     r3, [r3, #0]
 1c6:   f8d3 2094       ldr.w   r2, [r3, #148]  ; 0x94
 1ca:   f042 0280       orr.w   r2, r2, #128    ; 0x80
 1ce:   f8c3 2094       str.w   r2, [r3, #148]  ; 0x94
 1d2:   f8d3 20c8       ldr.w   r2, [r3, #200]  ; 0xc8
 1d6:   f442 5280       orr.w   r2, r2, #4096   ; 0x1000
 1da:   f8c3 20c8       str.w   r2, [r3, #200]  ; 0xc8
 1de:   4770            bx      lr
 1e0:   00010e3a        andeq   r0, r1, sl, lsr lr

00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}

Arm32:

Code: [Select]
000001c0 <main>:
 1c0:   e59f3020        ldr     r3, [pc, #32]   ; 1e8 <main+0x28>
 1c4:   e08f3003        add     r3, pc, r3
 1c8:   e5933000        ldr     r3, [r3]
 1cc:   e5932094        ldr     r2, [r3, #148]  ; 0x94
 1d0:   e3822080        orr     r2, r2, #128    ; 0x80
 1d4:   e5832094        str     r2, [r3, #148]  ; 0x94
 1d8:   e59320c8        ldr     r2, [r3, #200]  ; 0xc8
 1dc:   e3822a01        orr     r2, r2, #4096   ; 0x1000
 1e0:   e58320c8        str     r2, [r3, #200]  ; 0xc8
 1e4:   e12fff1e        bx      lr
 1e8:   00010e34        andeq   r0, r1, r4, lsr lr

00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}
/code]

Thumb1:

[code]
000001c0 <main>:
 1c0:   4b07            ldr     r3, [pc, #28]   ; (1e0 <main+0x20>)
 1c2:   447b            add     r3, pc
 1c4:   681b            ldr     r3, [r3, #0]
 1c6:   2194            movs    r1, #148        ; 0x94
 1c8:   2280            movs    r2, #128        ; 0x80
 1ca:   5858            ldr     r0, [r3, r1]
 1cc:   4302            orrs    r2, r0
 1ce:   505a            str     r2, [r3, r1]
 1d0:   3134            adds    r1, #52 ; 0x34
 1d2:   2280            movs    r2, #128        ; 0x80
 1d4:   0152            lsls    r2, r2, #5
 1d6:   5858            ldr     r0, [r3, r1]
 1d8:   4302            orrs    r2, r0
 1da:   505a            str     r2, [r3, r1]
 1dc:   4770            bx      lr
 1de:   46c0            nop                     ; (mov r8, r8)
 1e0:   00010e3a        andeq   r0, r1, sl, lsr lr

 00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}

Arm64

Code: [Select]
00000000000002ac <main>:
 2ac:   b0000080        adrp    x0, 11000 <PORT>
 2b0:   f9400000        ldr     x0, [x0]
 2b4:   b9409401        ldr     w1, [x0, #148]
 2b8:   32190021        orr     w1, w1, #0x80
 2bc:   b9009401        str     w1, [x0, #148]
 2c0:   b940c801        ldr     w1, [x0, #200]
 2c4:   32140021        orr     w1, w1, #0x1000
 2c8:   b900c801        str     w1, [x0, #200]
 2cc:   d65f03c0        ret

0000000000011000 <PORT>:
   11000:       decaf000        .word   0xdecaf000
   11004:       00000000        .word   0x00000000

Thumb1:

Code: [Select]
000001c0 <main>:
 1c0:   4b07            ldr     r3, [pc, #28]   ; (1e0 <main+0x20>)
 1c2:   447b            add     r3, pc
 1c4:   681b            ldr     r3, [r3, #0]
 1c6:   2194            movs    r1, #148        ; 0x94
 1c8:   2280            movs    r2, #128        ; 0x80
 1ca:   5858            ldr     r0, [r3, r1]
 1cc:   4302            orrs    r2, r0
 1ce:   505a            str     r2, [r3, r1]
 1d0:   3134            adds    r1, #52 ; 0x34
 1d2:   2280            movs    r2, #128        ; 0x80
 1d4:   0152            lsls    r2, r2, #5
 1d6:   5858            ldr     r0, [r3, r1]
 1d8:   4302            orrs    r2, r0
 1da:   505a            str     r2, [r3, r1]
 1dc:   4770            bx      lr
 1de:   46c0            nop                     ; (mov r8, r8)
 1e0:   00010e3a        andeq   r0, r1, sl, lsr lr

00011000 <PORT>:
   11000:       decaf000        cdple   0, 12, cr15, cr10, cr0, {0}

RISC-V rv32ic: (without c is identical except all instructions take 4 bytes. 64 bit is identical except for a "ld" to get <PORT> and the pointer is 8 bytes instead of 4)

Code: [Select]
00010074 <main>:
   10074:       67c5                    lui     a5,0x11
   10076:       0947a783                lw      a5,148(a5) # 11094 <PORT>
   1007a:       6685                    lui     a3,0x1
   1007c:       0947a703                lw      a4,148(a5)
   10080:       08076713                ori     a4,a4,128
   10084:       08e7aa23                sw      a4,148(a5)
   10088:       0c87a703                lw      a4,200(a5)
   1008c:       8f55                    or      a4,a4,a3
   1008e:       0ce7a423                sw      a4,200(a5)
   10092:       8082                    ret

00011094 <PORT>:
   11094:       f000                    fsw     fs0,32(s0)
   11096:       deca                    sw      s2,124(sp)

M68k:

Code: [Select]
800001ac <main>:
800001ac:       2079 8000 400c  moveal 8000400c <PORT>,%a0
800001b2:       0068 0080 0096  oriw #128,%a0@(150)
800001b8:       0068 1000 00ca  oriw #4096,%a0@(202)
800001be:       4e75            rts

8000400c <PORT>:
8000400c:       deca            addaw %a2,%sp
8000400e:       f000

i686:

Code: [Select]
000001b5 <main>:
 1b5:   e8 20 00 00 00          call   1da <__x86.get_pc_thunk.ax>
 1ba:   05 3a 1e 00 00          add    $0x1e3a,%eax
 1bf:   8b 80 0c 00 00 00       mov    0xc(%eax),%eax
 1c5:   81 88 94 00 00 00 80    orl    $0x80,0x94(%eax)
 1cc:   00 00 00
 1cf:   81 88 c8 00 00 00 00    orl    $0x1000,0xc8(%eax)
 1d6:   10 00 00
 1d9:   c3                      ret   

000001da <__x86.get_pc_thunk.ax>:
 1da:   8b 04 24                mov    (%esp),%eax
 1dd:   c3                      ret   

00002000 <PORT>:
    2000:       00 f0                   add    %dh,%al
    2002:       ca                      .byte 0xca
    2003:       de                      .byte 0xde

SH4:

Code: [Select]
004001b0 <main>:
  4001b0:       07 d1           mov.l   4001d0 <main+0x20>,r1   ! 411000 <PORT>
  4001b2:       12 61           mov.l   @r1,r1
  4001b4:       13 62           mov     r1,r2
  4001b6:       7c 72           add     #124,r2
  4001b8:       26 50           mov.l   @(24,r2),r0
  4001ba:       80 cb           or      #-128,r0
  4001bc:       06 12           mov.l   r0,@(24,r2)
  4001be:       05 92           mov.w   4001cc <main+0x1c>,r2   ! bc
  4001c0:       2c 31           add     r2,r1
  4001c2:       13 52           mov.l   @(12,r1),r2
  4001c4:       03 93           mov.w   4001ce <main+0x1e>,r3   ! 1000
  4001c6:       3b 22           or      r3,r2
  4001c8:       0b 00           rts     
  4001ca:       23 11           mov.l   r2,@(12,r1)
  4001cc:       bc 00           mov.b   @(r0,r11),r0
  4001ce:       00 10           mov.l   r0,@(0,r0)
  4001d0:       00 10           mov.l   r0,@(0,r0)
  4001d2:       41 00           .word 0x0041

00411000 <PORT>:
  411000:       00 f0           .word 0xf000
  411002:       ca de           mov.l   41132c <__bss_start+0x31c>,r14

#InstrCodeDataTotalISA
1032840Thumb2
1040848Arm32
15301040Thumb1
936844Arm64
1032840RISC-V rv64ic
1032436RISC-V rv32ic
1040444RISC-V rv32i
420424M68k
841445i686
13261440SH4

Good old Motorola 68000 wins by miles on both number of instructions and total number of bytes!

Thumb1 and SH4 use a lot of instructions, but are the next smallest in code size after m68k. They're just middle of the pack once you include .data

rv31i is slightly smaller than Arm32 and rv32ic is slightly smaller than Thumb2 in total size. The number of instructions is identical for all of them and rv32i/Arm32 and rv32ic/Thumb2 have the same code size as each other.

rv64ic has one instruction more than Arm64, but the code is 4 bytes smaller. Both have to load a 64 bit pointer from the .data section, costing 4 bytes, but they don't need an intermediate pointer at the end of the function code, saving 4 bytes.
« Last Edit: December 16, 2018, 01:33:52 pm by brucehoult »
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 1775
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #153 on: December 16, 2018, 04:06:14 pm »
My surprises show up when initializing periperals.  I expected code like:
Code: [Select]
       PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
       PORT->Group[0].DIRSET.reg |= 1<<12;

Just for fun, I made a couple of definitions so your code would be compilable and tried it on a few things.

Code: [Select]
#include <stdint.h>

 #define PORT_PINCFG_DRVSTR (1<<7)

struct {
    struct {
        struct {
            uint32_t foo;
            uint32_t reg;
            uint32_t bar;
        } PINCFG[16];
        struct {
            uint64_t baz;
            uint32_t reg;
        } DIRSET;
    } Group[10];
} *PORT = (void*)0xdecaf000;

void main(){
    PORT->Group[0].PINCFG[12].reg |= PORT_PINCFG_DRVSTR;
    PORT->Group[0].DIRSET.reg |= 1<<12;
}


In SAM, "Group" represents a group of registers 128 bytes long and everything below is just unions. "PORT" would be a fixed location in memory space. So, what the code actually does is setting 2 bits at the fixed memory location.

There's no pointer loading (which takes whopping 50% in Motorola, and 49% in Intel which you decided to compile as position-independent code). Moreover, when someone builds an MCU with RISC-V, they will probably provide some way of setting bits without reading registers, as Atmel did here:

Code: [Select]
PORT->Group[0].DIRSET.reg = 1<<12; // no need for "|="
The register is called DIRSET because writing to it only sets the bits (and the bits which are written "0" remain unchanged), and there's an opposite register called DIRCLR which clears the bits, and also DIRTGL which xors.

The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area, so the whole thing boils down to this:

Code: [Select]
6685                    lui     a3,0x1
0ce7a423                sw      a3,200(a5) ; replace "200" with correct offset from a5

<edit>Can't help it. In dsPIC33 you get:

Code: [Select]
bset LATA,#12
one instruction and 3 bytes (50% compared to RISC-V).

« Last Edit: December 16, 2018, 04:17:30 pm by NorthGuy »
 

Offline lucazader

  • Regular Contributor
  • *
  • Posts: 119
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #154 on: December 16, 2018, 06:25:52 pm »
Quote
Pretty much everyone using ARC or Xtensa is likely to switch to RISC-V
Espressif too?  Is there any indication that the "mostly China" manufacturers would switch?

Yea they are a member of the RISC-V foundation, a "Founding Gold" member, whatever that means.
https://riscv.org/members-at-a-glance/

Judging from timing on when they would have started development on an ESP32 successor, I'd put it at about 50% chance of the switching over to risc-v in the next chip, but a lot higher in the chip after that.
 

Online rhodges

  • Regular Contributor
  • *
  • Posts: 135
  • Country: us
  • Available for embedded projects.
    • My STM8 libraries
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #155 on: December 16, 2018, 06:45:45 pm »

Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.
I have really been enjoying this discussion  :-+

A decade and a half ago, I had the pleasure of working with  a VLIW processor, the Trimedia/Philips PNX1302. It dispatched up to 5 operations per instruction word at 200mhz. It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required. Further, an interrupt only happens when the user code makes a jump. So user code could (with care) use the top 64 between jumps. An interesting and useful side-effect is that user code could assume no interrupts while doing code that needs to be atomic.

I just thought some might find this interesting.
Currently developing STM8. Past includes 6809, Z80, 8086, PIC, MIPS, PNX1302, and some 8748 and 6805.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 1775
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #156 on: December 16, 2018, 08:09:43 pm »
It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required.

Some modern MCUs have multiple register sets. When an interrupt happens, the new set gets loaded. When it quits, the old one gets restored. It doesn't take any additional time and thus decreases the interrupt latency by a lot. If you have a separate register set for every interrupt level, you never need to save anything.

However, I think in the future, as everything moves to multi-cores, things may get even better. If you assign a designated core to an interrupt, then the core can simply sit there waiting for the interrupt to happen. Then there's no latency except for the short period necessary to synchronize the interrupt signal to the CPU clock.

 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 1448
  • Country: dk
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #157 on: December 16, 2018, 08:40:28 pm »
Quote
Quote
I wish the ISRs in C code [on ARM] that the HW interrupt entry was quicker...
Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?
Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.

slower in the rare case you need to do something in a few cycles with no registers, likely faster in the majority of cases
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 5853
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #158 on: December 16, 2018, 08:51:26 pm »
However, I think in the future, as everything moves to multi-cores, things may get even better. If you assign a designated core to an interrupt, then the core can simply sit there waiting for the interrupt to happen. Then there's no latency except for the short period necessary to synchronize the interrupt signal to the CPU clock.
The limiting factor here will be memory. You either need to have a dedicated memory per core, which will make the maximum size of the handler inflexible, or deal with concurrent access by multiple cores, which will slow down everything.
Alex
 

Online andersm

  • Super Contributor
  • ***
  • Posts: 1050
  • Country: fi
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #159 on: December 16, 2018, 09:00:04 pm »
Some modern MCUs have multiple register sets. When an interrupt happens, the new set gets loaded. When it quits, the old one gets restored. It doesn't take any additional time and thus decreases the interrupt latency by a lot. If you have a separate register set for every interrupt level, you never need to save anything.
Register banks do make code that need to access registers across priority levels a whole lot messier (eg. task switching using a low-priority interrupt, like is usually done on Cortex-M MCUs, or exception handlers). I guess with modern manufacturing processes the extra state required by the additional register banks isn't a big deal anymore (eg. 31 32-bit registers by 8 banks is a bit less than 1000 bytes).

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 9902
  • Country: us
  • DavidH
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #160 on: December 16, 2018, 09:20:08 pm »
Register banks do make code that need to access registers across priority levels a whole lot messier (eg. task switching using a low-priority interrupt, like is usually done on Cortex-M MCUs, or exception handlers). I guess with modern manufacturing processes the extra state required by the additional register banks isn't a big deal anymore (eg. 31 32-bit registers by 8 banks is a bit less than 1000 bytes).

It does not cost as much due to area now but the register bank is within the critical timing path for the pipeline so it limits performance in an aggressive design.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 1775
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #161 on: December 16, 2018, 09:32:49 pm »
However, I think in the future, as everything moves to multi-cores, things may get even better. If you assign a designated core to an interrupt, then the core can simply sit there waiting for the interrupt to happen. Then there's no latency except for the short period necessary to synchronize the interrupt signal to the CPU clock.
The limiting factor here will be memory. You either need to have a dedicated memory per core, which will make the maximum size of the handler inflexible, or deal with concurrent access by multiple cores, which will slow down everything.

I have ideas for this too. Most of the cores should have very limited amount of dedicated regular memory, but they will have one or more deep hardware FIFOs. The other end of the FIFOs may be muxed to other cores, which provides wide address-less communication channels between cores. This removes bus congestion altogether. The central core (or cores), in contrast, will have bigger memory so they can process data.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 5853
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #162 on: December 16, 2018, 09:36:01 pm »
I have ideas for this too. Most of the cores should have very limited amount of dedicated regular memory, but they will have one or more deep hardware FIFOs. The other end of the FIFOs may be muxed to other cores, which provides wide address-less communication channels between cores. This removes bus congestion altogether. The central core (or cores), in contrast, will have bigger memory so they can process data.
That does not address code memory.
Alex
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 1775
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #163 on: December 16, 2018, 10:01:03 pm »
That does not address code memory.

Doesn't have to. Code memory can be made completely separate from data memory. Each peripheral core has its own limited amount of code memory which can be programmed by the central core as needed. Small memories can be made very fast. This ensures very fast deterministic execution for the peripheral cores. In contrast, the central core doesn't have to be deterministic - may have caches, pipelines - if it ever needs access to data, it all gets smoothed out by FIFOs.

 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 5853
  • Country: us
    • Personal site
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #164 on: December 16, 2018, 10:03:38 pm »
Code memory can be made completely separate from data memory.
That's exactly what I'm talking about. You will essentially limit what your "interrupt" handler can do by defining the amount of code memory it has. I think this will be enough of a limitation to make this system impractical. At least for common microcontroller uses. It may be useful in an MPU environment. Kind of like ARM's big.LITTLE stuff.
Alex
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2035
  • Country: nz
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #165 on: December 16, 2018, 10:05:51 pm »
... Further, an interrupt only happens when the user code makes a jump... An interesting and useful side-effect is that user code could assume no interrupts while doing code that needs to be atomic.

I just thought some might find this interesting.
I found that very interesting!
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 3027
  • Country: us
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #166 on: December 16, 2018, 10:17:36 pm »
Quote
[ARM Cortex NVIC register stacking] likely faster in the majority of cases

I'm not convinced.  We're talking register stacking, probably limited by memory speed, and taking all of 1 instruction (push multiple) in the ISR to save exactly which ones you need...


Quote
The register is called DIRSET because writing to it only sets the bits

Yeah, ....DIRSET |= bitmask; was not the best example.


Quote
The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area

Maybe.  32bit processors tend to really spread those IO registers out, perhaps occupying more than even a reasonable offset constant for indexed addressing.And constant-folding upper bits of an address might be too much to ask of a compiler.   I remember looking at PIC32 code (MIPS), which loads 32bit constants half-at-a-time (LUI/ORI), and being disappointed that it it kept re-loading the same upper value.  OTOH, I think Microchip was defining those symbols at link time rather than in C source, so there wasn't much choice...  (This was quite a while ago.  Maybe now, with LTO and similar, it does better.)
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 1775
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #167 on: December 16, 2018, 10:36:09 pm »
Code memory can be made completely separate from data memory.
That's exactly what I'm talking about. You will essentially limit what your "interrupt" handler can do by defining the amount of code memory it has. I think this will be enough of a limitation to make this system impractical. At least for common microcontroller uses.

You do not need a lot of memory for peripheral cores - you need speed and determinism. And that is what MCUs are lacking now. You always can have a central core with enormous amount of memory to do any kind of processing.

The approach where you have a single memory bus for both data and code which is accessed simultaneously by CPU and 15 DMA channels through the bus arbiter, is not very suitable for real-time applications.
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 1448
  • Country: dk
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #168 on: December 16, 2018, 10:47:00 pm »
Quote
[ARM Cortex NVIC register stacking] likely faster in the majority of cases

I'm not convinced.  We're talking register stacking, probably limited by memory speed, and taking all of 1 instruction (push multiple) in the ISR to save exactly which ones you need...


but before you get to your push multiple, first the core has read the vector table and fetch the first instruction of the ISR (prolog)
done automatically it can often be done in parallel


 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 1775
  • Country: ca
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #169 on: December 16, 2018, 11:16:19 pm »
Quote
The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area
Maybe.  32bit processors tend to really spread those IO registers out, perhaps occupying more than even a reasonable offset constant for indexed addressing.And constant-folding upper bits of an address might be too much to ask of a compiler.   I remember looking at PIC32 code (MIPS), which loads 32bit constants half-at-a-time (LUI/ORI), and being disappointed that it it kept re-loading the same upper value.

Microchip went overboard with spreading the registers all over the place in PIC32. There's no reason for that. In PIC24, everything fits into 2048 bytes quite nicely, even with space to spare. RISC-V has only 4096 reach, but I think this is Ok for hardware registers.

If you locate all your peripheral registers at the beginning of the memory space, you already have the zero register which creates free zero base for you. So, you have 2048 bytes which are easily accessible. Good place for hardware registers.

It would be full 4096 bytes, but RISC-V went the traditional sign-extended (instead of more reasonable zero-extended) road for offsets. Although addresses 0xfffff000 to 0xffffffff may be used for peripheral registers too.

OTOH, I think Microchip was defining those symbols at link time rather than in C source, so there wasn't much choice... 

That's true. Although it's not a very good idea. I remember I had to copy definitions from the linker scripts to the inc files when I was working with PIC24.

 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1099
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #170 on: December 17, 2018, 12:25:56 am »
In SAM, "Group" represents a group of registers 128 bytes long and everything below is just unions.

I don't suppose the exact sizes matter much, as long as you stay within what can be done with a simple offset.

Quote
"PORT" would be a fixed location in memory space. So, what the code actually does is setting 2 bits at the fixed memory location.

Setting two bits at fixed locations .. yep .. that's what I compiled.

Quote
There's no pointer loading (which takes whopping 50% in Motorola, and 49% in Intel which you decided to compile as position-independent code).

I compiled them the way they came. None of the other ISAs have problems using PC-relative addressing.

You need to get the address of the hardware registers *somehow*. Now, it's true that you'd probably get slightly smaller code using the address of "PORT" as a #define instead of as a global variable, but that's the same for all ISAs and doesn't favour one over another.

Quote
Moreover, when someone builds an MCU with RISC-V, they will probably provide some way of setting bits without reading registers, as Atmel did here:

Code: [Select]
PORT->Group[0].DIRSET.reg = 1<<12; // no need for "|="
The register is called DIRSET because writing to it only sets the bits (and the bits which are written "0" remain unchanged), and there's an opposite register called DIRCLR which clears the bits, and also DIRTGL which xors.

I took the C code exactly as given by westfw, which also matches the ARM assembly language he gave in loading, ORing, and storing.

Incidentally, RISC-V *does* have a way to change bits without bringing the data to the CPU and back, but it seemed unfair to use it. I'm concentrating here on compiled C code.

AMOOR.W res,addr,val

This sends a message with the address, value, and operation out over the TileLink bus. If all the channels of the bus go as far as the peripheral, then the peripheral itself will do the OR operation locally and report back the new value. If at some point on the way to the peripheral the bus narrows to just a simple read/write bus then the controller at that point will do the read/modify/write and report the result back to the CPU.

Quote
The compiler may be clever enough to keep one of the registers permanently pointing to the IO registers area, so the whole thing boils down to this:

Code: [Select]
6685                    lui     a3,0x1
0ce7a423                sw      a3,200(a5) ; replace "200" with correct offset from a5

Sure, of course. But that value has to *get* into a5 somehow, and I showed that.

If I'd chosen to put the code into a function that took PORT as an argument then *all* of the ISAs would show shorter code.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1099
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #171 on: December 17, 2018, 12:39:43 am »
A decade and a half ago, I had the pleasure of working with  a VLIW processor, the Trimedia/Philips PNX1302. It dispatched up to 5 operations per instruction word at 200mhz. It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required.

You can do this on any CPU with a reasonably large number of registers. It's just a matter of documenting it and making sure the compiler (and/or assembly language programmers) know about it.

Even three or four registers is enough for many interrupt routines, so you could reasonably do this on machines with 16 registers -- but 32 would be better.

Quote
Further, an interrupt only happens when the user code makes a jump. So user code could (with care) use the top 64 between jumps. An interesting and useful side-effect is that user code could assume no interrupts while doing code that needs to be atomic.

This is a nice property. I've worked on a machine that (potentially) switched threads after every "block" of code -- not quite a basic block as there was a way to do if/then/else and small loops within a block, but there was a limit on the number of instructions executed in the block. Once you were in a block you were guaranteed NOT to be interrupted. And there was a bank of 8 fast registers (1 cycle latency) that could be used within a block but went *poof* at the end of the block. The 256 global registers had several cycles more latency than that.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1099
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #172 on: December 17, 2018, 12:47:40 am »
It had 128 32-bit registers, and the convention was that the botttom 64 belonged to user code and the top 64 could be used by the ISR. No saving required.

Some modern MCUs have multiple register sets. When an interrupt happens, the new set gets loaded. When it quits, the old one gets restored. It doesn't take any additional time and thus decreases the interrupt latency by a lot. If you have a separate register set for every interrupt level, you never need to save anything.

I don't know about "modern". The Z80 did this. Old ARM chips had a set of registers for every privilege level (not necessarily a whole set). And SPARC and Itanium had register windows that were used nto only by interrupts, but by function calls.

There are two problems with this that explain why no one does it any more:

1) at some point you run out and want three sets instead of two, or seventeen sets instead of sixteen. And then you have a whole lot of delay while you swap stuff. And you have to swap the entire set of registers even if the function/task using them is only using a small proportion of them.

2) it's just a huge waste of hardware resources that, in the end, is not actually used all that effectively. You're better off spending those transistors on something else -- such as a cache or write buffer that can absorb manually saved registers quickly on interrupts, but also makes your normal code run faster the rest of the time as well.

« Last Edit: December 17, 2018, 01:07:45 am by brucehoult »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1099
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #173 on: December 17, 2018, 12:58:33 am »
Quote
Quote
I wish the ISRs in C code [on ARM] that the HW interrupt entry was quicker...
Doesn't the 'naked' attribute of the function definition remove the prolog and epilog?
Not for ARM Cortex.  The NVIC hardware saves exactly the same registers that the C ABI says must be saved, so effectively there is NO extra prolog for ISRs.  But the NVIC hardware stacks 8 words of context, so it's slower than it could be if the choice was left to the programer.

slower in the rare case you need to do something in a few cycles with no registers, likely faster in the majority of cases

Not faster. If the hardware managed to write those 8 words to memory (or at least to a write buffer or something) in one or two clock cycles then it would be faster. But it doesn't. Cortex M3, M4, M7 all have 12 cycle interrupt latency (M0 has 16). It's sitting there writing those eight registers out at one per clock cycle, exactly the same as you could do yourself in software.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 1099
  • Country: nz
  • Currently at SiFive, previously Samsung R&D
Re: RISC-V assembly language programming tutorial on YouTube
« Reply #174 on: December 17, 2018, 01:00:58 am »
Register banks do make code that need to access registers across priority levels a whole lot messier (eg. task switching using a low-priority interrupt, like is usually done on Cortex-M MCUs, or exception handlers). I guess with modern manufacturing processes the extra state required by the additional register banks isn't a big deal anymore (eg. 31 32-bit registers by 8 banks is a bit less than 1000 bytes).

It does not cost as much due to area now but the register bank is within the critical timing path for the pipeline so it limits performance in an aggressive design.

Yes.

Also, there are other ways to use that 1 KB worth of transistors that give more bang for the buck, more of the time.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf