Author Topic: CH32V003 - first assembler program.. and loop speed.  (Read 1532 times)

0 Members and 1 Guest are viewing this topic.

Offline pancioTopic starter

  • Contributor
  • Posts: 14
  • Country: pl
    • SystemEmbedded.eu
CH32V003 - first assembler program.. and loop speed.
« on: November 14, 2024, 10:16:47 am »
Hi,

I want to use CH32V003 as simple replacement for cmall FPGA (or rather CPLD) and I know that if I need to have good timing I must write program in assembler. I found cool and usable tools and compiler and try to blinking the LED on my nanoCH32V003 board. Everything works fine but unfortunately expected maximum speed is too slow... Can anybody explain what is going on (maybe the clock is set with wrong value?)

The program was taken from:
https://github.com/sahasradal/CH32V003-Blinky-RISCV-assembly-language-code

I did some modification and prepare it for my board:

Code: [Select]
#the core is RISCV32EC, only 16 registers(0-15) ,basic integer opcode only
# x0 = zero
# x1 = ra Caller
# x2 = sp Callee
#x5-x7= t0,t1,t2
#x8 s0/fp Save register/frame pointer Callee
#x9 s1 Save register Callee
#x10-11= a0-1= Function parameters/return values Caller
#x12-15 a2-5 Function parameters Caller
#The Caller attribute in the above table means that the called procedure does not save the register value, and
#the Callee attribute means that the called procedure saves the register

# 2kb sram

include CH32V003_reg1.asm # file with all address defines

fclk = 24000000    # 24Mhz RCO internal



main:

sp_init:
   
    li sp, STACK     # load stack pointer with stack end address

li x10,R32_RCC_APB2PCENR     # load address of APB2PCENR register to x10 ,for enabling GPIO A,D,C peripherals
lw x11,0(x10)         # load contents from peripheral register R32_RCC_APB2PCENR pointed by x10
li x7,((1<<2)|(1<<4)|(1<<5)) # 1<<IOPA_EN,1<<IOPC_EN,1<<IOPD_EN
or x11,x11,x7 # or values
sw x11,0(10) # store modified enable values in R32_RCC_APB2PCENR

li x10,R32_GPIOD_CFGLR # load pointer x10 with address of R32_GPIOD_CFGLR , GPIO configuration register
lw x11,0(x10) # load contents from register pointed by x10
# li x7,~(0xf<<16) # we need to setup PD4 (led pin of board). clear PD4 config bits with mask 0xfff0ffff or ~(F<<16)
li x7,0xf0ffffff # we need to setup PD4 (led pin of board). clear PD6 config bits with mask 0xfff0ffff or ~(F<<16)
   
and x11,x11,x7 # clear PD6 mode and cnf bits for selected pin D6
# li x7,(0x3<<16) # 00: Universal push-pull output mode.|11: Output mode, maximum speed 50MHz = 0011 (0x3 shifted to bit 16 of reg)
li x7,(0x3<<24) # 00: Universal push-pull output mode.|11: Output mode, maximum speed 50MHz = 0011 (0x3 shifted to bit 24 of reg)
or x11,x11,x7 # OR value to register
sw x11,0(x10) # store in R32_GPIOD_CFGLR

PD6_ON:
li x10,R32_GPIOD_BSHR # R32_GPIOD_BSHR register sets and resets GPIOD pins, load address into pointer x10
lw x11,0(x10) # load contents to x11
li x7,(1<<0x06) # set PD6 by shifting 1 to bit position 6
or x11,x11,x7 # OR with x11
sw x11,0(x10) # store x11 to R32_GPIOD_BSHR 

call delay # delay subroutine

PD6_OFF:
li x10,R32_GPIOD_BSHR # R32_GPIOD_BSHR register sets and resets GPIOD pins, load address into pointer x10
lw x11,0(x10) # load contents to x11
li x7,(1<<0x16)     # reset PD6 by shifting 1 into bit position 22 (0x16) of R32_GPIOD_BSHR
or x11,x11,x7 # OR with x11
sw x11,0(x10) # store x11 to R32_GPIOD_BSHR

call delay # delay subroutine

j PD6_ON # jump to label PD4_ON and loop   



delay: # delay routine
li x6,1         # load an arbitarary value 1 to t1 register
dloop:
addi x6,x6,-1 # subtract 1 from t1
bne x6,zero,dloop # if t1 not equal to 0 branch to label loop
ret


The 'delay' subroutine determining time between ON/OFF LED and it's set to shortest value.

Is possible that rest of program flow taking so much time and max switching frequency is ~211KHz?



SystemEmbedded.eu - Power without the price! :-)
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4774
  • Country: nz
Re: CH32V003 - first assembler program.. and loop speed.
« Reply #1 on: November 14, 2024, 10:59:41 am »
Quote
I know that if I need to have good timing I must write program in assembler.

You know wrongly. You can easily write code in C that will run 5 or 10 times faster than the assembly language you show here.

With assembly language you can make things predictable and controllable, down to the clock cycle, on a CPU like the CH32V003, but you are far from that territory.

I count 23 instructions per loop. If you're running at 24 MHz or the instructions are 4-byte instructions then that's pretty much exactly 1 MHz. Still, that's significantly faster than the 211 KHz you say you are getting.

Many MCUs have a maximum pin toggle speed well below the clock speed because of clock domain crossing or various other things. But I'd expect a CH32V003 to be able to do a lot better than that -- you need 1 MHz (with fairly precise duty cycle) to from RGB LED strings, for example. And an 8 bit AVR can do that.

On a HiFive1 (first 32 bit RISC-V chip, from 2016) I've been able to get a 17 MHz output on a GPIO, from a 256 MHz CPU.

You can get your loop down to just ...

Code: [Select]
    li x10,R32_GPIOD_BSHR
    li x11,(1<<0x06)
    li x12,(1<<0x16)
loop:
    sw x11,(x10)
    sw x12,(x10)
    j loop

... which at least will try to run just as fast as possible (unrolling would make it a little faster)

I don't know for sure, but I don't think reading from R32_GPIOD_BSHR does anything useful -- I've never tried such a thing, but I'd imagine it always returns 0s. Similarly, the OR instructions are useless at best.

And loading the address and the constants inside the loop is just silly.

That's also a very bad way to get a delay. Much better to loop until the mcycle CSR is >= a calculated value.
« Last Edit: November 14, 2024, 11:03:18 am by brucehoult »
 
The following users thanked this post: pancio

Offline pancioTopic starter

  • Contributor
  • Posts: 14
  • Country: pl
    • SystemEmbedded.eu
Re: CH32V003 - first assembler program.. and loop speed.
« Reply #2 on: November 14, 2024, 11:45:54 am »
Thank you for so detailed answer!

Quote
You know wrongly. You can easily write code in C that will run 5 or 10 times faster than the assembly language you show here.

I expressed myself badly... I meant that C code can be of different lengths and it is more difficult to predict the execution time of its individual parts. Especially that optimization might have an influence...

Quote
Many MCUs have a maximum pin toggle speed well below the clock speed because of clock domain crossing or various other things. But I'd expect a CH32V003 to be able to do a lot better than that -- you need 1 MHz (with fairly precise duty cycle) to from RGB LED strings, for example. And an 8 bit AVR can do that.

That's why I'm almost certain it can be done faster.. maybe DMA but I'm still a long way off...

Code: [Select]
    li x10,R32_GPIOD_BSHR
    li x11,(1<<0x06)
    li x12,(1<<0x16)
loop:
    sw x11,(x10)
    sw x12,(x10)
    j loop

I'll try to modify code as you proposed  - I see high potential in it! :-)
 
Quote
I don't know for sure, but I don't think reading from R32_GPIOD_BSHR does anything useful -- I've never tried such a thing, but I'd imagine it always returns 0s. Similarly, the OR instructions are useless at best.

You have totally right. BSHR returning 0x00000000 and it's WO in SET/RESET bits area. There is no need to read it before modifying...

Need some tome to tests all your advises :-)


A few moments later...

I changed code, drop old loop and use your program method... result: 1.5MHz pretty nice!

Many thanks!


A few (few) moments later...

I tried to do the same task in C language:
Code: [Select]
#include "ch32v00x_rcc.h"
#include <stdio.h>


#define GPIO_CNF_OUT_PP 0

int main()
{
    SystemInit();

    // Enable GPIOs
    RCC->APB2PCENR |= RCC_APB2Periph_GPIOD;

    // GPIO D6 Push-Pull
    GPIOD->CFGLR &= ~(0xf<<(4*6));
    GPIOD->CFGLR |= (GPIO_Speed_10MHz | GPIO_CNF_OUT_PP)<<(4*6);

    while(1)
    {
        // Turn on GPIO
        GPIOD->BSHR = (1<<0x6);
        // Turn off GPIO
        GPIOD->BSHR = (1<<0x16);

    }
}


I'm so big surprised... 6MHz! How it's possible?
It seems to me more and more that it is a matter of correct initialization of the microcontroller... but I don't know how to do it in assembler.

BR,
pancio



« Last Edit: November 14, 2024, 01:18:25 pm by pancio »
SystemEmbedded.eu - Power without the price! :-)
 

Offline HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1610
  • Country: gb
Re: CH32V003 - first assembler program.. and loop speed.
« Reply #3 on: November 14, 2024, 02:42:21 pm »
I'm so big surprised... 6MHz! How it's possible?
It seems to me more and more that it is a matter of correct initialization of the microcontroller... but I don't know how to do it in assembler.

Why not look at the assembly generated by the C compiler? You can generate that from the binary ELF executable:

riscv-none-elf-objdump --section-headers --disassemble-all --source --visualize-jumps "input.elf" > "output.lst"

Edit:

Or, take a look at it in Godbolt Compiler Explorer: https://godbolt.org/z/6Tv3GGqa9

Assembly output from that copied below for convenience:

Code: [Select]
main:
        li      a4,1073876992
        lw      a5,24(a4)
        li      a3,-251658240
        addi    a3,a3,-1
        ori     a5,a5,32
        sw      a5,24(a4)
        li      a5,1073811456
        addi    a5,a5,1024
        lw      a4,0(a5)
        and     a4,a4,a3
        sw      a4,0(a5)
        lw      a4,0(a5)
        li      a3,16777216
        or      a4,a4,a3
        sw      a4,0(a5)
        li      a3,64
        li      a4,4194304
.L2:
        sw      a3,16(a5)
        sw      a4,16(a5)
        j       .L2

Call to SystemInit() has been omitted for brevity.

You can see that it's essentially the same as brucehoult's suggested assembly code - the 'on' and 'off' BSHR register values are pre-computed outside the loop, and the loop just consists of a couple of store-word instructions writing those values to the BSHR register. So I don't know why you were only able to get the pin toggling at 1.5 MHz with that code compared to the C version.

I forget exactly what SystemInit() in WCH's HAL code does, so maybe it's configuring the main clock for 48 MHz? But that wouldn't explain why essentially the same machine code is running 4 times quicker rather than merely twice as quick...
« Last Edit: November 14, 2024, 03:16:12 pm by HwAoRrDk »
 

Offline HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1610
  • Country: gb
Re: CH32V003 - first assembler program.. and loop speed.
« Reply #4 on: November 14, 2024, 03:39:28 pm »
I forget exactly what SystemInit() in WCH's HAL code does, so maybe it's configuring the main clock for 48 MHz? But that wouldn't explain why essentially the same machine code is running 4 times quicker rather than merely twice as quick...

Hmm, after looking into what SystemInit() does... Would the HB Bus Peripheral Clock (HCLK) speed affect GPIO speeds?

By default, whether you're running at 24 or 48 MHz, the SystemInit() code sets HPRE in the RCC_CFGR0 register to 0b0000, setting the HCLK prescaler to divide by 1 (so HCLK = SYSCLK). However, the register default value for HPRE is 0b0010, for divide by 3 - which is what OP would get using their assembly code (because it doesn't initialise the RCC_CFGR0 register).

So, it's likely there is a difference in the speed of HCLK between OP's assembly and C examples, so we're not quite comparing apples to apples.
 
The following users thanked this post: pancio

Offline pancioTopic starter

  • Contributor
  • Posts: 14
  • Country: pl
    • SystemEmbedded.eu
Re: CH32V003 - first assembler program.. and loop speed.
« Reply #5 on: November 14, 2024, 04:12:35 pm »
Thank you for tipHwAoRrDk,

It's generated strange long code but i found the part of while(1) loop:

Code: [Select]
    while(1)
    {
        // Turn on GPIO
        GPIOD->BSHR = (1<<0x6);
 2b6:     40078793          addi a5,a5,1024
    GPIOD->CFGLR &= ~(0xf<<(4*6));
 2ba:     8f75                and a4,a4,a3
 2bc:     c398                sw a4,0(a5)
    GPIOD->CFGLR |= (GPIO_Speed_50MHz | GPIO_CNF_OUT_PP)<<(4*6);
 2be:     4398                lw a4,0(a5)
 2c0:     030006b7          lui a3,0x3000
 2c4:     8f55                or a4,a4,a3
 2c6:     c398                sw a4,0(a5)
        GPIOD->BSHR = (1<<0x6);
 2c8:     04000693          li a3,64
        // Turn off GPIO
        GPIOD->BSHR = (1<<0x16);
 2cc:     00400737          lui a4,0x400
        GPIOD->BSHR = (1<<0x6);
 2d0: /-> cb94                sw a3,16(a5)
        GPIOD->BSHR = (1<<0x16);
 2d2: |   cb98                sw a4,16(a5)
 2d4: \-- bff5                j 2d0 <main+0x3a>

It's means that in loopwe have only:
Code: [Select]
2d0:
sw a3,16(a5)
sw a4,16(a5)
j 2d0



it's very similar to code from @brucehoult

Code: [Select]
loop:
    sw x11,(x10)
    sw x12,(x10)
    j loop

Need to check how to CLK is initialized in C language...

At this moment i bricked two devices - probably because during configuration PORTD i touched DIO pin... :-)

 
« Last Edit: November 14, 2024, 04:18:29 pm by pancio »
SystemEmbedded.eu - Power without the price! :-)
 

Offline pancioTopic starter

  • Contributor
  • Posts: 14
  • Country: pl
    • SystemEmbedded.eu
Re: CH32V003 - first assembler program.. and loop speed.
« Reply #6 on: November 14, 2024, 04:24:28 pm »

Hmm, after looking into what SystemInit() does... Would the HB Bus Peripheral Clock (HCLK) speed affect GPIO speeds?

By default, whether you're running at 24 or 48 MHz, the SystemInit() code sets HPRE in the RCC_CFGR0 register to 0b0000, setting the HCLK prescaler to divide by 1 (so HCLK = SYSCLK). However, the register default value for HPRE is 0b0010, for divide by 3 - which is what OP would get using their assembly code (because it doesn't initialise the RCC_CFGR0 register).

So, it's likely there is a difference in the speed of HCLK between OP's assembly and C examples, so we're not quite comparing apples to apples.

Yes, I'll update assembly code with CLK configuration and provide confirmation that's missing part  for full speed :-)
 Unfortunately I need to assembly new board.. or two. Do you  know, it's possible to unbrick deadly modules?

BR,
pancio
 
SystemEmbedded.eu - Power without the price! :-)
 

Offline pancioTopic starter

  • Contributor
  • Posts: 14
  • Country: pl
    • SystemEmbedded.eu
Re: CH32V003 - first assembler program.. and loop speed.
« Reply #7 on: November 15, 2024, 04:10:23 am »
Hi,

Yesterday's evening I unblocked my modules using 'minichlink' by @cnlohr / ch32v003fun. Very usefull tool.

https://github.com/cnlohr/ch32v003fun/tree/master/minichlink

Additionally I checked similar code for switching GPIO under ch32v003fun envt. and speed is 6MHz. Tried to understand how CLK initialization was done but it's so complicated for me. Needs to spend a lot of time to porting it to pure assembler...

Best Regards,
pancio



SystemEmbedded.eu - Power without the price! :-)
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4354
  • Country: us
Re: CH32V003 - first assembler program.. and loop speed.
« Reply #8 on: November 15, 2024, 11:07:53 am »
Quote
[size=0px]Tried to understand how CLK initialization was done but it's so complicated for me.[/size]
chip


I was going to comment that I didn’t see any clock initialization code in your example, bunion wasn’t sure is was actually needed on the 003.


Most new 32bit chips have complex clock systems that are a pain to get past when you’re first learning to use the chip, especially when compared to the 8bit chips they’re trying to replace.  Annoying!
 

Offline pancioTopic starter

  • Contributor
  • Posts: 14
  • Country: pl
    • SystemEmbedded.eu
Re: CH32V003 - first assembler program.. and loop speed.
« Reply #9 on: November 15, 2024, 05:38:53 pm »
That's right. Even such a small and "simple" microcontroller as 003 has a complicated way of managing clocks. Just look at the C code and you can get a headache. As a beginner in RV32 assembly, I probably won't be able to handle this task too quickly. By the way, it's very sad that WCH doesn't show any examples of assembly programming...
SystemEmbedded.eu - Power without the price! :-)
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf