Author Topic: CH32v003 gpio speed  (Read 3564 times)

0 Members and 1 Guest are viewing this topic.

Offline HeindalTopic starter

  • Contributor
  • Posts: 16
  • Country: us
CH32v003 gpio speed
« on: December 16, 2024, 04:09:37 pm »
So i am trying to get nanosecond timing on ch32v003 for neopixel. the minimum time it can do at 48 MHz is 20.8 ns which is exactly one "nop" instruction. My first problem lies in the max speed i can get out of gpio is 4.80MHz when measuring with an  oscillascope with toggle code being inside setup(). Inside Loop() its reduced to 1.86 MHz. Second issue is that The low level on  gpio stays for far longer than it should. If the high state is 100ns (80ns for direct memory access + 20ns for nop) and the low state is irregular with 300-400ns. And i know adafruit library works with it. But adafruit library and uart on ch32v003 is unable to coexist (uart breaks timing)

 GPIOD->BSHR = GPIO_BSHR_BS3;  // Set PD3 high
 Waitns(); // This code is inlined and only contains a single nop instruction
 GPIOD->BSHR = GPIO_BSHR_BR3;  // Set PD3 low
 Waitns(); // This code is inlined and only contains a single nop instruction

I know the loop() also affects gpio speed but this problem also happens in setup.(Arduino btw). Btw im using tssop 20 adapter board. PD3 is connected to Neopixel.
btw i have succeeded in getting neopixel to work by more or less manually making a block of code take 400ns for example(asm loops and nops). But it only works for 48MHz. As for using timers...im not sure if uart or spi will use them because i will be using all uart + spi + neopixel in the same sketch.  The mcu will receive commands via uart then neopixel will  give an indication of what the mcu is doing for example reading flash etc.
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 9449
  • Country: fi
Re: CH32v003 gpio speed
« Reply #1 on: December 16, 2024, 04:27:18 pm »
This is a regular discussion; in a nutshell,

1) write to GPIOD->BSHR takes more than one instruction, check the compiler output to see exactly what; but it usually involves loading the constant from memory and writing it to another address

2) not all memory is equally fast, peripheral registers are usually behind a synchronization barrier on higher-end MCUs so access takes more clock cycles than a normal memory access.

Conclusion is always, try to use peripherals (like SPI) to do timing-sensitive things if at all possible.
 

Offline HeindalTopic starter

  • Contributor
  • Posts: 16
  • Country: us
Re: CH32v003 gpio speed
« Reply #2 on: December 16, 2024, 04:49:02 pm »
if i only needed to use neopixel then yeh spi could be used but spi is currently used by external flash.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11934
  • Country: us
    • Personal site
Re: CH32v003 gpio speed
« Reply #3 on: December 16, 2024, 05:00:57 pm »
You may also want to try relocating time-critical code into the SRAM. This way you will not be incurring flash wait state penalty.
Alex
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 9449
  • Country: fi
Re: CH32v003 gpio speed
« Reply #4 on: December 16, 2024, 05:18:31 pm »
if i only needed to use neopixel then yeh spi could be used but spi is currently used by external flash.

Really, get a microcontroller with two SPIs. They start at some tens of cents. Working with very limited, underperforming microcontrollers can be "fun" but prepare for a lot of learning experiences.
 
The following users thanked this post: thm_w

Offline voltsandjolts

  • Supporter
  • ****
  • Posts: 2594
  • Country: gb
Re: CH32v003 gpio speed
« Reply #5 on: December 16, 2024, 05:25:15 pm »
if i only needed to use neopixel then yeh spi could be used but spi is currently used by external flash.

If you really can't change to a device with two spi peripherals, then use the one hardware spi for neopixel and bit-bang the flash - it's fine with slow spi.
 

Offline HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1618
  • Country: gb
Re: CH32v003 gpio speed
« Reply #6 on: December 16, 2024, 07:13:35 pm »
I suspect one factor behind how fast GPIO can run is what the peripheral clock is running at. The peripheral clock for GPIOD runs from APB2, which sources its clock from AHB, a.k.a. HCLK. This might be subdivided from the main SYSCLK, and not running at the full speed. We don't know what OP's clock configuration is, but it seems they are using the Arduino framework, so if they're running the official WCH one, then we can possibly eliminate this as a problem because the WCH HAL code, when setting up for 48 MHz operation (regardless of whether HSI or HSE oscillator source), sets HPRE, the AHB prescaler, to divide-by-1 (i.e. undivided). So OP may already have HCLK as fast as it can go.

1) write to GPIOD->BSHR takes more than one instruction, check the compiler output to see exactly what; but it usually involves loading the constant from memory and writing it to another address

Assuming the GPIOD base address has already been loaded into a register, yes, it takes two instructions to set BSHR: one to load the literal value into a register, and a second to write that value to the register.

Code: [Select]
li a1,0x8
sw a1,16(a0)    ; GPIOD base addr already in in A0

However, if the compiler is being sensible, then with a simple scenario like OP's benchmark where a pin is simply being toggled in a loop by assigning constant values to BSHR, the compiler will put the li loading instructions outside the loop, so in effect only a single instruction is needed to set the GPIO pin. I seem to recall there was a thread discussing this recently.

You may also want to try relocating time-critical code into the SRAM. This way you will not be incurring flash wait state penalty.

Yeah, running at 48 MHz necessitates 1 wait state for flash access.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11934
  • Country: us
    • Personal site
Re: CH32v003 gpio speed
« Reply #7 on: December 16, 2024, 07:40:41 pm »
And not relying on the compiler and wiring critical things by hand may also improve things. If you write the whole LED update loop by hand, you will have the best possible control over the timing.
Alex
 

Offline HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1618
  • Country: gb
Re: CH32v003 gpio speed
« Reply #8 on: December 16, 2024, 07:50:37 pm »
By the way, there is also a technique for driving NeoPixels (a.k.a. WS2812) involving timers and DMA. I forget the exact details right now, but it involves using PWM mode of a timer to output the bit stream by toggling the duty cycle appropriately for the 1s and 0s of each bit. DMA is used to load a pre-computed buffer of compare values representing each bit into the compare value register of the timer. The timer triggers a DMA transfer of the next value every time the timer counter cycles (i.e. at 800 kHz). For the NeoPixel reset/latch period, either handle the DMA 'transfer complete' IRQ and disable timer/DMA and wait the reset period by some other means, or I suppose you could tag on a bunch of 'null' pixels on to the end of the buffer to simulate the reset period.

However, this technique isn't really usable on something with limited memory like the CH32V003 (2KB) if you want to dynamically change the pixel patterns, because of how much space the array of compare values for each bit takes up. Timer compare values are 16-bit, so each bit of a 24-bit pixel colour value requires 2 bytes. Assuming it'd even be possible to utilise the entire 2048 bytes of RAM, you'd only be able to handle (2048/2)/24 = 42 pixels. You can use flash for DMA source, but that means pixel patterns would have to be fixed. So, basically, this technique is really only useful on MCUs with larger amounts of memory (e.g. with 16KB you can handle 341 pixels).

(Hmm, side thought: would one actually need to write the entire 16-bit value to the CHnCVR register? If the compare values for '1' and '0' work out to <256, then store as a single byte and only make 8-bit transfers with DMA? Would that work?)
« Last Edit: December 16, 2024, 08:07:06 pm by HwAoRrDk »
 

Offline HeindalTopic starter

  • Contributor
  • Posts: 16
  • Country: us
Re: CH32v003 gpio speed
« Reply #9 on: December 16, 2024, 08:09:56 pm »
yeh im running the offical wch one. With also the lastest source so i have the clock config menu which is set to 48 MHz.  I know with esp32 series mcus you use IRAM_ATTR to put the code in iram but what is the attr for wch.

I dont need to use it for alot of neopixels just one. all the neopixel is used for is a status indicator.  also the ch32v003 does have like 1.5k to 1.7k free ram
« Last Edit: December 16, 2024, 08:16:12 pm by Heindal »
 

Offline HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1618
  • Country: gb
Re: CH32v003 gpio speed
« Reply #10 on: December 16, 2024, 09:04:21 pm »
I know with esp32 series mcus you use IRAM_ATTR to put the code in iram but what is the attr for wch.

ESP-IDF's IRAM_ATTR is simply a convenience macro for a GCC function attribute:

Code: [Select]
#define IRAM_ATTR __attribute__((section(".iram1")))

Basically, it tells the compiler that "this function should reside in section <x>". Then you have an entry in the linker script to define that this section exists within RAM, and its contents should be loaded there from flash at startup. The startup code needs to specifically copy this section from flash to RAM.

You would use the same approach for pretty much any microcontroller that you're compiling/linking code for with GCC.

It seems in latest versions of WCH's HAL code, they have catered to this by defining a ".highcode" section in linker script and startup code. You'd use it by just adding __attribute__((section(".highcode"))) as an attribute to a function. However, this is not (currently) present in the Arduino core's linker script and startup code.

Instead you might be able to just use a ".data" sub-section. I don't really see any difference in how the two sections - .highcode and .data - are defined (in terms of alignment, etc.) in the linker script.

Code: [Select]
__attribute__((section(".data.my_function")))
void my_function(int blah, int foo) {
    // etc...
}

Everything in .data, and .data.*, is already copied from flash to RAM at reset by the startup code.
 

Offline HeindalTopic starter

  • Contributor
  • Posts: 16
  • Country: us
Re: CH32v003 gpio speed
« Reply #11 on: December 16, 2024, 09:38:58 pm »
tried  putting code in the .data section like you said. Theres no difference. So im guessing they havent added the code to support it yet in 1.0.4 source
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11934
  • Country: us
    • Personal site
Re: CH32v003 gpio speed
« Reply #12 on: December 16, 2024, 10:42:17 pm »
Find the linker script and see what sections are defined in your case. Or just add what you need.

It is not guaranteed to improve thing, but you should verify that the code is indeed placed in the SRAM by looking at the map file or disassembly.
Alex
 

Offline HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1618
  • Country: gb
Re: CH32v003 gpio speed
« Reply #13 on: December 17, 2024, 01:33:22 am »
tried  putting code in the .data section like you said. Theres no difference. So im guessing they havent added the code to support it yet in 1.0.4 source

The point of putting it in the .data section was that you shouldn't need to do anything extra for it to work - linker script and startup code should already handle things as-is.

I tried it (albeit, not with Arduino core - but my linker script and startup code are identical in how they handle .data), and it does work.

In fact, it intrigued me whether there is actually any measurable difference in execution speed between running code from flash and from RAM. So I put together a small benchmark program:

Code: [Select]
/* includes, init functions, etc snipped for brevity */

static void systick_init(void) {
SysTick->CNT = 0;
SysTick->SR = 0;
SysTick->CMP = UINT32_MAX;
SysTick->CTLR = STK_STRE | STK_STCLK;
}

static inline void systick_start(void) {
SysTick->CNT = 0;
SysTick->SR = 0;
SysTick->CTLR |= STK_STE;
}

static inline uint32_t systick_stop(void) {
SysTick->CTLR &= ~STK_STE;
return SysTick->CNT;
}

#define TEST_ITERATIONS 200000
#define TEST_FUNC_BODY(i) \
do { \
volatile uint32_t count = (i); \
while(count-- > 0); \
while(count++ < (i)); \
while(count-- > 0); \
} while(0)

__attribute__((noinline)) static void test_func_flash(const uint32_t iters) {
TEST_FUNC_BODY(iters);
}

__attribute__((section(".data.test_func_ram"), noinline)) static void test_func_ram(const uint32_t iters) {
TEST_FUNC_BODY(iters);
}

int main(void) {
uint32_t ticks_flash, ticks_ram;

clock_init();
gpio_init();
systick_init();
uart_init(HSI_VALUE, UART_BAUD_RATE);

printf("----------------------------------------\n");
printf("RAM EXEC TEST\n");
printf("----------------------------------------\n");

printf("test_func_flash() address: %p\n", test_func_flash);
printf("test_func_ram() address: %p\n", test_func_ram);

systick_start();
test_func_flash(TEST_ITERATIONS);
ticks_flash = systick_stop();

printf("test_func_flash() execution: %lu ticks\n", ticks_flash);

systick_start();
test_func_ram(TEST_ITERATIONS);
ticks_ram = systick_stop();

printf("test_func_ram() execution: %lu ticks\n", ticks_ram);

while(true);
}

The linker map shows:

Code: [Select]
.text.test_func_flash
                0x00000828       0x28 obj\Release\main.o

 .data.test_func_ram
                0x20000000       0x28 obj\Release\main.o

And the output I get is:

Code: [Select]
----------------------------------------
RAM EXEC TEST
----------------------------------------
test_func_flash() address: 0x828
test_func_ram() address: 0x20000000
test_func_flash() execution: 1600035 ticks
test_func_ram() execution: 1600038 ticks

This is with the CH32V003 running at 24 MHz, with SYSCLK = HSI, HCLK = SYSCLK/1, and zero flash wait states. SysTick runs from HCLK/1, so should also be counting at 24 MHz.

As you can see, no meaningful difference. In fact, execution time from RAM actually consistently seems to be 3-4 ticks slower than executing from flash for some reason. :-//

Edit: Oh, I see why there's a difference. The code isn't quite identical between the two test cases. The RAM function is called slightly differently: after the timer is started, there's an extra auipc instruction and jalr is used to call the function. That's probably where the extra few ticks comes from.

Code: [Select]
sw zero,8(s0)                                      sw zero,8(s0)
sw zero,4(s0)                                      sw zero,4(s0)
lw a5,0(s0)                                        lw a5,0(s0)
lui a0,0x31                                         lui a0,0x31
addi a0,a0,-704 # 30d40 <_data_lma+0x301a8>      addi a0,a0,-704 # 30d40 <_data_lma+0x301a8>
ori a5,a5,1                                         ori a5,a5,1
sw a5,0(s0)                                        sw a5,0(s0)
jal 828 <test_func_flash>                           auipc ra,0x1ffff
                                                    jalr 1690(ra) # 20000000 <test_func_ram>

I shall try with 48 MHz HSI and 1 wait state and see what the results are for that - but I need to figure out the clock configuration code for that first. :P
« Last Edit: December 17, 2024, 01:46:15 am by HwAoRrDk »
 
The following users thanked this post: edavid

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15927
  • Country: fr
Re: CH32v003 gpio speed
« Reply #14 on: December 17, 2024, 01:58:11 am »
if i only needed to use neopixel then yeh spi could be used but spi is currently used by external flash.

I would suggest considering using a timer with PWM output or another appropriate output compare mode with DMA.
 

Offline HeindalTopic starter

  • Contributor
  • Posts: 16
  • Country: us
Re: CH32v003 gpio speed
« Reply #15 on: December 17, 2024, 03:02:24 am »
maybe in my case putting it in ram would not improve anything since a nop still would take 20ns and memory access might be  bottlenecked by another thing. has anyone gotten gpio  to work with 30/50MHz in Output mode. Idk datasheet Port Configuration Register(GPIOx_CFGLR) says max is 30 MHz but wch chip diagram show HB bus Fmax  as 50 MHz.

since im only geting 4.8 MHz  i assume by default its at 10MHz since manually toggling would halve the frequency
« Last Edit: December 17, 2024, 03:04:53 am by Heindal »
 

Offline HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1618
  • Country: gb
Re: CH32v003 gpio speed
« Reply #16 on: December 17, 2024, 04:35:55 am »
Results of my experiment for 48 MHz HSI (i.e. SYSCLK = PLL, PLL fed by HSI) and 1 flash wait state:

Code: [Select]
----------------------------------------
RAM EXEC TEST
----------------------------------------
test_func_flash() address: 0x828
test_func_ram() address: 0x20000000
test_func_flash() execution: 2000039 ticks
test_func_ram() execution: 1600041 ticks

So, conclusion is that running code from RAM is only faster when running with 1 flash wait state necessitated by clock being greater than 24 MHz.

has anyone gotten gpio  to work with 30/50MHz in Output mode. Idk datasheet Port Configuration Register(GPIOx_CFGLR) says max is 30 MHz but wch chip diagram show HB bus Fmax  as 50 MHz.

Those GPIO output mode settings of 2/10/30 MHz are setting the drive strength of the pin's output. Different drive strengths will affect the slew rate of the voltage output by the pin - i.e. how fast the transitions between high/low are. It doesn't dictate the actual frequency with which the pin can be toggled. They describe the settings in terms of MHz because those are the maximum signal frequencies which those drive strengths are suitable for. Whether you can actually generate an output signal of such frequency from the GPIO is another matter. Typically only special-purpose peripherals like SPI, PWM, etc. will be outputting signals of such high frequencies, so you only really need a high drive strength in those scenarios. Otherwise, for EMI reasons you should typically always select the lowest drive strength.

Not sure which diagram you're referring to that mentions 50 MHz. ???
 
The following users thanked this post: whitehorsesoft

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3303
  • Country: ca
Re: CH32v003 gpio speed
« Reply #17 on: December 17, 2024, 08:20:40 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.
 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 14127
  • Country: gb
    • Mike's Electric Stuff
Re: CH32v003 gpio speed
« Reply #18 on: December 17, 2024, 08:48:52 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.
or the PWM peripheral
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4837
  • Country: nz
Re: CH32v003 gpio speed
« Reply #19 on: December 17, 2024, 09:24:04 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.

How do you hide the start and stop bits? You must have to use something like 2.5 Mbps, which probably very few UARTs can do -- that's 22x the highest commonly used speed! (115200)

Think you'd also need an inverter.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3303
  • Country: ca
Re: CH32v003 gpio speed
« Reply #20 on: December 17, 2024, 10:05:02 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.

How do you hide the start and stop bits? You must have to use something like 2.5 Mbps, which probably very few UARTs can do -- that's 22x the highest commonly used speed! (115200)

Think you'd also need an inverter.

Yes, you need to invert the output, which I suppose and modern UART module can do.

Before inversion, to encode 1 you need to transmit the following bauds: 001, to encode 0 use 011

The start bit is always 0 which is coinsides with the start bit, then 2 bits of data, then 3 bits for the second datum, and then 3 bits for the third. The stop bit will give you a pause.

Something like this

Code: [Select]
#define BIT1_0 0x03
#define BIT1_1 0x02
#define BIT2_0 0x18
#define BIT2_1 0x10
#define BIT3_0 0xc0
#define BIT3_1 0x80

combine these as needed with or and send. For example

Code: [Select]
BIT1_0 | BIT2_1 | BIT3_1
will send 0 - 1 - 1 to WS2812

or

Code: [Select]
#define BASE 0x92
#define BIT1 0x01
#define BIT2 0x08
#define BIT2 0x40

As to the speed, you need around 3 Mbaud. For 24 MHz clock, it is CLK/8. Shouldn't be a problem for an UART module. You don't need very precise baud rate as WS2812 has a huge margin.

<edit>corrected errors in numbers
« Last Edit: December 17, 2024, 10:23:15 pm by NorthGuy »
 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 14127
  • Country: gb
    • Mike's Electric Stuff
Re: CH32v003 gpio speed
« Reply #21 on: December 17, 2024, 10:22:19 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.

How do you hide the start and stop bits? You must have to use something like 2.5 Mbps, which probably very few UARTs can do -- that's 22x the highest commonly used speed! (115200)

Think you'd also need an inverter.
Most UARTS can do at least clk/16, some up to clk/4.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 14127
  • Country: gb
    • Mike's Electric Stuff
Re: CH32v003 gpio speed
« Reply #22 on: December 17, 2024, 10:24:35 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.

How do you hide the start and stop bits? You must have to use something like 2.5 Mbps, which probably very few UARTs can do -- that's 22x the highest commonly used speed! (115200)

Think you'd also need an inverter.

Yes, you need to invert the output, which I suppose and modern UART module can do.

Before inversion, to encode 1 you need to transmit the following bauds: 001, to transmit 011

The start bit is always 0 which is coinsides with the start bit, then 2 bits of data, then 3 bits for the second datum, and then 3 bits for the third. The stop bit will give you a pause.

Something like this

Code: [Select]
#define BIT1_0 0x02
#define BIT1_1 0x03
#define BIT2_0 0x10
#define BIT2_1 0x18
#define BIT3_0 0x80
#define BIT3_1 0xc0

combine these as needed with or and send. For example

Code: [Select]
BIT1_0 | BIT2_1 | BIT3_1
will send 0 - 1 - 1 to WS2812

or

Code: [Select]
#define BASE 0x92
#define BIT1 0x01
#define BIT2 0x08
#define BIT2 0x40

As to the speed, you need around 3 Mbaud. For 24 MHz clock, it is CLK/8. Shouldn't be a problem for an UART module. You don't need very precise baud rate as WS2812 has a huge margin.
Something else that can be useful to know is that although bit-to-bit speed is fairly critical, many ( all?) WS2812 style chips will tolerate inter-byte gaps of up to a couple of hundred uS before seeing the gap as a frame reset
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4837
  • Country: nz
Re: CH32v003 gpio speed
« Reply #23 on: December 17, 2024, 10:39:37 pm »
Something else that can be useful to know is that although bit-to-bit speed is fairly critical, many ( all?) WS2812 style chips will tolerate inter-byte gaps of up to a couple of hundred uS before seeing the gap as a frame reset

With 3 bits per UART byte, you need to send 8 bytes -- a full RGB -- before WS2812 bytes line up with UART bytes again.

All this shifting and masking and extracting 3 bit fields that span bytes and reassembling doesn't look like much less work than just sampling a bit and then toggling a GPIO twice with a few NOPs in between (which an 8 MHz AVR can do no problem).

The only advantage would be if your CPU is significantly faster than needed then it can preload a buffer (either in the UART itself, or for DMA) and then get on with doing something else instead of waiting around.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3303
  • Country: ca
Re: CH32v003 gpio speed
« Reply #24 on: December 17, 2024, 10:40:10 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.

How do you hide the start and stop bits? You must have to use something like 2.5 Mbps, which probably very few UARTs can do -- that's 22x the highest commonly used speed! (115200)

Think you'd also need an inverter.
Most UARTS can do at least clk/16, some up to clk/4.

I have just looked at the datasheet  - it does CLK/16, and it supports rates up to 3 MBaud, which would imply 48 MHz clock.

However, unless I missed it, the chip doesn't seem to offer control of output polarity :(
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf