Author Topic: CH32v003 gpio speed  (Read 3610 times)

0 Members and 1 Guest are viewing this topic.

Offline HeindalTopic starter

  • Contributor
  • Posts: 16
  • Country: us
CH32v003 gpio speed
« on: December 16, 2024, 04:09:37 pm »
So i am trying to get nanosecond timing on ch32v003 for neopixel. the minimum time it can do at 48 MHz is 20.8 ns which is exactly one "nop" instruction. My first problem lies in the max speed i can get out of gpio is 4.80MHz when measuring with an  oscillascope with toggle code being inside setup(). Inside Loop() its reduced to 1.86 MHz. Second issue is that The low level on  gpio stays for far longer than it should. If the high state is 100ns (80ns for direct memory access + 20ns for nop) and the low state is irregular with 300-400ns. And i know adafruit library works with it. But adafruit library and uart on ch32v003 is unable to coexist (uart breaks timing)

 GPIOD->BSHR = GPIO_BSHR_BS3;  // Set PD3 high
 Waitns(); // This code is inlined and only contains a single nop instruction
 GPIOD->BSHR = GPIO_BSHR_BR3;  // Set PD3 low
 Waitns(); // This code is inlined and only contains a single nop instruction

I know the loop() also affects gpio speed but this problem also happens in setup.(Arduino btw). Btw im using tssop 20 adapter board. PD3 is connected to Neopixel.
btw i have succeeded in getting neopixel to work by more or less manually making a block of code take 400ns for example(asm loops and nops). But it only works for 48MHz. As for using timers...im not sure if uart or spi will use them because i will be using all uart + spi + neopixel in the same sketch.  The mcu will receive commands via uart then neopixel will  give an indication of what the mcu is doing for example reading flash etc.
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 9495
  • Country: fi
Re: CH32v003 gpio speed
« Reply #1 on: December 16, 2024, 04:27:18 pm »
This is a regular discussion; in a nutshell,

1) write to GPIOD->BSHR takes more than one instruction, check the compiler output to see exactly what; but it usually involves loading the constant from memory and writing it to another address

2) not all memory is equally fast, peripheral registers are usually behind a synchronization barrier on higher-end MCUs so access takes more clock cycles than a normal memory access.

Conclusion is always, try to use peripherals (like SPI) to do timing-sensitive things if at all possible.
 

Offline HeindalTopic starter

  • Contributor
  • Posts: 16
  • Country: us
Re: CH32v003 gpio speed
« Reply #2 on: December 16, 2024, 04:49:02 pm »
if i only needed to use neopixel then yeh spi could be used but spi is currently used by external flash.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11947
  • Country: us
    • Personal site
Re: CH32v003 gpio speed
« Reply #3 on: December 16, 2024, 05:00:57 pm »
You may also want to try relocating time-critical code into the SRAM. This way you will not be incurring flash wait state penalty.
Alex
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 9495
  • Country: fi
Re: CH32v003 gpio speed
« Reply #4 on: December 16, 2024, 05:18:31 pm »
if i only needed to use neopixel then yeh spi could be used but spi is currently used by external flash.

Really, get a microcontroller with two SPIs. They start at some tens of cents. Working with very limited, underperforming microcontrollers can be "fun" but prepare for a lot of learning experiences.
 
The following users thanked this post: thm_w

Online voltsandjolts

  • Supporter
  • ****
  • Posts: 2603
  • Country: gb
Re: CH32v003 gpio speed
« Reply #5 on: December 16, 2024, 05:25:15 pm »
if i only needed to use neopixel then yeh spi could be used but spi is currently used by external flash.

If you really can't change to a device with two spi peripherals, then use the one hardware spi for neopixel and bit-bang the flash - it's fine with slow spi.
 

Online HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1619
  • Country: gb
Re: CH32v003 gpio speed
« Reply #6 on: December 16, 2024, 07:13:35 pm »
I suspect one factor behind how fast GPIO can run is what the peripheral clock is running at. The peripheral clock for GPIOD runs from APB2, which sources its clock from AHB, a.k.a. HCLK. This might be subdivided from the main SYSCLK, and not running at the full speed. We don't know what OP's clock configuration is, but it seems they are using the Arduino framework, so if they're running the official WCH one, then we can possibly eliminate this as a problem because the WCH HAL code, when setting up for 48 MHz operation (regardless of whether HSI or HSE oscillator source), sets HPRE, the AHB prescaler, to divide-by-1 (i.e. undivided). So OP may already have HCLK as fast as it can go.

1) write to GPIOD->BSHR takes more than one instruction, check the compiler output to see exactly what; but it usually involves loading the constant from memory and writing it to another address

Assuming the GPIOD base address has already been loaded into a register, yes, it takes two instructions to set BSHR: one to load the literal value into a register, and a second to write that value to the register.

Code: [Select]
li a1,0x8
sw a1,16(a0)    ; GPIOD base addr already in in A0

However, if the compiler is being sensible, then with a simple scenario like OP's benchmark where a pin is simply being toggled in a loop by assigning constant values to BSHR, the compiler will put the li loading instructions outside the loop, so in effect only a single instruction is needed to set the GPIO pin. I seem to recall there was a thread discussing this recently.

You may also want to try relocating time-critical code into the SRAM. This way you will not be incurring flash wait state penalty.

Yeah, running at 48 MHz necessitates 1 wait state for flash access.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11947
  • Country: us
    • Personal site
Re: CH32v003 gpio speed
« Reply #7 on: December 16, 2024, 07:40:41 pm »
And not relying on the compiler and wiring critical things by hand may also improve things. If you write the whole LED update loop by hand, you will have the best possible control over the timing.
Alex
 

Online HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1619
  • Country: gb
Re: CH32v003 gpio speed
« Reply #8 on: December 16, 2024, 07:50:37 pm »
By the way, there is also a technique for driving NeoPixels (a.k.a. WS2812) involving timers and DMA. I forget the exact details right now, but it involves using PWM mode of a timer to output the bit stream by toggling the duty cycle appropriately for the 1s and 0s of each bit. DMA is used to load a pre-computed buffer of compare values representing each bit into the compare value register of the timer. The timer triggers a DMA transfer of the next value every time the timer counter cycles (i.e. at 800 kHz). For the NeoPixel reset/latch period, either handle the DMA 'transfer complete' IRQ and disable timer/DMA and wait the reset period by some other means, or I suppose you could tag on a bunch of 'null' pixels on to the end of the buffer to simulate the reset period.

However, this technique isn't really usable on something with limited memory like the CH32V003 (2KB) if you want to dynamically change the pixel patterns, because of how much space the array of compare values for each bit takes up. Timer compare values are 16-bit, so each bit of a 24-bit pixel colour value requires 2 bytes. Assuming it'd even be possible to utilise the entire 2048 bytes of RAM, you'd only be able to handle (2048/2)/24 = 42 pixels. You can use flash for DMA source, but that means pixel patterns would have to be fixed. So, basically, this technique is really only useful on MCUs with larger amounts of memory (e.g. with 16KB you can handle 341 pixels).

(Hmm, side thought: would one actually need to write the entire 16-bit value to the CHnCVR register? If the compare values for '1' and '0' work out to <256, then store as a single byte and only make 8-bit transfers with DMA? Would that work?)
« Last Edit: December 16, 2024, 08:07:06 pm by HwAoRrDk »
 

Offline HeindalTopic starter

  • Contributor
  • Posts: 16
  • Country: us
Re: CH32v003 gpio speed
« Reply #9 on: December 16, 2024, 08:09:56 pm »
yeh im running the offical wch one. With also the lastest source so i have the clock config menu which is set to 48 MHz.  I know with esp32 series mcus you use IRAM_ATTR to put the code in iram but what is the attr for wch.

I dont need to use it for alot of neopixels just one. all the neopixel is used for is a status indicator.  also the ch32v003 does have like 1.5k to 1.7k free ram
« Last Edit: December 16, 2024, 08:16:12 pm by Heindal »
 

Online HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1619
  • Country: gb
Re: CH32v003 gpio speed
« Reply #10 on: December 16, 2024, 09:04:21 pm »
I know with esp32 series mcus you use IRAM_ATTR to put the code in iram but what is the attr for wch.

ESP-IDF's IRAM_ATTR is simply a convenience macro for a GCC function attribute:

Code: [Select]
#define IRAM_ATTR __attribute__((section(".iram1")))

Basically, it tells the compiler that "this function should reside in section <x>". Then you have an entry in the linker script to define that this section exists within RAM, and its contents should be loaded there from flash at startup. The startup code needs to specifically copy this section from flash to RAM.

You would use the same approach for pretty much any microcontroller that you're compiling/linking code for with GCC.

It seems in latest versions of WCH's HAL code, they have catered to this by defining a ".highcode" section in linker script and startup code. You'd use it by just adding __attribute__((section(".highcode"))) as an attribute to a function. However, this is not (currently) present in the Arduino core's linker script and startup code.

Instead you might be able to just use a ".data" sub-section. I don't really see any difference in how the two sections - .highcode and .data - are defined (in terms of alignment, etc.) in the linker script.

Code: [Select]
__attribute__((section(".data.my_function")))
void my_function(int blah, int foo) {
    // etc...
}

Everything in .data, and .data.*, is already copied from flash to RAM at reset by the startup code.
 

Offline HeindalTopic starter

  • Contributor
  • Posts: 16
  • Country: us
Re: CH32v003 gpio speed
« Reply #11 on: December 16, 2024, 09:38:58 pm »
tried  putting code in the .data section like you said. Theres no difference. So im guessing they havent added the code to support it yet in 1.0.4 source
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11947
  • Country: us
    • Personal site
Re: CH32v003 gpio speed
« Reply #12 on: December 16, 2024, 10:42:17 pm »
Find the linker script and see what sections are defined in your case. Or just add what you need.

It is not guaranteed to improve thing, but you should verify that the code is indeed placed in the SRAM by looking at the map file or disassembly.
Alex
 

Online HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1619
  • Country: gb
Re: CH32v003 gpio speed
« Reply #13 on: December 17, 2024, 01:33:22 am »
tried  putting code in the .data section like you said. Theres no difference. So im guessing they havent added the code to support it yet in 1.0.4 source

The point of putting it in the .data section was that you shouldn't need to do anything extra for it to work - linker script and startup code should already handle things as-is.

I tried it (albeit, not with Arduino core - but my linker script and startup code are identical in how they handle .data), and it does work.

In fact, it intrigued me whether there is actually any measurable difference in execution speed between running code from flash and from RAM. So I put together a small benchmark program:

Code: [Select]
/* includes, init functions, etc snipped for brevity */

static void systick_init(void) {
SysTick->CNT = 0;
SysTick->SR = 0;
SysTick->CMP = UINT32_MAX;
SysTick->CTLR = STK_STRE | STK_STCLK;
}

static inline void systick_start(void) {
SysTick->CNT = 0;
SysTick->SR = 0;
SysTick->CTLR |= STK_STE;
}

static inline uint32_t systick_stop(void) {
SysTick->CTLR &= ~STK_STE;
return SysTick->CNT;
}

#define TEST_ITERATIONS 200000
#define TEST_FUNC_BODY(i) \
do { \
volatile uint32_t count = (i); \
while(count-- > 0); \
while(count++ < (i)); \
while(count-- > 0); \
} while(0)

__attribute__((noinline)) static void test_func_flash(const uint32_t iters) {
TEST_FUNC_BODY(iters);
}

__attribute__((section(".data.test_func_ram"), noinline)) static void test_func_ram(const uint32_t iters) {
TEST_FUNC_BODY(iters);
}

int main(void) {
uint32_t ticks_flash, ticks_ram;

clock_init();
gpio_init();
systick_init();
uart_init(HSI_VALUE, UART_BAUD_RATE);

printf("----------------------------------------\n");
printf("RAM EXEC TEST\n");
printf("----------------------------------------\n");

printf("test_func_flash() address: %p\n", test_func_flash);
printf("test_func_ram() address: %p\n", test_func_ram);

systick_start();
test_func_flash(TEST_ITERATIONS);
ticks_flash = systick_stop();

printf("test_func_flash() execution: %lu ticks\n", ticks_flash);

systick_start();
test_func_ram(TEST_ITERATIONS);
ticks_ram = systick_stop();

printf("test_func_ram() execution: %lu ticks\n", ticks_ram);

while(true);
}

The linker map shows:

Code: [Select]
.text.test_func_flash
                0x00000828       0x28 obj\Release\main.o

 .data.test_func_ram
                0x20000000       0x28 obj\Release\main.o

And the output I get is:

Code: [Select]
----------------------------------------
RAM EXEC TEST
----------------------------------------
test_func_flash() address: 0x828
test_func_ram() address: 0x20000000
test_func_flash() execution: 1600035 ticks
test_func_ram() execution: 1600038 ticks

This is with the CH32V003 running at 24 MHz, with SYSCLK = HSI, HCLK = SYSCLK/1, and zero flash wait states. SysTick runs from HCLK/1, so should also be counting at 24 MHz.

As you can see, no meaningful difference. In fact, execution time from RAM actually consistently seems to be 3-4 ticks slower than executing from flash for some reason. :-//

Edit: Oh, I see why there's a difference. The code isn't quite identical between the two test cases. The RAM function is called slightly differently: after the timer is started, there's an extra auipc instruction and jalr is used to call the function. That's probably where the extra few ticks comes from.

Code: [Select]
sw zero,8(s0)                                      sw zero,8(s0)
sw zero,4(s0)                                      sw zero,4(s0)
lw a5,0(s0)                                        lw a5,0(s0)
lui a0,0x31                                         lui a0,0x31
addi a0,a0,-704 # 30d40 <_data_lma+0x301a8>      addi a0,a0,-704 # 30d40 <_data_lma+0x301a8>
ori a5,a5,1                                         ori a5,a5,1
sw a5,0(s0)                                        sw a5,0(s0)
jal 828 <test_func_flash>                           auipc ra,0x1ffff
                                                    jalr 1690(ra) # 20000000 <test_func_ram>

I shall try with 48 MHz HSI and 1 wait state and see what the results are for that - but I need to figure out the clock configuration code for that first. :P
« Last Edit: December 17, 2024, 01:46:15 am by HwAoRrDk »
 
The following users thanked this post: edavid

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15984
  • Country: fr
Re: CH32v003 gpio speed
« Reply #14 on: December 17, 2024, 01:58:11 am »
if i only needed to use neopixel then yeh spi could be used but spi is currently used by external flash.

I would suggest considering using a timer with PWM output or another appropriate output compare mode with DMA.
 

Offline HeindalTopic starter

  • Contributor
  • Posts: 16
  • Country: us
Re: CH32v003 gpio speed
« Reply #15 on: December 17, 2024, 03:02:24 am »
maybe in my case putting it in ram would not improve anything since a nop still would take 20ns and memory access might be  bottlenecked by another thing. has anyone gotten gpio  to work with 30/50MHz in Output mode. Idk datasheet Port Configuration Register(GPIOx_CFGLR) says max is 30 MHz but wch chip diagram show HB bus Fmax  as 50 MHz.

since im only geting 4.8 MHz  i assume by default its at 10MHz since manually toggling would halve the frequency
« Last Edit: December 17, 2024, 03:04:53 am by Heindal »
 

Online HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1619
  • Country: gb
Re: CH32v003 gpio speed
« Reply #16 on: December 17, 2024, 04:35:55 am »
Results of my experiment for 48 MHz HSI (i.e. SYSCLK = PLL, PLL fed by HSI) and 1 flash wait state:

Code: [Select]
----------------------------------------
RAM EXEC TEST
----------------------------------------
test_func_flash() address: 0x828
test_func_ram() address: 0x20000000
test_func_flash() execution: 2000039 ticks
test_func_ram() execution: 1600041 ticks

So, conclusion is that running code from RAM is only faster when running with 1 flash wait state necessitated by clock being greater than 24 MHz.

has anyone gotten gpio  to work with 30/50MHz in Output mode. Idk datasheet Port Configuration Register(GPIOx_CFGLR) says max is 30 MHz but wch chip diagram show HB bus Fmax  as 50 MHz.

Those GPIO output mode settings of 2/10/30 MHz are setting the drive strength of the pin's output. Different drive strengths will affect the slew rate of the voltage output by the pin - i.e. how fast the transitions between high/low are. It doesn't dictate the actual frequency with which the pin can be toggled. They describe the settings in terms of MHz because those are the maximum signal frequencies which those drive strengths are suitable for. Whether you can actually generate an output signal of such frequency from the GPIO is another matter. Typically only special-purpose peripherals like SPI, PWM, etc. will be outputting signals of such high frequencies, so you only really need a high drive strength in those scenarios. Otherwise, for EMI reasons you should typically always select the lowest drive strength.

Not sure which diagram you're referring to that mentions 50 MHz. ???
 
The following users thanked this post: whitehorsesoft

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3305
  • Country: ca
Re: CH32v003 gpio speed
« Reply #17 on: December 17, 2024, 08:20:40 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.
 

Online mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 14135
  • Country: gb
    • Mike's Electric Stuff
Re: CH32v003 gpio speed
« Reply #18 on: December 17, 2024, 08:48:52 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.
or the PWM peripheral
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4853
  • Country: nz
Re: CH32v003 gpio speed
« Reply #19 on: December 17, 2024, 09:24:04 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.

How do you hide the start and stop bits? You must have to use something like 2.5 Mbps, which probably very few UARTs can do -- that's 22x the highest commonly used speed! (115200)

Think you'd also need an inverter.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3305
  • Country: ca
Re: CH32v003 gpio speed
« Reply #20 on: December 17, 2024, 10:05:02 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.

How do you hide the start and stop bits? You must have to use something like 2.5 Mbps, which probably very few UARTs can do -- that's 22x the highest commonly used speed! (115200)

Think you'd also need an inverter.

Yes, you need to invert the output, which I suppose and modern UART module can do.

Before inversion, to encode 1 you need to transmit the following bauds: 001, to encode 0 use 011

The start bit is always 0 which is coinsides with the start bit, then 2 bits of data, then 3 bits for the second datum, and then 3 bits for the third. The stop bit will give you a pause.

Something like this

Code: [Select]
#define BIT1_0 0x03
#define BIT1_1 0x02
#define BIT2_0 0x18
#define BIT2_1 0x10
#define BIT3_0 0xc0
#define BIT3_1 0x80

combine these as needed with or and send. For example

Code: [Select]
BIT1_0 | BIT2_1 | BIT3_1
will send 0 - 1 - 1 to WS2812

or

Code: [Select]
#define BASE 0x92
#define BIT1 0x01
#define BIT2 0x08
#define BIT2 0x40

As to the speed, you need around 3 Mbaud. For 24 MHz clock, it is CLK/8. Shouldn't be a problem for an UART module. You don't need very precise baud rate as WS2812 has a huge margin.

<edit>corrected errors in numbers
« Last Edit: December 17, 2024, 10:23:15 pm by NorthGuy »
 

Online mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 14135
  • Country: gb
    • Mike's Electric Stuff
Re: CH32v003 gpio speed
« Reply #21 on: December 17, 2024, 10:22:19 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.

How do you hide the start and stop bits? You must have to use something like 2.5 Mbps, which probably very few UARTs can do -- that's 22x the highest commonly used speed! (115200)

Think you'd also need an inverter.
Most UARTS can do at least clk/16, some up to clk/4.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Online mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 14135
  • Country: gb
    • Mike's Electric Stuff
Re: CH32v003 gpio speed
« Reply #22 on: December 17, 2024, 10:24:35 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.

How do you hide the start and stop bits? You must have to use something like 2.5 Mbps, which probably very few UARTs can do -- that's 22x the highest commonly used speed! (115200)

Think you'd also need an inverter.

Yes, you need to invert the output, which I suppose and modern UART module can do.

Before inversion, to encode 1 you need to transmit the following bauds: 001, to transmit 011

The start bit is always 0 which is coinsides with the start bit, then 2 bits of data, then 3 bits for the second datum, and then 3 bits for the third. The stop bit will give you a pause.

Something like this

Code: [Select]
#define BIT1_0 0x02
#define BIT1_1 0x03
#define BIT2_0 0x10
#define BIT2_1 0x18
#define BIT3_0 0x80
#define BIT3_1 0xc0

combine these as needed with or and send. For example

Code: [Select]
BIT1_0 | BIT2_1 | BIT3_1
will send 0 - 1 - 1 to WS2812

or

Code: [Select]
#define BASE 0x92
#define BIT1 0x01
#define BIT2 0x08
#define BIT2 0x40

As to the speed, you need around 3 Mbaud. For 24 MHz clock, it is CLK/8. Shouldn't be a problem for an UART module. You don't need very precise baud rate as WS2812 has a huge margin.
Something else that can be useful to know is that although bit-to-bit speed is fairly critical, many ( all?) WS2812 style chips will tolerate inter-byte gaps of up to a couple of hundred uS before seeing the gap as a frame reset
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4853
  • Country: nz
Re: CH32v003 gpio speed
« Reply #23 on: December 17, 2024, 10:39:37 pm »
Something else that can be useful to know is that although bit-to-bit speed is fairly critical, many ( all?) WS2812 style chips will tolerate inter-byte gaps of up to a couple of hundred uS before seeing the gap as a frame reset

With 3 bits per UART byte, you need to send 8 bytes -- a full RGB -- before WS2812 bytes line up with UART bytes again.

All this shifting and masking and extracting 3 bit fields that span bytes and reassembling doesn't look like much less work than just sampling a bit and then toggling a GPIO twice with a few NOPs in between (which an 8 MHz AVR can do no problem).

The only advantage would be if your CPU is significantly faster than needed then it can preload a buffer (either in the UART itself, or for DMA) and then get on with doing something else instead of waiting around.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3305
  • Country: ca
Re: CH32v003 gpio speed
« Reply #24 on: December 17, 2024, 10:40:10 pm »
WS2812? Use UART. You can encode 3 bit values per a transmitted byte.

How do you hide the start and stop bits? You must have to use something like 2.5 Mbps, which probably very few UARTs can do -- that's 22x the highest commonly used speed! (115200)

Think you'd also need an inverter.
Most UARTS can do at least clk/16, some up to clk/4.

I have just looked at the datasheet  - it does CLK/16, and it supports rates up to 3 MBaud, which would imply 48 MHz clock.

However, unless I missed it, the chip doesn't seem to offer control of output polarity :(
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3305
  • Country: ca
Re: CH32v003 gpio speed
« Reply #25 on: December 17, 2024, 10:59:03 pm »
All this shifting and masking and extracting 3 bit fields that span bytes and reassembling doesn't look like much less work than just sampling a bit and then toggling a GPIO twice with a few NOPs in between (which an 8 MHz AVR can do no problem).

Assuming you want to do other things too, you would need to prepare a buffer, fill it with data, let it sail and free the CPU.

For every bit you want to send, you need to

- with PWM - select one or the other duty cycle and store the result
- with UART - if the bit is 0 then do "or" and store after every 3 bits

If you want to save processing time, generating PWM buffers will be somewhat faster (unless there are too many bus wait cycles) than UART.

If you want to prefill more buffers, UART will let you store 3 times more data (6 times if PWM duty cycle is 16-bit long).
 

Online mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 14135
  • Country: gb
    • Mike's Electric Stuff
Re: CH32v003 gpio speed
« Reply #26 on: December 18, 2024, 12:03:51 am »
All this shifting and masking and extracting 3 bit fields that span bytes and reassembling doesn't look like much less work than just sampling a bit and then toggling a GPIO twice with a few NOPs in between (which an 8 MHz AVR can do no problem).

Assuming you want to do other things too, you would need to prepare a buffer, fill it with data, let it sail and free the CPU.

For every bit you want to send, you need to

- with PWM - select one or the other duty cycle and store the result
- with UART - if the bit is 0 then do "or" and store after every 3 bits

If you want to save processing time, generating PWM buffers will be somewhat faster (unless there are too many bus wait cycles) than UART.

If you want to prefill more buffers, UART will let you store 3 times more data (6 times if PWM duty cycle is 16-bit long).
Can the CH32V003 do DMA to a PIO port ? if so, creating a bit pattern in memory and DMAing it could be another option, maybe 1 byte at a time if your ws2812's are OK with inter-byte gaps.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline HeindalTopic starter

  • Contributor
  • Posts: 16
  • Country: us
Re: CH32v003 gpio speed
« Reply #27 on: December 18, 2024, 01:20:37 pm »
In my case uart and spi cannot be used because they already serve a function. Uart is needed to receive instructions over serial port while spi is connected to external chip. The spi code to read flash takes up space already if i were to bitbang spi then it would take up alot more space. My end goal is to make a flash programmer . The neopixel is just for status purposes. Like one color would represent a erase or read or write etc.

if i use adafruit library i would have to stop uart then update neopixel then start uart again to display or send instructions. But then i would be limited by ch32v003 flash again because of uart + spi + adafruit lib. And assuming i can trim down the required flash size..id prolly lose instructions i sent over  uart when neopixel is being updated.

So id either have to use one of my more powerful mcus like ch32v203 , rp2040 or esp series. But that would be overkill and not much of a learning experience. This is my first project with ch32v003 so kinda want to see it through.

Thats why I am looking for a better way to use neopixel. I think pwm is still available but not sure if it is possible using that.

lol attached is the circuit i have now. The external flash will be removed later so i can use any 3.3v flash and some form of mounting will be made. There is a usb to serial chip under the pcb ( ch340c - 16 pin). The pcb has the contacts only on one side so  wires had to be used.
« Last Edit: December 18, 2024, 01:52:19 pm by Heindal »
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3305
  • Country: ca
Re: CH32v003 gpio speed
« Reply #28 on: December 18, 2024, 03:01:49 pm »
So id either have to use one of my more powerful mcus like ch32v203 , rp2040 or esp series. But that would be overkill and not much of a learning experience. This is my first project with ch32v003 so kinda want to see it through.

When choosing MCU, look at periphery first, assign the tasks to the periphery, then decide on the MCU. An MCU with a suitable set of periphery goes a long way.

If it's only one WS2812 for status, simply bit-bang it.

It may be beneficial to use bit-banging for the flash too (using CPU or perhaps DMA). This way you can use QSPI mode which is much faster.
 

Online mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 14135
  • Country: gb
    • Mike's Electric Stuff
Re: CH32v003 gpio speed
« Reply #29 on: December 18, 2024, 03:13:03 pm »

If it's only one WS2812 for status, simply bit-bang it.

or use a vanilla RGB LED
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline rhodges

  • Frequent Contributor
  • **
  • Posts: 355
  • Country: us
  • Available for embedded projects.
    • My public libraries, code samples, and projects for STM8.
Re: CH32v003 gpio speed
« Reply #30 on: December 19, 2024, 06:52:36 pm »
For what it's worth, I use SPI and DMA for the WS2812B. Each LED uses 9 bytes, encoded as 24 color bits and 3 SPI bits each. I set the CPU to 48mhz and divide by 5, and SPI divisor of 4 to get the SPI 2.4mhz.
Here is my code to create the SPI bit table, if anyone is interested.
Currently developing embedded RISC-V. Recently STM32 and STM8. All are excellent choices. Past includes 6809, Z80, 8086, PIC, MIPS, PNX1302, and some 8748 and 6805. Check out my public code on github. https://github.com/unfrozen
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4361
  • Country: us
Re: CH32v003 gpio speed
« Reply #31 on: December 20, 2024, 01:46:42 am »
Hmm.
CH32v003 datasheet: 2KB volatile data storage area SRAM

SPI-based code:
Code: [Select]
    uint8_t    pixtab[256][3];
Of course, with the 32v003 prices, you might be best off just using an entire chip just for controlling the LEDs
 

Offline ocelot

  • Contributor
  • Posts: 12
  • Country: gb
Re: CH32v003 gpio speed
« Reply #32 on: December 22, 2024, 08:35:04 am »
On an STM32 project with the WS2812, I used SPI with DMA and because I could not get the clock rate on the SPI precisely correct for the WS2812, I oversampled the data, by a factor of about 2.5.  The WS2812 can accept some jitter in the width of the narrow and wide pulses .
So the process was to build up the message in RAM, encoding the pattern as repeated 1 and 0 bits, then point the DMA controller at it, with the SPI data register as the destination. 
 

Offline pastaclub

  • Contributor
  • Posts: 48
  • Country: th
Re: CH32v003 gpio speed
« Reply #33 on: January 12, 2025, 06:42:18 am »
Timing is tight to distinguish a 0 from a 1, but between the data bit you have up to 5us of time. So you can do all the data fetching, shifting and branching between the transmission of the bits, and then you have short blocks of assembly with fixed known execution times for transmitting a 0 and for transmitting a 1. No DMA, no peripheral, no worries about which RAM is used, just bit banging.

People have done this with 8MHz MCUs, so for sure it can‘t be impossible on a 48Mhz MCU.

For timing, check this:
https://wp.josh.com/2014/05/13/ws2812-neopixels-are-not-so-finicky-once-you-get-to-know-them/
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6438
  • Country: es
Re: CH32v003 gpio speed
« Reply #34 on: January 12, 2025, 07:03:55 am »
Commands normally use very low bandwith, so why not make a software uart for them? 9600baud should be easy.
Then use the hw uart for the neopixel.

Or, can't you remap the spi pin?
Change it on the fly before and after the neopixel transfer.
The flash won't care about anything with CS inactive.
Create a busy flag so nothing else uses the spi.

And use DMA!

Also check these fast rgb expander routines for spi/uart:
https://www.eevblog.com/forum/microcontrollers/more-fun-with-less-4k-ram-100-x-ws2812bs/msg4525124/#msg4525124
« Last Edit: January 12, 2025, 07:25:21 am by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf