Author Topic: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?  (Read 9174 times)

0 Members and 1 Guest are viewing this topic.

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5890
  • Country: es
ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« on: May 24, 2022, 12:56:05 pm »
Just got one of these to play with.
Perhabs I'm missing out some details about bus interconnection/speeds, but toggling a GPIO in the simplest way, placed in IRAM, reaches 8MHz, which seems way too slow for fecking 240MHz CPU.
Also, if I run a second thread in Core 1 toggling a different pin, effective rate halves. Not impressed at all.
Additonally it has periodic jittering due the ESP system interrupts, which is expected, yet to find if it's possible to disable all that, having full control of the system.
Code: [Select]
#define LED GPIO_NUM_4
void setup(){
pinMode(LED, OUTPUT);
}
void IRAM_ATTR loop(){                                                    // Unrolled loop to avoid cache miss / branches
  while(1){
    WRITE_PERI_REG(GPIO_OUT_W1TS_REG,1<<LED); // Set led pin high. GPIO_OUT_W1TS / GPIO_OUT_W1TC work like STM32 BSRR registers (Set/Reset mask)
    WRITE_PERI_REG(GPIO_OUT_W1TC_REG,1<<LED); // Set led pin low
    WRITE_PERI_REG(GPIO_OUT_W1TS_REG,1<<LED); // And so on
    WRITE_PERI_REG(GPIO_OUT_W1TC_REG,1<<LED);
    WRITE_PERI_REG(GPIO_OUT_W1TS_REG,1<<LED);
    WRITE_PERI_REG(GPIO_OUT_W1TC_REG,1<<LED);
    WRITE_PERI_REG(GPIO_OUT_W1TS_REG,1<<LED);
    WRITE_PERI_REG(GPIO_OUT_W1TC_REG,1<<LED);
  }
}
So 240MHZ dual-core, but performs like a 45HP car with 300KG in the trunk and a brick under the gas pedal... but hey, with "Sport", "Turbo" and "V6" stickers  :-DD
« Last Edit: May 24, 2022, 01:13:51 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #1 on: May 24, 2022, 01:21:30 pm »
I think this is the same issue I found here
https://www.eevblog.com/forum/microcontrollers/32f417-spi-running-at-one-third-the-speed-it-should/

These ARM32 processors use an ARM32 core (which they bought in from ARM as a "block") running at 100000GHz and then they tack on the various in-house designed (or bought-in) peripherals which not only do not run at anywhere near the CPU speed (they run off a "peripheral clock" which is the CPU clock divided by 2^N, and with quite a low max e.g. 50MHz) but also need multiple clocks (of this slow peripheral clock) to de-metastable data syncing.

So stuff like general I/O, reading the status registers of UARTs, etc, runs very slowly relative to what you may expect from the CPU clock speed.

The solution is to use DMA for as much as possible. Even for generating a fast waveform on a GPIO pin it may pay to use DMA to pick values out of a circular RAM buffer (which incidentally will avoid most of the jitter due to interrupt servicing).
« Last Edit: May 24, 2022, 01:23:12 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5890
  • Country: es
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #2 on: May 24, 2022, 01:33:24 pm »
Yeah, these were my thoughts. Didn't went in-depth on the architecture yet.
Makes sense, targetting IoT, the SoC might be oriented to data processing, not I/O power.
That's a bit sad, with faster peripherals it would blow out the STM32 out of the water: 8MB SPI flash/psram, 512K SRAM, HW hashing/crypto...
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Online hans

  • Super Contributor
  • ***
  • Posts: 1636
  • Country: nl
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #3 on: May 24, 2022, 02:21:33 pm »
I was looking at the ESP32 chipsets as a SPI WiFi/BLE bridge. There is some hosted firmware for it available (esp-hosted), but I was looking at the low-level details myself.. specifically SPI master/slave: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/spi_master.html

Lookup "Transaction Duration" and be amazed to find out that the overhead to set up a 1-byte "transaction" takes 25us. On 240MHz that is 6000 cycles |O. I bet there is some serious IDF overhead in there.. as I can't imagine the RTL code being *that* slow.

Regarding GPIO toggle speed.. IIRC a STM32H7 is not too dissimilar. The GPIOs are tucked away on a separate peripheral bus , which must be accessed via several bus bridges each which require bus arbitration, setup of a AXI bus (or similar) transaction, handshaking, etc. It's all to make the CPU go fast and be able to have the peripherals operate at a lower frequency, either for speed limitations of that logic (e.g. the peripheral bus with possibly a couple dozen slaves) or power consumption.

You'll probably find complexity of said peripherals go up exponentially to handle more complicated orchestras of pin wiggling using DMA without CPU interrupts.
 

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5890
  • Country: es
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #4 on: May 24, 2022, 02:50:29 pm »
Checking the reference manual:
https://espressif.com/sites/default/files/documentation/esp32-s3_technical_reference_manual_en.pdf

I've seen the APB bus runs at 80MHZ max, that's what the GPIO runs.
Effective 16MHz from 80MHz (each io cycle is 2 writes, high/low) might mean something like this:
- 1 cycle for CPU fetching pointer address
- 1 cycle for CPU write to pointer address
- 1 cycle for APB sync?
- 1 cycle for APB transfer?

But didn't found the details.
And yes, there's a lot of overhead, but sometimes understandable, as your code runs as a process managed by the expresiff OS (FreeRTOS IIRC), and it must take care of everything.
But 6K cycles seems a lot.
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #5 on: May 24, 2022, 02:53:36 pm »
A colleague is using the ESP32 also.

Pluses:
Lots of bang for the buck; much cheaper than ST32F4, because the "Western" vendors rip everyone off.
The ETH libs (including TLS) actually work because, reportedly, the company rented one of the "social media prominent" coders and paid him to sort out their libs, whereas ST just put out a load of sh*it written by a random collection of employees passing through ST, leaving users to spend months googling for bug fixes.
Can use SPI RAM chips, which is a big thing because many apps are either RAM-limited or can be made much more powerful if you have lots of RAM.

Minuses:
It is Chinese so a) will have far fewer "serious" users e.g. motor vehicles etc that drive long term mfg life; b) some "political risk" (sanction potential, etc, so basically forget getting any if China was ever dumb enough to get adventurous over Taiwan).

Pin waggling using DMA is really trivial. I have a waveform generator done that way, and once you have it running (I confess to paying someone to write the basic stuff) then it is just a few lines, and to wiggle a pin with a square wave you just need a circular buffer of 2 values. The period is set up with a timer. I posted the source code here...

Having been involved in this project for 1-2 years I reckon that 99% of embedded "IOT" doesn't need any ETH performance. It goes out over ADSL, or even 3G/4G. It just needs a solid code library. It talks to a private server anyway; running a public-facing HTTPS server will always be a long term disaster in an embedded product.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5890
  • Country: es
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #6 on: May 24, 2022, 03:41:12 pm »
Yes,I know, I wasn't trying to make any waveform, just checking some basics.
I've also tried Timer DMA->GPIO in STM32, also saving a lot of space using uint8 instead uint32 and DMA size to byte for both src and dst.
Only tried pointing to the GPIO base address, so this byte would write to pins 0-7, but I guess it could address any of the higher bytes of the 32-bit GPIO register.

Edit: Well, it seems like the second core doesn't really hurt the first one. It's definitely the peripheral bus bandwidth.
Made some cpu performance tests handling memory/floats : www.jdoodle.com/ia/rjS

Code: [Select]
Only Core 0: 4220ms
Only Core 1: 4273ms
Core 0: 4226ms
Core 1: 4235ms
Core 0: 4226ms
Core 1: 4235ms
« Last Edit: May 24, 2022, 07:22:31 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14431
  • Country: fr
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #7 on: May 24, 2022, 08:44:46 pm »
IO toggling rate as a measure of a CPU performance? Really? :-DD
 
The following users thanked this post: Siwastaja, m98, Buriedcode, cgroen, JPortici

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5890
  • Country: es
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #8 on: May 24, 2022, 10:20:11 pm »
This is not a CPU, but a MCU, targetting external circuitry, so yes, IO speed is very important.
You might want to interface some external device that can't be driven with any existing hardware peripheral, and as you may know, 8MHz isn't lighting fast, most basic PICs can toggle it faster while running a lot slower, so depending on the application, you might waste a lot of cycles.

I just found strange that it took 15 CPU cycles to toggle a bit, directly accessing the register, and 30 when running dual core, wasn't sure if this was a memory bottleneck or what, until I later found the APB bus runs at 80MHZ.
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 
The following users thanked this post: eugene

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #9 on: May 25, 2022, 04:05:20 am »
The ESP32 is primarily about the WiFi, the GPIO is somewhat limited but in practice that is not a problem for the sort of stuff it is intended to be used in. Certainly I've never needed to toggle GPIO at 8MHz on one, mostly I use SPI or I2C for more advanced peripherals and use the regular GPIO for stuff like LEDs.
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #10 on: May 25, 2022, 06:23:10 am »
There is a general emphasis on not making GPIO too fast because of EMC. A 500MHz CPU with a 25MHz external xtal radiates almost nothing but high slew rate GPIO is a nightmare. They could have programmable slew rate GPIO but I don't think they do. It's not easy to do. You can have multiple size mosfets and select which size you use; not sure if this is done.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Doctorandus_P

  • Super Contributor
  • ***
  • Posts: 3341
  • Country: nl
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #11 on: May 25, 2022, 07:27:30 am »
So 240MHZ dual-core, but performs like a 45HP car with 300KG in the trunk and a brick under the gas pedal... but hey, with "Sport", "Turbo" and "V6" stickers  :-DD

That is a very bad analogy.
The days of assessing a uC's performance by how fast it can toggle an I/O pin are over.
The days of squeezing the most out of your uC by hand optimised assembly and cycle counting to get "perfect" performance are also over.

And it's all because of a combination of progress and physical limitations.
Even micrcontrollers are getting things like caches to speed stuff up or "flash acellerators" that access the Flash in 512 it wide chunks because the flash is much slower then the processor itself.

And quick I/O toggling is also seldom needed for a uC. take for example a CNC controller running GRBL. Generating step frequencies of some 200kHz is adequate, but it needs a lot of buffering and processing of the text strings to translate G-code to stepper motor timings, and all that background processing is not timing critical.

If you really need fast I/O, then use an FPGA, or use a microcontroller with a dedicated fast peripheral that suits your application.

The weird thing is though:
Why does an anal-ogy and ass-essing remind me of a full bridge rectum fire?
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8167
  • Country: fi
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #12 on: May 25, 2022, 07:53:10 am »
This is not a CPU, but a MCU, targetting external circuitry, so yes, IO speed is very important.

No, GPIO toggling rate is almost never important.

Do you have any actual application in mind?

I mean, I have used MCUs to bitbang fast protocols. Program logic is always the problem. For example, a 400MHz core and 100MHz IO bus play along just fine. Compared to a 10MHz PIC or AVR, 400MHz core makes interrupt latency just disappear. It means you don't have to think about a few logical operations and maybe an if-else. Maybe you spend 40 cycles on that ISR + IO generation logic, but it's equivalent to 1 cycle(!!!) on that 10MHz AVR. Plus another AVR-equivalent cycle for the IO.

So you have the IO performance of the PIC/AVR which was perfectly fine for almost everything, but also get the CPU performance which allows you to just write normal applications and utilize ISRs, instead of some hacker-level hand assembly trickery.

IO latency of 1/16MHz = 62.5 ns is really fine for almost everything, but of course you can't bitbang (R)MII for 100M Ethernet, for example. Do you want to do something comparable to this, or what do you have in mind?
 

Online hans

  • Super Contributor
  • ***
  • Posts: 1636
  • Country: nl
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #13 on: May 25, 2022, 08:08:48 am »
The only measure of MCU performance is the one of your application. You may find yourself in a very tight IRQ where GPIO performance may matter, but that is indeed "almost never".

Then again you could also load a Coremark benchmark, and stare yourself blind on a AVR vs Cortex-m3 comparison. But the Coremark benchmark does a matrix multiplication; is that indicative of your MCU application? If not, maybe that AVR is just fine.

The IO performance of fast CPU's is relatively speaking appalling. But absolutely speaking, there is nothing wrong with a 50ns or 100ns pin toggle rate. It will likely only cause problems for e.g. toggling the CS of a SPI device too fast.
And if you want the last bit/s of performance you'll need to resort to DMA anyway, and hope that the manufacturer has sophisticated enough peripherals that it can play your orchestra of pin toggles to complete the transactions without 'too much' CPU intervention.
And if that's not possible.. it's very common and relatively easy to connect a FPGA (like a Lattice ICE40 or bigger) to such a MCU.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #14 on: May 25, 2022, 09:22:54 am »
Quote
These ARM32 processors use an ARM32 core (which they bought in from ARM as a "block") ...

I just thought I'd point out that none of the ESP processors use an ARM core.Most use an Xtensa core from "Tensilica" (same principles apply; Tensilica just isn't as successful as ARM.)  Some of the newer ESP chips used a RISC-V core.

A chip that doesn't specifically target "8bit replacement" is extremely likely to have very slow IO, compared to internal CPU clock rate.  On top of that, who knows what goes on in the SDK/OS that helps allow "easy" use of WiFi.  I mean, the code looks OK:
Code: [Select]
void IRAM_ATTR loop(){                                                    // Unrolled loop to avoid cache miss / branches
40375144:       004136          entry   a1, 32
  while(1){
    WRITE_PERI_REG(GPIO_OUT_W1TS_REG,1<<LED); // Set led pin high. GPIO_OUT_W1TS / GPIO_OUT_W1TC work like STM32 BSRR registers (Set/Reset mask)
40375147:       fcafa1          l32r    a10, 40374404 <_iram_text_start>
    WRITE_PERI_REG(GPIO_OUT_W1TC_REG,1<<LED); // Set led pin low
4037514a:       fcaf91          l32r    a9, 40374408 <_iram_text_start+0x4>
    WRITE_PERI_REG(GPIO_OUT_W1TS_REG,1<<LED); // Set led pin high. GPIO_OUT_W1TS / GPIO_OUT_W1TC work like STM32 BSRR registers (Set/Reset mask)
4037514d:       081c            movi.n  a8, 16
4037514f:       0020c0          memw
40375152:       0a89            s32i.n  a8, a10, 0
    WRITE_PERI_REG(GPIO_OUT_W1TC_REG,1<<LED); // Set led pin low
40375154:       0020c0          memw
40375157:       0989            s32i.n  a8, a9, 0
    WRITE_PERI_REG(GPIO_OUT_W1TS_REG,1<<LED); // And so on
40375159:       0020c0          memw
4037515c:       0a89            s32i.n  a8, a10, 0

But it wouldn't be entirely surprising if "memw" to specific memory regions cause a trap to OS code that carefully managed access by multiple CPUs/etc.  :-(

(alas, I find the xtensa instruction set particularly difficult to understand without having studied it.)
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8167
  • Country: fi
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #15 on: May 25, 2022, 10:09:58 am »
Quote
These ARM32 processors use an ARM32 core (which they bought in from ARM as a "block") ...
But it wouldn't be entirely surprising if "memw" to specific memory regions cause a trap to OS code that carefully managed access by multiple CPUs/etc.  :-(

I don't think so - it would be much slower.

Note that DavidAlfa's "8MHz" notation sounds slower than it actually is because if I understood correctly, this was square wave frequency, i.e., one IO latency is 1/16MHz. Not so bad at all - it's going to be much faster than 8-bit AVR running cbi/sbi for 2 cycles at 20MHz - i.e. 5MHz square wave.
 

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5890
  • Country: es
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #16 on: May 27, 2022, 03:19:39 pm »
Yes, I already said 16MHz effective rate! Not terrible, but I think the 80MHz bus is a bit silly when having 2x 240MHz cores.

The mouth-watering 8MB PSRAM sounds great, but in real-life is rather limited.
memcpy tests copying 32K uint8_t, interleaving two buffers to avoid caching:
Code: [Select]
TX SZ:          67108864 Bytes (64MB)
SRAM->SRAM:     176ms (363MB/s)
PSRAM->SRAM:    2428ms (26MB/s)
PSRAM->PSRAM:   7061ms (9MB)

No idea why SRAM is achieving 364MB/s?
I expected memcpy to copy one byte at a time, so at best 240MB/s for 240MHz cpu.
Changing the buffer to uint32_t throwed the same results.

Also, of 512KB, only 295KB were available for the user, with BT, Wifi...everything disabled. Was a bit of a disssapointment.
« Last Edit: May 27, 2022, 03:56:49 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8167
  • Country: fi
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #17 on: May 27, 2022, 03:56:47 pm »
It isn't silly at all. Fast cores are pipelined exactly to break down critical paths (largest number of logic gates between two flip-flops) into smaller ones. But bus, by definition, needs to go "everywhere". It can't run fast, given the same silicon process node, power consumption requirement, and style of design.

That's why there are usually multiple buses. Some are faster, some slower. For the fastest interfaces, single point-to-point links are used instead of shared bus.

80MHz for an IO bus is nothing surprising.

You are just seeing the effects that you can't scale up speed arbitrarily. Some parts scale easier, and are also more important - hence CPU is made faster, IO kept slower. This is fine because 99.99% of use cases need this, because to decide what IO operation you want to do, you usually spend dozens of instructions on the CPU.

Library or compiler supplied memcpy is always highly optimized and of course will use the full memory bandwidth (here, 32-bit moves). memcpy can see the alignment and the size from its arguments, so can start or end the copy with single-byte moves but do the bulk in full word writes.
« Last Edit: May 27, 2022, 03:59:18 pm by Siwastaja »
 
The following users thanked this post: SiliconWizard

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5890
  • Country: es
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #18 on: May 27, 2022, 05:34:39 pm »
You can always allow full speed and leave the performance/watt selection to the user, just like any stm32.
I suspected it was doing 32-bit transfers under the hood, but wasn't sure, thanks for clarifying.
I guess I could force 8-bit transfers by using unaligned addresses?
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8167
  • Country: fi
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #19 on: May 27, 2022, 05:53:52 pm »
If addresses are unaligned, memcpy would move the first/last few bytes as 8 or 16 bit moves, but move the aligned parts with 32 bit moves. A typical memcpy would thus have a few checks taking a few clock cycles extra to get started, but pay back quickly. Compiler might be even able to do some optimizations on compile-time today, I'm sure.

99% of the time, "stock" memcpy is nearly the fastest and definitely least-effort way to copy arbitrary data with any size and alignment.

If you want to test 8-bit moves, just write your own for(int i=0; i<1234; i++) u8_thing[ i ] = another_u8_thing[ i ];

And even if you don't enjoy writing assembly, it's a good idea to start reading the compiler listing / disassembly.
« Last Edit: May 27, 2022, 05:56:44 pm by Siwastaja »
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8167
  • Country: fi
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #20 on: May 27, 2022, 06:00:03 pm »
You can always allow full speed and leave the performance/watt selection to the user, just like any stm32.

But critical path delay is fixed in design. Not "any STM32" support over 80MHz bus speeds, either. Although the high end models support 240MHz (half of 480MHz CPU!) AHB where GPIOs are.
 

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5890
  • Country: es
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #21 on: May 27, 2022, 06:50:07 pm »
You're right, APB1 clock is usually half the core.
That's a good implementation of memcpy then. That's also what I thought when forcing unaligned access, but again, most of the time these things seem to be plain stupid, having to control everything yourself, so one never knows!
Anyways, I'm hating developing on this thing, probably going to the "misc" box soon. For some reason, I detest any arduino-thing!
« Last Edit: May 27, 2022, 06:52:07 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline rteodor

  • Regular Contributor
  • *
  • Posts: 122
  • Country: ro
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #22 on: May 27, 2022, 08:37:23 pm »
The IOMUX GPIO matrix might be the bottleneck. Ethernet Rx/Tx pins (25 or 50MHz) specifically go around the IOMUX GPIO matrix because it is too slow. I remember seeing this explanation in some of Espressif documents.

Later edit: it is GPIO matrix not IOMUX.
The wording they used in "ESP32 Technical Reference manual", Chapter 5.1 was: "Some high-speed digital functions (Ethernet, SDIO, SPI, JTAG, UART) can bypass the GPIO Matrix for better high-frequency digital performance. In this case, the IO_MUX is used to connect these pads directly to the peripheral."
But I think there are other mentions.
« Last Edit: May 27, 2022, 09:04:19 pm by rteodor »
 
The following users thanked this post: tooki

Offline DavidAlfaTopic starter

  • Super Contributor
  • ***
  • Posts: 5890
  • Country: es
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #23 on: May 27, 2022, 09:39:10 pm »
IOMUX is a direct connection to the peripheral, so there's no need of accessing the slow bus, thus avoiding the delay.
But to manually toggle a pin, you must use the matrix.
« Last Edit: May 27, 2022, 09:40:58 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1717
  • Country: se
Re: ESP32-S3 (Dual-core 240MHz blah blah) crappy performance?
« Reply #24 on: May 27, 2022, 10:58:17 pm »
Compiler might be even able to do some optimizations on compile-time today, I'm sure.

With -Ofast, gcc will inline fixed length memcpy with size up to 64 bytes, clang up to 16 bytes (using target cortex-m7).

As for memcpy itself, the full glibc (generic target) will split the copy in  the three phases (unaligned, aligned, leftovers) over a threshold of 16 and use byte copy otherwise.

But for MCUs, newlib-nano 1, 2 and derivatives such as picolibc  will copy word by word (long, in fact) with some loop unrolling if the source and destination addresses are both word aligned, but if not, it will revert to byte by byte copy.
Nandemo wa shiranai wa yo, shitteru koto dake.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf