Author Topic: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6  (Read 7508 times)

0 Members and 1 Guest are viewing this topic.

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3669
  • Country: gb
  • Doing electronics since the 1960s...
Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« on: May 06, 2021, 06:02:46 pm »
Here is one
https://www.st.com/en/microcontrollers-microprocessors/stm32f407vg.html

Here is the other
https://www.gigadevice.com/microcontroller/gd32f407vgt6/

Obviously the 2nd is meant to be a replacement for the first. My quick and dirty observations are:

1) Pinout is the same, except the two decoupling capacitor pins which the ST uses for its internal VCC are N/C on the GD.

2) The ST has 1MB flash while the GD has 512k+512k (code+data) which is rather strange - why? The data sheet contains no explanation.


3) At 168MHz, the ST needs 5 wait states on code running from flash (although they claim their "accelerator" prefetch thingy makes this effectively zero) while the GD says "zero wait states".

4) Their PCLK1/PCLK2 allocation to different peripherals is different (I wonder why)

5) GD has a slower SPI1 (30MHz versus 42MHz) - irrelevant for most apps

6) ST has abslute max 4V; GD is 3.6V, and some related diffs e.g. max on a 5VT pin

7) GD has 4x faster DAC (4msps)

Clearly the GD is not binary compatible especially on the peripherals.

The GD has a 69 page data sheet versus 201 pages for the ST :)

The ST data sheet is initial 2011; the GD data sheet is dated 2016.



« Last Edit: May 06, 2021, 06:06:50 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #1 on: May 06, 2021, 06:31:03 pm »
Code area is shadowed in the SRAM. So execution from that section is much faster (there are no wait states). They did not want to shadow the whole flash, since SRAM is expensive. And 512 KB is enough for a lot of applications to contain both the code and the data.

There are also minor differences in PLL configuration. So yes, it is not binary compatible with ST.

Not that on GD, the flash is actually a second die in the same package. The code for the first 512 KB is loaded from that flash into the shadow SRAM on reset. But if you have to hit the flash for the data or code, it would be pretty slow.
« Last Edit: May 06, 2021, 06:34:54 pm by ataradov »
Alex
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3669
  • Country: gb
  • Doing electronics since the 1960s...
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #2 on: May 06, 2021, 08:03:26 pm »
Interesting. I would expect RAM based code to be less robust in embedded systems, though of course all "PC" hardware runs like that.

I looked for pricing. They list Arrow, which is a horrible company to deal with. Digikey is listed but doesn't find the P/N. https://lcsc.com/ lists it at a price similar to the ST, which surprises me; I would think it ought to be quite a bit cheaper. But maybe it is in volume.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #3 on: May 06, 2021, 08:26:09 pm »
Why would it be less robust exactly? If anything flash is much harder to make stable over temperature variations.

Current prices on semiconductors are not a reflection on any reality. Distributors just charge whatever they want and customers are buying whatever they can get.
Alex
 
The following users thanked this post: harerod

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3669
  • Country: gb
  • Doing electronics since the 1960s...
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #4 on: May 06, 2021, 09:16:43 pm »
I would think RAM is more easily corrupted by EM. Is the code copy done on-chip, or in startup code (along with the usual zeroing of the RAM etc)?

Re prices, I've been in this business 40+ years and it has always gone through feast/famine cycles. I remember 74LS245 going from 25p to £2.50 and back to 25p in one year :) What happens is that prices (of commodity parts; smart designers use commodity parts for everything possible) naturally fall, and eventually they fall too far, then the distis start spreading the dreaded "a" word (allocation), then big company buyers start getting worried and start placing long orders which exhaust the supply pipeline, so prices rise, then everybody panics and buys everything they can, and 6 months later there is a bloodbath, and things return to normal :)

Would you say Gigadevice is a reputable company? I have been dealing with Chinese for 20+ years and they have done from mostly sort of respectably behaving, to make a fast buck and shamelessly screw everybody you can as much as you can and to hell with the consequences. Now I just buy simple stuff like cables, moulded parts, etc, from them, a year or two's stock at a time and usually the company is gone before the next order, and finished item production is all back in the UK.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ttt

  • Regular Contributor
  • *
  • Posts: 87
  • Country: us
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #5 on: May 06, 2021, 09:58:14 pm »
Would you say Gigadevice is a reputable company? I have been dealing with Chinese for 20+ years and they have done from mostly sort of respectably behaving, to make a fast buck and shamelessly screw everybody you can as much as you can and to hell with the consequences. Now I just buy simple stuff like cables, moulded parts, etc, from them, a year or two's stock at a time and usually the company is gone before the next order, and finished item production is all back in the UK.

I've done a project with a GD32F107 18 months back and had no issue backordering 300 qfp parts on a tray through lcsc.com. They came in a week later. Now, without lcsc I am not sure it would have been that easy and that's a concern. On the other hand the hardware/register set up is similar enough to where switching back and forth between a ST32 and GD32 part would not kill me software engineering wise. I see that as a plus as you have effectively a second source. How long ST will be around before they get swallowed up is anyone's guess. And there is always the risk that ST will finally do something about the obvious design rip-off.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #6 on: May 06, 2021, 10:15:25 pm »
I would think RAM is more easily corrupted by EM. Is the code copy done on-chip, or in startup code (along with the usual zeroing of the RAM etc)?
The copy is in the hardware. For the user it is totally transparent. You don't even have a write access to that SRAM, it just appears as normal flash, just very fast.

If your SRAM is corrupted, then your main SRAM would be corrupted too. SRAM is one of the most stable things inside the MCU. If there is EMI strong enough to corrupt it, you have bigger problems.

Would you say Gigadevice is a reputable company?
Yes, absolutely. They have been in the business for a long time. They were mostly focused on the flash devices, and MCUs is a relatively new thing, but they are doing it right.
« Last Edit: May 06, 2021, 10:17:55 pm by ataradov »
Alex
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3669
  • Country: gb
  • Doing electronics since the 1960s...
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #7 on: May 08, 2021, 08:47:31 am »
I wonder what the aim of this chip is?

I would think that most designers will go for the "established player" i.e. ST as a default, so GD will have to sell theirs for a lot less e.g. half the price.

One factor working against that is a cultural one: Chinese designers may prefer a locally made chip.

At 1M+ volumes the prices paid are unknown and nothing like what one sees on mouser com (£5.51 1k+ so probably £4 from a normal distributor). I reckon down to £2 at 1M. But at high volumes the chip cost (of a uC) is hardly a factor in such applications.

At 1k qty they seem similarly priced so why use GD? It has no tech advantage that I can see.

To succeed in business, against an established incumbent, you need to offer something special.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #8 on: May 08, 2021, 04:58:33 pm »
I wonder what the aim of this chip is?
Getting into the market faster and making it easier to switch for customers that want to try them. Their new chips are diverging from ST more and more.

At 1k qty they seem similarly priced so why use GD? It has no tech advantage that I can see.
They were quite a bit cheaper before the semiconductor shortage. And they may still be cheaper on lcsc.com. Western distributors get away with very high prices.

Also, their performance is way higher because of that SRAM-based flash emulation. But SRAM is expensive, so in the newer chips they are scaling that back too. So that's a competitive edge.
« Last Edit: May 08, 2021, 05:00:24 pm by ataradov »
Alex
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3669
  • Country: gb
  • Doing electronics since the 1960s...
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #9 on: May 09, 2021, 10:38:57 am »
" their performance is way higher because of that SRAM-based flash emulation. "

ST claim their prefetch and cache system delivers perf equivalent to a zero wait state flash.

"Getting into the market faster and making it easier to switch for customers that want to try them."

Faster than with ST, whose chips are available? I don't understand.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #10 on: May 09, 2021, 04:59:11 pm »
ST claim their prefetch and cache system delivers perf equivalent to a zero wait state flash.
They can claim anything they want. Measure this on a real application and you will quickly  see that it is not the case. Also, performance of the code is one thing, but if your code fetches data from the same flash, then that would be slow too.

Faster than with ST, whose chips are available? I don't understand.
Well, assuming your goal is to get into the market, this strategy lets you do it faster.

Following your logic there would never be two companies doing the same thing.
Alex
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #11 on: May 10, 2021, 03:20:17 am »
ST claim their prefetch and cache system delivers perf equivalent to a zero wait state flash.
If you read into how ST implemented their prefetch system, you can see it is really just a pitifully small cache with a stupid replacement policy that will result in multiple cache miss penalties for every function call, every loop iteration and hell forbid every constant operand fetch on Cortex-M0/M23 parts. When a cache fails basic memory-intensive C library functions like memcpy, memset and strlen, it is a failed cache system. Cortex-M3 and above just have flat slow constant fetches.

GD's shadow RAM solution is basically a huge cache that never needs to be replaced ever, giving you true and consistent zero wait states throughout the run.

Faster than with ST, whose chips are available? I don't understand.
As of now nobody's chip is available, at least within China. GD has redirected virtually all their chips to its domestic market in China, while ST is dealing with uncertainties caused by geopolitical tensions when entering its foreign market in China. Before all this chip shortage, their products are widely available world wide, especially through LCSC.

To succeed in business, against an established incumbent, you need to offer something special.
With their pin-compatible portfolio, their products have at least a slightly faster, revised main CPU core, for example GD32F103 uses a 96MHz Cortex-M3 r2p1 core, GD32E103 uses an 120MHz Cortex-M4F core, GD32VF103 uses an 108MHz RISC-V core, while STM32F103 used a 72MHz Cortex-M3 r1p1 core. Also as above their shadow RAM architecture. Then they have their non-pin-compatible portfolio.

The copy is in the hardware. For the user it is totally transparent. You don't even have a write access to that SRAM, it just appears as normal flash, just very fast.
IMO they really should implement a bit in their shadow RAM interface to allow this shadow RAM to be written while disabling in-application programming if the Flash until the next reset. This means for applications that do not need IAP, they can just put their writable data sections and heap allocations in the Flash address space, instead of wasting time on doing another memcpy from the shadow RAM into the main RAM. Better if that shadow RAM write control is available on a per Flash sector basis.

Not that on GD, the flash is actually a second die in the same package. The code for the first 512 KB is loaded from that flash into the shadow SRAM on reset. But if you have to hit the flash for the data or code, it would be pretty slow.
That second die is really just a QSPI Flash and their Flash interface is just a QSPI controller with hard-coded parameters. GD's newer chips now do use integrated on-die Flash, yet they are still including their famous shadow RAM since it brings performance benefits.

There is this brand called Artery that also makes STM32 pin-compatible microcontrollers using this exact architecture, they even make one of their built-in hard-coded QSPI controllers accessible over the pins.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3669
  • Country: gb
  • Doing electronics since the 1960s...
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #12 on: May 10, 2021, 07:14:24 am »
Isn't this the old debate about benchmarking...

In most cases, most of the time is spent in loops, and most loops are fairly tight. One could write code which is specifically small enough to fit into it. And one could put constants in RAM, by declaring variables but (AIUI) not assigning them a value initially places them into RAM e.g.

uint32_t fred = 0;

places it into flash

uint32_t fred;
and later
fred=0;

places it into RAM.

But one would not bother with this except in some tight loop where fred is referenced frequently.

According to this  https://www.st.com/content/ccc/resource/training/technical/product_training/group0/7d/83/8c/1f/3a/1c/43/1e/STM32H7-System-Adaptive_Real-Time_Accelerator_ART/files/STM32H7-System-Adaptive_Real-Time_Accelerator_ART.pdf/_jcr_content/translations/en.STM32H7-System-Adaptive_Real-Time_Accelerator_ART.pdf

the ART is 64 lines of 256 bits each, which is not bad. It is basically 512 32-bit words. Another description here

https://eda360insider.wordpress.com/2011/09/22/ingenious-architectural-features-allow-st-micro-to-extract-maximum-performance-from-new-microcontroller-family-based-on-arm-cortex-m4-cost-less-than-6-bucks-in-1000s/

« Last Edit: May 10, 2021, 07:24:41 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #13 on: May 10, 2021, 07:36:54 am »
I did benchmark the GD, it is faster than ST. If you personally think that ST is better - go for ST.

ART is a low performance cache with a cool marketing name, nothing more.

There are a ton of constants that are placed by the compiler and you have no control over on the C level. Those things will go into the flash unless you just move the whole program into SRAM. Which is what GD did already for you on a hardware level.
Alex
 
The following users thanked this post: Jacon

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3669
  • Country: gb
  • Doing electronics since the 1960s...
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #14 on: May 10, 2021, 08:21:28 am »
Does the 5 wait flash mean that each 1 cycle instruction (which is most of them) actually takes 6 cycles?

That would make the 168MHz CPU effectively run at ~30MHz.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #15 on: May 10, 2021, 09:45:44 am »
Does the 5 wait flash mean that each 1 cycle instruction (which is most of them) actually takes 6 cycles?

That would make the 168MHz CPU effectively run at ~30MHz.
ART allows long sequences of instructions to reach 168MHz, only if there is no branches and no constant fetches. However when it is a tight loop like memcpy(3), memset(3) and strlen(3) it would be 5 additional cycles per iteration, and in the case of those two-instruction loops, 7 cycles for 2 instructions for the tightest part of memset(3).
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3669
  • Country: gb
  • Doing electronics since the 1960s...
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #16 on: May 10, 2021, 11:05:15 am »
That suggests that any branch flushes the cache completely.

I don't see that it does. Say you have

fred1: some code
 goto fred1

that can run out of one or maybe two cache lines, no?
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online newbrain

  • Super Contributor
  • ***
  • Posts: 1714
  • Country: se
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #17 on: May 10, 2021, 11:38:42 am »
And one could put constants in RAM, by declaring variables but (AIUI) not assigning them a value initially places them into RAM e.g.

uint32_t fred = 0;

places it into flash

uint32_t fred;
and later
fred=0;

places it into RAM.
Depends what you mean by 'it'.

If fred is defined at file scope, it has by default external linkage and static duration.
If nothing is used, and no other declaration is found (simplifying, bear with me...) the definition becomes a declaration: using = 0 or not makes no difference, as static duration objects are initialized as if they were assigned 0.
So it will end up being initialized to 0 by the runtime startup code - no flash is directly used.

If fred is declared in a block scope and with automatic duration (i.e. no external or static), it does not make much difference for the compiler if an initialization value (also 0, in this case) is provided at declaration or later on.

The variable fred, if declared const might end up in flash, but depends on arch, compiler (it does for for arm & gcc) and sometimes other magic incantations (e.g. for avr). Note that C++ has different rules about const.

Not using const is enough to make sure that the variable will end up in RAM, and will be read from there (if needed) - independently from the way it was initialized, which I think is the goal you are aiming at.

Edit: Lunch break leftover removed.
« Last Edit: May 10, 2021, 12:13:48 pm by newbrain »
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3669
  • Country: gb
  • Doing electronics since the 1960s...
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #18 on: May 10, 2021, 02:28:57 pm »
OK; I do understand all that (to my surprise, as a novice to C ;) ).

Fred has to end up in RAM in any case (if not const or static) otherwise you could not assign it a value.

So that is an easy workaround if one is concerned about data being read out of flash with loads of wait states. And it will be true for anything inside a function because all that ends up on the stack.

Which leaves us with the debate re whether a branch flushes the cache. I can see it renders that cache line useless (the line containing the code which had the branch) and then where you branch will take the hit on waits states (to fill in the new cache line) but after you have been around that loop once, you now have all the code contained within two cache lines (possibly more but let's take a case of a tight loop like memcpy) and there should not be more wait states.

What am I missing?

I remember the Z280, 1987, which (apart from having been hand designed at transistor level, apparently, by a subcontractor to Zilog ;) ) had a 256 byte cache, and prefetch, and that would run loops entirely out of the cache. I could write

label: ld c, ioaddress
         ld a, 1
         out (c), a
         xor a
         out (c), a
         jp label

and put a scope on the EPROM /CS and would see nowt, while seeing pulses coming out of bit 0 of that I/O port.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #19 on: May 10, 2021, 02:42:08 pm »
OK; I do understand all that (to my surprise, as a novice to C ;) ).
In order to understand the benefit of GD's shadow RAM over ST's ART, you will need deeper levels of understanding of computer organization and assembly language.

Fred has to end up in RAM in any case (if not const or static) otherwise you could not assign it a value.

So that is an easy workaround if one is concerned about data being read out of flash with loads of wait states. And it will be true for anything inside a function because all that ends up on the stack.
STM32F4 has a block of SRAM that is not accessible by DMA and not executable called CCM, which is intended just for such frequently accessed data. You initialize CCM with a memcpy(3) at boot time. (And I like to also put the stack there so there is no way an accidental DMA buffer overrun can corrupt my stack.) STM32F3 and STM32G0 made that memory block executable so I put the vector table there too for those chips.

Which leaves us with the debate re whether a branch flushes the cache. I can see it renders that cache line useless (the line containing the code which had the branch) and then where you branch will take the hit on waits states (to fill in the new cache line) but after you have been around that loop once, you now have all the code contained within two cache lines (possibly more but let's take a case of a tight loop like memcpy) and there should not be more wait states.
It differs from the ART and the regular cache STM32F4 also haves. ART on STM32F1 is a FIFO, so once that instruction is fetched it is gone from the cache. For STM32F4 it seem to me that ART would be a single cache line, so once you branched away the single cache line would be gone, so jumping back means refetching everything with all the wait states. STM32F4 also have a separate multi-line cache for its Flash interface, but its capacity is pitiful as well (1kB I$ and 128 bytes D$ in 16-byte cache lines.)
« Last Edit: May 10, 2021, 02:57:10 pm by technix »
 

Online newbrain

  • Super Contributor
  • ***
  • Posts: 1714
  • Country: se
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #20 on: May 10, 2021, 03:17:14 pm »
OK; I do understand all that (to my surprise, as a novice to C ;) ).

Fred has to end up in RAM in any case (if not const or static) otherwise you could not assign it a value.
Glad I was clear enough. Just one thing:
static, as extern, can affect both linkage (visibility of symbol) and duration (lifetime) for variables depending on where the definition or declaration is.
It will not change whether the variable ends up in RAM or flash.

As for ART, I think it works OK for loops that can fit in cache, as it uses a simple LRU policy to discard cache lines:
Quote
Once all the instruction cache memory lines have been filled, the LRU
(least recently used) policy is used to determine the line to replace in the instruction memory
cache. This feature is particularly useful in case of code containing loops.
The cache line containing the jump will not in general be immediately reused, as it is still very 'fresh'.

I see technix considers the ART as only the prefetch part - in my reading ART includes both the prefetch FIFO and the (pitifully small, as they say) D and I caches, e.g. from STM32F405/7 DS:
Quote
[...]the accelerator implements an instruction prefetch queue and branch cache, which increases program execution speed from the 128-bit Flash memory.
Or the picture here.

That said, I don't disagree at all with the other considerations.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #21 on: May 10, 2021, 03:55:11 pm »
Does the 5 wait flash mean that each 1 cycle instruction (which is most of them) actually takes 6 cycles?
Without accelerators and other architectural improvements - yes. That's why on anything faster than 48 MHz you see a real cache, or at least accelerator of some sort.

The core itself fetches 32 bits at a time on the instruction bus. And most Thumb-2 instruction are 16-bits, so just that alone prefetches the next instruction. Most of the time maximum flash speed is about 25-35 Mhz. That's why you can have full performance most of the time on Cortex-M0+ running at 48 MHz with 1 wait state. Flash fetch would take 2 cycles, but you are also fetching two instructions at the same time on average.

On fast MCUs the flash is also typically organized as a 64-bit wide memory internally, and the result of fetching is stored in the flash controller, so fully sequential code would be optimized quite a bit. You would be fetching 4 instructions at a time on average, and the new read could be pending while those 4 instructions are executed.

And there are other ways to optimize things further. For example Atmel/Microchip SAM V7x have code loop optimization, which also remembers the flash line (64-bits) which was used last as a jump destination. So if you are in a long loop, the first fetch of a branch would be already ready. This optimization alone provides quite a performance boost. This is like a branch cache with one entry.
« Last Edit: May 10, 2021, 03:56:42 pm by ataradov »
Alex
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #22 on: May 10, 2021, 04:22:32 pm »
Without accelerators and other architectural improvements - yes. That's why on anything faster than 48 MHz you see a real cache, or at least accelerator of some sort.
From what I read just traditional STM32F1-ish ART means only one 128-bit cache line for STM32F4. STM32F4 have a separate I$ and D$ system on top of the STM32F1-ish ART which have (albeit just a few) cache lines.
 

Offline bson

  • Supporter
  • ****
  • Posts: 2265
  • Country: us
Re: Opinions on ST 32F407VGT6 versus Gigadevice GD32F407VGT6
« Reply #23 on: May 12, 2021, 08:26:57 pm »
ART is a low performance cache with a cool marketing name, nothing more.
It's just page mode access with the usual penalty on page changes.  Although unlike DRAM it's not a precharge time, but just plain flash access.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf