Author Topic: [SAMC] pointer code only works within a function (Read 5056 times)

Simon · « **Reply #50 on:** February 11, 2020, 07:37:25 am »

So the SAMD does not explicitly have cache but sort of does something to try and compensate. Presumably RAM is at the same speed.

ataradov · « **Reply #51 on:** February 11, 2020, 07:40:05 am »

Quote from: Simon on February 11, 2020, 07:37:25 am

So the SAMD does not explicitly have cache but sort of does something to try and compensate. Presumably RAM is at the same speed.

It does nothing. Executing from flash is just slower. RAM can fetch a word every cycle, so it is as fast as the core can go.

You can execute from RAM, of course, but you will obviously lose that RAM. And in may cases execution speed does not matter, but RAM does.

You can put just the critical stuff in RAM, like vector table and vector handlers, for example.

Simon · « **Reply #52 on:** February 11, 2020, 08:13:50 am »

Well i suppose if yoau read an instruction that then require access to variables in RAM when you want the ROM again it's ready.

Siwastaja · « **Reply #53 on:** February 11, 2020, 01:09:09 pm »

Many "mid-range" chips use quite wide flash words for their internal flash, like 64 or 128 bits, meaning they fetch multiple instructions at once. Usually with these mid-range parts, the flash speed is like 1/4 of the CPU frequency, so if you fetch 4 instructions at the time, there are no wait states.... in theory, if there are no jumps in the code.

This may or may not be called some kind of "accelerator", or "prefetch". It's like a very trivial and small case of a "cache".

In practice with mid-range STM32s (F3, F4) with such prefetch, I tend to see approx. 30% penalty from running from flash directly, compared to running from core-coupled instruction RAM region, not too bad usually.

Caches kind of suck in the microcontroller world, because often you want repeatability and predictability. Worst case performance, instead of average performance. Providing separate "scratchpad" RAM areas, i.e., separate RAM sections with their own interface, directly on the side of the CPU core, does much better job, so many MCUs provide those instead, or in addition, to caches.

In any case, instruction cache is not that bad, because typically you don't change the code on the fly, i.e., code is read only, so there are no consistency issues, cached data never goes bad.

Data caching comes with a bunch of problems, but data caching is still rare in the microcontroller world. I would more likely see larger number of core-coupled memories so you can put all timing-critical data there, and know it's always accessed in 1 cycle.

andersm · « **Reply #54 on:** February 11, 2020, 03:31:17 pm »

Quote from: Siwastaja on February 11, 2020, 01:09:09 pm

Data caching comes with a bunch of problems, but data caching is still rare in the microcontroller world.

You start seeing it when clock frequencies reach ca. 200MHz, like Cortex-M7 and PIC32MZ devices. When you have fast CPU cores coupled with fast peripherals, all competing for access to RAM, caches are almost a necessity.

ataradov · « **Reply #55 on:** February 11, 2020, 05:18:20 pm »

Quote from: Simon on February 11, 2020, 08:13:50 am

Well i suppose if yoau read an instruction that then require access to variables in RAM when you want the ROM again it's ready.

Cortex-M0+ has a single AHB-Lite interface. The code and data is fetched over the same interface anyway, so location of the code and data does not make a difference, it all will compete for the same interface.

On Cortex-M7, for example, there are two separate AHB interfaces. And in most (probably all) implementations they go to a multi-layer bus matrix. And typically SRAM has multiple ports too. So data and instructions can be fetched at the same exact time.

Simon · « **Reply #56 on:** February 11, 2020, 05:35:58 pm »

Yes but if the non volatile memory has to wait does this affect RAM as well or is that all memory has to wait 2 cycles at 48MHz ?

ataradov · « **Reply #57 on:** February 11, 2020, 05:37:37 pm »

Quote from: Simon on February 11, 2020, 05:35:58 pm

Yes but if the non volatile memory has to wait does this affect RAM as well or is that all memory has to wait 2 cycles at 48MHz ?

RAM is always single cycle (no wait). So yes, executing entirely from RAM is faster in general.

Simon · « **Reply #58 on:** February 11, 2020, 05:43:33 pm »

Yes but if you lead an instruction from ROM that sais "go get some RAM data" that is one cycle getting the data from RAM that the ROM has waited so effectively there are some gains over 16MHz CPU frequency but you can't load more than 32Mips

ataradov · « **Reply #59 on:** February 11, 2020, 05:45:22 pm »

I'm not sure what you mean. The AHB-lite bus is sequential. When it waits for the flash, there can be no other loads. The entire bus is stalled.

Simon · « **Reply #60 on:** February 11, 2020, 05:54:43 pm »

Oh, but what if it does not need flash data next. If i have an instruction that says load a value from RAM can I got get that data from RAM on the next cycle as i have my instructions on what to do? The ROM can sit there an not be accessible while the CPU is reading/writing from something else.

ataradov · « **Reply #61 on:** February 11, 2020, 06:03:11 pm »

In general, data fetches take precedence over the instruction fetches. So if there are two new conflicting access requests, then data will get the bus.

But specific ordering of things is hard to predict exactly without going through the exact pipeline. A situation where RAM has to wait for flash is still possible.

Lets say on a certain cycle a prefetch unit issued a read from flash. This read will take multiple cycles. On the next cycle core is ready to read the data. Well, it won't be able to do that. Since the bus is locked waiting for the flash.

But gain, in real life performance impact of this is so negligible that it is not worth thinking about.

Simon · « **Reply #62 on:** February 11, 2020, 06:22:38 pm »

Yes that makes sense. I am not too worried about peformance. Moving to 32 bit over 8 bit is already a massive speed boost given that I can natively work on 16 or 32 bit variables. I don't know what the overhead for an 8 bit cpu carrying out 16 bit math is but I bet it is something in the order of slowing execution to a quarter.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: [SAMC] pointer code only works within a function (Read 5056 times)

Simon

Re: [SAMC] pointer code only works within a function

ataradov

Re: [SAMC] pointer code only works within a function

Simon

Re: [SAMC] pointer code only works within a function

Siwastaja

Re: [SAMC] pointer code only works within a function

andersm

Re: [SAMC] pointer code only works within a function

ataradov

Re: [SAMC] pointer code only works within a function

Simon

Re: [SAMC] pointer code only works within a function

ataradov

Re: [SAMC] pointer code only works within a function

Simon

Re: [SAMC] pointer code only works within a function

ataradov

Re: [SAMC] pointer code only works within a function

Simon

Re: [SAMC] pointer code only works within a function

ataradov

Re: [SAMC] pointer code only works within a function

Simon

Re: [SAMC] pointer code only works within a function

Share me