Electronics > FPGA

Very small linux capable core

<< < (11/11)

asmi:

--- Quote from: brucehoult on August 05, 2021, 03:25:12 am ---SRAM/cache latency of 2 is very common in RISC-V pipelines, for example all the Rocket-based cores are e.g. SiFive's original 3- and 5-series. In fact they are 3 cycles for sub-word loads. But they don't stall unless the value is actually used by the immediately following instruction.

--- End quote ---
In my case there are up to 2 stall cycles for memory access as the request is sent to the data memory after execute stage, so the value becomes available via forwarding only 2 cycles later, and perhaps even later in case read requests go on the AXI4 bus to a peripheral. Right now CPU doesn't stall for AXI4 writes as AXI4 specification guarantees transaction order and so I assume that writes will complete before any subsequent reads. This will probably need to be revised later (definitely so if I ever get to implementing paged memory), but for now it's good enough.

asmi:
Hmm... I rewrote the blinker code in assembly and optimized branches (such that branch is not taken most of the time, followed by unconditional jump), and CPI jumped up to ~2.6. Fetch latency is still the limiting factor, so it looks like fixing it will need to be a priority.

SiliconWizard:

--- Quote from: brucehoult on August 05, 2021, 03:25:12 am ---SRAM/cache latency of 2 is very common in RISC-V pipelines, for example all the Rocket-based cores are e.g. SiFive's original 3- and 5-series. In fact they are 3 cycles for sub-word loads. But they don't stall unless the value is actually used by the immediately following instruction.

--- End quote ---

Are you sure about the last sentence? Even with a latency of 1, a stall is required in this case. So with a latency of 2, I think a stall is also required if the instruction after the immediately following instruction uses the value. The higher the latency, the more load-use hazard cases you get. Did I miss anything?

brucehoult:
Perhaps a terminology difference?

I call the fastest instructions such as add, move etc latency 1. Perhaps you are calling them latency 0?

asmi:

--- Quote from: brucehoult on August 05, 2021, 11:18:50 pm ---Perhaps a terminology difference?
I call the fastest instructions such as add, move etc latency 1. Perhaps you are calling them latency 0?

--- End quote ---
I think in this case a data memory latency is a measure of how many stall cycles can there potentially occur due to load-to-use data dependency. So no stall cycles - data memory latency is zero (as the result of previous operation is available to the following ones as soon as the execute stage for a previous command is completed), one stall - the following command will need to wait max 1 cycle - this is a "classic" implementation of 5-stage pipeline when memory operation begins as soon as execute is completed (and memory address is registered in EX/MEM pipeline registers), and completes one cycle later (so that you can forward it to subsequent commands). In my case I have a data memory latency of 2 cycles because memory operation only completes two cycles after request was made, and so the earliest that a value will be available to subsequent commands is 2 cycles after execution of load command.

This latency can be masked by scheduling a load command two cycles before the value is used, or if your CPU can execute commands out-of-order (perhaps speculatively if branch is involved), and it looks like GCC does that reordering to a certain degree with -O3 optimization turned on. And of course as soon as you add support for external (to the core) memory-mapped resources (be it external RAM, or registers of peripherals), that constant latency stuff goes out the window, and you WILL get stalls due to cache misses, some sort of lags in peripherals, or a congestion at the memory controller, and only ways to somewhat hide them is to go for multi-issue and out-of-order execution with register renaming, as well as nonblocking cache such that other instructions can continue executing while cache miss is being handled.

Navigation

[0] Message Index

[*] Previous page

There was an error while thanking
Thanking...
Go to full version