Author Topic: Cache and memory system simulators  (Read 1242 times)

0 Members and 1 Guest are viewing this topic.

Offline westfwTopic starter

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Cache and memory system simulators
« on: October 24, 2018, 12:54:13 am »
You can get moderately good cycle-accurate simulators for various CPUs.  I'm wondering if there are any simulators that extend this to the memory system, so that you could fiddle with code and/or memory setup (different cache layouts, with and without write-through, different memory access modes and widths, etc.) to see how that would affect performance.  Something more along the lines of an educational tool - need not implement a "real" or "Current" CPU architecture, nor handle all of the real-world details of memory.  (though it would be nice if it had enough of some compiler implemented to write typical algorithms in a HLL.)
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Cache and memory system simulators
« Reply #1 on: October 25, 2018, 10:09:31 pm »
your request is of an RTL level, therefore I am thinking to a ModelSim HDL simulation of a CPU + cache + RAM + peripherals (e.g. uart) which is accurate in equivalent time, and you can count cycles  :-//
 

Online ejeffrey

  • Super Contributor
  • ***
  • Posts: 3719
  • Country: us
Re: Cache and memory system simulators
« Reply #2 on: October 26, 2018, 04:15:29 pm »
RTL level simulation of a CPU complex enough to have a multi-tier cache hierarchy running an OS and programs is probably going to require some serious hardware.

If you just want to look at the memory hierarchy, cachegrind is an interesting approach.  It uses the virtual memory and page faults to simulate a cache heirarchy and tracks cache misses.  It doesn't simulate the CPU side and it will change the behavior of your program, especially multi-threaded programs, but it is still apparently effective for cache optimization.

Modern x86 CPUs come with an incredible number of performance counters accessible through MSRs.  So you can track a lot of detail about cache misses as well as branch prediction, TLB, and so forth.  But you are stuck with the architecture of your CPU.  You can't really ask how changing the size and associativity of the cache would help your application.
 

Offline Rasz

  • Super Contributor
  • ***
  • Posts: 2616
  • Country: 00
    • My random blog.
Re: Cache and memory system simulators
« Reply #3 on: October 27, 2018, 07:13:28 am »
are there any cycle accurate emulators for CPUs with cache at all? not thinking too hard about it all I come up with are 8-16 bit CPU emulators from the era where ram was fast enough (<30MHz)
Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Offline westfwTopic starter

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: Cache and memory system simulators
« Reply #4 on: October 27, 2018, 08:33:53 am »
Keil includes an ARM Cortex-M (M0/3/4, anyway) simulator that will do cycle counting.  I don't think that it does any RAM magic, though - that's not part of the ARM core.  I'm not sure if it actually handles the pipeline.

The Big Folk can simulate whole chips, of course, at great expense and much slower than real time.

But I didn't really want that level of accuracy.  Just something that would be able to demonstrate some of the effects of cache/memory behavior changes.  (I mean, if you're simulating a CPU that runs significantly faster than actual memory, simulating the memory behavior alongside "synchronized with CPU clock" shouldn't be TOO hard, right?  (even if such synchronization isn't realistic.)

Shucks, it'd be perfectly acceptable for my purposes to use one of those existing 8bit simulators, and just pretend that the main memory is 20x slower than the CPU, with various speedup mechanisms in between.
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1638
  • Country: nl
Re: Cache and memory system simulators
« Reply #5 on: October 27, 2018, 09:43:13 am »
I think it's not easy to find this, because memory buses are usually up to the implementer of a device (e.g. in ARM ecosystem) And they rarely share this information.

In addition, consider the complexity of a peripheral bus. It can have a dozen nodes connected to it, so no way you would like to do a direct fan-out of that bus in a chip. Some vendors may actually connect it like a network-on-chip structure. In low power designs, you may find that the "clock enable" bit is actually a power control bit, meaning that peripheral bus may need cross, bidirectionally, a dozen clock and power domains.

Even for SRAM blocks, they may need to be shutdown in sleep mode, and also requiring bridges. These may (or may not) add a few cycles delay.

Caches are a real PITA, because they are tuned to improve average performance. It's hard to be deterministic about caches, because their state is heavily temporally correlated by the program code. I suppose you could extend an existing (open source) sim to have handshaking with the memory subsystem, so you could introduce memory arbiters, flash wait states, etc.
But honestly at this point I think you're almost better off running it on real silicon.
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: Cache and memory system simulators
« Reply #6 on: October 27, 2018, 09:58:18 am »
Google shows a ton of results, eg. Cachegrind, drcachesim, or SimpleScalar. The first two simulate cache from an existing instruction trace, while the latter simulates the whole CPU.

If your audience is willing to read, the memory hierarchy chapter of Hennessy & Patterson is as thorough as you would expect.

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Cache and memory system simulators
« Reply #7 on: October 27, 2018, 11:30:22 am »
The Big Folk can simulate whole chips, of course, at great expense and much slower than real time.

Under the "equivalent time" hypothesis, and for the low complexity of a simple RISC CPU (e.g. R3K) you can even design the behavior in Matlap, describing the pipeline in term of functionalities and what happens when a data (fetch or load/store) is not found in the cache, and how many cycles are required to be spend to get the data while the pipeline is stalled (waiting on the Load/Store, or Fetch, stage).

Therefore you can describe a program in assembly, something that represents a piece of program in the real world, and put all together in a loop with a stimulus-set, and recording everything, cycle by cycle, on a file to analyze it later, or to plot it for comparative (with cache, without cache) purposes.


During my Erasmus, in UK in 2004, it was the experience we had in a lab :D
« Last Edit: October 27, 2018, 11:33:34 am by legacy »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf