Author Topic: What you would do with a 768MB cache  (Read 3059 times)

0 Members and 1 Guest are viewing this topic.

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1906
  • Country: ca
What you would do with a 768MB cache
« on: March 23, 2022, 10:56:19 am »
AMD has reveled their new EPYC chips with massive 768MB L3 cache, what a world are we living!
What's your favorite use case for them?
Personally I think RTL design can be done much faster!!! waiting to see what they would bring to 7 series Desktop Ryzen processors, I hope to see this trend too.
« Last Edit: March 23, 2022, 10:59:46 am by ali_asadzadeh »
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline eugene

  • Frequent Contributor
  • **
  • Posts: 494
  • Country: us
Re: What you would do with a 768MB cache
« Reply #1 on: March 23, 2022, 01:56:44 pm »
What's your favorite use case for them?

Text editor, web browser, compiler, ...
90% of quoted statistics are fictional
 

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: What you would do with a 768MB cache
« Reply #2 on: March 23, 2022, 02:02:01 pm »
768 Meg ought to be enough for anybody.  - Ed. Kloonk, 2022
iratus parum formica
 
The following users thanked this post: newbrain, james_s, YurkshireLad, MazeFrame

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5913
  • Country: es
Re: What you would do with a 768MB cache
« Reply #3 on: March 23, 2022, 02:10:16 pm »
It's oriented for extremely-threaded systems like a server, running thousands of processes.
I think it wouldn't make much difference for home computers.
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7769
  • Country: de
  • A qualified hobbyist ;)
Re: What you would do with a 768MB cache
« Reply #4 on: March 23, 2022, 03:21:44 pm »
Great for running multiple VMs at the same time.
 

Online T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21688
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: What you would do with a 768MB cache
« Reply #5 on: March 23, 2022, 03:25:29 pm »
Great for running multiple VMs at the same time.

One less cache level to Spectre through? :-DD

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7769
  • Country: de
  • A qualified hobbyist ;)
Re: What you would do with a 768MB cache
« Reply #6 on: March 23, 2022, 04:10:14 pm »
Yep! And you get that feature for free. >:D
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14482
  • Country: fr
Re: What you would do with a 768MB cache
« Reply #7 on: March 23, 2022, 07:01:52 pm »
That's impressive and must come with a massive power consumption. ;D

I'm curious about the performance (throughput, latency) of this 768MB L3 cache compared to main RAM.
 

Online mariush

  • Super Contributor
  • ***
  • Posts: 5029
  • Country: ro
  • .
Re: What you would do with a 768MB cache
« Reply #8 on: March 23, 2022, 07:47:14 pm »
That's impressive and must come with a massive power consumption. ;D

I'm curious about the performance (throughput, latency) of this 768MB L3 cache compared to main RAM.

Well, you have 8 dies with 32 MB cache built in, and then each die gets a 64 MB stack on top of it ... so 8 x 96 MB = 768 MB

They're downclocking the chips a bit, base frequency is good but boost frequency is lower...  probably a mix of thermal problems and maybe signal issues at high frequencies (they also disable overclocking on the 5800x3d which is a single die with a 64 MB chip on top, so same issue they have on a single die is also present on epyc cpus)

They say 280w TDP for the high end cpus, so the cpu will probably get to 300-350w at 100% load... maybe ... though if my memory is correct, the previous epyc processors stayed within the TDP values. (yes, I know TDP does not equal power consumption)
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: What you would do with a 768MB cache
« Reply #9 on: March 23, 2022, 08:36:38 pm »
AMD has reveled their new EPYC chips with massive 768MB L3 cache, what a world are we living!
What's your favorite use case for them?

If that's the 64 core chip, then that's only 6 MB per hardware thread! That's not a lot.

One of the benefits of a machine like this (which I first noticed when I went from a quad core i7 to an 18 core i9) is that even when you're only running pretty much single-threaded stuff, it's a LOT FASTER in the real world because so much more of what you're working on is in L3 cache instead of all the way out in RAM. Even just getting the L3 used for file cache makes a difference.

Most benchmarks don't capture this because they (deliberately?) don't use all that much memory.

I don't know why this is in the Microcontrollers topic.
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1641
  • Country: nl
Re: What you would do with a 768MB cache
« Reply #10 on: March 23, 2022, 08:42:55 pm »
That's impressive and must come with a massive power consumption. ;D

I'm curious about the performance (throughput, latency) of this 768MB L3 cache compared to main RAM.

See around 50:24 : https://youtu.be/B_jUXiOvMo8?t=3024

It looks like L3 latency got a bit slower in general, but for large data sets it becomes a lot quicker than main memory.

Now L3 is only useful if the application needs it. AFAIK code compilation is a classic cache test. I believe Wendell talks about it during the stream in terms of "linux kernel compiles per hour"-metric, but was not able to give specific test data at that time.
For Embedded design I'm not that worried though. It usually doesn't take minutes to compile a binary for even a large MCU. And with incremental compilation you don't even have to. Then I'd rather have a high boosting single-thread CPU because the final linking stage is usually a single process that has a large dependency tree.

How about RTL design? I don't know. For one the problem may be the tool itself. The speed at which e.g. Vivado initializes a design compile is horrendous. Even synthesizing, PNR, etc a blinky for an Artix 7 costs 3-5 minutes. It takes literal seconds with Yosys+nextpnr on ICE40.  |O

Most benchmarks don't capture this because they (deliberately?) don't use all that much memory.

A good program is designed to make optimal spatial use of a cache's design. It's easy to bring a CPU on it's knees by making a deliberately bad choice of data storage.. (and then highlight that a FPGA or ASIC is able to do a better job --- but that's missing the point).

I think the problem is that only very few applications really stress the CPU cache structure. For example, Cinebench is a classic benchmark that runs almost entirely from cache. That means you can overclock main memory all you want, it probably won't improve the Cinebench scores by that much. Even though high memory bandwith/low latency may dramatically improve high-speed I/O, file/network duties, web/data server duties, etc.
« Last Edit: March 23, 2022, 09:53:55 pm by hans »
 

Offline Simon

  • Global Moderator
  • *****
  • Posts: 17817
  • Country: gb
  • Did that just blow up? No? might work after all !!
    • Simon's Electronics
Re: What you would do with a 768MB cache
« Reply #11 on: March 23, 2022, 10:08:12 pm »
It sounds like my theory that the more levels of DDR we introduce the more boloxed the specs became is true. So how fast is this cache? I remember that each time they came out with a new DDR level I tested it on the first machine I built with it. I would look at the actual bandwidth or efficiency, it gradually went down to sub 40% on DDR4 starting at 90% on plain old SDRAM, I guess someone finally figured out the bottle neck and that DDR only works when all 2, 4 or 8 or 16 or whatever we are at now blocks of memory that have to be addressed just by addressing the first one contain data that we actually want.

So yea, stick the RAM in the CPU, my first PC had 256MB, I upgraded quickly to 512MB and felt like a rock star when I crammed 768 onto a motherboard and now it "just" comes with the CPU? who needs RAM anymore ? 8)
 
The following users thanked this post: MrMobodies

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: What you would do with a 768MB cache
« Reply #12 on: March 23, 2022, 10:22:54 pm »
Most benchmarks don't capture this because they (deliberately?) don't use all that much memory.

A good program is designed to make optimal spatial use of a cache's design. It's easy to bring a CPU on it's knees by making a deliberately bad choice of data storage..

99% of benchmarks execute a fixed task and see how long it takes. The most adjustment they usually do is to repeat a small task a few times so that it takes long enough to measure accurately.

The only benchmark I'm aware of that does otherwise is HINT and I haven't seen anyone use that for decades.

http://www.johngustafson.net/pubs/pub47/Hint.htm

Ok, and STREAM, but it's explicitly a benchmark for the memory hierarchy rather than an overall CPU benchmark.
« Last Edit: March 24, 2022, 01:30:15 am by brucehoult »
 

Offline golden_labels

  • Super Contributor
  • ***
  • Posts: 1209
  • Country: pl
Re: What you would do with a 768MB cache
« Reply #13 on: March 24, 2022, 02:10:37 am »
This CPU is not meant to be used in your desktops. Not even in a typical server. It’s going to be deployed in massive multiprocessor systems. This may be the key to understanding the size of its L3 cache. While this is going be exaggeration, I believe it conveys the image well. That 768MiB of cache shouldn’t be seen as intermediate memory between CPU and RAM sticks 50mm away. It is more akin to local RAM for a computer, that needs to communicate over relatively slow network to access main storage.

Of course the earlier observation, that the amount of memory is also not that huge per hardware thread, is also important.
People imagine AI as T1000. What we got so far is glorified T9.
 

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1906
  • Country: ca
Re: What you would do with a 768MB cache
« Reply #14 on: March 24, 2022, 06:53:27 am »
Wait a minute, Please guide me in the right direction, As I can understand the processor (AMD EPYC™ 7773X)base clock is 2.2GHz,and the max Boost clock is 3.5GHz, But DDR5 base speed is 4800MT which is equal to 2.4GHz! so the Question in here is that the new intel Alder Lake can benefit from DDR5, so even the lowest speed DDR5 is faster than this Cache, Am I correct? so does this mean intel is the winner? not to mention a lot cheaper. I know DDR has latency, But the max speed for that DDR5 is 3.6GHz, which is faster than even the boost clock of EPYC CPU.
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Online magic

  • Super Contributor
  • ***
  • Posts: 6779
  • Country: pl
Re: What you would do with a 768MB cache
« Reply #15 on: March 24, 2022, 07:36:15 am »
USB3 has 5GHz clock rate so it's faster than either :P
 

Offline Simon

  • Global Moderator
  • *****
  • Posts: 17817
  • Country: gb
  • Did that just blow up? No? might work after all !!
    • Simon's Electronics
Re: What you would do with a 768MB cache
« Reply #16 on: March 24, 2022, 08:33:10 am »
Wait a minute, Please guide me in the right direction, As I can understand the processor (AMD EPYC™ 7773X)base clock is 2.2GHz,and the max Boost clock is 3.5GHz, But DDR5 base speed is 4800MT which is equal to 2.4GHz! so the Question in here is that the new intel Alder Lake can benefit from DDR5, so even the lowest speed DDR5 is faster than this Cache, Am I correct? so does this mean intel is the winner? not to mention a lot cheaper. I know DDR has latency, But the max speed for that DDR5 is 3.6GHz, which is faster than even the boost clock of EPYC CPU.

see my comment about bandwidth efficiency. When you access DDR you are looking yourself to accessing multiple memory addresses based on the address you actually want. What you have to hope is that the data was stored into the RAM in the same way you want to take it out otherwise that bandwidth is wasted just accessing a couple of bytes when the other locations contain nothing that relates to what you are doing.
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1641
  • Country: nl
Re: What you would do with a 768MB cache
« Reply #17 on: March 24, 2022, 08:59:40 am »
Wait a minute, Please guide me in the right direction, As I can understand the processor (AMD EPYC™ 7773X)base clock is 2.2GHz,and the max Boost clock is 3.5GHz, But DDR5 base speed is 4800MT which is equal to 2.4GHz! so the Question in here is that the new intel Alder Lake can benefit from DDR5, so even the lowest speed DDR5 is faster than this Cache, Am I correct? so does this mean intel is the winner? not to mention a lot cheaper. I know DDR has latency, But the max speed for that DDR5 is 3.6GHz, which is faster than even the boost clock of EPYC CPU.

No, because DDR5 has a latency of 10ns+ and a limited bus width. Look at the memory latency vs depth chart I linked.

The Alder Lake reviews include some interesting comparison charts of the chips ran with DDR4 and DDR5. Very few applications saw an appreciable performance uplift. IIRC H264 did very well with it. But then the more modern/relevant H265 did not. :-//

Most benchmarks don't capture this because they (deliberately?) don't use all that much memory.

A good program is designed to make optimal spatial use of a cache's design. It's easy to bring a CPU on it's knees by making a deliberately bad choice of data storage..

99% of benchmarks execute a fixed task and see how long it takes. The most adjustment they usually do is to repeat a small task a few times so that it takes long enough to measure accurately.

The only benchmark I'm aware of that does otherwise is HINT and I haven't seen anyone use that for decades.

http://www.johngustafson.net/pubs/pub47/Hint.htm

Ok, and STREAM, but it's explicitly a benchmark for the memory hierarchy rather than an overall CPU benchmark.
Yes, reporting scores on time is the best indicator method for a benchmark. Regardless, if that task can fit into cache of a modern CPU then, then it limits the span of components it benchmarks. If you look at Prime95, although not a benchmark, has various size FFTs (and blends thereof) to stresstest CPU, cache, and memory subsystems.
If you want to know how well this CPU handles a gigantic database server, there really is zero use in looking at Cinebench scores. The only valid test is to run the actual application. However, for common day to day use, Cinebench scores (or similar benchmark suites like Passmark) give a rough indication how fast a CPU is.

The point I was trying to make: most HPC applications are designed with cache coherency in mind. If you're designing a ray or path tracing renderer, you'll likely screw over the performance big time. This is not to say there are use cases (like big projects, niche programs, concurrent or multi user environments) that will benefit massively from cache, but likewise, there are only very few programs that saw a massive uplift from DDR4 vs DDR5 (referring to Alder Lake comparison benchmarks).
« Last Edit: March 24, 2022, 09:04:47 am by hans »
 
The following users thanked this post: ali_asadzadeh, DiTBho

Offline MazeFrame

  • Contributor
  • Posts: 34
  • Country: de
  • = != ==
Re: What you would do with a 768MB cache
« Reply #18 on: March 25, 2022, 08:14:57 am »
This would be amazing for badly optimized Database interactions.
Or huge code compiles.
Never Forgive, Always Forget.
Perpetually Angry and Confused!
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: What you would do with a 768MB cache
« Reply #19 on: March 25, 2022, 12:10:06 pm »
most HPC applications are designed with cache coherency in mind

During the far 2000s, things based on MIPS4BE { R10K, R12K } were not cache-coherent but multi-CPU on super-scalar CPU-s - worse still with out-of-order IO capabilities - and it was nightmare with Linux, to the point we had to hack both Gcc { 2.9*, 3.*, 4.1.* } and the kernel { 2.2.*, 2.4.*, <=2.6.24} .

It never really worked, hard to maintain, poor performance squeezed out of Kernel, thousands of hours wasted for nothing.

For me, *all* the HPC-applications and *all* the HPC-hardware *must be* designed with cache coherency in mind.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline Zipdox

  • Regular Contributor
  • *
  • Posts: 170
  • Country: nl
Re: What you would do with a 768MB cache
« Reply #20 on: March 28, 2022, 11:32:03 am »
AV1 video encoding.
 

Offline tszaboo

  • Super Contributor
  • ***
  • Posts: 7392
  • Country: nl
  • Current job: ATEX product design
Re: What you would do with a 768MB cache
« Reply #21 on: March 28, 2022, 02:30:18 pm »
Run 384000 simultaneous NES emulators of super mario brothers with brute force with random input to find the fastest time possible to beat it.
 
The following users thanked this post: ali_asadzadeh, DiTBho

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1906
  • Country: ca
Re: What you would do with a 768MB cache
« Reply #22 on: March 28, 2022, 07:48:19 pm »
Quote
Run 384000 simultaneous NES emulators of super mario brothers with brute force with random input to find the fastest time possible to beat it.
That's a great scenario >:D
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline AntiProtonBoy

  • Frequent Contributor
  • **
  • Posts: 988
  • Country: au
  • I think I passed the Voight-Kampff test.
Re: What you would do with a 768MB cache
« Reply #23 on: March 30, 2022, 02:21:25 am »
What's your favorite use case for them?
Massively parallel operations where you need to perform local data exchange between SIMD groups. Kernel operations particularly benefit from this. Stuff like this is already happening on GPUs, but nice to see the convergence of CPUs and GPUs.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14482
  • Country: fr
Re: What you would do with a 768MB cache
« Reply #24 on: March 30, 2022, 05:22:30 pm »
USB3 has 5GHz clock rate so it's faster than either :P

The clock yes, but not the data rate. ;D
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf