Improvements to the Cache Hierarchy

The biggest under-the-hood change for the Ryzen 2000-series processors is in the cache latency. AMD is claiming that they were able to knock one-cycle from L1 and L2 caches, several cycles from L3, and better DRAM performance. Because pure core IPC is intimately intertwined with the caches (the size, the latency, the bandwidth), these new numbers are leading AMD to claim that these new processors can offer a +3% IPC gain over the previous generation.

The numbers AMD gives are:

  • 13% Better L1 Latency (1.10ns vs 0.95ns)
  • 34% Better L2 Latency (4.6ns vs 3.0ns)
  • 16% Better L3 Latency (11.0ns vs 9.2ns)
  • 11% Better Memory Latency (74ns vs 66ns at DDR4-3200)
  • Increased DRAM Frequency Support (DDR4-2666 vs DDR4-2933)

It is interesting that in the official slide deck AMD quotes latency measured as time, although in private conversations in our briefing it was discussed in terms of clock cycles. Ultimately latency measured as time can take advantage of other internal enhancements; however a pure engineer prefers to discuss clock cycles.

Naturally we went ahead to test the two aspects of this equation: are the cache metrics actually lower, and do we get an IPC uplift?

Cache Me Ousside, How Bow Dah?

For our testing, we use a memory latency checker over the stride range of the cache hierarchy of a single core. For this test we used the following:

  • Ryzen 7 2700X (Zen+)
  • Ryzen 5 2400G (Zen APU)
  • Ryzen 7 1800X (Zen)
  • Intel Core i7-8700K (Coffee Lake)
  • Intel Core i7-7700K (Kaby Lake)

The most obvious comparison is between the AMD processors. Here we have the Ryzen 7 1800X from the initial launch, the Ryzen 5 2400G APU that pairs Zen cores with Vega graphics, and the new Ryzen 7 2700X processor.

This graph is logarithmic in both axes.

This graph shows that in every phase of the cache design, the newest Ryzen 7 2700X requires fewer core clocks. The biggest difference is on the L2 cache latency, but L3 has a sizeable gain as well. The reason that the L2 gain is so large, especially between the 1800X and 2700X, is an interesting story.

When AMD first launched the Ryzen 7 1800X, the L2 latency was tested and listed at 17 clocks. This was a little high – it turns out that the engineers had intended for the L2 latency to be 12 clocks initially, but run out of time to tune the firmware and layout before sending the design off to be manufactured, leaving 17 cycles as the best compromise based on what the design was capable of and did not cause issues. With Threadripper and the Ryzen APUs, AMD tweaked the design enough to hit an L2 latency of 12 cycles, which was not specifically promoted at the time despite the benefits it provides. Now with the Ryzen 2000-series, AMD has reduced it down further to 11 cycles. We were told that this was due to both the new manufacturing process but also additional tweaks made to ensure signal coherency. In our testing, we actually saw an average L2 latency of 10.4 cycles, down from 16.9 cycles in on the Ryzen 7 1800X.

The L3 difference is a little unexpected: AMD stated a 16% better latency: 11.0 ns to 9.2 ns. We saw a change from 10.7 ns to 8.1 ns, which was a drop from 39 cycles to 30 cycles.

Of course, we could not go without comparing AMD to Intel. This is where it got very interesting. Now the cache configurations between the Ryzen 7 2700X and Core i7-8700K are different:

CPU Cache uArch Comparison
  AMD
Zen (Ryzen 1000)
Zen+ (Ryzen 2000)
Intel
Kaby Lake (Core 7000)
Coffee Lake (Core 8000)
L1-I Size 64 KB/core 32 KB/core
L1-I Assoc 4-way 8-way
L1-D Size 32 KB/core 32 KB/core
L1-D Assoc 8-way 8-way
L2 Size 512 KB/core 256 KB/core
L2 Assoc 8-way 4-way
L3 Size 8 MB/CCX
(2 MB/core)
2 MB/core
L3 Assoc 16-way 16-way
L3 Type Victim Write-back

AMD has a larger L2 cache, however the AMD L3 cache is a non-inclusive victim cache, which means it cannot be pre-fetched into unlike the Intel L3 cache.

This was an unexpected result, but we can see clearly that AMD has a latency timing advantage across the L2 and L3 caches. There is a sizable difference in DRAM, however the core performance metrics are here in the lower caches.

We can expand this out to include the three AMD chips, as well as Intel’s Coffee Lake and Kaby Lake cores.

This is a graph using cycles rather than timing latency: Intel has a small L1 advantage, however the larger L2 caches in AMD’s Zen designs mean that Intel has to hit the higher latency L3 earlier. Intel makes quick work of DRAM cycle latency however.

Talking 12nm and Zen+ Translating to IPC: All This for 3%?
Comments Locked

545 Comments

View All Comments

  • FaultierSid - Wednesday, April 25, 2018 - link

    The question is if testing a CPU at 4K Gaming does make much sense. At 4K the bottleneck is the GPU, not the CPU, especially since they tested with a 1080 and not a 1080TI.
    It is not a coincidence that the cpus all are showing roundabout the same fps in the 4K tests. Civilization seems to be easier on the GPU and shows 8700K in the lead, all other games show almost same fps for all 4 tested CPUs. Thats because the fps is limited by GPU in that case, not by the CPU.

    You might want to bring up the point that if you are Gaming in 4K and at highest settings, it doesn't make sense for you to look at 1080p benchmarks. And right now this might make sense, but not in a couple years when you upgrade your GPU to a faster model and the games are not GPU bottlenecked anymore. Then where you now see 60fps you might see 100 fps with an 8700K and only 80fps with the Ryzen 2600X.

    Basically, testing CPUs in Gaming at a resolution that stresses out the GPU so much that the performance of the CPU becomes almost irrelevant is not the right way to judge the Gaming Performance of a CPU.

    If your point is that at the time you purchase a new GPU you will also purchase a new CPU, then this might not affect you, and you decide to pick the 2700X over an 8700K because of all the advantages in other areas.
    But in general, we have to admit, the crown of "best gaming CPU" is (sadly) still in Intel's Corner.
  • mapesdhs - Monday, May 14, 2018 - link

    If all you're doing is gaming at 4K then yes, in most titles thebottleneck will be the GPU, but this is not always the case. These days live streaming on Twitch is becoming popular, and for that it really does help to have more cores; the load is pushed back onto the CPU, even when the player sees smooth updates (the viewer side experience can be bad instead). GN has done some good tests on this. Plus, some games are more reliant on CPU power for various reasons, especially the use of outdated threading mechanisms. And in time, newer games will take better advantage of more cores, especially due the compatibility with consoles.
  • jjj - Wednesday, April 25, 2018 - link

    So what was wrong, was it HPET crippling Intel or does Intel have some kind of issue with 4 channels memory?
  • Ryan Smith - Wednesday, April 25, 2018 - link

    The former.
  • risa2000 - Thursday, April 26, 2018 - link

    Can you explain a bit HPET crippling? I was looking around Google, but did not find anything really conclusive.
  • Uxot - Wednesday, April 25, 2018 - link

    So...i have 2666mhz RAM...RAM support for 2700X says 2933...what does that mean ? is 2933 the lowest ram compatibility ? FML if i cant go with 2700X bcz of ram.. -_-
  • Maxiking - Thursday, April 26, 2018 - link

    It refers to the highest OFFICIALLY supported frequency by the chipset on your mobo. You should be able to run RAM with higher clocks than 2933 but they might be issues. Because Ryzen memory support sucks. For higher clocked rams, I would check it they are on the QVL, so that way, you can be sure, they were tested with your mobo and no issues will arrise.

    2666mhz RAM will run without any issue on your system.
  • johnsmith222 - Thursday, April 26, 2018 - link

    Make sure you have the newest bios update, AGESA 1.0.0.2a seems to improve memory compatibility too. My crappy kingston 2400 cl17 now works fine at 3000 cl15 1.36V. I'll try 3200 at 1.38V later.
  • Uxot - Wednesday, April 25, 2018 - link

    Ok...my comment got deleted for NO REASON...
  • Gideon - Thursday, April 26, 2018 - link

    Good work tracking down the timing issues! I know that this review is still WIP, but just noticed that the "Power Analysis" block has a "fsfasd" written right after it, that probably isn't needed :)

Log in

Don't have an account? Sign up now