Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.


Click to enlarge (lots of cores and threads = lots of core pairings)

Comparing core to core latencies from Zen 4 (7950X) and Zen 3 (5950X), both are using a two CCX 8-core chiplet design, which is a marked improvement over the four CCX 16-core design featured on the Zen 2 microarchitecture, the Ryzen 9 3950X. The inter-core latencies within the L3 cache range from between 15 ns and 19 ns. The inter-core latencies between different cores within different parts of the CCD show a larger latency penalty of up to 79.5 ns, which is something AMD should work on going forward, but it's an overall improvement in cross CCX latencies compared to Zen 3. Any gain is still a gain.

Even though AMD has opted for a newer and more 'efficient' IOD which is based on TSMC's 6 nm node. It is around the same size physically as the previous AMD IOD on Zen 3 manufactured on GlobalFoundries 12 nm node, but with a much larger transistor count. Within the IOD is the newly integrated RDNA 2 graphics, although this isn't typical iGPU in the sense that an APU is. A lot of the room on the IOD is made up of the DDR5 memory controller or IMC, as well as the chips PCIe 5.0 lanes, and of course, connects to the logic through its primary interconnect named Infinity Fabric. All of these variables play a part on power, latency, and operation.


AMD Ryzen 9 5950X Core-to-Core Latency results

It's actually astounding how similar the latency performance of the Ryzen 9 7950X (Zen 4) is when compared directly to the Ryzen 9 5950X (Zen 3), despite being on the new 5 nm TSMC manufacturing process. Even with a change of IOD, but with the same interconnect, the inter-core latencies within the Ryzen 9 7950X are great in terms of cores within the same core complex; latency does degrade when pairing up with a core in another chiplet, but this works and AMD's Ryzen 5000 series proved that the overall penalty performance is negatable.

Test Bed and Setup SPEC2017 Single-Threaded Results
POST A COMMENT

205 Comments

View All Comments

  • RomanPixel - Tuesday, September 27, 2022 - link

    Me too! Reply
  • kmalyugin - Monday, September 26, 2022 - link

    Wow, this article is almost unreadable. Was spellchecker turned off? Reply
  • jonkullberg - Monday, September 26, 2022 - link

    Gaming benchmarks with DDR5-6000 CL30 please! Reply
  • BushLin - Monday, September 26, 2022 - link

    Exactly Reply
  • xol - Monday, September 26, 2022 - link

    wtf am I reading (context a part with tdp up to 170W from 105W) :

    "This has been possible through superior power efficiency, as Zencally a Zen 3 refinement, but on the new TSMC 5 nm process node (from TSMC 7 nm). This efficiency has allowed AMD to increase the overall TDP to 170 W from the previous 105 W but without too much penalty."

    I can't even .. "too much penalty" ??

    .. Looks like Zen has reached the end of the road imo (it had a good run) - none of the improvements here are from AMD - new DDR5, new 5nm node. The rest is "increase clocks/tdp" just like when Intel was stuck on 14nm.

    I just don't know where they are going from here
    Reply
  • Threska - Monday, September 26, 2022 - link

    Well we have " While Ryzen 7000 can drive a 2 DPC/4 DIMM setup, you’re going to lose 31% of your memory bandwidth if you go that route. So for peak performance, it’ll be best to treat Ryzen 7000 as a 1 DPC platform." and " Unfortunately, the compatibility situation is essentially unchanged from the AM4 platform, which is to say that while the CPU supports ECC memory, it’s going to be up to motherboard manufacturers to properly validate it against their boards.". The memory situation seems like a sticking point for a good while till things mature. Reply
  • BushLin - Monday, September 26, 2022 - link

    Did you read the article? Put it in eco mode (105W for a 170W part) and it still stomps over everything in MT performance. Zen 4 is more about platform improvements, Zen 5 will be the microarchitecture overhaul. Reply
  • BushLin - Monday, September 26, 2022 - link

    Stomping everything at 65W even! Reply
  • xol - Tuesday, September 27, 2022 - link

    Impressed that it's nominally $100 cheaper than a 5950X .Got to admit that. Reply
  • xol - Tuesday, September 27, 2022 - link

    Eco mode does perform better eg cinebench- b maybe +23% compared to 5950X, but it's using DDR5 5200 vs DDR4-3200 (?), and the power advantage can be assumed to come from 5nm

    My original point still stands for me- 90% of benefits are from node and memory and allowing clocks as high as Tjunction allows - I don't think that is a great showing for AMD
    Reply

Log in

Don't have an account? Sign up now