CPU Tests: Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

On all our Threadripper Pro CPUs, we saw:

  • a thread-to-thread latency of 7ns,
  • a core-to-core in the same CCX latency as 17-18 nanoseconds,
  • a core-to-core in a different CCX scale from 80 ns with no IO die hops to 113 with 3 IO die hops

Here we can distinuguish how long it takes for threads to ping back and forth with cores that are different hops across the IO die.

A y-Cruncher Sprint

The y-cruncher website has a large about of benchmark data showing how different CPUs perform to calculate specific values of pi. Below these there are a few CPUs where it shows the time to compute moving from 25 million digits to 50 million, 100 million, 250 million, and all the way up to 10 billion, to showcase how the performance scales with digits (assuming everything is in memory). This range of results, from 25 million to 250 billion, is something I’ve dubbed a ‘sprint’.

I have written some code in order to perform a sprint on every CPU we test. It detects the DRAM, works out the biggest value that can be calculated with that amount of memory, and works up from 25 million digits. For the tests that go up to the ~25 billion digits, it only adds an extra 15 minutes to the suite for an 8-core Ryzen CPU.

With this test, we can see the effect of increasing memory requirements on the workload and the scaling factor for a workload such as this. We're plotting milllions of digits calculated per second.

The 64C/64T processor obtains the peak efficiency here, although as more digits are calculated, the memory requirements come into play.

CPU Tests: SPEC Conclusion
Comments Locked

98 Comments

View All Comments

  • Threska - Tuesday, July 20, 2021 - link

    VFIO would be more popular if video card makers weren't tight with GPU-pasthrough.

    https://forum.level1techs.com/t/the-vfio-and-gpu-p...

    CPUs like Threadripper would be a great fit.
  • FLORIDAMAN85 - Wednesday, July 21, 2021 - link

    Alt title: AMD, faster than Intel in Crysis, again.
  • quadibloc - Thursday, July 22, 2021 - link

    It's too bad it took so long for this chip to become generally available. I hope this won't be repeated in the next generation of Threadrippers - and they should have become available sooner. Like within a month of Ryzen, so that people could buy them before they're already obsolete.
  • mode_13h - Thursday, July 22, 2021 - link

    > Like within a month of Ryzen, so that people could buy them before they're already obsolete.

    First, how is it obsolete? TR 3000 and TR 3000 Pro are still peerless, in many ways.

    Second, Intel has traditionally had like 6 months or a year of lag between their mainstream and HEDT. I know you didn't say anything about Intel, but I'm pointing this out because it establishes a precedent for what AMD is doing (not that I think AMD is worried about precedents).
  • alpha754293 - Thursday, July 29, 2021 - link

    "file:///J:/Shared%20drives/AnandTech/Articles/20210706%20TR%20Pro/AMD%20Opens%20Up%20Threadripper%20Pro:%20Three%20New%20WRX80%20Motherboards"

    You might want to fix this link in your review.
  • Ryan Smith - Monday, August 2, 2021 - link

    Thanks!
  • 0ldman79 - Thursday, July 29, 2021 - link

    I can understand Lenovo locking their OEM CPUs to their motherboards as a packaged deal.

    If I read this correctly, ALL Threadripper Pro CPUs will be locked to Lenovo boards forever if they're ever installed in a Lenovo P620 motherboard.

    That's a huge load of crap. No one is going to know this except for Anandtech readers and whatever poor schmuck that gets screwed a few years down the road.

    Hopefully someone will figure out how to defeat that OEM lock. That is just poor judgment on AMD's part.

    To clarify, I have zero problem with the CPU being locked to the Lenovo system *as it is sold*, but it is 100% unacceptable to lock a LATER installed CPU to the motherboard as well.
  • GregoriaEgan - Sunday, December 12, 2021 - link

    So I was thinking about getting the low end 3955WX Threadripper Pro, but as I understand it the 24 Core "old" Threadripper are better because of more chiplets (and cores). I just saw a mentioning of TH "Pro" functions, I'm unsure, are there more functions of the Pro-line, like remote administration or the like?
    I doubt that I would ever get more than 256 GB RAM and I'm not sure that I need the extra PCI-e lanes, but are there any other more features that just don't exist in the non-pro TH, that you only get with the Pro-line?

Log in

Don't have an account? Sign up now