CPU Tests: Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

On all our Threadripper Pro CPUs, we saw:

  • a thread-to-thread latency of 7ns,
  • a core-to-core in the same CCX latency as 17-18 nanoseconds,
  • a core-to-core in a different CCX scale from 80 ns with no IO die hops to 113 with 3 IO die hops

Here we can distinuguish how long it takes for threads to ping back and forth with cores that are different hops across the IO die.

A y-Cruncher Sprint

The y-cruncher website has a large about of benchmark data showing how different CPUs perform to calculate specific values of pi. Below these there are a few CPUs where it shows the time to compute moving from 25 million digits to 50 million, 100 million, 250 million, and all the way up to 10 billion, to showcase how the performance scales with digits (assuming everything is in memory). This range of results, from 25 million to 250 billion, is something I’ve dubbed a ‘sprint’.

I have written some code in order to perform a sprint on every CPU we test. It detects the DRAM, works out the biggest value that can be calculated with that amount of memory, and works up from 25 million digits. For the tests that go up to the ~25 billion digits, it only adds an extra 15 minutes to the suite for an 8-core Ryzen CPU.

With this test, we can see the effect of increasing memory requirements on the workload and the scaling factor for a workload such as this. We're plotting milllions of digits calculated per second.

The 64C/64T processor obtains the peak efficiency here, although as more digits are calculated, the memory requirements come into play.

CPU Tests: SPEC Conclusion
Comments Locked

98 Comments

View All Comments

  • Qasar - Tuesday, July 27, 2021 - link

    sorry but that is not HEDT, workstation, sure. the last HEDT platform intel had was x299 and socket 2066
    socket 3647, is there server/workstation platform, but hey if you consider a US $3k cpu to be a HEDT processor, then that's your choice :-)
  • mode_13h - Monday, July 26, 2021 - link

    > at least amd HAS a HEDT cpu, when was the last one from intel ?

    Intel is doing an Ice Lake workstation platform. Not sure if HEDT will follow.
  • mode_13h - Sunday, July 25, 2021 - link

    > 7 days to August

    The rumor was that it would be *announced* at some point in August. It didn't say when, in August, but the rumored ship date wasn't until sometime in September. But it's just a rumor.
  • croc - Monday, July 26, 2021 - link

    MY point is that the BIOS updates usually happen about a month before the product announcement. Not to mention some benchmarks and other 'leaked' information. Y'know,,, Hype generation, direct from horsey's mouth. August announcement? Don't think so. Chagall? Possible, but would break convention, not that AMD really has any when it comes to code names...
  • mode_13h - Monday, July 26, 2021 - link

    > BIOS updates usually happen about a month before the product announcement.

    Before announcement or ship?

    > Hype generation

    Seems to me that it's not necessary, in this case. AMD will already have more demand than it can satisfy.
  • Qasar - Tuesday, July 27, 2021 - link

    " Not to mention some benchmarks and other 'leaked' information "
    considering how few leaks and info have come out about amd's products as of late until quite close to release, im not surprised there is little info out there about zen 3 TR

    " Hype generation "
    which amd doesnt need all that much, their products are more interesting then intels right now, intel needs the hype, not amd ;-)
  • Mikewind Dale - Monday, July 19, 2021 - link

    Given how much trouble Intel has had with their new process - even though Intel used to be the industry leader in fabrication - I suspect that if AMD had kept fabrication in-house, they'd be in serious trouble right.

    GlobalFoundries has also had trouble moving to a new, cutting-edge process. At the moment, they'd decided to stay one process behind TSMC, and cater to the portion of the market that doesn't need a cutting-edge process.
  • anakhizer - Monday, July 19, 2021 - link

    The article is excellent! However, the ordering of data in the tables is absolutely terrible.

    Please figure out how to sort the tables in a more logical manner like performance. As the tables are they are pretty much unreadable if you want to get the performance numbers with a glance.
  • kensiko - Monday, July 19, 2021 - link

    Performance wise, looking at all those graphs, the 5950x is such a great deal ! I really love my 5950x. I did love my TR1950x, it was not getting as hot at my 5950x. But no way I'm going back to Threadripper for just a home PC. Event at work I don't think we would get a Threadripper again, the Epyc gives what we want even if the frequency is a bit lower.
  • mode_13h - Tuesday, July 20, 2021 - link

    Threadripper still makes a lot of sense for people who have scalable workloads (or run lots of VMs) and who don't need the full memory bandwidth or PCIe lanes of EPYC or TR Pro.

    I personally wouldn't buy one, but they're popular for deep learning workstations and Linus Torvalds famously has one.

Log in

Don't have an account? Sign up now