CPU Tests: Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

Broadwell is a familiar design, with all four cores connected in a ring-bus topology.

Cache-to-DRAM Latency

This is another in-house test built by Andrei, which showcases the access latency at all the points in the cache hierarchy for a single core. We start at 2 KiB, and probe the latency all the way through to 256 MB, which for most CPUs sits inside the DRAM (before you start saying 64-core TR has 256 MB of L3, it’s only 16 MB per core, so at 20 MB you are in DRAM).

Part of this test helps us understand the range of latencies for accessing a given level of cache, but also the transition between the cache levels gives insight into how different parts of the cache microarchitecture work, such as TLBs. As CPU microarchitects look at interesting and novel ways to design caches upon caches inside caches, this basic test proves to be very valuable.

Our data shows a 4-cycle L1, a 12-cycle L2, a 26-50 cycle L3, while the eDRAM has a wide range from 50-150 cycles. This is still quicker than main memory, which goes to 200+ cycles.

Frequency Ramping

Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.

Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.

One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.

We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.

We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.

From an idle frequency of 800 MHz, It takes ~32 ms for Intel to boost to 2.0 GHz, then another ~32 ms to get to 3.7 GHz. We’re essentially looking at 4 frames at 60 Hz to hit those high frequencies.

A y-Cruncher Sprint

The y-cruncher website has a large about of benchmark data showing how different CPUs perform to calculate specific values of pi. Below these there are a few CPUs where it shows the time to compute moving from 25 million digits to 50 million, 100 million, 250 million, and all the way up to 10 billion, to showcase how the performance scales with digits (assuming everything is in memory). This range of results, from 25 million to 250 billion, is something I’ve dubbed a ‘sprint’.

I have written some code in order to perform a sprint on every CPU we test. It detects the DRAM, works out the biggest value that can be calculated with that amount of memory, and works up from 25 million digits. For the tests that go up to the ~25 billion digits, it only adds an extra 15 minutes to the suite for an 8-core Ryzen CPU.

With this test, we can see the effect of increasing memory requirements on the workload and the scaling factor for a workload such as this.

  • MT   25m: 1.617s
  • MT   50m: 3.639s
  • MT  100m: 8.156s
  • MT  250m: 24.050s
  • MT  500m: 53.525s
  • MT 1000m: 118.651s
  • MT 2500m: 341.330s

The scaling here isn’t linear – moving from 25m to 2.5b, we should see a 100x time increase, but instead it is 211x.

CPU Tests: SPEC Gaming Tests: Chernobylite
Comments Locked

120 Comments

View All Comments

  • Billy Tallis - Wednesday, November 4, 2020 - link

    Ian already said he tests at JEDEC speeds, which includes the latency timings. Using modules that are capable of faster timings does not prevent running them at standard timings.
  • Quantumz0d - Tuesday, November 3, 2020 - link

    Don't even bother Ian with these people.
  • Nictron - Wednesday, November 4, 2020 - link

    I appreciate the review and context over a period of time. Having a baseline comparison is important and it is up to us the reader to determine the optimal environment we would like to invest in. As soon as we do the price starts to skyrocket and comparisons are difficult.

    Reviews like this also show that a well thought out ecosystem can deliver great value. Companies are here to make money and I appreciate reviewers that provide baseline compatible testing over time for us to make informed decisions.

    Thank you and kind regards,
  • GeoffreyA - Tuesday, November 3, 2020 - link

    Thanks, Ian. I thoroughly enjoyed the article and the historical perspective especially. And the technical detail: no other site can come close.
  • eastcoast_pete - Tuesday, November 3, 2020 - link

    Ian, could you comment on the current state of the art of EDRAM? How fast can it be, how low can the latency go? Depending on those parameters and difficulty of manufacturing, there might be a number of uses that make sense.
    One where it could is to possibly allow Xe graphics to use cheaper and lower power LPDDR-4 or -5 RAM without taking a large performance hit vs. GDDR6. 128 or 256 MB EDRAM cache might just do that, and still keep costs lower. Pure speculation, of course.
  • DARK_BG - Tuesday, November 3, 2020 - link

    Hi , what I'm wondering is where the 30% gap between the 5770C and 4790K in Games came from , compared to your original review and all other reviews out there of 5770C. Since I'm with a Z97 platform and 4.0GHz Xeon , moving to 4770k or 4790K doesn't make any sense given their second hand prices but 5770C on this review makes alot of sense.

    So is it the OS,the drivers , some BIOS settings or on the older reviews the systems were just GPU limited failing to explore the CPU performance?
  • jpnex - Friday, January 8, 2021 - link

    Lol, no, the I7 5775c is just stronger than an i7 4790k, this is a known fact. Other benchmarks show the same thing. Old benchmarks don't show It because back then people didn't know that deactivating the iGPU would give a performance boost.
  • DARK_BG - Wednesday, July 20, 2022 - link

    I forgot back then to reply back , based on this review I've sourced 5775C (for a little less than 100$ this days going for 140-150$) coupled with Asus Z97 Pro and after some tweaking (CPU at 4.1GHz , eDRAM at 2000MHz and some other minor stuff that I already forgot) the difference compared to the Xeon 4.0GHz in games was mind blowing.Later I was able to source and 32GB Corsair Dominator DDR3 2400MHz CL10 just for fun to make it top spec config. :)

    It is a very capable machine but this days I'll swap it for Ryzen 5800X3D to get the final train on the fastest Windows 7 capable gaming system.Yeah i know it is OLD OS but everything I need runs flawessly for more than a decade with only reainstall 7 years ago due to an SSD failure. It is my only personal Intel System for the past 22 years since it was the for a first time the best price performance second hand platform for a moment , all the rest were AMD based and I keep them all in working condition.

    BTW I was able to run Windows XP 64bit on the Z97 platform , I just need to swap the GTX 1070 for GTX 980/980 Ti to be fully functional everything else runs like a charm under XP i was able to hack the driver to install as an GTX 960 so I have a 2D hardware acceleration under XP on GTX 1070 since nvidia havent changed anything in regard to 2D compared to the previous generation
  • dew111 - Tuesday, November 3, 2020 - link

    Rocket lake should have been the comet lake processor with eDRAM. Instead they'll be lucky to beat comet lake at all.
  • erotomania - Tuesday, November 3, 2020 - link

    Thanks, Ian. I enjoyed this article from a NUC8i7BEH that has 128MB of coffee-flavored eDRAM. Also, thanks Ganesh for the recent reminder that Bean > Frost.

Log in

Don't have an account? Sign up now