CPU Tests: Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

All three CPUs exhibit the same behaviour - one core seems to be given high priority, while the rest are not.

Frequency Ramping

Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.

Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.

One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.

We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.

We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.

From an idle frequency of 800 MHz, It takes ~16 ms for Intel to boost to the top frequency for both the i9 and the i5. The i7 was most of the way there, but took an addition 10 ms or so. 

Power Consumption: Caution on Core i9 CPU Tests: Office and Science
Comments Locked

279 Comments

View All Comments

  • GeoffreyA - Tuesday, March 30, 2021 - link

    It could be due to x264 limiting the number of threads because when vertical resolution divided by threads drops below a certain threshold---I think round about 30 or 40---quality begins to suffer.
  • GeoffreyA - Wednesday, March 31, 2021 - link

    I tested this now on FFmpeg but it should be the same on Handbrake because the x264/5 libraries are doing the actual encoding.

    I only have a 4C/4T CPU but used the "-threads" switch to request more. On x264, regardless of resolution, once more than 16 threads are asked for, it logs a warning that it's not recommended but goes ahead and uses the requested count, up to 128. I assume that running at default settings, like AT is probably doing with Handbrake, will let x264 cut off at 16 by itself. If someone could confirm this with a 32-thread CPU, that would be nice. As for x265, I gave it a try as well and the encoder refuses to go on if more than 16 threads are requested, saying the range must be between 0 and X265_MAX_FRAME_THREADS.

    In short, I reckon both these codecs are cutting off at 16 threads on default settings. If Ian or someone else could test how much extra is gained by manually putting in the count on a 32T CPU, that would be interesting.
  • scott_htpc - Tuesday, March 30, 2021 - link

    Splat. Backporting doesn't really work & dead-end platform.

    What I'd really like to read is a detailed narrative of Intel's blunders over the last 5-10 years. To me, it probably makes a case study in failed leadership & hubris, but I would really like to read an authoritative, detailed account. I'm curious why the risks of their decisions were not enough to dissuade them to take a better path forward.
  • Prosthetic Head - Tuesday, March 30, 2021 - link

    Yes, some sort of post mortem on Intel development over the last few years would be interesting. Once they abandoned the Pentium 4 madness, they did a good job with Core, Core2 and then the early stages of the 'i' series. Because AMD were by that point down their own dead end, they had essentially no competition for about a decade. The tempting easy explanation is that as a de facto monopoly for desktop and laptop CPUs, they only innovated enough to keep the upgrade cycle ticking over, then when AMD made a rapid comeback they got caught with their pants down and some genuine technical difficulties in fab tech.... But the reality could be a lot more complex and interesting than that.
  • Hifihedgehog - Tuesday, March 30, 2021 - link

    > But the reality could be a lot more complex and interesting than that.

    The reality is Conroe was a once-in-a-lifetime IPC improvement, literally 90% better (or nearly double the performance!) clock-for-clock than the ill-fated Pentium 4 (see here: https://www.reddit.com/r/intel/comments/m7ocxj/pen... They are not going to get that again unless Gelsinger clones himself across Intel's entire leadership team. Now, they may get something Zen-like in the ~50% range, but nothing Conroe-like unless ALL the stars align after a decade of complacency.
  • Hifihedgehog - Tuesday, March 30, 2021 - link

    https://www.reddit.com/r/intel/comments/m7ocxj/pen...
  • 29a - Tuesday, March 30, 2021 - link

    Keep in mind that P4 was a piece of shit built for marketing high clock speeds and was easily beaten by Athlon 64 running 1Ghz slower so getting that much IPC wasn't as hard as usual.
  • GeoffreyA - Wednesday, March 31, 2021 - link

    "Keep in mind that P4 was a piece of"

    Not to defend the P4, but Northwood wasn't half bad in the Athlon XP's time, beating it quite a lot. It was Prescott that mucked it all up.
  • TheinsanegamerN - Wednesday, March 31, 2021 - link

    TBF, the only reason it wasnt half bad is AMD's willingness to just abandon XP. I mean, only 2.23 GHz? 3 GHz OCs were not hard to do with their mobile lineup, and those obliterated anything intel would have until conroe. IF they had released 2.4, 2.6, and 2.8 GHz athlon XPs intel would have been losing every benchmark against them.
  • GeoffreyA - Friday, April 2, 2021 - link

    Oh yes, the XP had the higher IPC and would have given Intel a sound drubbing if its clocks were only higher. Thankfully, the Athlon 64 came and turned the tables round. I remember in those days my heart was set on the 3200+ Barton but I ended up with a K8 budget system of sorts.

Log in

Don't have an account? Sign up now