Core-to-Core Latency: Issues with the Core i5

For Intel’s Comet Late 10th Gen Core parts, the company is creating two different silicon dies for most of the processor lines: one with 10 cores and one with 6 cores. In order to create the 8 and 4 core parts, different cores will be disabled. This isn’t anything new, and has happened for the best part of a decade across both AMD and Intel in order to minimize the number of new silicon designs, and also to build in a bit of redundancy into the silicon and enable most of the wafer to be sold even if defects are found.

For Comet Lake, Intel is splitting the silicon such that all 10-core Core i9 and 8-core Core i7 processors are built from the 10c die, as is perhaps expected, and the 6-core Core i5 and 4-core Core i3 processors are built from the 6c die. The only exception to these rules are the Core i5-10600K/KF processors which will use the 10-core die with four cores disabled, giving six cores total. This leads to a potential issue.

So imagine a 10c die as two columns of five cores, capped on each end by the System Agent (DRAM, IO) and Graphics, creating a ring of 12 stops that data has to go through to reach other parts of the silicon. Let us start simple, and imagine disabling two cores to make an 8c processor. It can be pretty straightforward to guess the best/worst case scenario in order to get the best/worst core-to-core latency

The other worst 8c case might be to keep Core 0 enabled, and then disable Core 1 and Core 2, leaving Core 3-9 enabled.

We can then disable four cores from the original 10 core setup. It can be any four cores, so imagine another worst case and a best case scenario.

On the left we have the absolute best case arrangement that minimizes all core-to-core latency. In the middle is the absolute worst case, with any contact to the first core in the top left being a lot higher latency with more distance to travel from any core. On the right is an unbalanced design, but perhaps a lower variance in latency.

When Intel disables cores to create these 8c and 6c designs, the company has in the past promised that any disabling would leave the rest of the processor ‘with similar performance targets’, and that while different individual units might have different cores disabled, they should all fall within a reasonable spectrum.

So let us start with our Core i5-10600K core-to-core latency chart.

Cores next door seem well enough, then as we make longer trips around the ring, it takes about 1 nanosecond longer for each stop. Until those last two cores that is, where we get a sudden 4 nanosecond jump. It’s clear that the processor we have here as a whole is lopsided in its core-to-core latency and if any thread gets put onto those two cores at the end, there might be some questionable performance.

Now it’s very easy to perhaps get a bit heated with this result. Unfortunately we don’t have an ‘ideal’ 6c design to compare it against, which makes comparisons on performance to be a bit tricky. But it does mean that there is likely to be variation between different Core i5-10600K samples.

The effect still occurs on the 8-core Core i7-10700K, however it is less pronounced.

There’s still a sizeable jump between the 3 cores at the end compared to the other five cores. One of the unfortunate downsides with the test is that the enumeration of the cores won’t correspond to any physical location, so it might be difficult to narrow down the exact layout of the chip.

Moving up to the big 10-core processor yields an interesting result:

So while we should have a steadily increasing latency here, there’s still that 3-4 nanosecond jump with two of the cores. This points to a different but compounding issue.

Our best guess is that these two extra cores are not optimized for this sort of ring design in Comet Lake. For their Core lineup of processors, Intel has been using a ring bus as the principle interconnect between its cores for over a decade, and we typically see them on four and six core processors. Intel also used a ring bus in its enterprise processors for many years, with chips up to 24 cores, however those designs used dual-ring buses in order to keep core-to-core latency down. Intel has put up to 12 cores on a single ring, though broadly speaking the company seems to prefer keeping designs to 8 or fewer cores per ring.

If Intel could do it for those enterprise chips, then why not for the 10 core Comet Lake designs here? We suspect it is because the original ring design that went into consumer Skylake processors, while it was for four cores, doesn’t scale linearly as the core count increases. There is a noticeable increase in the latency as we move from four to six and six to eight core silicon designs, but a ten-core ring is just a step too far, and additional repeaters are required in the ring in order to support the larger size.

There could also be an explanation relating to these cores also having additional function on that section of the ring, such as sharing duties with IO parts of the core, or PCIe lanes, and as a result extra cycles are required for any additional cacheline transfers.

We are realistically reaching the limits of any ring-line interconnect for Intel’s Skylake consumer line processors here. If Intel were to create a 12-core version of Skylake consumer for a future processor, a single ring interconnect won’t be able to handle it without an additional latency penalty, which might be more of a penalty if the ring isn't tuned for the size. There's also a bandwidth issue, as the same ring and memory has to support more cores. If Intel continue down this path, they will either have to use dual rings, use a different interconnect paradigm altogether (mesh, chiplet), or move to a new microarchitecture and interconnect design completely.

Frequency Ramps

We also performed our frequency ramps on all three processors. Nothing much to say here – all three CPUs went from 800 MHz idle to peak frequency in 16 milliseconds, or one frame at 60 Hz. We saw the peak turbo speeds on all the parts.

Test Bed and Setup Poking Power: Does Intel Really Need 250W for 10 Cores? (Yes)
Comments Locked

220 Comments

View All Comments

  • ByteMag - Wednesday, May 20, 2020 - link

    I'm wondering why the 3300X wasn't in the DigiCortex benchmark? This $120 dollar 4c/8t banger lays waste to the selected lineup. Or is it too much of a foreshadowing of how Zen 3 may perform? I guess benchmarks can sometimes be like a box of chocolates.
  • ozzuneoj86 - Wednesday, May 20, 2020 - link

    Just a request, but can you guys consider renaming the "IGP" quality level something different? The site has been doing it for a while and it kind of seems like they may not even know why at this point. Just change it to "Lowest" or something. Listing "IGP" as a test, when running a 2080 Ti on a CPU that doesn't have integrated graphics is extremely confusing to readers, to say the least.

    Also, I know the main reason for not changing testing methods is so that comparisons can be done (and charts can be made) without having to test all of the other hardware configs, but I have one small request for the next suite of tests (I'm sure they'll be revised soon). I'd request that testing levels for CPU benchmarks should be:

    Low Settings at 720P
    Max Settings at 1080P
    Max Settings at 1440P
    Max Settings at 4K

    (Maybe a High Settings at 1080P thrown in for games where the CPU load is greatly affected by graphics settings)

    Drop 8K testing unless we're dealing with flagship GPU releases. It just seems like 8K has very little bearing on what people are realistically going to need to know. A benchmark that shows a range from 6fps for the slowest to 9fps for the fastest is completely pointless, especially for CPU testing. In the future, replacing that with a more common or more requested resolution would surely be more useful to your readers.

    Often times the visual settings in games do have a significant impact on CPU load, so tying the graphical settings to the resolution for each benchmark really muddies the waters. Why not just assume worst case scenario performance (max settings) for each resolution and go from there? Obviously anti-aliasing would need to be selected based on the game and resolution, with the focus being on higher frame rates (maybe no or low AA) for faster paced games and higher fidelity for slower paced games.

    Just my 2 cents. I greatly appreciate the work you guys do and it's nice to see a tech site that is still doing written reviews rather than forcing people to spend half an hour watching a video. Yeah, I'm old school.
  • Spunjji - Tuesday, May 26, 2020 - link

    Agreed 99% with this (especially that last part, all hial the written review) - but I'd personally say it makes more sense for the CPU reviews to be limited to 720p Low, 1080P High and 1440P Max.

    My theory behind that:
    720p Low gives you that entirely academic CPU-limited comparison that some people still seem to love. I don't get it, but w/e.
    1080p High is the kind of setting people with high-refresh-rate monitors are likely to run - having things look good, but not burning frames for near-invisible changes. CPU limiting is likely to be in play at higher frame rates. We can see whether a given CPU will get you all the way to your refresh-rate limit..
    1440p Max *should* take you to GPU-limited territory. Any setting above this ought to be equally limited, so that should cover you for everything, and if a given CPU and/or game doesn't behave that way then it's a point of interest.
  • dickeywang - Wednesday, May 20, 2020 - link

    With more and more cores being added to the CPU, it would've been nice to see some benchmarks under Linux.
  • MDD1963 - Wednesday, May 20, 2020 - link

    Darn near a full 2% gain in FPS in some games! Quite ...uhhh..... impressive! :/
  • MDD1963 - Wednesday, May 20, 2020 - link

    Doing these CPU gaming comparisons at 720P is just as silly as when HardOCP used to include 640x480 CPU scaling...; 1080P is low enough, go medium details if needed.
  • Spunjji - Tuesday, May 26, 2020 - link

    Personally agreed here. It just gives more fodder to the "15% advantage in gaming" trolls.
  • croc - Wednesday, May 20, 2020 - link

    It would be 'nice' if the author could use results from the exact same stack of chips for each test. If the same results cannot be obtained from the same stack, then whittle the stack down to those chips for which the full set of tests can be obtained. I could understand the lack of results on newly added tests...

    For a peer review exercise it would be imperative, and here at Anandtech I am sure that there are many peers....
  • 69369369 - Thursday, May 21, 2020 - link

    Overheating and very high power bills happens with Intel.
  • Atom2 - Thursday, May 21, 2020 - link

    Dear Ian, You must be the only person on the planet that goes to such lengths not to use AVX, that you even compare Intel's AVX512 instructions to a GPU based OpenCL, just to have a reason not to use it. Consequently you only have AMD win the synthetic benchmarks, but all real world math is held by Intel. Additionally, all those synthetics, which are "not" compiled with Intel C++. Forget it... GCC is only used by Universities. The level of bias towards AMD is becoming surreal.

Log in

Don't have an account? Sign up now