Core-to-Core Latency: Issues with the Core i5

For Intel’s Comet Late 10th Gen Core parts, the company is creating two different silicon dies for most of the processor lines: one with 10 cores and one with 6 cores. In order to create the 8 and 4 core parts, different cores will be disabled. This isn’t anything new, and has happened for the best part of a decade across both AMD and Intel in order to minimize the number of new silicon designs, and also to build in a bit of redundancy into the silicon and enable most of the wafer to be sold even if defects are found.

For Comet Lake, Intel is splitting the silicon such that all 10-core Core i9 and 8-core Core i7 processors are built from the 10c die, as is perhaps expected, and the 6-core Core i5 and 4-core Core i3 processors are built from the 6c die. The only exception to these rules are the Core i5-10600K/KF processors which will use the 10-core die with four cores disabled, giving six cores total. This leads to a potential issue.

So imagine a 10c die as two columns of five cores, capped on each end by the System Agent (DRAM, IO) and Graphics, creating a ring of 12 stops that data has to go through to reach other parts of the silicon. Let us start simple, and imagine disabling two cores to make an 8c processor. It can be pretty straightforward to guess the best/worst case scenario in order to get the best/worst core-to-core latency

The other worst 8c case might be to keep Core 0 enabled, and then disable Core 1 and Core 2, leaving Core 3-9 enabled.

We can then disable four cores from the original 10 core setup. It can be any four cores, so imagine another worst case and a best case scenario.

On the left we have the absolute best case arrangement that minimizes all core-to-core latency. In the middle is the absolute worst case, with any contact to the first core in the top left being a lot higher latency with more distance to travel from any core. On the right is an unbalanced design, but perhaps a lower variance in latency.

When Intel disables cores to create these 8c and 6c designs, the company has in the past promised that any disabling would leave the rest of the processor ‘with similar performance targets’, and that while different individual units might have different cores disabled, they should all fall within a reasonable spectrum.

So let us start with our Core i5-10600K core-to-core latency chart.

Cores next door seem well enough, then as we make longer trips around the ring, it takes about 1 nanosecond longer for each stop. Until those last two cores that is, where we get a sudden 4 nanosecond jump. It’s clear that the processor we have here as a whole is lopsided in its core-to-core latency and if any thread gets put onto those two cores at the end, there might be some questionable performance.

Now it’s very easy to perhaps get a bit heated with this result. Unfortunately we don’t have an ‘ideal’ 6c design to compare it against, which makes comparisons on performance to be a bit tricky. But it does mean that there is likely to be variation between different Core i5-10600K samples.

The effect still occurs on the 8-core Core i7-10700K, however it is less pronounced.

There’s still a sizeable jump between the 3 cores at the end compared to the other five cores. One of the unfortunate downsides with the test is that the enumeration of the cores won’t correspond to any physical location, so it might be difficult to narrow down the exact layout of the chip.

Moving up to the big 10-core processor yields an interesting result:

So while we should have a steadily increasing latency here, there’s still that 3-4 nanosecond jump with two of the cores. This points to a different but compounding issue.

Our best guess is that these two extra cores are not optimized for this sort of ring design in Comet Lake. For their Core lineup of processors, Intel has been using a ring bus as the principle interconnect between its cores for over a decade, and we typically see them on four and six core processors. Intel also used a ring bus in its enterprise processors for many years, with chips up to 24 cores, however those designs used dual-ring buses in order to keep core-to-core latency down. Intel has put up to 12 cores on a single ring, though broadly speaking the company seems to prefer keeping designs to 8 or fewer cores per ring.

If Intel could do it for those enterprise chips, then why not for the 10 core Comet Lake designs here? We suspect it is because the original ring design that went into consumer Skylake processors, while it was for four cores, doesn’t scale linearly as the core count increases. There is a noticeable increase in the latency as we move from four to six and six to eight core silicon designs, but a ten-core ring is just a step too far, and additional repeaters are required in the ring in order to support the larger size.

There could also be an explanation relating to these cores also having additional function on that section of the ring, such as sharing duties with IO parts of the core, or PCIe lanes, and as a result extra cycles are required for any additional cacheline transfers.

We are realistically reaching the limits of any ring-line interconnect for Intel’s Skylake consumer line processors here. If Intel were to create a 12-core version of Skylake consumer for a future processor, a single ring interconnect won’t be able to handle it without an additional latency penalty, which might be more of a penalty if the ring isn't tuned for the size. There's also a bandwidth issue, as the same ring and memory has to support more cores. If Intel continue down this path, they will either have to use dual rings, use a different interconnect paradigm altogether (mesh, chiplet), or move to a new microarchitecture and interconnect design completely.

Frequency Ramps

We also performed our frequency ramps on all three processors. Nothing much to say here – all three CPUs went from 800 MHz idle to peak frequency in 16 milliseconds, or one frame at 60 Hz. We saw the peak turbo speeds on all the parts.

Test Bed and Setup Poking Power: Does Intel Really Need 250W for 10 Cores? (Yes)
Comments Locked

220 Comments

View All Comments

  • Khenglish - Wednesday, May 20, 2020 - link

    Ian, for the Crysis CPU render test you'd probably get higher FPS disabling the GPU in the device manager and set Crysis to use hardware rendering. Disabling the GPU driver enables software rendering by default on Windows 10. The Win10 rendering does stutter worse than the reported FPS though, so take from it what you want.
  • shaolin95 - Wednesday, May 20, 2020 - link

    "But will the end-user want that extra percent of performance, for the sake of spending more on cooling and more in power?"

    Such retarded comment. More power...do you actually know who little difference this makes in a year. Wow this place is going down hill fast.
    Oh and a cooler you know we don't have to change our cooler with every CPU purchase so don't make it seem like this HUGE issue...your AMD fanboy colors are showing VERY clearly.
  • schujj07 - Wednesday, May 20, 2020 - link

    If you think you can use the 212 EVO you have from a 6700k or 7700k to keep the 10900k cool you are absolutely nuts. "Speaking with a colleague, he had issues cooling his 10900K test chip with a Corsair H115i, indicating that users should look to spending $150+ on a cooling setup. That’s going to be a critical balancing element here when it comes to recommendations." This isn't any form of fanboyism. This is stating a fact that to squeeze out the last remaining bits of performance in Skylake & 14nm Intel had to sacrifice massive amounts of heat/power to do so.
  • Maxiking - Wednesday, May 20, 2020 - link

    If you have issues cooling 10900k with H115i, the problem is always between the monitor and chair.

    They were able to cool OC 10900k with 240m AIO just lol

    Incompetency of some reviewers is just astonishing
  • schujj07 - Wednesday, May 20, 2020 - link

    All depends on the instructions that you are running. From Tomshardware: "We tested with the beefier Noctua NH-D15 and could mostly satisfy cooling requirements in standard desktop PC applications, but you will lose out on performance in workloads that push the boundaries with AVX instructions. As such, you'll need a greater-than-280mm AIO cooler or a custom loop to unlock the best of the 10900K. You'll also need an enthusiast-class motherboard with beefy power circuitry, and also plan on some form of active cooling for the motherboard's power delivery subsystem." https://www.tomshardware.com/reviews/intel-core-i9...
    "While Intel designed its 250W limit to keep thermals 'manageable' with a wide variety of cooling solutions, most motherboard vendors feed the chip up to ~330W of power at stock settings, leading to hideous power consumption metrics during AVX stress tests. Feeding 330W to a stock processor on a mainstream motherboard is a bit nuts, but it enables higher all-core frequencies for longer durations, provided the motherboard and power supply can feed the chip enough current, and your cooler can extract enough heat.

    To find the power limit associated with our chip paired with the Gigabyte Aorus Z490 Master motherboard, we ran a few Prime95 tests with AVX enabled (small FFT). During those tests, we recorded up to 332W of power consumption when paired with either the Corsair H115i 280mm AIO watercooler or a Noctua NH-D15S air cooler. Yes, that's with the processor configured at stock settings. For perspective, our 18-core Core i9-10980XE drew 'only' 256W during an identical Prime95 test." https://www.tomshardware.com/reviews/intel-core-i9...

    Think it is still a pebkac error?
  • alufan - Thursday, May 21, 2020 - link

    try this he doesn't slate the intel or amd just a proper review with live power draw at the socket OMG lol you need your won power plant when you run these let alone over clock it

    https://www.kitguru.net/components/leo-waldock/int...
  • Spunjji - Tuesday, May 26, 2020 - link

    "They were able to cool OC 10900k with 240m AIO just lol"
    Who were? Everyone I've read indicates that with a 240mm AIO, CPU temps hit 90+

    Pathetic comment troll is pathetic.
  • Retycint - Wednesday, May 20, 2020 - link

    It is, in fact, a huge issue because most people won't have high end coolers necessary to keep the thermals under control. Personal attacks such as accusing people of being a "fanboy" just degrades your argument (if there was any in the first place) and make you look dumb
  • Spunjji - Tuesday, May 26, 2020 - link

    "Such retarded comment."
    The pure, dripping irony of using a slur to mock someone else's intelligence, but screwing up the grammar of the sentence in which you do it...

    Some people build from scratch. Some people have uses for their old system. larger PSUs and suitable cooling to get optimal performance from this CPU don't come cheap. Go home, troll.
  • watzupken - Wednesday, May 20, 2020 - link

    Not surprising, Intel managed to keep their advantage in games by pushing for higher frequency. However the end result is a power hungry chip that requires some high end AIO or custom water cooler to keep cool. I agree that Intel is digging themselves deeper and deeper into a hole that they will not be able to get out so easily. In fact I don't think they can get out of it until their 7nm is ready and mature enough to maintain a high frequency, or they come out with a brand new architecture that allows them to improve on Comet Lake's performance without the crazy clockspeed. Indeed, they will not be able to pull another generation with their Skylake + 14nm combination looking at the power consumption and heat generation issue. Intel should consider bundling that industrial chiller they used to cool their 20 core chip during the demo.

Log in

Don't have an account? Sign up now