Core-to-Core Latency: Issues with the Core i5

For Intel’s Comet Late 10th Gen Core parts, the company is creating two different silicon dies for most of the processor lines: one with 10 cores and one with 6 cores. In order to create the 8 and 4 core parts, different cores will be disabled. This isn’t anything new, and has happened for the best part of a decade across both AMD and Intel in order to minimize the number of new silicon designs, and also to build in a bit of redundancy into the silicon and enable most of the wafer to be sold even if defects are found.

For Comet Lake, Intel is splitting the silicon such that all 10-core Core i9 and 8-core Core i7 processors are built from the 10c die, as is perhaps expected, and the 6-core Core i5 and 4-core Core i3 processors are built from the 6c die. The only exception to these rules are the Core i5-10600K/KF processors which will use the 10-core die with four cores disabled, giving six cores total. This leads to a potential issue.

So imagine a 10c die as two columns of five cores, capped on each end by the System Agent (DRAM, IO) and Graphics, creating a ring of 12 stops that data has to go through to reach other parts of the silicon. Let us start simple, and imagine disabling two cores to make an 8c processor. It can be pretty straightforward to guess the best/worst case scenario in order to get the best/worst core-to-core latency

The other worst 8c case might be to keep Core 0 enabled, and then disable Core 1 and Core 2, leaving Core 3-9 enabled.

We can then disable four cores from the original 10 core setup. It can be any four cores, so imagine another worst case and a best case scenario.

On the left we have the absolute best case arrangement that minimizes all core-to-core latency. In the middle is the absolute worst case, with any contact to the first core in the top left being a lot higher latency with more distance to travel from any core. On the right is an unbalanced design, but perhaps a lower variance in latency.

When Intel disables cores to create these 8c and 6c designs, the company has in the past promised that any disabling would leave the rest of the processor ‘with similar performance targets’, and that while different individual units might have different cores disabled, they should all fall within a reasonable spectrum.

So let us start with our Core i5-10600K core-to-core latency chart.

Cores next door seem well enough, then as we make longer trips around the ring, it takes about 1 nanosecond longer for each stop. Until those last two cores that is, where we get a sudden 4 nanosecond jump. It’s clear that the processor we have here as a whole is lopsided in its core-to-core latency and if any thread gets put onto those two cores at the end, there might be some questionable performance.

Now it’s very easy to perhaps get a bit heated with this result. Unfortunately we don’t have an ‘ideal’ 6c design to compare it against, which makes comparisons on performance to be a bit tricky. But it does mean that there is likely to be variation between different Core i5-10600K samples.

The effect still occurs on the 8-core Core i7-10700K, however it is less pronounced.

There’s still a sizeable jump between the 3 cores at the end compared to the other five cores. One of the unfortunate downsides with the test is that the enumeration of the cores won’t correspond to any physical location, so it might be difficult to narrow down the exact layout of the chip.

Moving up to the big 10-core processor yields an interesting result:

So while we should have a steadily increasing latency here, there’s still that 3-4 nanosecond jump with two of the cores. This points to a different but compounding issue.

Our best guess is that these two extra cores are not optimized for this sort of ring design in Comet Lake. For their Core lineup of processors, Intel has been using a ring bus as the principle interconnect between its cores for over a decade, and we typically see them on four and six core processors. Intel also used a ring bus in its enterprise processors for many years, with chips up to 24 cores, however those designs used dual-ring buses in order to keep core-to-core latency down. Intel has put up to 12 cores on a single ring, though broadly speaking the company seems to prefer keeping designs to 8 or fewer cores per ring.

If Intel could do it for those enterprise chips, then why not for the 10 core Comet Lake designs here? We suspect it is because the original ring design that went into consumer Skylake processors, while it was for four cores, doesn’t scale linearly as the core count increases. There is a noticeable increase in the latency as we move from four to six and six to eight core silicon designs, but a ten-core ring is just a step too far, and additional repeaters are required in the ring in order to support the larger size.

There could also be an explanation relating to these cores also having additional function on that section of the ring, such as sharing duties with IO parts of the core, or PCIe lanes, and as a result extra cycles are required for any additional cacheline transfers.

We are realistically reaching the limits of any ring-line interconnect for Intel’s Skylake consumer line processors here. If Intel were to create a 12-core version of Skylake consumer for a future processor, a single ring interconnect won’t be able to handle it without an additional latency penalty, which might be more of a penalty if the ring isn't tuned for the size. There's also a bandwidth issue, as the same ring and memory has to support more cores. If Intel continue down this path, they will either have to use dual rings, use a different interconnect paradigm altogether (mesh, chiplet), or move to a new microarchitecture and interconnect design completely.

Frequency Ramps

We also performed our frequency ramps on all three processors. Nothing much to say here – all three CPUs went from 800 MHz idle to peak frequency in 16 milliseconds, or one frame at 60 Hz. We saw the peak turbo speeds on all the parts.

Test Bed and Setup Poking Power: Does Intel Really Need 250W for 10 Cores? (Yes)
Comments Locked

220 Comments

View All Comments

  • Boshum - Wednesday, May 20, 2020 - link

    I generally agree, but I'm not so certain AMD will be in 2nd place within 5 years (from a best CPU architecture point of view). They should be considering the difference in resources, but Intel is so spread out and AMD seems so focused.
  • poohbear - Wednesday, May 20, 2020 - link

    OK i'll bite. Why would anyone buy this generation of Intel processors when AMD's is just as powerful and yet more efficient being on 7nm? Especially with Ryzen 4000 coming out this fall.
  • dguy6789 - Wednesday, May 20, 2020 - link

    AMD is ahead in a few key areas- price vs performance, total number of cores/threads, power.

    Intel is still ahead in the per core/per thread area. An Intel 8 core 16 thread will beat an AMD 8 core 16 thread in absolutely everything because of just how high Intel chips can clock to. In short, Intel is a higher performing albeit more expensive option for low thread count workloads.
  • Boshum - Wednesday, May 20, 2020 - link

    I don't think the power and heat are too big a deal until you hit the 8 and 10-core K chips. The people that buy those are enthusiast gamers who want the highest possible FPS in games (whether they are able to perceive it or not, but I am sure they can in certain scenarios). A lot of those ultra-enthusiasts have a lot of fun with overclocking too, and Intel gets more out of that.
    Ryzen 4000 will undoubtedly be a better overall chip, but Rocket Lake should be coming to the LGA 1200 platform in the not too distant future. It may pass up Ryzen 4000 in gaming for those benchmark enthusiasts. It will be no match for Ryzen 4000 in heavy multi-core scenarios.
  • gagegfg - Wednesday, May 20, 2020 - link

    At the end of the day, AMD continues to have the performance crown at a price premium (3950X).
    Also, it seems to me a bad ANANTECH policy for many graphics that do not have an AMD equivalent CPU and only put the 3600.
  • mandoman - Wednesday, May 20, 2020 - link

    I can't imaging anyone being the slightest bit concerned about power on the HEDT! It's simply ludicrous to even bring it into the discussion. Frankly the whole emphasis in this review smacks loudly of "tree hugger" philosophy which has no place in the high end computing arena at all.
  • Beany2013 - Wednesday, May 20, 2020 - link

    Some of us actually care about good engineering rather than pushing an old, inefficient process node as hard as technically possible.

    Enjoy dropping an extra £100 just to cool your CPU.
  • Hxx - Wednesday, May 20, 2020 - link

    WHAAT? U think this is not good engineering? this is BALLS engineering, they basically achieved a miracle on the 14nm platform. You are basically standing in front of a miracle. Step back and think about it. A 5 yo technology that competes and beats in many tests the competitor's 7nm process. Yes overall AMD may be the better purchase but again that not what im saying.
    Just think about that. On top of that they added good overclocking, controlled temps, plenty features, etc . Cant say im impressed with the Z490 platform itself since its the same old z390/70/270/170 with better connectivity but the CPU themselves will make history I mean the 14nm process sure is effing OLD but man what these guys did with this, the refinement it went through to achieve this performance on this OLD tech is amazing in my opinion and for that I applaud them. I want them to hurry up and wrap up Rocket Lake but this is definitely for sure no doubt definitely great engineering.
  • alufan - Thursday, May 21, 2020 - link

    so what exactly do you think would happen if AMD did the same thing threw the power limits out the window and used a 14++++++ node with the extra thermal headroom available with the 3000 series chips, Intel has not released its new process node chips because they cant make them work AMD has and the limitations are simply due to the node size and physics, they have engineered a way round the issue Intel even now is talking about backporting designs it stinks, this is a "new" chip from Intel with more top end period AMD has released 3 nodes in 3 years and has a new version coming up in a few months with a rumored 20% uplift in IPC but lets wait and see, not to mention 5nm is designed and being sampled and 3nm is in design, that is Engineering
  • Hxx - Thursday, May 21, 2020 - link

    ROFL AMD? AMD struggles with getting a BIOS right let alone fine tuning a platform ? Nah they are too busy now supposedly giving us a beta bios for the 4xx series and that's a very scary thought given AMD's track record. In case you didn't know, AMD doesn't make their own chips. If tsmc moves to a different node then so will AMD, that's how it works. So yes I applaud TSMC for good engineering, AMD not so much.

Log in

Don't have an account? Sign up now