Navigating the X299 Minefield: Kaby Lake-X Support

When building a platform, keeping it limited to one particular type of product makes it simple and easy to understand, at the expense of flexibility. The idea of flexibility is one that Intel and AMD have experimented with in the past, enabling users to stay on the same underlying platform and upgrade in future generations, but with X299, Intel is taking a large step forward in support. This is both a good and bad thing, depending on how different the support for the different generations needs to be. In this context, Skylake-X and Kaby Lake-X are like chalk and cheese, which can present a headache for users new to building systems. It has caused some minor headaches with system builders and motherboard manufacturers already.

To recap, the three elements of the Basin Falls platform launch were the motherboards/X299 chipset, the SKL-X processors and the KBL-X Processors.

X299: What Is It?

The X299 chipset supports the new Skylake-X and Kaby Lake-X processors, and like the Z170 chipset and Z270 chipset counterparts on the mainstream consumer line, are basically big PCIe switches. One of the issues with the older X99 chipset was its limited capabilities, and inability to drive many PCIe devices – this changes with the big switch mentality on X299. For the DMI 3.0 link going into the chipset (basically a PCIe 3.0 x4), the chipset has access to up to 24 PCIe 3.0 lanes for network controllers, RAID controllers, USB 3.1 controllers, Thunderbolt controllers, SATA controllers, 10GbE controllers, audio cards, more PCIe slot support, special controllers, accelerators, and anything else that requires PCIe lanes in either an x4, x2 or x1 link.

The total uplink is limited by the DMI 3.0 link, but there will be very few situations where this is saturated. There are a few limits to what support is available (some ports are restricted in what they can handle), and only three PCIe 3.0 x4 drives can use the in-built PCIe RAID, but this should satiate all but the most hardcore enthusiasts.

Skylake-X CPUs: Coming in Two Stages

The Skylake-X family of processors for Basin Falls comes in two stages, based on the way the processors are developed. Normally HEDT processors are cut down versions of enterprise processors, usually through restricting certain functions, but the enterprise processors are typically derived from three different silicon layouts during manufacturing. Internally Intel call these three layouts the LCC (low core-count), HCC (high core-count) and XCC (extreme core-count), based on the maximum number of cores they support. Nominally Intel does not disclose which silicon layout they use for which processors, though it is usually straight forward to work them out as long as Intel has disclosed what the configurations of the LCC/HCC/XCC dies are. In this case, Intel has officially left everyone guessing, but the point here is that historically Intel only uses the LCC silicon from the enterprise line for its consumer desktop processors.

In previous generations, this meant either a 6, 8 or 10 core processor at the top of the stack for consumers, with lower core count models being provided by disabling cores (this is a complex topic involving the quality of the manufacturing process and determining voltage/frequency in a process called binning. We’ve covered it before, but it’s something all manufacturers have to do to get good yields as making processors isn’t a perfect process). Each year we expected one of two things: the top-end SKU either gets more frequency, less power, or more cores, and as such the march of progress has been predictable. If you had asked us two months ago, we would have fully expected Skylake-X to top out with LCC silicon at 10 or 12 cores, depending on how Intel was planning the manufacturing part.

So the first element of Intel’s launch is the LCC processors, running up to 10 cores. We previously published that the LCC silicon was 12 cores, but we can now show it is 10 – more on that later. The three Skylake-X CPUs launching today are using LCC silicon with 6, 8 or 10 cores as the Core i7-7800X, Core i7-7820X and Core i9-7900X respectively. Intel is further separating these parts by adjusting the level of official supported DRAM frequency, as well as the PCIe lanes.

The second element to the Skylake-X launch is the one that has somewhat surprised most of the industry: the launch will contain four processors based on the HCC silicon. Technically these processors will not be out until Q4 this year (one SKU coming out in August), and the fact that Intel did not have frequency numbers to share when announcing these parts show that they are not finalized, calling into question when they were added to the roadmap (and if they were a direct response to AMD announcing a 16-core part for this summer). We’ve written a detailed analysis on this in our launch coverage, but Intel is set to launch 12, 14, 16 and 18-core consumer level processors later this year, with the top part running a tray price (when you buy 1k CPUs at a time) of $1999, so we expect the retail to be nearer $2099.

It should be noted that due to a number of factors, the Skylake-X cores and the communication pathways therein are built slightly differently to the consumer version of Skylake-S, which is something discussed and analyzed in our Skylake-X review.

Kaby Lake-X: The Outliers

The final element to the Basin Falls launch is Kaby Lake-X. This is also an aspect of the Basin Falls platform that deviates from the previous generations. Intel’s HEDT line has historically been one generation behind the mainstream consumer platform due to enterprise life cycles as well as the added difficulty of producing these larger chips. As a result, the enterprise and HEDT parts have never had the peak processing efficiency (IPC, instructions per clock) of the latest designs and have sat in the wings, waiting. By bringing the Kaby Lake microarchitecture to HEDT in the form of a Core i7 and a Core i5, this changes the scene, albeit slightly.

Rather than bringing a new big core featuring the latest microarchitecture, Intel is repurposing the Kaby Lake-S mainstream consumer silicon, binning it to slightly more stringent requirements for frequency and power, disabling the integrated graphics, and then putting it in a package for the high-end desktop platform. There are still some significant limitations, such as having only 16 PCIe 3.0 lanes and dual channel memory which might exclude it from the traditional designation of being a true HEDT processor; however Intel has stated that these parts fill a request from customers to have the latest microarchitecture on the HEDT platform. They also overclock quite well, which is worth noting.

The Kaby Lake-X parts will consist of a Core i7 and Core i5, both of which are quad core parts, with the i7 supporting hyperthreading.

Problem Number 1: PCIe Layouts

Users can choose an X299 motherboard and a SKL-X processor, or an X299 motherboard with a KBL-X processor. Every X299 motherboard should have to support both, and it is the level of support that each processor needs that makes it a more difficult task than one might imagine. The obvious difference between the two is the number of PCIe lanes, and where they come from.

KBL-X processors have 16 PCIe 3.0 lanes from the processor, coming from a single PCIe root complex, and these can be bifurcated into x8/x8 or x8/x4/x4 depending on what the motherboard manufacturer wants to implement.

SKL-X processors have either 28 PCIe 3.0 lanes or 44 PCIe 3.0 lanes, depending on which model you buy, and these can come from up to three PCIe x16 root complexes (some complexes will be limited to fit the number). Each one can still be bifurcated into x8/x4/x4, but typically one would expect one PCIe root complex to fill the first x16, and then the next x16 comes from the second root complex, which will either be filled by x8 on the 28-lane processors or x16 on the 44-lane processors. The PCIe lanes can also fly off to support other things, such as storage or Ethernet controllers.

The issue here is that motherboard manufacturers have to design for all three different PCIe lane counts. It is very easy to design a motherboard for SKL-X, and then when a user has a KBL-X processor half the features do not work. This is going to be true for a lot of PCIe slots, and in order to manage all this, manufacturers have to equip the motherboards with PCIe switches to make sure everything is routed correctly for both CPUs, and these switches add cost to the platform. If there were two different platforms, there would not be this added per-board cost (but vendors would have to build two boards instead, each one easier to design).


Chipset Diagram of MSI's X299 XPower Gaming AC, their high-end MB

Aside from the PCIe slots, storage is also going to become an issue. With the previous generation of X99, we started to see M.2 based PCIe storage coming from the processor – guaranteeing no limit in uplink bandwidth, especially when RAID was used. For the new X299 platform, because of KBL-X support, most M.2 slots will be derived from the chipset, adding a small amount of latency but also somewhat limited by the chipset-to-CPU uplink limit of PCIe 3.0 x4. Using two fast M.2 drives in RAID via the chipset will be limited by its own connection. Motherboard vendors know this, and they want to have as many features supported as much as possible in all situations, and so finding a motherboard with a CPU-derived M.2 slot is going to be a rare thing indeed.

The solution to some of this is to have specific motherboards for each set of CPUs. Sure, both CPUs will still work in the motherboard, but when using the wrong family for the CPU design, functionality is going to be severely limited. ASUS has already posted some details about its new Extreme motherboard for X299, with the disclaimer ‘not designed for Kaby Lake-X CPUs’ because the features on board are aimed at SKL-X customers only. We might see more of this filtering through.

There’s also the DRAM: KBL-X is a dual channel design, SKL-X is quad-channel. On an eight-DRAM slot X299 motherboard, only four are operational for KBL-X, wasting board space. The primary four slots for KBL-X are different to SKL-X as well, just in case a user needs to move DRAM around. If a user buys an X299 motherboard with only four slots, chances are only two will work with KBL-X.

One argument here is that a user can upgrade from KBL-X to SKL-X later, or to the beefier KBL-X CPUs launched next generation.

Problem Number 2: Power

Skylake and Kaby Lake are different x86 microarchitectures – the KBL core design was meant to be an ‘optimization’ implementation of Skylake, hitting a few loose hanging fruit and using an updated 14nm process to give better power consumption and better voltage/frequency response from the silicon. There isn’t so much drastic change in the cores, but there is in how the power is delivered.

Skylake-X uses an integrated voltage regulator, or IVR. If you recognize the term, then that is because Intel launched its Broadwell based CPUs with a FIVR, or fully-integrated voltage regulator. Skylake-X does not go all-in like Broadwell did, but for some of the voltage inputs to the CPU, the processor takes in a single voltage and splits it internally, rather than relying on the external circuitry of the motherboard to do so. This affords some benefits, such as consistency in voltage delivery and to a certain extent, some efficiency power gains, and it should simplify the motherboard design - unless you also have to design for non-IVR CPUs, like Kaby Lake-X.

Kaby Lake-X is more of a typical power delivery design, with all the required voltages being supplied by the motherboard. That means that the motherboard has to support both types of voltage delivery, and also adjust itself at POST if a different CPU has been placed in. This obviously adds to the boot time to check if it is the same, but it does require some voltages to be moved around, as too high a voltage can kill a processor. We’ve already killed one.

Specifically, the VRIN voltage on Skylake-X needs to be 1.8V input into the processor for the IVR to work. The same setting on Kaby Lake-X needs to be 1.1 volts for VCCIO. If the motherboard originally had a SKL-X processor in it and does not detect when a KBL-X processor is in, then the motherboard will supply 1.8 volts into the KBL-X rail and the chip will soon die.

When we received samples for SKL-X and KBL-X, we were told by our motherboard partners that if we were switching between the two CPUs, we would have to flush the BIOS. This involves removing AC power when switched off, and holding the Clear CMOS button for 30-60 seconds to train the capacitors and essentially reset the BIOS to default, so it could then detect which CPU was in play before applying full voltages.

We did this, and still ended up with a dead Kaby Lake i7-7740X. There is now a lump of sand in my CPU box. The interesting thing is that this CPU did not die instantly: we started SYSMark, which involves several reboots during the initial preparation phase. On about the 4th reboot, the system stuck with the BIOS code 0d. Nothing I did was able to go beyond this code, and I put in our Kaby i5 and that ran fine. I put in SKL-X and that ran fine. I put the Kaby i5 in and that ran benchmarks fine. It would appear that our initial Kaby i7 did not have much headroom, and we had to get a replacement for some of the benchmarks.

Incidentally, we also had an i9-7900X die on us. That seems to be unrelated to this issue.

So The Solution?

Motherboard manufacturers have told us that there may be chip-specific motherboards out there in the future. But as it stands, users looking at KBL-X would save a lot of money (and headache) staying with Z270, as the motherboards are cheaper and more streamlined for a Kaby Lake design. Users looking at the top Skylake-X CPUs have nothing to worry about – unless the user really wanted PCIe storage from the CPU. In this case the user will have to find the one or two motherboards that support it, or invest in a PCIe to M.2 riser card and enable it this way – as long as the user puts it into a CPU-based PCIe slot.

So why even bother testing KBL-X if it comes across so downbeat on the platform situation? KBL-X still warrants testing, as the highest frequency processor Intel has released on the latest CPU microarchitecture. As mentioned above, overclocking on KBL-X seems very good, and some users will require the peak single thread performance possible. The argument is more the fact that some of these issues can complicate the platform, reducing accessibility to new builders and causing extra work/time/headaches for motherboard manufacturers and system builders. The issues above are not a significant barrier for any user willing to put in some time to ensure what they buy is suited for their workload. 

Core i7-7740X and Core i5-7640X Tested Test Bed and Setup
Comments Locked

176 Comments

View All Comments

  • mapesdhs - Monday, July 24, 2017 - link

    Ok, you get a billion points for knowing Commodore BASIC. 8)
  • IanHagen - Monday, July 24, 2017 - link

    Dr. Ian, I would like to apologize for my poor choice of words. Reading it again, it sounds like I accused you of something which is not the case.

    I'm merely puzzled by how Ryzen performs poorly using msvc compared to other compilers. To be honest, your finds are very relevant to anyone using Visual Studio. But again, I find Microsoft's VS compilar to be a bit of an oddball.

    A few weeks ago I was running my own tests to determine wether my Core i5 4690K was up to my compiling tasks. Since most of my professional job sits on top of programming languages with either short compile times or no compilation needed at all, I never bothered much about it. But recently I've been using C++ more and more during my game development hobby and compile times started to bother me. What I found puzzling is that after running a few test I couldn't manage to get any gains through parallelism, even after verifying that msvc was indeed spanning all 4 threads to compile files. Than I tried disabling two cores and clocking the thing higher and... it was faster! Not by a lot, but faster still. How could it be faster with a 50% decrease in the number of active cores and consequently threads doing compile jobs? I'm fully aware that linking is single threaded, but at least a few seconds should be gained with two extra cores, at least in theory. Today I had the chance to compile the same project on a Core i7 7700HQ and it was substantially slower than my Core i5 4690K even with clocks capped to 3.2 GHz. In fact, it was 33% slower than my Core i5 at stock speeds.

    Anyhow… Dr. Ian’s findings are a very good to point out to those compiling C++ using msvc that Skylake-X is probably worth it over Ryzen. For my particular case, it would appear that Kaby Lake-X with the Core i7 7740X could even be the best choice, since my project somehow only scales nicely with clocks.

    I just would like to see the wording pointing out that Skylake-X isn’t a better compiling core. It’s a better compiling core using msvc at this particular workload. On the GCC side of things, Ryzen is very competitive to it and a much better value in my humble opinion.

    As for the suggestion, I’d say that since Windows is a requirement trying to script something to benchmark compile times using GCC would be daunting and unrealistic. Not a lot of people are using GCC to work on the Windows side of things. If Linux could be thrown into the equation, I’d suggest a project based on CMake. That would make it somewhat easy to write a simple script to setup, create a makefile and compile the project. Unfortunately, I can not readily think of any big name projects such as Chromium that fulfill that requirement without having to meddle with eventual dependency problems as the time goes by.
  • Kevin G - Monday, July 24, 2017 - link

    These chips edge out their LGA 1151 counter parts at stock with overclocking also carrying a slight razor edge over LGA 1151 overclocks. There are gains but ultimately these really don't seem worth it, especially in light of the fragmentation that this causes the X299 platform. Hard to place real figures on this but I'd wager that the platform confusion is going to cost Intel more than what they will gain with these chips. Intel should have kept these in the lab until they could offer something a bit more substantial.
  • mapesdhs - Monday, July 24, 2017 - link

    I wonder if it would have been at least a tad better received if they hadn't cripplied the on-die gfx, etc.
  • DanNeely - Tuesday, July 25, 2017 - link

    LGA2066 doesn't have video out pins because it was originally designed only for the bigger dies that don't include them; and even if Intel had some 'spare' pins it could use adding video out would only make already expensive mobos with a wide set of features that vary based on the CPU model even more expensive and more confusing. Unless they add a GPU to either future CPUs in the family (or IMO a bit more likely) a very basic one to a chipset variant (to remove the crappy one some server boards add for KVM support) keeping the IGP fully off in mainstream dies on the platform is the right call IMO.
  • DrKlahn - Monday, July 24, 2017 - link

    Great article, but the conclusion feels off:

    "The benefits in the benchmarks are clear against the nearest competition: these are the fastest CPUs to open a complex PDF, at the top for office work, and at the top for most web interactions by a noticeable amount."

    In most cases you're talking about a second or less between the Intel and AMD systems. That will not be noticeable to the average office worker. You're much more likely to run into scenarios where the extra cores or threads will make an impact. I know in my own user base shaving a couple of seconds off opening a large PDF will pale in comparison to running complex reports with 2 (4 threads) extra cores for less money. I have nothing against Intel, but I struggle to see anything in here that makes their product worth the premium for an Office environment. The conclusion seems a stretch to me.
  • mapesdhs - Monday, July 24, 2017 - link

    Indeed, and for those dealing with office work it makes more sense to emphasise investment where it makes the biggest difference to productivity, which for PCs is having an SSD (ie. don't buy a cheap grunge box for office work), but more generally dear god just make sure employees have a damn good chair to sit on and a decent IPS display that'll be kind to their eyes. Plus/minus 1s opening a PDF is a nothingburger compared to good ergonomics for office productivity.
  • DrKlahn - Tuesday, July 25, 2017 - link

    Yeah an SSD is by far the best bang for the buck. From a CPU standpoint there are more use cases for Ryzen 1600 than there is the i5/i7 options we have from HP/Dell. Even the Ryzen 1500 series would probably be sufficient and allow even more per unit savings to put into other areas that would benefit folks more.
  • JimmiG - Monday, July 24, 2017 - link

    The 7740X runs at a just over 2% higher clock speed than the 7700X. It can overclock maybe 4% higher than the 7700X. You'd really have to be a special kind of stupid to pay hundreds more for an X299 mobo just for those gains that are nearly within the margin of error.

    It doesn't make sense as a "stepping stone" onto HEDT either, because you're much better off simply buying a real HEDT right away. You'll pay a lot more in total if you first get the 7740X and then the 7820X for example.
  • mapesdhs - Monday, July 24, 2017 - link

    Intel seems to think there's a market for people who buy a HEDT platform but can't afford a relevant CPU, but would upgrade later. Highly unlikely such a market exists. By the time such a theoretical user would be in a position to upgrade, more than likely they'd want a better platform anyway, given how fast the tech is changing.

Log in

Don't have an account? Sign up now