Announcement Two: High Core Count Skylake-X Processors

The twist in the story of this launch comes with the next batch of processors. In our pre-briefing came something unexpected: Intel is bringing the high core count silicon from the enterprise side down to consumers. I’ll cover the parts and then discuss why this is happening.

The HCC die for Skylake is set to be either 18 or 20 cores. I say or, because there’s a small issue with what we had originally thought. If you had asked me six months ago, I would have said that the upcoming HCC core, based on some information I had and a few sources, would be an 18-core design. As with other HCC designs in previous years, while the LCC design is a single ring bus around all the cores, the HCC design would offer a dual ring bus, potentially lopsided, but designed to have an average L3 cache latency with so many cores without being a big racetrack (insert joke about Honda race engines). Despite this, Intel shared a die image of the upcoming HCC implementation, as in this slide:

It is clear that there are repeated segments: four rows of five, indicating the presence of a dual ring bus arrangement. A quick glance might suggest a 20 core design, but if we look at the top and bottom segments of the second column from the left: these cores are designed slightly differently. Are these actual cores? Are they different because they support AVX-512 (a topic discussed later), or are they non-cores, providing die area for something else? So is this an 18-core silicon die or a 20-core silicon die? We’ve asked Intel for clarification, but we were told to await more information when the processor is launched. Answers on a tweet @IanCutress, please.

So with the image of the silicon out of the way, here are the three parts that Intel is planning to launch. As before, all processors support hyperthreading.

Skylake-X Processors (High Core Count Chips)
  Core i9-7940X Core i9-7960X Core i9-7980XE
Cores/
Threads
14/28 16/32 18/36
Clocks TBD
L3 TBD
PCIe Lanes TBD
(Likely 44)
Memory Freq TBD
TDP TBD
Price $1399 $1699 $1999

As before, let us start from the bottom of the HCC processors. The Core i9-7940X will be a harvested HCC die, featuring fourteen cores, running in the same LGA2066 socket, and will have a tray price of $1399, mimicking the $100/core strategy as before, but likely being around $1449-$1479 at retail. No numbers have been provided for frequencies, turbo, power, DRAM or PCIe lanes, although we would expect DDR4-2666 support and 44 PCIe lanes, given that it is a member of the Core i9 family.

Next up is the Core i9-7960X, which is perhaps the name we would have expected from the high-end LCC processor. As with the 14-core part, we have almost no information except the cores (sixteen for the 7960X), the socket (LGA2066) and the price: $1699 tray ($1779 retail?). Reiterating, we would expect this to support at least DDR4-2666 memory and 44 PCIe lanes, but unsure on the frequencies.

The Core i9-7980XE sits atop of the stack as the halo part, looking down on all those beneath it. Like an unruly dictator, it gives nothing away: all we have is the core count at eighteen, the fact that it will sit in the LGA2066 socket, and the tray price at a rather cool $1999 (~$2099 retail). When this processor will hit the market, no-one really knows at this point. I suspect even Intel doesn’t know.

Analysis: Why Offer HCC Processors Now?

The next statement shouldn’t be controversial, but some will see it this way: AMD and ThreadRipper.

ThreadRipper is AMD’s ‘super high-end desktop’ processor, going above the eight cores of the Ryzen 7 parts with a full sixteen cores of their high-end microarchitecture. Where Ryzen 7 competed against Broadwell-E, ThreadRipper has no direct competition, unless we look at the enterprise segment.

Just to be clear, Skylake-X as a whole is not a response to ThreadRipper. Skylake-X, as far as we understand, was expected to be LCC only: up to 12 cores and sitting happy. Compared to AMD’s Ryzen 7 processors, Intel’s Broadwell-E had an advantage in the number of cores, the size of the cache, the instructions per clock, and enjoyed high margins as a result. Intel had the best, and could charge more. (Whether you thought paying $1721 for a 10-core BDW-E made sense compared to a $499 8-core Ryzen with fewer PCIe lanes, is something you voted on with your wallet). Pretty much everyone in the industry, at least the ones I talk to, expected more of the same. Intel could launch the LCC version of Skylake-X, move up to 12-cores, keep similar pricing and reap the rewards.

When AMD announced ThreadRipper at the AMD Financial Analyst Day in early May, I fully suspect that the Intel machine went into overdrive (if not before). If AMD had a 16-core part in the ecosystem, even at a lower 5-15% IPC to Intel, it would be likely that Intel with 12-cores might not be the halo product anymore. Other factors come into play of course, as we don’t know all the details of ThreadRipper such frequencies, or the fact that Intel has a much wider ecosystem of partners than AMD. But Intel sells A LOT of its top-end HEDT processor. I wouldn’t be surprised if the 10-core $1721 part was the bestselling Broadwell-E processor. So if AMD took that crown, Intel would lose a position it has held for a decade.

So imagine the Intel machine going into overdrive. What would be going through their heads? Competing in performance-per-dollar? Pushing frequencies? Back in the days of the frequency race, you could just slap a new TDP on a processor and just bin harder. In the core count race, you actually need physical cores to provide that performance, if you don’t have 33%+ IPC difference. I suspect the only way in order to provide a product in the same vein was to bring the HCC silicon to consumers.

Of course, I would suspect that inside Intel there was push back. The HCC (and XCC) silicon is the bread and butter of the company’s server line. By offering it to consumers, there is a chance that the business Intel normally gets from small and medium businesses, or those that buy single or double-digit numbers of systems, might decide to save a lot of money by going the consumer route. There would be no feasible way for Intel to sell HCC-based processors to end-users at enterprise pricing and expect everyone to be happy.

Knowing what we know about working with Intel for many years, I suspect that the HCC was the most viable option. They could still sell a premium part, and sell lots of them, but the revenue would shift from enterprise to consumer. It would also knock back any threat from AMD if the ecosystem comes into play as well.

As it stands, Intel has two processors lined up to take on ThreadRipper: the sixteen-core Core i9-7960X at $1699, and the eighteen-core Core i9-7980XE at $1999. A ThreadRipper design is two eight-core Zeppelin silicon designs in the same package – a single Zeppelin has a TDP of 95W at 3.6 GHz to 4.0 GHz, so two Zeppelin dies together could have a TDP of 190W at 3.6 GHz to 4.0 GHz, though we know that AMD’s top silicon is binned heavy, so it could easily come down to 140W at 3.2-3.6 GHz. This means that Intel is going to have to compete with those sorts of numbers in mind: if AMD brings ThreadRipper out to play at around 140W at 3.2 GHz, then the two Core i9s I listed have to be there as well. Typically Intel doesn’t clock all the HCC processors that high, unless they are the super-high end workstation designs.

So despite an IPC advantage and an efficiency advantage in the Skylake design, Intel has to ply on the buttons here. Another unknown is AMD’s pricing. What would happen if ThreadRipper comes out at $999-$1099?  

But I ask our readers this:

Do you think Intel would be launching consumer grade HCC designs for HEDT if ThreadRipper didn’t exist?

For what it is worth, kudos all around. AMD for shaking things up, and Intel for upping the game. This is what we’ve missed in consumer processor technology for a number of years.

(To be fair, I predicted AMD’s 8-core to be $699 or so. To see one launched at $329 was a nice surprise).

I’ll add another word that is worth thinking about. AMD’s ThreadRipper uses a dual Zeppelin silicon, with each Zeppelin having two CCXes of four cores apiece. As observed in Ryzen, the cache-to-cache latency when a core needs data in other parts of the cache is not consistent. With Intel’s HCC silicon designs, if they are implementing a dual-ring bus design, also have similar issues due to the way that cores are grouped. For users that have heard of NUMA (non-unified memory access), it is a tricky thing to code for and even trickier to code well for, but all the software that supports NUMA is typically enterprise grade. With both of these designs coming into consumer, and next-to-zero NUMA code for consumer applications (including games), there might be a learning period in performance. Either that or we will see software pinning itself to particular groups of cores in order to evade the issue entirely.

Announcement One: Low Core Count Skylake-X Processors Announcement Three: Skylake-X's New L3 Cache Architecture
Comments Locked

203 Comments

View All Comments

  • shady28 - Tuesday, May 30, 2017 - link

    Looks like a marketing stunt to me. I welcome the 6c/12t part, but most applications can't even effectively use 4c/8t processors. It is a complete waste for 99% of buyers and even the remaining 1% are likely to rarely see a benefit.
  • Maleorderbride - Tuesday, May 30, 2017 - link

    Your statement just betrays your ignorance and your lack of imagination. Computers are tools for quite a few people, so they will pay considerable sums for better tools which in turn earn them more money.

    Video editing and 3D work can and will use all cores. While I am not going to claim they are a large percentage of the market, they routinely purchase 8/10 core options. I have quite a few customers running X99 boards with a single E5-2696 V4 dropped in ($1400 on ebay) and it excels in some workflows.

    They are not "rarely" use these extra cores--they are using them every single day and it is the primary reason for purchase.
  • shady28 - Tuesday, May 30, 2017 - link


    Lol! The childish insults aside, you think those thoughts you regurgitated are new? Professional video editors make a tiny fraction of a tiny fraction of the market, and if they are smart they aren't using CPUs for much. Most people who profess this 'need' to do 3D video editing are playing anyway, not working. Like I already said, a fraction of a 1% use case.

    Common sense says Intel did not release these for the 0.1% of users who might be able to take advantage of it. They released it to make suckers of the other 99.9%. Your comments indicate they are once again succeeding.
  • Maleorderbride - Wednesday, May 31, 2017 - link

    Your post made a claim about 100% of the market. Obviously you over-claimed. You can't edit posts here, so your "like I said," followed by a watered down version of your post is just a transparent attempt to save your ego. Your assumptions about whether people who claim to be video editors are really "working" is irrelevant.

    As for blaming video professionals for even using a CPU, you obviously are unaware that some codecs are entirely CPU bound when transcoding, and that these professionals (DITs especially) are under pressure to complete transcodes as quickly as possible on location. Every other person there is waiting for them.

    Are many things GPU accelerated? Yes, but being "smart" has nothing to do with it. Sometimes one can use those 2x 1080 Ti's, but sometimes you need 18+ cores, or both. But I guess you got me, I'm a "sucker" if I buy the best tool for a job that makes money.
  • shady28 - Friday, June 2, 2017 - link

    First sentence in your post is a lie, else you're reading comprehension is challenged. My first post is just a few lines up, it said :
    "It is a complete waste for 99% of buyers and even the remaining 1% are likely to rarely see a benefit."
  • prisonerX - Wednesday, May 31, 2017 - link

    You use applications that are highly parallel everyday and you don't even know it. Maleorderbride is right: you're ignorant and unimaginative.
  • Meteor2 - Saturday, June 3, 2017 - link

    No shady28 is correct here. People who *truly* need HCC on desktop are a vanishingly small minority. This is about headlines and marketing.
  • Namisecond - Wednesday, May 31, 2017 - link

    Welcome to the 1%?
  • helvete - Friday, September 8, 2017 - link

    Have you ever tried to run more than one application at a time? /s
  • Bulat Ziganshin - Tuesday, May 30, 2017 - link

    i can give you details about avx-512 - they are pretty obvious from analysis of skylake execution ports. so

    1) avx-512 is mainly single-issue. all the avx commands that now are supported BOTH on port 0 & port 1, will become avx-512 commands supported on joined port 0+1

    2) a few commands that are supported only on port 5 (this are various bit exchanges), will be also single-issued in avx-512, which still means doubled perfromance - from single-issued avx-256 to single-issued avx-512

    3) a few commands that can be issued on any of 3 ports (0,1,5), including booleans and add/sub/cmp - so-lcalled PADD group, will be double-issued in avx-512, so they will get 33% uplift

    overall, ports 0&1 will join when executing 512-bit commands, while port 5 is extended to 512-bit operands. joined port 0&1 can execute almost any avx-512 command, except for a bit exchange ones, port 5 can execute bit exchanges and PADD group

    when going from sse to avx, intel sacrificed easy of programming for easy of hardware implemenation, resulting in almost fuull lack of commands that can exchane data between upper&lower parts of ymm register. avx-512 was done right, but this means that bit exchange commands require a full 512-bit mesh. so, intel mobed all these commands to port 5 providing full 512 bit implementation, while most remaining commands were moved into ports 0&1 where 512-bit command can be implemented as simple pair of 256-bit ones

    lloking at power budgets, it's obvious that simple doubling of execution resources (i.e. support of 512 bit commands instead of 256-bit ones) is impossible. in previous cpu generation, even avx commands increased energy usage by 40%, so it's easy to predict that extending each executed command to 512 bits will require another 80% increase

    of course, m/a analysis can't say anything about commands absent in avx2 set, so my guess that predicate register manipulations will also go to port 5, just to make the m/a a bit less asymmetric

    also it's easy to predict that in the next generations the first "improvement" will be to add FMAD capability to port 5, further doubling the marketing perfromance figures

    finally, their existing 22-core cpus are already perfrom more than SP teraflop, but this time teraflop will go into HEDT class (while 10 broadwell cores at 3 GHz are only 0.9 tflops capable)

Log in

Don't have an account? Sign up now