Announcement Three: Skylake-X's New L3 Cache Architecture

(AKA I Like Big Cache and I Cannot Lie)

SKU madness aside, there's more to this launch than just the number of cores at what price. Deviating somewhat from their usual pattern, Intel has made some interesting changes to several elements of Skylake-X that are worth discussing. Next is how Intel is implementing the per-core cache.

In previous generations of HEDT processors (as well as the Xeon processors), Intel implemented an three stage cache before hitting main memory. The L1 and L2 caches were private to each core and inclusive, while the L3 cache was a last-level cache covering all cores and that also being inclusive. This, at a high level, means that any data in L2 is duplicated in L3, such that if a cache line is evicted into L2 it will still be present in the L3 if it is needed, rather than requiring a trip all the way out to DRAM. The sizes of the memory are important as well: with an inclusive L2 to L3 the L3 cache is usually several multiplies of the L2 in order to store all the L2 data plus some more for an L3. Intel typically had 256 kilobytes of L2 cache per core, and anywhere between 1.5MB to 3.75MB of L3 per core, which gave both caches plenty of room and performance. It is worth noting at this point that L2 cache is closer to the logic of the core, and space is at a premium.

With Skylake-X, this cache arrangement changes. When Skylake-S was originally launched, we noted that the L2 cache had a lower associativity as it allowed for more modularity, and this is that principle in action. Skylake-X processors will have their private L2 cache increased from 256 KB to 1 MB, a four-fold increase. This comes at the expense of the L3 cache, which is reduced from ~2.5MB/core to 1.375MB/core.

With such a large L2 cache, the L2 to L3 connection is no longer inclusive and now ‘non-inclusive’. Intel is using this terminology rather than ‘exclusive’ or ‘fully-exclusive’, as the L3 will still have some of the L3 features that aren’t present in a victim cache, such as prefetching. What this will mean however is more work for snooping, and keeping track of where cache lines are. Cores will snoop other cores’ L2 to find updated data with the DRAM as a backup (which may be out of date). In previous generations the L3 cache was always a backup, but now this changes.

The good element of this design is that a larger L2 will increase the hit-rate and decrease the miss-rate. Depending on the level of associativity (which has not been disclosed yet, at least not in the basic slide decks), a general rule I have heard is that a double of cache size decreases the miss rate by the sqrt(2), and is liable for a 3-5% IPC uplift in a regular workflow. Thus here’s a conundrum for you: if the L2 has a factor 2 better hit rate, leading to an 8-13% IPC increase, it’s not the same performance as Skylake-S. It may be the same microarchitecture outside the caches, but we get a situation where performance will differ.

Fundamental Realisation: Skylake-S IPC and Skylake-X IPC will be different.

This is something that fundamentally requires in-depth testing. Combine this with the change in the L3 cache, and it is hard to predict the outcome without being a silicon design expert. I am not one of those, but it's something I want to look into as we approach the actual Skylake-X launch.

More things to note on the cache structure. There are many ‘ways’ to do it, one of which I imagined initially is a partitioned cache strategy. The cache layout could be the same as previous generations, but partitions of the L3 were designated L2. This makes life difficult, because then you have a partition of the L2 at the same latency of the L3, and that brings a lot of headaches if the L2 latency has a wide variation. This method would be easy for silicon layout, but hard to implement. Looking at the HCC silicon representation in our slide-deck, it’s clear that there is no fundamental L3 covering all the cores – each core has its partition. That being the case, we now have an L2 at approximately the same size as the L3, at least per core. Given these two points, I fully suspect that Intel is running a physical L2 at 1MB, which will give the design the high hit-rate and consistent low-latency it needs. This will be one feather in the cap for Intel.

Announcement Two: High Core Count Skylake-X Processors Announcement Four: The Other Stuff (AVX-512, Favored Core)
Comments Locked

203 Comments

View All Comments

  • Strunf - Tuesday, May 30, 2017 - link

    "even if manages to take tangible lead against a 16 core threadripper, it will not be worth the money." on this market niche money means nothing... AMD needs to have a 10%+ performance advantage to be considered cause Intel has a much better brand value, if anything the 16 core Threadripper is a desperate attempt by AMD to actually gain some traction on the HEDT.
    About the thermal limit, yes there's a wall but with the new Turbo the two best cores of a CPU can be clocked higher than the rest and hence give you better single thread performance when need, this is the future no doubt about it.

    You guys need to realize it's not cause AMD releases a product that is better on all metrics that everyone will shift to AMD, brand value counts and in the case of CPU the motherboard matters too, sure AMD has some nice motherboards but overall the Intel motherboards seem to be better furnished albeit at a higher cost.
  • ddriver - Tuesday, May 30, 2017 - link

    Sure, intel's high prices are justified by several things:

    corporate brand loyalty
    amd's limited production capacity
    fanboyism

    But all in all, money is EVERYTHING, the whole industry cares primarily about one thing, and that's profit. There is absolutely no good reason to pay 100% more for 10% more. I mean not unless someone else does the actual paying.

    Only an idiot would care about "brand value". Computers are supposed to do work, not make up for your poor self-esteem. Any intelligent person who needs performance would put his money where he'd get the most bang for the buck. Workstation grade workloads render particularly well to multithreading but also to clustering. So if you want more performance, the smart solution is to aim for the best price/performance product, and get a lot of it, rather than getting the single most expensive product.

    AMD is not desperately trying anything. It's desktop line pretty much annihilated intel's existing HEDT offerings at significantly lower price points. It is intel desperately trying to not lose the HEDT market to AMD's mainstream offerings. They'd rather throw in a couple of extra cores even if it makes zero sense, just to not disillusion their fanboys.

    I am not speaking of any brand loyalty point, I have like 70 active systems and they all run intel CPUs. I am however very happy and eager to diversify and replace most of those, which are aging 3770k chips with something that offers higher performance and better power/performance ratio.
  • Hxx - Tuesday, May 30, 2017 - link

    well, first off general comments have no place in the tech industry due to the variety of use cases and products. Folks care about brand value on certain items , say motherboard brands but not so much maybe on CPUs.
    Second, AMD did not annihilate Intel by any stretch of the imagination. where do u guys get this info? Probably from wccftech.com . Anyway their ryzen release is solid but they need cpus with higher IPCs or higher than Intel which they currently don't have.
    Third, I'm not sure what you mean by intel and desperately but there is nothing desperate about this current announcement. CPUs don't take 2 months to develop. Its not like Intel said in response to Ryzen "oh yeah? lets build a better cpu". these cpus have been fully developed and waiting retail release, maybe Ryzen pushed them to prioritize this release but these were not build as a "response to Ryzen" by any means.
  • ddriver - Tuesday, May 30, 2017 - link

    Ryzen offered intel's E series of performance at half the cost. That's twice the value. You don't need imagination, much less to stretch it, to realize that 100% better value is tad amount to annihilation. This is over-exponated by the fact it was mainstream CPUs against premium HEDT.

    And YES, it is desperation, because this product was never intended for HEDT, this is not a case of intel holding a trump card just in case amd finally decides to stop sitting on its hands. The 18 core chip was intended for server parts, and its arrival is exactly on time to be directly caused by the Ryzen launch. Intel simply too a server part with some deffective or disabled cores, in order to gain TDP headroom to boos the clocks of the remaining cores higher. It is not like intel sat down and "let's design a whole new chip in response to ryzen" - that would take significantly more time, they simply took a server part, crippled it a bit, overclocked it a bit, just so they can have a HEDT product with 2 more cores, and in doing so, sacrificing the amount of money they will make on that chip just to save face, as it would have been significantly more expensive as a xeon branded product.

    Had amd not launched ryzen, intel's current gen HEDT would have capped out at 12 cores. The 18 core solution is a last resort, last moment solution, and not too economically viable either. So yeah, it is desperation.

    But then again, expecting someone who cannot property format a paragraph to get common sense might be pushing it...
  • ddriver - Tuesday, May 30, 2017 - link

    Keep in mind had not intel sacrificed xeons to make that 18 core chip, its HEDT line would have been stuck at 12 cores, meaning that threadripper would have made intel look like a second-class CPU maker in that segment.

    So yes, it is quite literally burning money to save face for intel.
  • Kjella - Tuesday, May 30, 2017 - link

    Burning money? Ever since Bulldozer started lagging behind Intel has been printing money like crazy, this is just a return to normal profit margins because AMD is back on the field. Intel made $10 billion profit last year, I'm sure they'll survive this horrible "loss".
  • ddriver - Wednesday, May 31, 2017 - link

    The "desperation" is not for their survival, they survived the netburst fiasco when their product was marginally inferior.

    The desperation is to not look like a second grade choice in the HEDT market, thus sacrificing a much more profitable die to save face.
  • rocky12345 - Thursday, June 1, 2017 - link

    "The "desperation" is not for their survival, they survived the netburst fiasco when their product was marginally inferior."

    Back in Netburst days AMD was a lot better with what they had to offer. Heck AMD CPU running at 2000Mhz was able to keep up to or surpass a Pentium 4 @3.2Ghz. It only got worse when dual core Ahtlon's came about and Intel had to make the Pentium 4 D's but still running much much faster clock rate just to stay in the game. Very few people seem to remember Intel had a lot of bad years as well. Pentium 4 series all sucked Donkey Nutz nuf said.

    As others have said if AMD did not release Ryzen that competes nicely with Intel's HEDT platform at half the price then AMD say's oh we have a 16/32 Threadripper as well Intel would not be releasing the 18/36 CPU right now they would have kept that CPU in the Zeon line where they make the big bucks hell that 18 core is probably a cut down 20/40 Zeon retro fitted to be a X series chip. Anyways all this means it is good for us the consumers we get more choice and hopefully at a better price also.
  • ddriver - Friday, June 2, 2017 - link

    Don't forget that with AMD you get marginally better value. So even if the 18 core intel HEDT chip is tangibly faster than the top tier threadripper, for 2000$ AMD could get you a 32 core Epyc that will beat the 18 core in performance, and pretty much every other chip intel have at any price point.

    The 18 core number is also interesting as AMD's design is practically incapable of efficiently producing such a SKU, so even if intel don't get the fastest single chip, they will still be technically getting the performance crown in HEDT, albeit with a server chip they shoehorned there, and with a unique core count that AMD cannot exactly match, even if they can significantly outmatch.
  • Azethoth - Tuesday, May 30, 2017 - link

    Dude, you are missing an opportunity to really diss Intel here. Why just compare AMD to the last gen Intel chips from many years ago when you can go back decades!

    Compare to the pentium. Then you can claim that AMD annihilated Intel, scraped up the ashes then decimated those, then threw them in the microwave and nuked them before getting hookers to pee on the dust and leaving it out to blow around in the sun and wind!

    As it is your post is too weak to take seriously.

Log in

Don't have an account? Sign up now