Announcement Three: Skylake-X's New L3 Cache Architecture

(AKA I Like Big Cache and I Cannot Lie)

SKU madness aside, there's more to this launch than just the number of cores at what price. Deviating somewhat from their usual pattern, Intel has made some interesting changes to several elements of Skylake-X that are worth discussing. Next is how Intel is implementing the per-core cache.

In previous generations of HEDT processors (as well as the Xeon processors), Intel implemented an three stage cache before hitting main memory. The L1 and L2 caches were private to each core and inclusive, while the L3 cache was a last-level cache covering all cores and that also being inclusive. This, at a high level, means that any data in L2 is duplicated in L3, such that if a cache line is evicted into L2 it will still be present in the L3 if it is needed, rather than requiring a trip all the way out to DRAM. The sizes of the memory are important as well: with an inclusive L2 to L3 the L3 cache is usually several multiplies of the L2 in order to store all the L2 data plus some more for an L3. Intel typically had 256 kilobytes of L2 cache per core, and anywhere between 1.5MB to 3.75MB of L3 per core, which gave both caches plenty of room and performance. It is worth noting at this point that L2 cache is closer to the logic of the core, and space is at a premium.

With Skylake-X, this cache arrangement changes. When Skylake-S was originally launched, we noted that the L2 cache had a lower associativity as it allowed for more modularity, and this is that principle in action. Skylake-X processors will have their private L2 cache increased from 256 KB to 1 MB, a four-fold increase. This comes at the expense of the L3 cache, which is reduced from ~2.5MB/core to 1.375MB/core.

With such a large L2 cache, the L2 to L3 connection is no longer inclusive and now ‘non-inclusive’. Intel is using this terminology rather than ‘exclusive’ or ‘fully-exclusive’, as the L3 will still have some of the L3 features that aren’t present in a victim cache, such as prefetching. What this will mean however is more work for snooping, and keeping track of where cache lines are. Cores will snoop other cores’ L2 to find updated data with the DRAM as a backup (which may be out of date). In previous generations the L3 cache was always a backup, but now this changes.

The good element of this design is that a larger L2 will increase the hit-rate and decrease the miss-rate. Depending on the level of associativity (which has not been disclosed yet, at least not in the basic slide decks), a general rule I have heard is that a double of cache size decreases the miss rate by the sqrt(2), and is liable for a 3-5% IPC uplift in a regular workflow. Thus here’s a conundrum for you: if the L2 has a factor 2 better hit rate, leading to an 8-13% IPC increase, it’s not the same performance as Skylake-S. It may be the same microarchitecture outside the caches, but we get a situation where performance will differ.

Fundamental Realisation: Skylake-S IPC and Skylake-X IPC will be different.

This is something that fundamentally requires in-depth testing. Combine this with the change in the L3 cache, and it is hard to predict the outcome without being a silicon design expert. I am not one of those, but it's something I want to look into as we approach the actual Skylake-X launch.

More things to note on the cache structure. There are many ‘ways’ to do it, one of which I imagined initially is a partitioned cache strategy. The cache layout could be the same as previous generations, but partitions of the L3 were designated L2. This makes life difficult, because then you have a partition of the L2 at the same latency of the L3, and that brings a lot of headaches if the L2 latency has a wide variation. This method would be easy for silicon layout, but hard to implement. Looking at the HCC silicon representation in our slide-deck, it’s clear that there is no fundamental L3 covering all the cores – each core has its partition. That being the case, we now have an L2 at approximately the same size as the L3, at least per core. Given these two points, I fully suspect that Intel is running a physical L2 at 1MB, which will give the design the high hit-rate and consistent low-latency it needs. This will be one feather in the cap for Intel.

Announcement Two: High Core Count Skylake-X Processors Announcement Four: The Other Stuff (AVX-512, Favored Core)
Comments Locked

203 Comments

View All Comments

  • SaturnusDK - Thursday, June 1, 2017 - link

    The infinity fabric seems to be working fine with minimal scaling performance loss for the Ryzen chips already on the market so there's no reason to believe that extending the bus will incur a severe performance penalty.
  • rocky12345 - Thursday, June 1, 2017 - link

    I got to ask Anandtech site gives all of this love to Intel for releasing products we already expected except for the 18/36 CPU (Thanks AMD for getting fire under Intel's butt again). What I am saying is there are at least three headlines for the Intel crap but one little byline for AMD's threadripper crap. I like Anandtech and all but AMD's release is way more important to the industry than this Intel release because of it were not for AMD new CPU line Intel would have just once more released a ho hum product with little extra to offer and probably $500 or more than the prices they are now asking. Give credit where credit is needed. You say new stuff in the industry does not excite you much anymore. Well for me and hopefully anyone else with a brain are more excited for the New AMD tech than this rehashed Intel tech. Thanks
  • KalliMan - Friday, June 2, 2017 - link

    There is a "small" mistake here. The price of 1800X is now ~ 429-449. You are comparing 2 CPUS with that belong to completely different price ranges( 1800X is 150- 170 $ cheaper than 7872X) . And be sure in Multitasking it will be superior.
  • cekim - Friday, June 2, 2017 - link

    To all those prattling on about how such processors have no market or purpose, I direct your attention to ebay... clearly you are wrong. The question is not whether there is a market for consumer HCC chips, the question is what that market is willing to pay for them?
  • alpha754293 - Friday, June 2, 2017 - link

    re: the whole AMD vs. Intel thing all over again

    I'm not worried about AMD as a threat at all.

    Their latest processor, on some workloads, still barely beats an Intel Core i5(!) or can only beat some of the mid-range Core i7s at best.

    I've long been an "AMD guy" because they used to be a value proposition - where you can get decent performance at a much lower price compared to the Intel counterparts.

    But times have changed and that isn't really quite the case anymore. AMD CPUs really aren't that much cheaper compared to Intel's, but Intel's CPUs perform SIGNIFICANTLY better than AMD (mostly because AMD went the way of the UltraSPARC Niagara T1, by having only ONE FPU shared across multiple ALUs) - and of course, the problem with THAT design idea/approach is that fundamentally, CPUs are massively glorified calculators.

    And AMD choose to cripple their product's ability to do calculations.

    People have a tendency to want to focus on IPCs (as it is here). But really, you need both IPC AND FLOPs and a MUCH BETTER metric to compare against is FLOP/clock (because it tells you about the processor's computational efficiency), which almost NO one writes about anymore.

    I'm already running 16 cores acrossed three systems and I just make the requisition for a 64-core system.

    The "thing" that I have found/discovered with systems that have lots and lots of cores is that you REALLY WANT, should, and NEED to have ECC RAM because if you try to get it to do many things at once, in order to prevent issues with the programs interfering with each other, the ECC is a patch-style method that can help correct some of that.

    When I've launched 6 runs of a computationally intensive task at once, some of them fail because my current systems don't have ECC Registered RAM (and I am not sure if the CPU knows what to do with it (being that the memory controller is on-die) and to deal with and work with memory coherency.

    While it might be a welcome changed on the ultra high end, extreme enthusiast front, you can get a system that does a LOT more for a LOT less than what it would cost you to use these processors by using server grade hardware (albeit used - which, in my opinion, if it still works, why not? I don't see anything "wrong" with that.)

    A system using the new 16-core CPU is likely going to run you between $3000-5000. The system that I just bought has 64 cores (four times more) with 512 GB of RAM for the same price.
  • Meteor2 - Saturday, June 3, 2017 - link

    Literally TL;DR.
  • Lolimaster - Saturday, June 3, 2017 - link

    If you mean low threaded then you need to look at the Ryzen 5 1400-1500X which is 90% of the i7 7700 and its obviously "better" than the top of the line Ryzen at "some workloads, mean lower thread apps/games",

    $160-190, rip intel.
  • Gothmoth - Sunday, June 4, 2017 - link

    so much words for trolling.... you took the time to write so much but when it comes to what you supposedly bought you suddenly become unspecific.... no letters and words to write it out you can only say "The system that I just bought ".
  • twtech - Friday, June 2, 2017 - link

    So what are some common applications for this many cores? Rendering, compiling large C++ projects like Unreal 4 for example. It may not be huge, but there is a market for more cores, and Intel doesn't want AMD taking all of it.
  • slickr - Saturday, June 3, 2017 - link

    So not only are they introducing less cores overall than AMD's threadripper at 32/64, they also cost a ton more money, require a new socket, it features locked overclocking and they cost more than AMD's equivalents.

    Intel really do have nothing, they announced their 14/16/18 cores, but they have no info on them, meaning it was a last minute thing, where they would only be available late 2017, but they have nothing else to go against AMD, so they are playing a move to trick people into thinking they have products up and coming soon, when they don't.

Log in

Don't have an account? Sign up now