The Intel Skylake Mobile and Desktop Launch, with Architecture Analysis

Name: The Intel Skylake Mobile and Desktop Launch, with Architecture Analysis
Item: The Intel Skylake Mobile and Desktop Launch, with Architecture Analysis
Author: Dr. Ian Cutress

by Ian Cutress on September 1, 2015 11:05 PM EST

Posted in
CPUs
Intel
Core M
Skylake
eDRAM

173 Comments | Add A Comment

173 Comments

The Claims

As with any launch, there are numbers abound from Intel to explain how the performance and experience of Skylake is better than previous designs as well as the competition.

As with Haswell and Broadwell, Intel is implementing a mobile first design with Skylake. As with any processor development structure the primal idea is to focus on one power point as being the most efficient and extend that efficiency window as far in either direction as possible. During IDF, Intel stated that having an efficiency window from 4.5W to 91W is a significant challenge, to which we agree, as well as improving both performance and power consumption over Broadwell at each stage.

Starting at 4.5W, we spoke extensively with parts of Intel at IDF due to our Broadwell-Y coverage. From their perspective Broadwell-Y designs were almost too wide ranging, especially for what is Intel’s premium low-power high performance product, and for the vendors placing it in an ill-defined chassis far away from Intel’s recommended designs gave concern to the final performance and user experience. As a result, Intel’s guidelines to OEMs this generation are tightened so that the designers looking for the cheaper Core M plastic implementations can tune their design to get the best out of it. Intel has been working with a few of these (both entry Core M and premium models) to enact the user experience model.

Overall however, Intel is claiming 40% better graphics performance for Core M with the new Generation 9 (Gen9) implementation, along with battery saving and compatibility with new features such as RealSense. Because Core-M will find its way into products from tablets to 2-in-1s and clamshells, we’ve been told that the Skylake design should hit a home-run against the best-selling tablets in the market, along with an appropriate Windows 10 experience. When we get units in to review, we will see what the score is from our perspective on that one.

For the Skylake-Y to Skylake-U transition (and in part, Skylake-H), Intel is claiming a 60% gain in efficiency over Haswell-U. This means either 60% less active power during media consumption or 60% more CPU performance at the same power (measured by synthetics, specifically SPECint_base_rate2006). The power consumption metrics comes from updates relating to the Gen9 graphics, such as multi-plane overlay and fixed-function decoders, as well as additional power/frequency gating between the unslice and slices. We will cover this later in the review. The GPU itself, due to the new functionality, is claiming 40% better graphics performance for Core M during 3DMark synthetic tests.

While not being launched today, Intel’s march on integrated graphics is also going to continue. With the previous eDRAM parts, Intel took the crown for absolute IGP performance from AMD, albeit being in a completely different price band. With Skylake, the introduction of a 4+4e model means that Intel’s modular graphics design will now extend from GT1 to GT4, where GT4e has 72 execution units with 128MB of eDRAM in tow. This leads to the claim that GT4e is set to match/beat a significant proportion of the graphics market today.

Back in our Skylake-K review, we were perhaps unimpressed with the generational gain in clock-for-clock performance, although improved multi-threading and frequency ranges helped push the out-of-the-box experience. The other side of that performance is the power draw, and because Skylake is another mobile-first processor, the power aspect becomes important down in mobile devices. We will go through some of these developments to improve power consumption in this article.

The Intel Skylake Launch The Skylake Package: High Level Core and Power Delivery

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

173 Comments

View All Comments

Xenonite - Thursday, September 3, 2015 - link
Actually, it seems that power consumption is the only thing that matters to consumers, even on the desktop.
All this talk about AMD's lack of competition being the reason why we aren't seeing meaningful generational performance improvements is just that: talk.

The real thing that hampers performance progress is consumers' plain refusal to upgrade for performance reasons (even a doubling in performance is not economically viable to produce since no one, except for me it seems, will buy it).
Consumers only buy the lowest power system that they can afford. It has nothing to do with AMD.
Even if AMD released a CPU that is 4x faster than piledriver, it wouldn't change Intel's priority (nor would it help AMD's sales...).
IUU - Wednesday, September 2, 2015 - link
Sorry for my tone , but "I'm failing to see", how transistor count don't mean more to consumers than to anyone else.
So, after 10 years of blissful carelessness(because duuude it's user experience dat matters, ugh..),
you will have everyone deceiving you on what they offer on the price point they offer. Very convenient, especially if they are not able to sustain an exponential increase in performance and passing to the next paradigm to achieve it.

Because untill very recently we have been seeing mostly healthy practices, despite the fact that you could always meet people pointing to big or small sins.
Big example, What's the need of an igp on a processor that consumes 90 watts, especially a gpu that is tragically subpar? To hide the fact they have nothing more to offer to the consumer, cpu dependent, at 90 watts(at the current market situation) and have an excuse for charging more on a
theoretically higher consuming and "higher performing" cpu?
Because, what bugs me is what if 6700k lacked the igp? Would it perform better without a useless igp dragging it down? I really don't know, but I feel it wouldn't.
Regarding the mobile solutions and the money and energy limited devices, the igp could really prove to be useful to a lot of people, without overloading their device with a clunky, lowly, discrete gpu.
xenol - Wednesday, September 2, 2015 - link
If the 6700K lacked the iGPU with no other modifications, it would perform exactly the same.
MrSpadge - Wednesday, September 2, 2015 - link
Yes, it would perform exaclty the same (if the iGPU is not used, otherwise it needs memory bandwidth). But the chip would run hotter since it would be a lot smaller. Si is not the best thermal conductor, but the presence of the iGPU spreads the other heat producers a bit.
xenol - Wednesday, September 2, 2015 - link
I don't think that's how thermals in ICs work...
MrSpadge - Wednesday, September 2, 2015 - link
Thermodynamics "work" and don't care if they're being applied to an IC or a metal brick. Silicon is a far better heat conductor than air, so even if the GPU is not used, it will transfer some of the heat from the CPU + Uncore to the heat spreader.

My comment was a bit stupid, though, in the way that given how tightly packed the CPU cores and the uncore are, the GPU spreads none of them further apart from each other. It could have been designed like that, but according to the picture on one of first few pages it's not.
Xenonite - Thursday, September 3, 2015 - link
No, it wouldn't. You could easily spread out the cores by padding them with much more cache and doubling their speculative and parallel execution capabilities. If you up the power available for such out of order execution, the additional die space could easily result in 50% more IPC throughput.
MrSpadge - Thursday, September 3, 2015 - link
50% IPC increase? Go ahead and save AMD, then! They've been trying that for years with probably billions of R&D budget (accumulated over the years), yet their FX CPUs with huge L3 don't perform significantly better than the APUs with similar CPU cores and no L3 at all.
Xenonite - Thursday, September 3, 2015 - link
Yes, but I specifically mentioned using that extra cache to feed the greater amount of speculative execution units made available by the removal of the iGPU.

Sadly, AMD can't use this strategy because Global Foundaries' and TSMC's manufacturing technology cannot fit the same amount of transistors into a given area, as Intel's can.
Furthermore, their yields for large dies are also quite a bit lower and AMD really doesn't have the monetary reserves to produce such a high-risk chip.

Also, the largest fraction of that R&D budget went into developing smaller, cheaper and lower power processors to try and enter the mobile market, while almost all of the rest went into sacrificing single threaded design (such as improving and relying more on out of order execution, branch prediction and speculative execution) to design Bulldozer-like, multi-core CPUs (which sacrifice a large portion of die area, that could have been used to make a low amount of very fast cores, to implement a large number of slow cores).

Lastly, I didn't just refer to L3 cache when I suggested using some of the free space left behind by the removal of the iGPU to increase the amount of cache. The L1 and L2 caches could have been made much larger, with more associativity to further reduce the amount and duration of pipeline stalls, due to not having a data dependancy in the cache.
Also, while it is true that the L3 cache did not make much of a difference in the example you posted, its also equally true that cache performance becomes increasingly important as a CPU's data processing throughput increases.
Modern CPU caches just seem to have stagnated (aside from some bandwidth inprovements every now and then), because our CPU cores haven't seen that much of a performance upgrade since the last time the caches have been improved.
Once a CPU gets the required power and transistor budgets for improved out of order performance, the cache will need to be large enough to hold all the different datasets that a single core is working on at the same time (which is not a form a multi-threading in case you were wondering), while also being fast enough to service all of those units at once, without adversely affecting any one set of calculations.
techguymaxc - Wednesday, September 2, 2015 - link
Your representation of Skylake's CPU/IPC performance is inaccurate and incomplete due to the use of the slowest DDR4 memory available. Given the nature of DDR4 (high bandwidth, high latency), it is an absolute necessity to pair the CPU with high clockspeed memory to mitigate the latency impairment. Other sites have tested with faster memory and seen a much larger difference between Haswell and Skylake. See Hardocp's review, (the gaming section specifically) as well as Techspot's review (page 13, memory speed comparison). Hardocp shows Haswell with 1866 RAM is actually faster than Skylake with 2133 RAM in Unigine Heaven and Bioshock Infinite @ lowest quality settings (to create a CPU bottleneck). I find Techspot's article particularly interesting in that they actually tested both platforms with fast RAM. In synthetic testing (Sandra 2015) Haswell with 2400 DDR3 has more memory bandwidth than Skylake with 2666 DDR4, it is not until you pair Skylake with 3000 DDR4 that it achieves more memory bandwidth than Haswell with 2400 DDR3. You can see here directly the impact that latency has, even on bandwidth and not just overall performance. Furthermore in their testing, Haswell with 2400 RAM vs. Skylake with 3000 RAM shows Haswell being faster in Cinebench R15 multi-threaded test (895 vs. 892). Their 7-zip testing has Haswell leading both Skylake configurations in a memory-bound workload (32MB dictionary) in terms of instructions per second. Finally, in a custom Photoshop workload Haswell's performance is once again sandwiched between the two Skylake configurations.

Clearly both Haswell and Skylake benefit from faster memory. In fact, Skylake should ideally be paired with > 3000 DDR4 as there are still scenarios in which it is slower than Haswell with 2400 DDR3 due to latency differences.

Enthusiasts are also far more likely to buy faster memory than the literal slowest memory available for the platform, given the minimal price difference. Right now on Newegg one can purchase a 16GB DDR3 2400 kit (2x8) for $90, a mere $10 more than an 1866 16GB kit. With DDR4 the situation is only slightly worse. The cheapest 16GB (2x8) 2133 DDR4 kit is $110, and 3000 goes for $135. It is also important to note that these kits have the same (primary) timings with a CAS latency of 15.

So now we come to your reasoning for pairing Skylake with such slow RAM, and that of other reviewers, as you are not the only one to have done this. Intel only qualified Skylake with DDR4 up to 2133 MT/s. Why did they do this? To save time and money during the qualification stage leading up to Skylake's release. It is not because Skylake will not work with faster RAM, there isn't an unlocked Skylake chip in existence that is incapable of operating with at least 3000 RAM speed, and some significantly higher. Hardocp was able to test their Skylake sample (with no reports of crashing or errors) with the fastest DDR4 currently available today, 3600 MT/s. I have also heard anecdotally from enthusiasts with multiple samples that DDR4 3400-3600 seems to be the sweet spot for memory performance on Skylake.

In conclusion, your testing method is improperly formed, when considered from the perspective of an enthusiast whose desire is to obtain the most performance from Skylake without over-spending. Now, if you believe your target audience is not in fact the PC enthusiast but instead a wider "mainstream" audience, I think the technical content of your articles easily belies this notion.

The Intel Skylake Mobile and Desktop Launch, with Architecture Analysis

The Claims

Post Your Comment

173 Comments

View All Comments

Xenonite - Thursday, September 3, 2015 - link

IUU - Wednesday, September 2, 2015 - link

xenol - Wednesday, September 2, 2015 - link

MrSpadge - Wednesday, September 2, 2015 - link

xenol - Wednesday, September 2, 2015 - link

MrSpadge - Wednesday, September 2, 2015 - link

Xenonite - Thursday, September 3, 2015 - link

MrSpadge - Thursday, September 3, 2015 - link

Xenonite - Thursday, September 3, 2015 - link

techguymaxc - Wednesday, September 2, 2015 - link

Log in

Don't have an account? Sign up now