A small portion of this page was posted as part of our initial Skylake-K coverage.

The High Level Core

Taking a high level view, the application of the Skylake-S/H architecture features results in the following:

From most perspectives, the central image provided by Intel could have been taken from a Broadwell presentation if you scrubbed out the ‘GT4’ element of the design, but the move to Skylake suggests improvements in each of these areas either in terms of functionality, bandwidth, power consumption, or all of the above.

Out of the box, the Skylake IA core is designed to extract instruction level parallelism from code, and the design allows for more operations to be dispatched, queued and retired in a single clock. We cover this more in the next page. But to augment this, the data transfer ring between cores, cache, graphics and the system agent is also increased in read bandwidth to 128b, allowing more data to be supplied to the cores.  In terms of the cores themselves, the big upgrades surround in-flight management of data, especially with respect to cache misses. We have been told that a number of these upgrades, while they might not be readily usable by current software, should have effects in certain vertical markets (finance, big data, etc).

From a memory connectivity standpoint, most processors will support both DDR3L and DDR4 with a dual memory controller design (or a single memory controller that supports both). Skylake-Y (Core M) is the exception here, and will only work in DDR3L/LPDDR3 modes. As always with multi-DRAM support designs, only one standard can be in use at any time.

On the PCI-Express Graphics allocation side, the Skylake-H and Skylake-K processors will have sixteen PCIe 3.0 lanes to use for directly attached devices to the processor, similar to Intel's previous generation processors. These can be split into a single PCIe 3.0 x16, x8/x8 or x8/x4/x4 with basic motherboard design. (Note that this is different to early reports of Skylake having 20 PCIe 3.0 lanes for GPUs. It does not.)

With this SLI will work up to x8/x8 and CrossFire to x8/x8 or x8/x4/x4 in both desktop and mobile designs. For desktops, if a motherboard supports x8/x4/x4 and a PCIe card is placed into that bottom slot, SLI will not work because only one GPU will have eight lanes. NVIDIA requires a minimum of PCIe x8 in order to enable SLI. Crossfire has no such limitation, which makes the possible configurations interesting. Below we discuss that the chipset has 20 (!) PCIe 3.0 lanes to use in five sets of four lanes, and these could be used for graphics cards as well. That means a motherboard can support x8/x8 from the CPU and PCIe 3.0 x4 from the chipset and end up with either dual-SLI or tri-CFX enabled when all the slots are populated.

For Skylake-U/Y, these processors are not typically paired with discrete graphics and as far as we can tell, the PCIe lanes have been removed from these lines. As a result, any storage based on PCIe (such as M.2) for devices based on these processors will be using the chipset PCIe lanes. As mentioned later, the chipsets on U/Y also differ to their more desktop oriented counterparts.

DMI 3.0

The Skylake-S and H processors are connected to the chipset by the four-lane DMI 3.0 interface. The DMI 3.0 protocol is an upgrade over the previous generation which used DMI 2.0 – this upgrade boosts the speed from 5.0 GT/s (2GB/sec) to 8.0 GT/s (~3.93GB/sec), essentially upgrading DMI from PCIe 2 to PCIe 3, but requires the motherboard traces between the CPU and chipset to be shorter (7 inches rather than 8 inches) in order to maintain signal speed and integrity. This also allows one of the biggest upgrades to the system, chipset connectivity, as shown below in the HSIO section.

Skylake-Y/H, by virtue of having the chipset on the processor package, can keep the interface between the CPU and IO very simple and uses what they call an OPIO – on package input-output. We have seen this as far back as Haswell to deliver 4GB/s of bandwidth at 1pJ/bit, and has been listed as being very power efficient as well as highly scalable.

CPU Power Delivery – Moving the FIVR

Moving on to power arrangements, with Skylake the situation changes as compared to Haswell. Prior to Haswell, voltage regulation was performed by the motherboard and the right voltages were then put into the processor. This was deemed inefficient for power consumption, and for the Haswell/Broadwell processors Intel decided to create a fully integrated voltage regulator (FIVR) in order to reduce motherboard cost and reduce power consumption. This had an unintended side-effect – while it was more efficient (good for mobile platforms), it also acted as a source of heat generation inside the CPU with high frequencies. As a result, overclocking was limited by temperatures and the quality of the FIVR led to a large variation in results. For Broadwell-Y, this also resulted in an increase for the z-height of the processor due to having leveling transistors on the rear of the package. As a result, in order to decrease the z-height of any corresponding product, a hole in the motherboard had to be forged:

For Skylake, the voltage regulation is moved back into the hands of the motherboard manufacturers. This should allow for cooler processors depending on how the silicon works, but it will result in slightly more expensive motherboards.

A slight indication of this will be that some motherboards will go back to having a large amount of multiplexed phases on the motherboard, and it will allow some manufacturers to use this as a differentiating point, although the usefulness of such a design is sometimes questionable.

The Claims: Performance and Power The Skylake Package: 100-Series Chipsets and DRAM
Comments Locked

173 Comments

View All Comments

  • prisonerX - Thursday, September 3, 2015 - link

    You can't make a purse out of a sow's ear.
  • BMNify - Saturday, September 5, 2015 - link

    "prisonerX - Thursday, September 03, 2015
    You can't make a purse out of a sow's ear."
    hmm
    https://libraries.mit.edu/archives/exhibits/purse/...
    Report: "On the Making of Silk Purses from Sows' Ears," 1921

    https://libraries.mit.edu/archives/exhibits/purse/...
  • extide - Saturday, September 5, 2015 - link

    Yeah that's because they are starting with a MUCH MUCH less refined architecture. It's easy to improve a ton on something that isnt as good to begin with... Plus, there is only so much you can do, you quickly run into diminishing returns.

    This is seriously like one of the simplest concepts ever but people still don't seem to get it....
  • BMNify - Saturday, September 5, 2015 - link

    "extide: Plus, there is only so much you can do, you quickly run into diminishing returns."

    that's a subjective POV, if you where to remove the base sram and dram and replace them with 10s of femto seconds Correlated Electron RAM and/or Mram in both sram and new wideIO2 configurations for these Arm invested NVram and role them out in the usual arm/foundry collaborations then you begin to see the latest advertised Intel far slower new "3D XPoint" as a sub standard technology in comparison ....
    http://www.zdnet.com/article/arm-licenses-next-gen...
  • Galatian - Thursday, September 3, 2015 - link

    "For users who actively want an LGA1151 4+4e configuration, make sure your Intel representative knows it, because customer requests travel up the chain."

    Who do I need to talk to? Seriously I didn't get Broadwell because I knew Skylake was right around the corner. I mean why introduce a pocketable 5x5 platform, just to announce that you have no plans to actually release the perfect processor for that platform?
  • Valantar - Thursday, September 3, 2015 - link

    "For Skylake-U/Y, these processors are not typically paired with discrete graphics and as far as we can tell, the PCIe lanes have been removed from these lines. As a result, any storage based on PCIe (such as M.2) for devices based on these processors will be using the chipset PCIe lanes."

    According to Intel Ark, the 15W U-series CPUs (at least the i5s and i7s (including the Iris 6650U), which I looked at) have 12 PCIe 3.0 lanes, available in "1x4, 2x2, 1x2+2x1 and 4x1" configurations. Worth updating the article?
  • Valantar - Thursday, September 3, 2015 - link

    And reading on, I suddenly realize why you said what you did. 12 lanes does indeed line up with the ones from the PCH-LP. Does this point toward more of an SOC-like integration of features for U-/Y-series CPUs?
  • BMNify - Thursday, September 3, 2015 - link

    "A lot of these improvements are aimed directly at the pure-performance perspective (except L2 and FMUL to an extent), so you really have to be gunning it or have a specific workload to take advantage."

    i cant believe that to be true, as its a tock and yet no real world view can call this tock an improvement never mind "so you really have to be gunning it or have a specific workload to take advantage." as the real world x264/x265 show no benefit what so ever here....

    also Ian, was it an oversight on your part that in all the 9 pages analysis you did not point out the missing generic "AVX2 SIMD" in most of all these launched today.... please note that the official Intel slides pacifically remove any mention of any AVX SIMD in their latest charts etc.

    it seems a clear cut choice on intels part to try and stop news outlets from mentioning and pointing out the lack of 2015 class SIMD on many of these soc released today.....

    can you at least go through the included charts and point out all the cores/soc that Do Not include generic AVX2 SIMD to make it clear which cores/soc to actually buy (anything with AVX2+) and what new/old soc to discard (anything with only antiquated 2006 sse4 SIMD)
  • Xenonite - Thursday, September 3, 2015 - link

    Actually, consumers will actively avoid AVX2 instruction set capable processors, since they could use more power (especially on the desktop, where Intel's power limiter allows AVX2 to really boost application performance / power consumption)
  • BMNify - Monday, September 7, 2015 - link

    i dont see any logic to your "consumers will actively avoid AVX2 instruction set " comment as by definition "SIMD" ( Single Instruction, Multiple Data) describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

    so in fact AVX2 256 bit SIMD does exactly the opposite of wasting more power compared to the older/slower data paths today, also its clear in the mobile/ultra low power devices where "Qualcomm's new DSP technology boasts heavy vector engine—that it calls Hexagon Vector eXtensions or HVX—for compute-intensive workloads in computational photography, computer vision, virtual reality and photo-realistic graphics on mobile devices. Moreover, it expands single instruction multiple data (SIMD) from 64-bit to 1,024-bit in order to carry out image processing with a wider vector capability...." is in fact doubling even intel's as yet unreleased 512bit AVX3 , with their lowest power 1,024-bit SIMD to date, although its unknown weather its a refreshed NEON or another complimentary SIMD there... we shall see soon enough.

Log in

Don't have an account? Sign up now