Skylake's iGPU: Intel Gen9

Both the Skylake processors here use Intel’s HD 530 graphics solution. When I first heard the name, alarm bells went off in my head with questions: why is the name different, has the architecture changed, and what does this mean fundamentally?

Not coming up with many details, we did the obvious thing – check what information comes directly out of the processor. Querying HD 530 via Intel's OpenCL driver reports a 24 EU design running at 1150 MHz. This is different than what GPU-Z indicates, which points to a 48 EU design instead, although GPU-Z is not often correct on newer graphics modules before launch day. We can confirm that this is a 24 EU design, and this most likely follows on from Intel’s 8th Generation graphics in the sense that we have a base GT2 design featuring three sub-slices of 8 EUs each.

As far as we can tell, Intel calls the HD 530 graphics part of its 9th Generation (i.e. Gen9). We have been told directly by Intel that they have changed their graphics naming scheme from a four digit (e.g. HD4600) to a three digit (HD 530) arrangement in order "to minimize confusion" (direct quote). Personally we find that it adds more confusion, because the HD 4600 naming is not directly linked to the HD 530 naming. While you could argue that 5 is more than 4, but we already have HD 5200, HD 5500, Iris 6100 and others. So which is better, HD 530 or HD 5200? At this point it will already create a miasma of uncertainty, probably exaggerated until we get a definite explanation of the stack nomenclature.

Naming aside, Generation 9 graphics comes with some interesting enhancements. The slice and un-slice now have individual power and clock domains, allowing for a more efficient use of resources depending on the load (e.g. some un-slice not needed for some compute tasks). This lets the iGPU better balance power usage between fixed-function operation and programmable shaders.

Generation 9 will support a feature called Multi Plane Overlay, which is a similar feature to AMD’s video playback path adjustments in Carrizo. The principle here is that when a 3D engine has to perform certain operations to an image (blend, resize, scale), the data has to travel from the processor into DRAM then to the GPU to be worked on, then back out to DRAM before it hits the display controller, a small but potentially inefficient operation in mobile environments. What Multi Plane Overlay does is add fixed function hardware to the display controller to perform this without ever hitting the GPU, minimizing power consumption from the GPU and taking out a good portion of DRAM data transfers. This comes at a slight hit for die area overall due to the added fixed function units.

As shown above, this feature will be supported on Win 8.1 with Skylake’s integrated graphics. That being said, not all imaging can be moved in this way, but where possible the data will take the shorter path.

To go along with the reduced memory transfer, Gen9 has support for memory color stream compression. We have seen this technology come into play for other GPUs, where by virtue of fixed function hardware and lossless algorithms this means that smaller quantities of image and texture data is transferred around the system, again saving power and reducing bandwidth constraints. The memory compression is also used with a scalar and format conversion pipe to reduce the encoding pressure on the execution units, reducing power further.

Adding into the mix, we have learned that Gen9 includes a feature called the ‘Camera Pipe’ for quick standard adjustments to images via hardware acceleration. This adjusts the programmable shaders to work in tandem for specific DX11 extensions on common image manipulation processes beyond resize/scale. The Camera Pipe is teamed with SDKs to help developers connect into optimized imaging APIs.

Media Encoding & Decoding

In the world of encode/decode, we get the following:

Whereas Broadwell implemented HEVC decoding in a "hybrid" fashion using a combination of CPU resources, GPU shaders, and existing GPU video decode blocks, Skylake gets a full, low power fixed function HEVC decoder. For desktop users this shouldn't impact things by too much - maybe improve compatibility a tad - but for mobile platforms this should significantly cut down on the amount of power consumed by HEVC decoding and increase the size and bitrate that the CPU can decode. Going hand-in-hand with HEVC decoding, HEVC encoding is now also an option with Intel's QuickSync encoder, allowing for quicker HEVC transcoding, or more likely real-time HEVC uses such as video conferencing.

Intel is also hedging their bets on HEVC by also implementing a degree of VP9 support on Skylake. VP9 is Google's HEVC alternative codec, with the company pushing it as a royalty-free option. Intel calls VP9 support on Skylake "partial" for both encoding and decoding, indicating that VP9 is likely being handled in a hybrid manner similar to how HEVC was handled on Broadwell.

Finally, JPEG encoding is new for Skylake and set to support images up to 16K*16K.

Video Support

The analog (VGA) video connector has now been completely removed from the CPU/chipset combination, meaning that any VGA/D-Sub video connection has to be provided via an active digital/analog converter chip. This has been a long time coming, and is part of a previous committment made by Intel several years ago to remove VGA by 2015. Removing analog display functionality will mean added cost for legacy support in order to drive analog displays. Arguably this doesn’t mean much for Z170 as the high end platform is typically used with a discrete graphics card that has HDMI or DisplayPort, but we will see motherboards with VGA equipped in order to satisfy some regional markets with specific requirements.

HDMI 2.0 is not supported by default, and only the following resolutions are possible on the three digital display controllers:

A DP to HDMI 2.0 converter, specifically an LS-Pcon, is required to do the adjustments, be it on the motherboard itself or as an external adapter. We suspect that there will not be many takers buying a controller to do this, given the capabilities and added benefits listed by the Alpine Ridge controller.

The Skylake CPU Architecture Skylake's Launch Chipset: Z170
Comments Locked

477 Comments

View All Comments

  • SkOrPn - Tuesday, December 13, 2016 - link

    Well if you were paying attention to AMD news today, maybe you partially got your answer finally. Jim Keller yet again to the rescue. Ryzen up and take note... AMD is back...
  • CaedenV - Wednesday, August 5, 2015 - link

    Agreed, seems like the only way to get a real performance boost is to up the core count rather than waiting for dramatically more powerful single-core parts to hit the market.
  • kmmatney - Wednesday, August 5, 2015 - link

    If you have an overclocked SandyBridge, it seems like a lot of money to spend (new motherboard and memory) for a 30% gain in speed. I personally like to upgrade my GPU and CPU when I can get close the double the performance of the previous hardware. It's a nice improvement here, but nothing earth=shattering - especially considering you need a new motherboard and memory.
  • Midwayman - Wednesday, August 5, 2015 - link

    And right as dx12 is hitting as well. That sandy bridge may live a couple more generations if dx12 lives up to the hype.
  • freaqiedude - Wednesday, August 5, 2015 - link

    agreed I really don't see the point of spending money for a 30% speedbump in general, (as its not that much) when the benefit in games is barely a few percent, and my other workloads are fast enough as is.

    If Intel would release a mainstream hexa/octa core I would be all over that, as the things I do that are heavy are all SIMD and thus fully multithreaded, but I can't justify a new pc for 25% extra performance in some area's. with CPU performance becoming less and less relevant for games that atleast is no reason for me to upgrade...
  • Xenonite - Thursday, August 6, 2015 - link

    "If Intel would release a mainstream hexa/octa core I would be all over that, as the things I do that are heavy are all SIMD and thus fully multithreaded, but I can't justify a new pc for 25% extra performance in some area's."

    SIMD actually has absolutely nothing to do with multithreading. SIMD refers to instruction-level parallellism, and all that has to be done to make use of it, for a well-coded app, is to recompile with the appropriate compiler flag. If the apps you are interested in have indeed been SIMD optimised, then the new AVX and AVX2 instructions have the potential to DOUBLE your CPU performance. Even if your application has been carefully designed with multi-threading in mind (which very few developers can, let alone are willing to, do) the move from a quad core to a hexa core CPU will yield a best-case performance increase of less than 50%, which is less than half what AVX and AVX2 brings to the table (with AVX-512 having the potential to again provide double the performance of AVX/AVX2).

    Unfortunately it seems that almost all developers simply refuse to support the new AVX instructions, with most apps being compiled for >10 year old SSE or SSE2 processors.

    If someone actually tried, these new processors (actually Haswell and Broadwell too) could easily provide double the performance of Sandy Bridge on integer workloads. When compared to the 900-series Nehalem-based CPUs, the increase would be even greater and applicable to all workloads (integer and floating point).
  • boeush - Thursday, August 6, 2015 - link

    Right, and wrong. SIMD are vector based calculations. Most code and algorithms do not involve vector math (whether FP or integer). So compiling with or without appropriate switches will not make much of a difference for the vast majority of programs. That's not to say that certain specialized scenarios can't benefit - but even then you still run into a SIMD version of Amdahl's Law, with speedup being strictly limited to the fraction of the code (and overall CPU time spent) that is vectorizable in the first place. Ironically, some of the best vectorizable scenarios are also embarrassingly parallel and suitable to offloading on the GPU (e.g. via OpenCL, or via 3D graphics APIs and programmable shaders) - so with that option now widely available, technologically mature, and performant well beyond any CPU's capability, the practical utility of SSE/AVX is diminished even further. Then there is the fact that a compiler is not really intelligent enough to automatically rewrite your code for you to take good advantage of AVX; you'd actually have to code/build against hand-optimized AVX-centric libraries in the first place. And lastly, AVX 512 is available only on Xeons (Knights Landing Phi and Skylake) so no developer targeting the consumer base can take advantage of AVX 512.
  • Gonemad - Wednesday, August 5, 2015 - link

    I'm running an i7 920 and was asking myself the same thing, since I'm getting near 60-ish FPS on GTA 5 with everything on at 1080p (more like 1920 x 1200), running with a R9 280. It seems the CPU would be holding the GFX card back, but not on GTA 5.

    Warcraft - who could have guessed - is getting abysmal 30 FPS just standing still in the Garrison. However, system resources shows GFX card is being pushed, while the CPU barely needs to move.

    I was thinking perhaps the multicore incompatibility on Warcraft would be an issue, but then again the evidence I have shows otherwise. On the other hand, GTA 5, that was created in the multicore era, runs smoothly.

    Either I have an aberrant system, or some i7 920 era benchmarks could help me understand what exactly do I need to upgrade. Even specific Warcraft behaviour on benchmarks could help me, but I couldn't find any good decisive benchmarks on this Blizzard title... not recently.
  • Samus - Wednesday, August 5, 2015 - link

    The problem now with nehalem and the first gen i7 in general isn't the CPU, but the x58 chipset and its outdated PCI express bus and quickpath creating a bottleneck. The triple channel memory controller went mostly unsaturated because of the other chipset bottlenecks which is why it was dropped and (mostly) never reintroduced outside of enthusiast x99 quad channel interface.

    For certain applications the i7 920 is, amazingly, still competitive today, but gaming is not one of them. An SLI GTX 570 configuration saturates the bus, I found out first hand that is about the most you can get out of the platform.
  • D. Lister - Thursday, August 6, 2015 - link

    Well said. The i7 9xx series had a good run, but now, as an enthusiast/gamer in '15, you wouldn't want to go any lower than Sandy Bridge.

Log in

Don't have an account? Sign up now