Intel’s Generation 9 Graphics

On the graphics side of the equation, the information comes in two loaded barrels. For the media capabilities, including information regarding Multi Plane Overlay, Intel’s Quick Sync and HEVC decode, head on over to Ganesh’s great one page summary. Some of that information might be reproduced on this page to help explain some of the more esoteric aspects of the design.

First let us look at how Intel’s integrated graphics has progressed over the generations. It has been no secret that the drive to integrated graphics has eaten almost all of the low end graphics market save one or two cards for extra monitor outputs. For Intel, this means slowly substituting a larger portion of the die to more execution units, as well as upgrading how those units process data.  From the graphics above, and as we’ve noted before, Gen9 takes Gen8’s concept but with an added element at the top – GT4, which incorporates a new slice of EUs. As mentioned on the previous page, the eDRAM will come in 64 MB arrangements for GT3e (Iris), and 128 MB for GT4e (Iris Pro).

Intel’s graphics topology consists of an ‘unslice’ (or slice common) that deals solely with command streaming, vertex fetching, tessellation, domain shading, geometry shaders, and a thread dispatcher. This is followed up by either one, two or three slices, where each slice holds three sub-slices of 8 EUs, separate L1/L2/L3 caches and rasterizers, texture caches and media samplers. With one slice, a GT2 configuration of 24 EUs, we get the video quality engine (VQE), scalar and format converter (SFC) and a multi format codec engine (MFX). Moving to GT3 adds in another MFX and VQE.

Intel Gen9 HD Graphics
GPU Designation Execution Units GT eDRAM? YUHS Example
Intel HD Graphics 12 2+1 - Y 4405Y
Intel HD Graphics 510 12 2+2 - U S G4400
4405-U
Intel HD Graphics 515 24 2+2 - Y 6Y75
6Y57
6Y30
Intel HD Graphics 520 24 4+2
2+2
- U i7-6600U
i3-6100U
Intel HD Graphics 530 24 4+2
2+2
- H S i7-6700K
i3-6100T
i7-6820HK
Intel HD Graphics P530 24 4+2 - H E3-1535M v5
Intel Iris Graphics 540 48 2+3e 64MB U i7-6650U
i5-6260U
Intel Iris Graphics 550 48 2+3e 64MB U i7-6567U
i3-6167U
Intel Iris Pro Graphics 580 72 4+4e 128MB H -

In each of the SKU lists, Intel has both the name of the graphics used as well as the base/turbo frequencies of the graphics. To synchronize both the name and the execution unit arrangement, we have the table above. At this point we have no confirmation of any parts having the GT1.5 arrangement, nor are we going to see any 23/24 EU combinations for Pentium/Celeron models similar to what we saw in Haswell.  For the 12 EU arrangements, this means that these have a single slice of 24 EUs, but half of them are disabled.  We questioned Intel on this back with Haswell, regarding of the EU cuts are split to 4/4/4 from the 8/8/8 sub-slice arrangement, and the answer we got back was interesting – there is no set pattern for 12 EU arrangements. It could be 4/4/4, or 3/4/5, or 3/3/6, or any other combination, as long as the performance is in line with what is expected and the EU count per sub-slice cannot be user probed. Despite the fact that a 3/3/6 arrangement might have L2 cache pressure in some circumstances, Intel feels that at this level of performance they can guarantee that the processors they ship will be consistent. It’s an interesting application.

Part of Intel’s strategy with the graphics is to separate all of these out as much as possible to different frequency planes and power gating, allowing parts of the silicon to only be powered when needed, or to offer more efficiency for regular use cases such as watching video.

The geometry pipe in the un-slice is improved to improve the triangle cull rate as well as remove redundant vertices, which aids an improved tessellated that creates geometry in a format (tri-stips) that can be resubmitted through the triangle cull when needed.

As we noted at Skylake-K launch, Gen 9 graphics also features lossless image compression allowing for fewer data transfers across the graphics subsystem. This saves both power and bandwidth, and an upgrade to the display controller allows the image format to be read in a compressed format so as not to uncompress it before it gets sent out to the display.

16-bit support has been a key element to discrete graphics in 2014 and 2015, affording faster processing when less accuracy is required by dumping extra significant digits. This is doubly beneficial, resulting in less power consumption for the same amount of work – Intel is exposing this to a greater degree in Gen9 for its shaders.

Intel’s big push on power for low-power Skylake is part of the reason why we did not see much difference in our Skylake-K review. Being able to shut off more parts of the processor that are not in use allows the available power budget to be applied to those that need it, hence why both power gating and frequency domains on the silicon, while they take up die area to implement, contribute to power saving overall or allow work to complete quicker in a race to sleep environment.

Multiplane Overlay (MPO)

One of the features I’m most particularly interested in is Multiplane Overlay. In a typical environment, a user might be looking at content that is stretched, rotated or distorted in some way. In order to perform this transformation, the image information is loaded into memory, siphoned off to the graphics to perform the mathematics, moved back into memory and then fired off to the display controller. This in and out of the DRAM costs power and battery life, so an attempt to mitigate this with fixed function hardware is ultimately beneficial. This is what MPO does.

This slides shows a good way on how MPO works. The current implementation splits the screen into three ‘planes’ (app, background, cursor, or any combination therein) which are fired off to the desktop window manager (DWM). In a standard method, these layers are worked on separately before being composited and sent off to the display (the top model). With MPO, each of the planes fills a buffer in fixed function hardware on the display controller, without touching the GPU and requiring it to be put into a high power mode. MPO also allows for data in NV12 format to be processed on the fly as well, rather than requiring it all to be RGB before it is recombined into the final image (which again, saves power and bandwidth).

There are some limitations to this method. Currently, only three planes are supported, and there is no z-hierarchy meaning that if an app is obscured by another but still requires work, and then it does not get discarded as it should. For non-OS work that takes full-screen environments, it also requires the necessary hooks to the FF hardware otherwise it will assume it all as one plane and go back through the GPU. Intel is planning to improve this feature over time, as we might expect.

Intel’s data showed a 17% power saving increase overall while watching a 1080p24 video on a 1440p panel. Given the march to higher resolution displays and the lack of high resolution content, I can imagine this being an important player in future content consumption or multitasking.

There are a number of other media specific functions also new to Skylake, especially evolving around RAW processing in the fixed function units of the video quality engines to save power, other memory compression techniques and the encode/decode of specific formats such as HEVC and VP9. For this information, I suggest heading over to read Ganesh’s piece on Intel's Skylake GPU - Analyzing the Media Capabilities.

Skylake Core Microarchitecture and eDRAM Analysis The Long March to Power: Intel’s Assault on Battery Life
POST A COMMENT

173 Comments

View All Comments

  • tipoo - Tuesday, September 01, 2015 - link

    The bit of eDRAM on even ultrabook parts may be one of the more exciting bits of Skylake. Should bring baseline performance up significantly, even with half the eDRAM of the Pro 5200. Reply
  • tipoo - Tuesday, September 01, 2015 - link


    That 72EU part also comes shockingly close to XBO GPU Gflop numbers, which, while not directly comparable, means integrated graphics will catch up to this gens consoles very soon.
    Reply
  • RussianSensation - Wednesday, September 02, 2015 - link

    But it's irrelevant in the real world for 5 reasons:

    1) Intel's best CPUs don't focus on IGP (i.e., i7-6600K, 6700K, 5820K-5960X) which means someone who is interested in gaming is buying a dedicated i5/i7, especially K series and going for a discrete graphics card.

    2) Since we are discussing PC gaming, not console gaming, a budget gamer is going to be better off getting a lower end discrete GPU like the $90 GTX750Ti or even going on the used market and buying a $100 HD7970/GTX670, instead of trying to play games on Intel's 72 EU part.

    3) Looking at historical pricing of Intel's parts with eDRAM, they'll probably cost almost as much as the Xbox One/PS4.

    4) No one buys an Xbox One/PS4 because they want the best graphics. If you want that, you build a Core i7 + GTX980Ti SLI/Fury X CF system. People buy consoles for ease of use, to play online with their friends, and to have exclusives. In the areas the consoles excel, a 13-15" PC laptop with a 72 EU Intel part will fail miserably in comparison to the gaming experience one would get on a PS4/XB1 + large TV in the living room. Frankly, these 2 devices aren't competing with each other.

    5) Overall cost of the device - a $300 Intel CPU is worthless without a motherboard, ram, SSD/HDD, keyboard, etc. That means trying to compare how fast an Intel's CPU with 72 EUs and EDRAM is vs. an Xbox One and PS4 and ignoring the Total System Cost is misleading.

    I guarantee it that anyone interested in PC gaming could care less about Intel's IGP as any serious gamer will be getting a Skylake laptop with a Maxwell and next year a Pascal GPU.
    Reply
  • HideOut - Wednesday, September 02, 2015 - link

    No where in his comment did he mention same performance and cost. He was merely making an observation. Reply
  • Jumangi - Wednesday, September 02, 2015 - link

    That Intels Integrated graphics can finally match a ten year old console? Big deal... Reply
  • IanHagen - Wednesday, September 02, 2015 - link

    No, that it matches a console released last year. Reply
  • SunLord - Wednesday, September 02, 2015 - link

    I doubt it they might be able to match the spec numbers but actual real world performance will likely still favor the console simply because of the specialized optimizations developers use on consoles vs the more generic options pc games are forced to use thanks to the near infinit hardware combinatiosn one can have Reply
  • tuxRoller - Sunday, September 06, 2015 - link

    There's already a vulkan driver ready for Intel (on Linux) made by lunarg. That will allow for the optimizations needed if the developers care to make them. Reply
  • Jumangi - Wednesday, September 02, 2015 - link

    Ahaha you think this thing will match an Xbox one? Wow the delusion is seriously big with some people. Also the cost of one of these high end Iris Pro CPUs alone will cost more than an entire console or a decent AMD laptop with still better graphics. Reply
  • BillyONeal - Wednesday, September 02, 2015 - link

    Considering both the PS4 and XBO also use integrated graphics solutions from a couple years ago it isn't far fetched. Reply

Log in

Don't have an account? Sign up now