Intel’s Generation 9 Graphics

On the graphics side of the equation, the information comes in two loaded barrels. For the media capabilities, including information regarding Multi Plane Overlay, Intel’s Quick Sync and HEVC decode, head on over to Ganesh’s great one page summary. Some of that information might be reproduced on this page to help explain some of the more esoteric aspects of the design.

First let us look at how Intel’s integrated graphics has progressed over the generations. It has been no secret that the drive to integrated graphics has eaten almost all of the low end graphics market save one or two cards for extra monitor outputs. For Intel, this means slowly substituting a larger portion of the die to more execution units, as well as upgrading how those units process data.  From the graphics above, and as we’ve noted before, Gen9 takes Gen8’s concept but with an added element at the top – GT4, which incorporates a new slice of EUs. As mentioned on the previous page, the eDRAM will come in 64 MB arrangements for GT3e (Iris), and 128 MB for GT4e (Iris Pro).

Intel’s graphics topology consists of an ‘unslice’ (or slice common) that deals solely with command streaming, vertex fetching, tessellation, domain shading, geometry shaders, and a thread dispatcher. This is followed up by either one, two or three slices, where each slice holds three sub-slices of 8 EUs, separate L1/L2/L3 caches and rasterizers, texture caches and media samplers. With one slice, a GT2 configuration of 24 EUs, we get the video quality engine (VQE), scalar and format converter (SFC) and a multi format codec engine (MFX). Moving to GT3 adds in another MFX and VQE.

Intel Gen9 HD Graphics
GPU Designation Execution Units GT eDRAM? YUHS Example
Intel HD Graphics 12 2+1 - Y 4405Y
Intel HD Graphics 510 12 2+2 - U S G4400
4405-U
Intel HD Graphics 515 24 2+2 - Y 6Y75
6Y57
6Y30
Intel HD Graphics 520 24 4+2
2+2
- U i7-6600U
i3-6100U
Intel HD Graphics 530 24 4+2
2+2
- H S i7-6700K
i3-6100T
i7-6820HK
Intel HD Graphics P530 24 4+2 - H E3-1535M v5
Intel Iris Graphics 540 48 2+3e 64MB U i7-6650U
i5-6260U
Intel Iris Graphics 550 48 2+3e 64MB U i7-6567U
i3-6167U
Intel Iris Pro Graphics 580 72 4+4e 128MB H -

In each of the SKU lists, Intel has both the name of the graphics used as well as the base/turbo frequencies of the graphics. To synchronize both the name and the execution unit arrangement, we have the table above. At this point we have no confirmation of any parts having the GT1.5 arrangement, nor are we going to see any 23/24 EU combinations for Pentium/Celeron models similar to what we saw in Haswell.  For the 12 EU arrangements, this means that these have a single slice of 24 EUs, but half of them are disabled.  We questioned Intel on this back with Haswell, regarding of the EU cuts are split to 4/4/4 from the 8/8/8 sub-slice arrangement, and the answer we got back was interesting – there is no set pattern for 12 EU arrangements. It could be 4/4/4, or 3/4/5, or 3/3/6, or any other combination, as long as the performance is in line with what is expected and the EU count per sub-slice cannot be user probed. Despite the fact that a 3/3/6 arrangement might have L2 cache pressure in some circumstances, Intel feels that at this level of performance they can guarantee that the processors they ship will be consistent. It’s an interesting application.

Part of Intel’s strategy with the graphics is to separate all of these out as much as possible to different frequency planes and power gating, allowing parts of the silicon to only be powered when needed, or to offer more efficiency for regular use cases such as watching video.

The geometry pipe in the un-slice is improved to improve the triangle cull rate as well as remove redundant vertices, which aids an improved tessellated that creates geometry in a format (tri-stips) that can be resubmitted through the triangle cull when needed.

As we noted at Skylake-K launch, Gen 9 graphics also features lossless image compression allowing for fewer data transfers across the graphics subsystem. This saves both power and bandwidth, and an upgrade to the display controller allows the image format to be read in a compressed format so as not to uncompress it before it gets sent out to the display.

16-bit support has been a key element to discrete graphics in 2014 and 2015, affording faster processing when less accuracy is required by dumping extra significant digits. This is doubly beneficial, resulting in less power consumption for the same amount of work – Intel is exposing this to a greater degree in Gen9 for its shaders.

Intel’s big push on power for low-power Skylake is part of the reason why we did not see much difference in our Skylake-K review. Being able to shut off more parts of the processor that are not in use allows the available power budget to be applied to those that need it, hence why both power gating and frequency domains on the silicon, while they take up die area to implement, contribute to power saving overall or allow work to complete quicker in a race to sleep environment.

Multiplane Overlay (MPO)

One of the features I’m most particularly interested in is Multiplane Overlay. In a typical environment, a user might be looking at content that is stretched, rotated or distorted in some way. In order to perform this transformation, the image information is loaded into memory, siphoned off to the graphics to perform the mathematics, moved back into memory and then fired off to the display controller. This in and out of the DRAM costs power and battery life, so an attempt to mitigate this with fixed function hardware is ultimately beneficial. This is what MPO does.

This slides shows a good way on how MPO works. The current implementation splits the screen into three ‘planes’ (app, background, cursor, or any combination therein) which are fired off to the desktop window manager (DWM). In a standard method, these layers are worked on separately before being composited and sent off to the display (the top model). With MPO, each of the planes fills a buffer in fixed function hardware on the display controller, without touching the GPU and requiring it to be put into a high power mode. MPO also allows for data in NV12 format to be processed on the fly as well, rather than requiring it all to be RGB before it is recombined into the final image (which again, saves power and bandwidth).

There are some limitations to this method. Currently, only three planes are supported, and there is no z-hierarchy meaning that if an app is obscured by another but still requires work, and then it does not get discarded as it should. For non-OS work that takes full-screen environments, it also requires the necessary hooks to the FF hardware otherwise it will assume it all as one plane and go back through the GPU. Intel is planning to improve this feature over time, as we might expect.

Intel’s data showed a 17% power saving increase overall while watching a 1080p24 video on a 1440p panel. Given the march to higher resolution displays and the lack of high resolution content, I can imagine this being an important player in future content consumption or multitasking.

There are a number of other media specific functions also new to Skylake, especially evolving around RAW processing in the fixed function units of the video quality engines to save power, other memory compression techniques and the encode/decode of specific formats such as HEVC and VP9. For this information, I suggest heading over to read Ganesh’s piece on Intel's Skylake GPU - Analyzing the Media Capabilities.

Skylake Core Microarchitecture and eDRAM Analysis The Long March to Power: Intel’s Assault on Battery Life
POST A COMMENT

173 Comments

View All Comments

  • just4U - Wednesday, September 2, 2015 - link

    I have to agree with Jumangi,

    If your gaming plans revolve around a integrated GPU your still better served to go the AMD route.. While the CPU is not as fast it's no slouch either.. and gaming performance is going to be acceptable in comparison on most titles.
    Reply
  • sundragon - Monday, September 7, 2015 - link

    Um, first hand experience: Macbook Pro 2015, (Iris 6200): Skyrim, ESO, Civilization 5, Homeworld, all run at 1440x - I love all these people talk about intel integrated graphics sucking, meanwhile I'm getting crushed in Civ5 and kicking ass in Homeworld and ESO.
    I'm not lugging an integrated laptop around to play games, I have a laptop and I like to have ONE LAPTOP, and guess what, everything I've thrown on here has played. My MBA 2012 HD4000 struggled with Skyrim and Civ 5 but I still played. Please stop talking theoretical and talk about your actual rig... /end rant
    Reply
  • BurntMyBacon - Thursday, September 3, 2015 - link

    @retrospooty: Core2 era was more a return to parity. One of the most even matchups I can remember was the ironically similarly numbered Phenom II 955 and the Core 2 Quad 9550. Nahalem is what really did the damage. Here's hoping Zen can put AMD back in the ballpark.

    I do think AMD has a pretty significant GPU advantage in the area of gaming over Intel. However, as you've stated, the power/thermal constraints do not allow them to fully exploit this advantage. A CPU intense game, even if not CPU limited, will chew up much of the GPU's available thermal envelop, effectively eliminating any advantage AMD had. Granted, there are cases where the thermal solutions in play provide the necessary thermal headroom, but these are mostly found in laptops that are already using discrete chips.
    Reply
  • MrBungle123 - Thursday, September 3, 2015 - link

    The Phenom II didn't come out until after Intel had retired the Core 2 line. Everyone wants AMD to be competitive but the fact is they are miles behind Intel. Reply
  • MapRef41N93W - Friday, September 4, 2015 - link

    Guess you didn't read the review of Broadwell Iris Pro on this very site. AMD's GPU loses by as much as 20-30% in most games vs Broadwell Iris Pro. Skylake Iris Pro will be offering up to 50% more performance. Reply
  • V900 - Wednesday, September 2, 2015 - link

    4: Not everybody who are interested in a gaming machine can afford a Core i7 and several 1000$ graphic cards in a SLI configuration. A lot of gamers have a budget between 500$-1000$, and if you can get/get close to XB1 performance with just an Intel IGP, it would be perfect for that kind of budget.

    Also: Why would you think a 13' laptop with Iris Pro and 72 execution units would "fail miserably" in comparison with an XB1/PS4?!?

    That's ridiculous. Any advantage the console would have is tiny.

    Just get two wireless controllers and hook up the laptop to your HDTV with a HDMI cable, and the experience would be close to identical....
    Reply
  • MrSpadge - Wednesday, September 2, 2015 - link

    "Also: Why would you think a 13' laptop with Iris Pro and 72 execution units would "fail miserably" in comparison with an XB1/PS4?!?"

    Because he specifically mentioned this in conjunction with "user experience". The PC gives you freedom but certainly not the ease of use of a console. Which is mainly why these things exist at all.
    Reply
  • Jumangi - Wednesday, September 2, 2015 - link

    Lolz if you think an Intel only machine with any sort of Integrated graphics(even the best Iris Pro) will give you anything close to an Xbox One game your seriously naive and ignorant. Stop looking at theoretical Gflops numbers to make comparisons. Reply
  • IanHagen - Wednesday, September 2, 2015 - link

    Well, a few posts back up you're stating that AMD's A10 APU have "far better graphics" when it failed to beat last generation Iris 5200 GPU and now there you are, talking about naiveness and ignorance. Reply
  • Jumangi - Wednesday, September 2, 2015 - link

    Compare actual gaming on the two mr naive one. also compare the huge cost differences of these chips. An Iris Pro laptop will be far far more expensive. Reply

Log in

Don't have an account? Sign up now