The Kaby Lake-U/Y GPU - Media Capabilities

While from a feature standpoint Kaby Lake is not a massive shift from Skylake, when it comes to GPU matters it none the less brings across some improvements that are directly visible to the end-user. As with the CPU cores, Intel’s 14nm+ process will allow for higher GPU frequencies and overall better GPU performance, but arguably the more impressive change with Kaby Lake is the updated media capabilities. To be clear, Kaby Lake is still an Intel Gen9 GPU – the core GPU architecture has not changed – but Intel has revised the video processing blocks to add further functionality and improve their performance for Kaby Lake.

The media capabilities of the Skylake GPU was analyzed in great detail in our 2015 IDF coverage. The updates to Kaby Lake-U/Y should be analyzed while keeping those features in mind. The major feature change in the Kaby Lake-U/Y media engine is the availability of full hardware acceleration for encode and decode of 4K HEVC Main10 profile videos. This is in contrast to Skylake, which can support HEVC Main10 decode up to 4Kp30, but does so using a “hybrid” process that spreads out the workload over the CPU, the GPU’s media processors, and the GPU’s shader cores. As a result, not only can Kaby Lake process more HEVC profiles in fixed function hardware than before, but it can do so at a fraction of the power and with much better throughput.

Also along these lines, Kaby Lake has implemented full fixed function 8-bit encode and 8/10-bit decode support for Google’s VP9 codec. Skylake offered hybrid decode support for the codec, which is useful from a feature standpoint, but is a bit more problematic in real-world use since it’s not as power-efficient to use VP9 a codec implemented in fixed function hardware. Google has proven eager to serve up VP9 to its YouTube users, so they can now much more efficiently decode the codec. Meanwhile, on the encode side, brand-new to Kaby Lake is VP9 encoding support, to go with the aforementioned HEVC encode support.

Intel Video Codec Support
  Kaby Lake Skylake Broadwell
H.264 Decode Hardware Hardware Hardware
HEVC Main Decode Hardware Hardware Hybrid
HEVC Main10 Decode Hardware Hybrid No
VP9 8-Bit Decode Hardware Hybrid Hybrid
VP9 10-Bit Decode Hardware No No
   
H.264 Encode FF & PG-Mode FF & PG-Mode PG-Mode
HEVC Main Encode FF & PG-Mode PG-Mode No
HEVC Main10 Encode FF & PG-Mode No No
VP9 8-Bit Encode FF & PG-Mode No No
VP9 10-Bit Encode No No No

An overview of the GPU engine in Kaby Lake-U/Y is presented in the slide below.

The new circuitry for hardware accelerating HEVC Main10 and VP9 are part of the MFX block. The MFX block can now handle 8b/10b HEVC and VP9 decode and 10b HEVC / 8b VP9 encode. The QuickSync block also gets a few updates to improve quality further, and AVC encode performance also receives a boost.

The Video Quality Engine also receives some tweaks for HDR and Wide Color Gamut (Rec.2020) support.  Skylake's VQE brought in RAW image processing support with a 16-bit image pipeline for selected filters. While Intel has not discussed the exact updates that enable Rec.2020 support, we suspect that more components in the VQE can now handle higher bit-widths. Intel pointed out that the HDR capabilities involve usage of both the VQE and the EUs in the GPU. So, there is still scope for further hardware acceleration and lower power consumption in this particular use-case.

Intel claims that Kaby Lake-U/Y can handle up to eight 4Kp30 AVC and HEVC decodes simultaneously. HEVC decode support is rated at 4Kp60 up to 120 Mbps (especially helpful for premium content playback and Ultra HD Blu-ray). With Kaby Lake-U/Y's process improvements, even the 4.5W TDP Y-series processors can handle real-time HEVC 4Kp30 encode.

On the subject of premium content, in their presentation Intel rather explicitly mentioned that the improved decode capabilities were, in part, for “premium content playback.” When we pushed Intel a bit on the matter – and specifically on 4K Netflix support – they didn’t have much to say beyond the fact that to play 4K Netflix, you need certification. Based on what was said and what was not said (and what we know about the certification process) our educated guess is that the updates in Kaby Lake-U/Y include some new DRM requirements for 4K content, and 4K Netflix should hopefully be good to go with the new platform. However on that note, because of those DRM requirements and that this is being pitched as a new feature for Kaby Lake, we suspect that when 4K Netflix streaming does come to the PC platform, Skylake owners are going to be out of luck.

Update: On a related note, one of the Intel press releases that has gone out today is that Sony's 4K movie and television streaming service, ULTRA, will be coming to Kaby Lake PCs in 2017. To date the service has only been available on Sony's televisions - in part for security reasons - so this is an example of one such premium content service that's coming to Kaby Lake thanks to its stronger DRM abilities.

It must be kept in mind that all the encode / decode aspects discussed above are for 4:2:0 streams. This is definitely acceptable for consumer applications, as even Blu-ray video streams (that have plenty of bandwidth at their disposal) are encoded in 4:2:0. However, if Intel wants to use the new media engine in professional broadcast and datacenter applications, 4:2:2, and, to a much lesser extent, even 4:4:4 support might become necessary. For the purpose of the Kaby Lake-U/Y consumer platforms being introduced today, this is not an issue at all.

Moving on, like the GPU core itself, Kaby Lake-U/Y's display pipeline is the same as that of Skylake. This means the iGPU can support up to three simultaneous displays.

One of the disappointing aspects from Skylake that has still not been addressed in Kaby Lake-U/Y is the absence of a native HDMI 2.0 port with HDCP 2.2 support. Intel has been advocating the addition of an LSPCon (Level Shifter - Protocol Converter) in the DP 1.2 path. This approach has been used in multiple motherboards and even SFF PCs like the Intel Skull Canyon NUC (NUC6i7KYK) and the ASRock Beebox-S series. Hopefully, future iterations of Kaby Lake (such as the desktop and high-performance mobile parts coming in January) address this issue to simplify BOM cost for system vendors.

In summary, Kaby Lake-U/Y resolves one of the major complaints we had about Skylake's media engine: the absence of hardware-accelerated 4Kp60 HEVC Main10 decode. There are a few other improvements under the hood that enable a more satisfying multimedia experience for consumers. The software and content-delivery ecosystems have plenty of catching up to do when it comes to taking full advantage of Kaby Lake-U/Y's media capabilities.

The New CPUs, Updates to Core M Branding Updated 14nm, Speed Shift v2, Performance Updates
Comments Locked

129 Comments

View All Comments

  • lilmoe - Saturday, September 3, 2016 - link

    I couldn't give a rat's bottom how cheap Intel's chips become. If AMD gets similar performance at reasonable prices, then good-bye Intel for me.
  • BillBear - Tuesday, August 30, 2016 - link

    Given the new 14nm+ process, is it safe to speculate that the original problem with 14nm that delayed the hell out of Broadwell and outright killed some of the desktop versions of Broadwell was a power leakage problem?
  • saratoga4 - Tuesday, August 30, 2016 - link

    I don't think leakage was a huge problem. Yields appear to have been though. Intel has some slides explaining that they ramped slower than expected.
  • Jumangi - Tuesday, August 30, 2016 - link

    So possibly no real IPC gains in the desktop version? No point in waiting if your looking to upgrade then.
  • wumpus - Tuesday, August 30, 2016 - link

    Not for this. I keep looking at that die photo and wondering how hard it would be to replace some of that silly graphics bit with 2-4 cores. Maybe they will do it once zen ships. Maybe it would generate too much heat and won't work (I suspect the big multicore jobs have more cache/core area than these chips have (GPU+cache)/core). Maybe Intel will someday include a pony.
  • Molor - Tuesday, August 30, 2016 - link

    They kind of do that. They call it the extreme version and charge more for it. Graphics nodes tend to be more forgiving of flaws due to redundancy. AMD and NVidia usually disable a broken SM or two per chip for yields. It would be interesting to know if Intel does the same.
  • shabby - Tuesday, August 30, 2016 - link

    Pretty much, the 12% benchmark increase came from a 13% clock bump, same wattage though apparently.
  • someonesomewherelse - Thursday, September 1, 2016 - link

    Are you talking about single thread IPC or total chip? I think that single thread IPC improvements are going to (or already have) become too expensive for most applications and without programmer/compiler help. Things that vectorize well are probably the last area where large improvements are realistic. But this will either require great programmers that can actually utilize current (AVX2) and future SIMD instructions in their code + higher development costs or dropping support for older cpus (neither sounds good). Per chip IPC is probably easier but you still need good programmers/compilers or the use of multiple expensive applications at the same time (why not play a game while encoding 3 videos.... with enough cores/threads/cache/memory bw/io b this would work) .

    Clocks could still be increased if you are willing to accept high power consumption and expensive cooling.

    However unless Zen is an extreme success Intel has no reason to do this since slow and expensive increases are more profitable and they have no reason to do this.
  • akmittal - Tuesday, August 30, 2016 - link

    Any chance to see these in this year's macbook lineup.
  • lilmoe - Tuesday, August 30, 2016 - link

    As much as I dislike Apple's ways, sometimes they do things for a reason.

    1) Apple seems very hesitant to bother with Skylake because the various problems/bugs associated with the new architecture. It seams that Haswell/Broadwell is doing the job good enough for them and the "new features" aren't worth it in their general assessment. Macbooks are more media consumption/creation-centric and they're probably waiting on the new fix-function features.

    2) Cost and profitability. If the above is true, it makes sense to stay with Haswell/Bradwell to maximize profit. Just like how they're using 3 gen old AMD graphics.

    3) Lower than expected demand? Not so sure, but possible.

    4) And I'm being hopeful here: *Zen* (and future HBM APUs). Keller has a history working "with" Apple, and they actually like his designs which play well with their OS(s). I'm being hopeful because Apple's marketing prowess and branding may be the beacon AMD (and the competitive market) needs to unleash the new platform and drive Intel to a corner forcing them to lower prices.

Log in

Don't have an account? Sign up now