The Kaby Lake-U/Y GPU - Media Capabilities

While from a feature standpoint Kaby Lake is not a massive shift from Skylake, when it comes to GPU matters it none the less brings across some improvements that are directly visible to the end-user. As with the CPU cores, Intel’s 14nm+ process will allow for higher GPU frequencies and overall better GPU performance, but arguably the more impressive change with Kaby Lake is the updated media capabilities. To be clear, Kaby Lake is still an Intel Gen9 GPU – the core GPU architecture has not changed – but Intel has revised the video processing blocks to add further functionality and improve their performance for Kaby Lake.

The media capabilities of the Skylake GPU was analyzed in great detail in our 2015 IDF coverage. The updates to Kaby Lake-U/Y should be analyzed while keeping those features in mind. The major feature change in the Kaby Lake-U/Y media engine is the availability of full hardware acceleration for encode and decode of 4K HEVC Main10 profile videos. This is in contrast to Skylake, which can support HEVC Main10 decode up to 4Kp30, but does so using a “hybrid” process that spreads out the workload over the CPU, the GPU’s media processors, and the GPU’s shader cores. As a result, not only can Kaby Lake process more HEVC profiles in fixed function hardware than before, but it can do so at a fraction of the power and with much better throughput.

Also along these lines, Kaby Lake has implemented full fixed function 8-bit encode and 8/10-bit decode support for Google’s VP9 codec. Skylake offered hybrid decode support for the codec, which is useful from a feature standpoint, but is a bit more problematic in real-world use since it’s not as power-efficient to use VP9 a codec implemented in fixed function hardware. Google has proven eager to serve up VP9 to its YouTube users, so they can now much more efficiently decode the codec. Meanwhile, on the encode side, brand-new to Kaby Lake is VP9 encoding support, to go with the aforementioned HEVC encode support.

Intel Video Codec Support
  Kaby Lake Skylake Broadwell
H.264 Decode Hardware Hardware Hardware
HEVC Main Decode Hardware Hardware Hybrid
HEVC Main10 Decode Hardware Hybrid No
VP9 8-Bit Decode Hardware Hybrid Hybrid
VP9 10-Bit Decode Hardware No No
   
H.264 Encode FF & PG-Mode FF & PG-Mode PG-Mode
HEVC Main Encode FF & PG-Mode PG-Mode No
HEVC Main10 Encode FF & PG-Mode No No
VP9 8-Bit Encode FF & PG-Mode No No
VP9 10-Bit Encode No No No

An overview of the GPU engine in Kaby Lake-U/Y is presented in the slide below.

The new circuitry for hardware accelerating HEVC Main10 and VP9 are part of the MFX block. The MFX block can now handle 8b/10b HEVC and VP9 decode and 10b HEVC / 8b VP9 encode. The QuickSync block also gets a few updates to improve quality further, and AVC encode performance also receives a boost.

The Video Quality Engine also receives some tweaks for HDR and Wide Color Gamut (Rec.2020) support.  Skylake's VQE brought in RAW image processing support with a 16-bit image pipeline for selected filters. While Intel has not discussed the exact updates that enable Rec.2020 support, we suspect that more components in the VQE can now handle higher bit-widths. Intel pointed out that the HDR capabilities involve usage of both the VQE and the EUs in the GPU. So, there is still scope for further hardware acceleration and lower power consumption in this particular use-case.

Intel claims that Kaby Lake-U/Y can handle up to eight 4Kp30 AVC and HEVC decodes simultaneously. HEVC decode support is rated at 4Kp60 up to 120 Mbps (especially helpful for premium content playback and Ultra HD Blu-ray). With Kaby Lake-U/Y's process improvements, even the 4.5W TDP Y-series processors can handle real-time HEVC 4Kp30 encode.

On the subject of premium content, in their presentation Intel rather explicitly mentioned that the improved decode capabilities were, in part, for “premium content playback.” When we pushed Intel a bit on the matter – and specifically on 4K Netflix support – they didn’t have much to say beyond the fact that to play 4K Netflix, you need certification. Based on what was said and what was not said (and what we know about the certification process) our educated guess is that the updates in Kaby Lake-U/Y include some new DRM requirements for 4K content, and 4K Netflix should hopefully be good to go with the new platform. However on that note, because of those DRM requirements and that this is being pitched as a new feature for Kaby Lake, we suspect that when 4K Netflix streaming does come to the PC platform, Skylake owners are going to be out of luck.

Update: On a related note, one of the Intel press releases that has gone out today is that Sony's 4K movie and television streaming service, ULTRA, will be coming to Kaby Lake PCs in 2017. To date the service has only been available on Sony's televisions - in part for security reasons - so this is an example of one such premium content service that's coming to Kaby Lake thanks to its stronger DRM abilities.

It must be kept in mind that all the encode / decode aspects discussed above are for 4:2:0 streams. This is definitely acceptable for consumer applications, as even Blu-ray video streams (that have plenty of bandwidth at their disposal) are encoded in 4:2:0. However, if Intel wants to use the new media engine in professional broadcast and datacenter applications, 4:2:2, and, to a much lesser extent, even 4:4:4 support might become necessary. For the purpose of the Kaby Lake-U/Y consumer platforms being introduced today, this is not an issue at all.

Moving on, like the GPU core itself, Kaby Lake-U/Y's display pipeline is the same as that of Skylake. This means the iGPU can support up to three simultaneous displays.

One of the disappointing aspects from Skylake that has still not been addressed in Kaby Lake-U/Y is the absence of a native HDMI 2.0 port with HDCP 2.2 support. Intel has been advocating the addition of an LSPCon (Level Shifter - Protocol Converter) in the DP 1.2 path. This approach has been used in multiple motherboards and even SFF PCs like the Intel Skull Canyon NUC (NUC6i7KYK) and the ASRock Beebox-S series. Hopefully, future iterations of Kaby Lake (such as the desktop and high-performance mobile parts coming in January) address this issue to simplify BOM cost for system vendors.

In summary, Kaby Lake-U/Y resolves one of the major complaints we had about Skylake's media engine: the absence of hardware-accelerated 4Kp60 HEVC Main10 decode. There are a few other improvements under the hood that enable a more satisfying multimedia experience for consumers. The software and content-delivery ecosystems have plenty of catching up to do when it comes to taking full advantage of Kaby Lake-U/Y's media capabilities.

The New CPUs, Updates to Core M Branding Updated 14nm, Speed Shift v2, Performance Updates
Comments Locked

129 Comments

View All Comments

  • rhysiam - Tuesday, August 30, 2016 - link

    They speculate on page 4 whether some retooling is required for the new 14nm+ process, and therefore whether perhaps only one or two fabs are going to be up and running early. If Intel has limited output it makes sense to direct early production to the valuable CPUs per mm2 of wafer... which is precisely these standard U and Y series processors (maybe some Xeon CPUs are higher earners, but the platform isn't ready yet). Mobile Iris Pro CPUs and most desktop processors require much more die area... meaning less output.

    All speculation at this point, but it is a possible answer to your question.
  • TEAMSWITCHER - Tuesday, August 30, 2016 - link

    Ok, that makes sense. I always thought they were the same chips - with the Iris Pro features disabled. But if they are smaller dies then the bottom up approach could help to perfect the process before switching to the larger dies - potentially reducing the number of defective chips. Thanks.
  • A5 - Tuesday, August 30, 2016 - link

    It's yield and profit concerns. Doing the big chips first means they have to throw more of them away, which cuts down their profits.
  • bryanlarsen - Tuesday, August 30, 2016 - link

    Smaller chips yield dramatically better when defects are high. Imagine a die that holds 100 large chips and there are 100 defects on the die. Some of the chips will have more than one defect so there will be a few chips that are good, perhaps 15-25 or so. Now imagine that you are putting 200 smaller chips on the same die with 100 defects. You'll get at least 100 good chips, perhaps 110-120. So unless you can sell the large chip for 6-8x the cost of the small chip, it's more profitable to start with the small chips when defect rates are high.
  • retrospooty - Tuesday, August 30, 2016 - link

    The answer to almost any question like that is - they think it will be more profitable for them. They arent just thinking about the latest fastest thing, they are thinking about production, orders, volume and stock levels.
  • quadrivial - Tuesday, August 30, 2016 - link

    The answer is most likely ARM.

    Intel has zero competition in the high-end CPU front. People who can't wait will pay just as much for last-gen chips because that's all that's on the market. People who can wait won't mind a few months (and don't really have an option). In contrast, Intel lives in fear of Qualcomm, Samsung, or AMD announcing an ARM chip competitive with x86. Taking a more aggressive stance and coming to market as soon as possible is what Intel shareholders will want to see.
  • CaedenV - Tuesday, August 30, 2016 - link

    True story. I can cry all I want about wanting a faster desktop chip, but the simple fact of the matter is that I will be forced to wait for Intel to release one because I am not tempted to move to AMD any time soon.
    But that the same time there are hundreds of schools debating between ARM and Intel chromebooks and chromeboxes, and whoever offers the lowest price is going to win the day. Releasing the smaller cheaper chips ASAP will prevent loosing those sales to ARM.
  • doggface - Wednesday, August 31, 2016 - link

    Only problem with your theory is these chips are priced at well above the cost of a Chromebook processor. We are talking $2-400 for these chips. Arm processors can be less than $50. Not even the same league.

    Intel has ceded the low end of the market to Arm with the discontinuation of atom.
  • fanofanand - Wednesday, August 31, 2016 - link

    Intel charges more for the chip than most chromebooks cost.
  • Meteor2 - Wednesday, August 31, 2016 - link

    None of this stuff (KBL) competes with ARM, it's aimed squarely at Apple. Broxton is the ARM competitor.

Log in

Don't have an account? Sign up now