The Kaby Lake-U/Y GPU - Media Capabilities
Written by Ganesh

While from a feature standpoint Kaby Lake is not a massive shift from Skylake, when it comes to GPU matters it none the less brings across some improvements that are directly visible to the end-user. As with the CPU cores, Intel’s 14nm+ process will allow for higher GPU frequencies and overall better GPU performance, but arguably the more impressive change with Kaby Lake is the updated media capabilities. To be clear, Kaby Lake is still an Intel Gen9 GPU – the core GPU architecture has not changed – but Intel has revised the video processing blocks to add further functionality and improve their performance for Kaby Lake.

The media capabilities of the Skylake GPU were analyzed in great detail in our 2015 IDF coverage. The updates to Kaby Lake-U/Y should be analyzed while keeping those features in mind. The major feature change in the Kaby Lake-U/Y media engine is the availability of full hardware acceleration for encode and decode of 4K HEVC Main10 profile videos. This is in contrast to Skylake, which can support HEVC Main10 decode up to 4Kp30, but does so using a “hybrid” process that spreads out the workload over the CPU, the GPU’s media processors, and the GPU’s shader cores. As a result, not only can Kaby Lake process more HEVC profiles in fixed function hardware than before, but it can do so at a fraction of the power and with much better throughput.

Also along these lines, Kaby Lake has implemented full fixed function 8-bit encode and 8/10-bit decode support for Google’s VP9 codec. Skylake offered hybrid decode support for the codec, which is useful from a feature standpoint, but is a bit more problematic in real-world use since it’s not as power-efficient to use VP9 a codec implemented in fixed function hardware. Google has proven eager to serve up VP9 to its YouTube users, so they can now much more efficiently decode the codec. Meanwhile, on the encode side, brand-new to Kaby Lake is VP9 encoding support, to go with the aforementioned HEVC encode support.

An overview of the GPU engine in Kaby Lake-U/Y is presented in the slide below.

The new circuitry for hardware accelerating HEVC Main10 and VP9 are part of the MFX block. The MFX block can now handle 8b/10b HEVC and VP9 decode and 10b HEVC / 8b VP9 encode. The QuickSync block also gets a few updates to improve quality further, and AVC encode performance also receives a boost.

The Video Quality Engine also receives some tweaks for HDR and Wide Color Gamut (Rec.2020) support.  Skylake's VQE brought in RAW image processing support with a 16-bit image pipeline for selected filters. While Intel has not discussed the exact updates that enable Rec.2020 support, we suspect that more components in the VQE can now handle higher bit-widths. Intel pointed out that the HDR capabilities involve usage of both the VQE and the EUs in the GPU. So, there is still scope for further hardware acceleration and lower power consumption in this particular use-case.

Intel claims that Kaby Lake-U/Y can handle up to eight 4Kp30 AVC and HEVC decodes simultaneously. HEVC decode support is rated at 4Kp60 up to 120 Mbps (especially helpful for premium content playback and Ultra HD Blu-ray). With Kaby Lake-U/Y's process improvements, even the 4.5W TDP Y-series processors can handle real-time HEVC 4Kp30 encode.

On the subject of premium content, in their presentation Intel rather explicitly mentioned that the improved decode capabilities were, in part, for “premium content playback.” When we pushed Intel a bit on the matter – and specifically on 4K Netflix support – they didn’t have much to say beyond the fact that to play 4K Netflix, you need certification. Based on what was said and what was not said (and what we know about the certification process) our educated guess is that the updates in Kaby Lake-U/Y include some new DRM requirements for 4K content, and 4K Netflix should hopefully be good to go with the new platform. However on that note, because of those DRM requirements and that this is being pitched as a new feature for Kaby Lake, we suspect that when 4K Netflix streaming does come to the PC platform, Skylake owners are going to be out of luck.

It must be kept in mind that all the encode / decode aspects discussed above are for 4:2:0 streams. This is definitely acceptable for consumer applications, as even Blu-ray video streams (that have plenty of bandwidth at their disposal) are encoded in 4:2:0. However, if Intel wants to use the new media engine in professional broadcast and datacenter applications, 4:2:2, and, to a much lesser extent, even 4:4:4 support might become necessary. For the purpose of the Kaby Lake-U/Y consumer platforms being introduced today, this is not an issue at all.

Moving on, like the GPU core itself, Kaby Lake-U/Y's display pipeline is the same as that of Skylake. This means the iGPU can support up to three simultaneous displays.

One of the disappointing aspects from Skylake that has still not been addressed in Kaby Lake-U/Y is the absence of a native HDMI 2.0 port with HDCP 2.2 support. Intel has been advocating the addition of an LSPCon (Level Shifter - Protocol Converter) in the DP 1.2 path. This approach has been used in multiple motherboards and even SFF PCs like the Intel Skull Canyon NUC (NUC6i7KYK) and the ASRock Beebox-S series. Hopefully, future iterations of Kaby Lake (such as the desktop and high-performance mobile parts coming in January) address this issue to simplify BOM cost for system vendors.

In summary, Kaby Lake-U/Y resolves one of the major complaints we had about Skylake's media engine: the absence of hardware-accelerated 4Kp60 HEVC Main10 decode. There are a few other improvements under the hood that enable a more satisfying multimedia experience for consumers. The software and content-delivery ecosystems have plenty of catching up to do when it comes to taking full advantage of Kaby Lake-U/Y's media capabilities.

Intel Authenticate and OPI 3.0 200-Series Chipsets and Motherboards
Comments Locked

43 Comments

View All Comments

  • Lolimaster - Tuesday, January 3, 2017 - link

    Considering the minimum cores you get per module is 4, I see AMD selling months later a 3c/6t cpu for $99.

    They will make a tweak for the raven ridge APU since the core count for those is 4c max.
  • jjj - Friday, January 6, 2017 - link

    Every segment they don't cover (and they don't have Zen APUs yet) is business left on the table - the budget segment is big enough and in regions they care about.

    Maybe they should go to 49$ with quads and disable HT, some cache but it is likely that if they don't do that, most would make an effort to get the 99$ quad. Just hope they don't get too greedy and start way higher, Intel can make quads without a GPU too, won't take too long and AMD needs to exploit this window of opportunity and gain,not just revenue, but hearts and minds.
  • name99 - Tuesday, January 3, 2017 - link

    "We still have not received an official word if Intel is working closely with Apple to bring the feature to macOS, or even if it will be promoted if it ever makes the transition"

    Could some more-or-less unexpected interaction between Speed Shift 2 and the rest of MacOS be the reason for the apparently random dramatic swings in the battery lifetime of the new MacBook Pros? We hardly know enough to point fingers at either Apple or Intel, but I could certainly imagine that each side has a certain mental model of what the other side is/"should" be doing, and the mismatch between those models means that the CPU is randomly being told to run at maximum speed when the OS actually wants it to dramatically slow down.
    I agree that this sounds kinda dumb of the surface, but I could imagine that there are enough layers between UI/framework code, the power driver, the core OS, and EFI, that something gets confused along the way including, perhaps, exposing a bug (again either on the Apple side or the Intel side) that just didn't get triggered (or at least not very often) on either previous x86 CPUs or on Linux/Windows.
  • rodmunch69 - Tuesday, January 3, 2017 - link

    My 5 year old 3930k can still basically keep up with Intel's latest and greatest with stock voltage OC. Hum... I used to buy new stuff every year, or every two years at most, because there was normally a good gain to be had. It's legit been 5 years now and my PC with a little work, in multi core tests, is just as fast as anything out there. That's pitiful on Intel's behalf. They've gotten fat and lazy and the consumer is paying for it. Trump needs to tell AMD to put the A back into their chips and actually put out some products at the high-end that actually pushes Intel to be great again.
  • Laststop311 - Wednesday, January 4, 2017 - link

    Is it really worth saving 60 dollars to get an unlocked i3 vs the unlocked i5? I really can't see any situation where 60 dollars is the difference between being able to afford a new pc or not. With DX12 it HIGHLY benefits from having 4 cores (really 6 cores is optimum with 8 only slightly improving). Being stuck with 2 cores in this day is severely crippling your lifespan of the pc. You will waste GPU power and be constrained by the 2 cores all in the name to save 60 dollars. Nah it's not worth it.

    Kaby lake in general is not worth it. Everyone with quad core sandy bridge and above is going to see very minimal gains from a quad core cpu. You really need to go to 6 cores to get any real performance increase and you also need to be playing in dx 12 mode. Your best bet is to wait for the 2019 tock of 10nm coffee lake. Intel will be moving to pci-e 4.0 which doubles the bandwidth so an 8x pci-e 4.0 is the same as a 16x pci-e 3.0. Since gpu's only lose a few percentage points of performance on 8x pci-e 3.0, 8x pci-e 4.0 will give them all the breathing room they need. This leaves you 16x lanes of the 24 lanes to use for m2 storage devices or capture cards without having to use the higher latency PCH pci-e lanes. Or with multi GPU you still have 8x cpu pci-e lanes and you only need 2x pci-e 4.0 lanes to give you 4GB/s (32gbps) so you can fit dual gpu's and 4 pci-e storage devices all connected to the cpu directly and both gpu's will get 16GB/s (128gbps) bandwidth. This gives you massive future proofing. With intel optane maturing you can go single gpu at 16x pci-e 4.0 lanes 32GB/s bandwidth (256gbps) stick an optane drive on 4x lanes giving you a massive 8GB/s (64gbps) and 2 m2 nvme ssd's on 2x lanes each 4GB/s (32gbps) each, with all devices connected directly to CPU for the lowest latency leaving all the PCH lanes free for external ports like TB3 USB 3.1 gen 2 etc.

    By waiting till 2019 you get a real upgrade instead of a sidegrade. pci-e 4.0 will unlock the true potential of Intel optane as i expect by then the optane drives will be maxing out the 4x pci-e 3.0 lanes at 4GB/s and pci-e 4.0 will allow optane to really shine and most likely hit 7GB/s or more. With that kinda storage speed you can transfer an entire blu ray disc image in about 7 seconds.

    Now by all means if you are still on the Q series quad cores than kaby lake is a compelling upgrade and isn't a total waste of money to upgrade. But even in that circumstance I would say try to stick it out another year so you can have a 6 core coffee lake as 6 cores is incredibly useful in dx12.
  • Lolimaster - Wednesday, January 4, 2017 - link

    You mean upgrade to the 8c/16t Ryzen or wait 2018-2019 for the 7nm Zen+?
  • gopher1369 - Wednesday, January 4, 2017 - link

    The only thing that occurs to me is game emulators. Dolphin and PCSX2 require high clock speeds and high IPC, not more cores. It's quite niche, but if you're building an emulator box then the unlocked Anniversary Edition Haswell Pentium is currently the go-to processor, the new i3 should be even better.
  • Laststop311 - Wednesday, January 4, 2017 - link

    What applications use AVX instructions? I wonder how much it will hurt performance for some applications by decreasing AVX to 4.0ghz so you can hit 5.0ghz on everything else. The highest overclock i've seen talked about is 5.1ghz on the i7-7700k using the corsair 115i
  • johnp_ - Wednesday, January 4, 2017 - link

    (3) Embedded DisplayPort* (eDP) 1.4 and PSR2 under evaluation

    I seriously didn't expect that! This means that they actually changed the display pipeline slightly :)
    Now, hopefully laptop vendors will make use of PSR2 to further improve battery life.

    On a side-note: Does anyone know how to overclock the 7820HK when there's no mobile chipset that supports overclocking? Will laptop vendors have to include the Z270 desktop chipset on their platform?
  • keeepcool - Friday, January 6, 2017 - link

    You open intel XTU and press on the arrows till it BSOD's.
    Laptop chipsets are "different" in a lot of senses.

Log in

Don't have an account? Sign up now