Display Matters: New Display Controller, HDR, & HEVC

Outside of the core Pascal architecture, Pascal/GP104 also introduces a new display controller and a new video encode/decode block to the NVIDIA ecosystem. As a result, Pascal offers a number of significant display improvements, particularly for forthcoming high dynamic range (HDR) displays.

Starting with the display controller then, Pascal’s new display controller has been updated to support the latest DisplayPort and HDMI standards, with a specific eye towards HDR. On the DisplayPort side, DisplayPort 1.3 and 1.4 support has been added. As our regular readers might recall, DisplayPort 1.3 adds support for DisplayPort’s new High Bit Rate 3 (HBR3) signaling mode, which increases the per-lane bandwidth rate from 5.4Gbps to 8.1Gbps, a 50% increase in bandwidth over DisplayPort 1.2’s HBR2. For a full 4 lane DP connection, this means the total connection bandwidth has been increased from 21.4Gbps to 32.4 Gbps, for a final data rate of 25.9Gbps after taking encoding overhead into account.

DisplayPort 1.4 in turns builds off of that, adding some HDR-specific functionality. DP 1.4 adds support for HDR static metadata, specifically the CTA 861.3 standard already used in other products and standards such as HDMI 2.0a. HDR static metadata is specifically focused on recorded media, such as Ultra HD Blu-Ray, which use static metadata to pass along the necessary HDR information to displays. This also improves DP/HDMI interoperability, as it allows DP-to-HDMI adapters to pass along that metadata. Also new to 1.4 is support for VESA Display Stream Compression – something I’m not expecting we’re going to see on desktop displays right now – and for the rare DisplayPort-transported audio setup, support for a much larger number of audio channels, now supporting 32 such channels.

Compared to DisplayPort 1.2, the combination of new features and significantly greater bandwidth is geared towards driving better displays; mixing and matching higher resolutions, higher refresh rates, and HDR. 5K displays with a single cable are possible under DP 1.3/1.4, as are 4Kp120 displays (for high refresh gaming monitors), and of course, HDR displays that need to use 10-bit (or better) color to better support the larger dynamic range.

Display Bandwidth Requirements (RGB/4:4:4 Chroma)
Resolution Minimum DisplayPort Version
1920x1080@60Hz, 8bpc SDR 1.1
3840x2160@60Hz, 8bpc SDR 1.2
3840x2160@60Hz, 10bpc HDR 1.3
5120x2880@60Hz, 8bpc SDR 1.3
5120x2880@60Hz, 10bpc HDR 1.4 w/DSC
7680x4320@60Hz, 8bpc SDR 1.4 w/DSC
7680x4320@60Hz, 10bpc HDR 1.4 w/DSC

I should note that officially, NVIDIA’s cards are only DisplayPort 1.3 and 1.4 “ready” as opposed to “certified.” While NVIDIA hasn’t discussed the distinction in any more depth, as best as I can tell it appears that no one is 1.3/1.4 certified yet. In cases such as these in the past, the typical holdup has been that the test isn’t finished, particularly because there’s little-to-no hardware to test against. I suspect the case is much the same here, and certification will come once the VESA is ready to hand it out.

Moving on, on the HDMI side things aren’t nearly as drastic as DisplayPort. In fact, technically Pascal’s HDMI capabilities are the same as Maxwell. However since we’re already on the subject of HDR support, this is a perfect time to clarify just what’s been going on in the HDMI ecosystem.

Since Maxwell 2 launched with HDMI 2.0 support, the HDMI consortium has made two minor additions to the standard: 2.0a and 2.0b. The difference between the two, as you might expect given the focus on HDR, is that the combination of 2.0a and 2.0b introduces support for HDR in the HDMI ecosystem. In fact it uses the same CTA 861.3 static metadata as DisplayPort 1.4 also added. There are no other changes to HDMI (e.g. bandwidth), so this is purely about supporting HDR.

Being a more flexible GPU with easily distributed software updates, NVIDIA was able to add HDMI 2.0a/b support to Maxwell 2. This means that if you’re just catching up from the launch of the GTX 980, Maxwell 2 actually has more functionality than when it launched. And all of this functionality has been carried over to Pascal, where thanks to some other feature additions it’s going to be much more useful.

Overall when it comes to HDR on NVIDIA’s display controller, not unlike AMD’s Polaris architecture, this is a case of display technology catching up to rendering technology. NVIDIA’s display controllers have supported HDR rendering going back farther than this – Maxwell 2 can do full HDR and wide gamut processing – however until now the display and display connectivity standards have not caught up. Maxwell’s big limitations were spec support and bandwidth. Static HDR metadata, necessary to support HDR movies, was not supported over DisplayPort 1.2 on Maxwell 2. And lacking a newer DisplayPort standard, Maxwell lacked the bandwidth to support deep color (10bit+ color) HDR in conjunction with high resolutions.

Pascal in turn addresses these problems. With DisplayPort 1.3 and 1.4 it gains both the spec support and the bandwidth to do HDR on DisplayPort displays, and to do so with more flexibility than HDMI provides.

Of course, how you get HDR to the display controller is an important subject in and of itself. On the rendering side of matters, HDR support is still a new concept. So new, in fact, that even Windows 10 doesn’t fully support it. The current versions of Windows and Direct3D support the wide range of values required for HDR (i.e. more than -1.0f to +1.0f), but it lacks a means to expose HDR display information to game engines. As a result the current state of HDR is a bit rocky.

For the moment, developers need to do an end-run around Windows to support HDR rendering today. This means using exclusive fullscreen mode to bypass the Windows desktop compositor, which is not HDR-ready, and combining that with the use of new NVAPI functions to query the video card about HDR monitor capabilities and tell it that you’re intentionally feeding it HDR-processed data.

All of this means that for now, developers looking to support HDR have to put in additional vendor-specific hooks, one set for NVIDIA and another for AMD. Microsoft will be fixing this in future versions of Windows – Windows 10 Anniversary Edition is expected to bring top-to-bottom HDR support – which along with providing a generic solution also means that HDR can be used outside of fullscreen exclusive mode. But even then, I expect the vendor APIs to stick around for a while, as these work to enable HDR on Windows 7 and 8 as well. Windows 10 Anniversary Edition adoption will itself take some time, and developers will need to decide if they want to support HDR for customers on older OSes.

Meanwhile, with all of this chat of 10bit color support and HDR, I reached out to NVIDIA to see if they’d be changing their policies for professional software support. In brief, NVIDIA has traditionally used 10bit+ color support under OpenGL as a feature differentiator between GeForce and Quadro. GeForce cards can use 10bit color with Direct3D, but only Quadro cards supported 10bit color under OpenGL, and professional applications like Photoshop are usually OpenGL based.

For Pascal, NVIDIA is opening things up a bit more, but they are still going to keep the most important aspect of that feature differentiation in place. 10bit color is being enabled for fullscreen exclusive OpenGL applications – so your typical OpenGL game would be able to tap into deeper colors and HDR – however 10bit OpenGL windowed support is still limited to the Quadro cards. So professional users will still need Quadro cards for 10bit support in their common applications.

Moving on, more straightforward – and much more Pascal-oriented – is video encoding and decoding. Relative to Maxwell 2 (specifically, GM204), both NVIDIA’s video encode and decode blocks have seen significant updates.

As a bit of background information here, on the decode side, GM204 only supported complete fixed function video decode for standards up to H.264. For HEVC/H.265, limited hardware acceleration support was provided via what NVIDIA called Hybrid decode, which used a combination of fixed function hardware, shaders, and the CPU to decode HEVC video with a lower CPU load than pure CPU decoding. However hybrid decode is still not especially power efficient, and depending on factors such as bitrate, it tended to only be good for up to 4Kp30 video.

In between GM204 and Pascal though, NVIDIA introduced GM206. And in something of a tradition from NVIDIA, they used their low-end GPU to introduce a new video decoder (Feature Set F). GM206’s decoder was a major upgrade from GM204’s. Importantly, it added full HEVC fixed function decoding. And not just for standard 8bit Main Profile HEVC, but 10bit Main10 Profile support as well. Besides the immediate benefits of full fixed function HEVC decoding – GM206 and newer can decode 4Kp120 HEVC without breaking a sweat – greater bit depths are also important to HDR video. Though not technically required, the additional bit depth is critical to encoding HDR video without artifacting. So the inclusion of a similarly powerful video decoder on Pascal is an important part of making it able to display HDR video.

GM206’s video decode block also introduced hardware decode support for the VP9 video codec. A Google-backed royalty free codec, VP9 is sometimes used as an alternative to H.264. Specifically, Google prefers to use it for YouTube when possible. Though not nearly as widespread as H.264, support for VP9 means that systems thrown VP9 video will be able to decode it efficiently in hardware instead of inefficient software.

Pascal, in turn, inherits GM206’s video decode block and makes some upgrades of its own. Relative to GM206, Pascal's decode block (Feature Set H) adds support for HEVC Main12 profile (12bit color) while also extending the maximum video resolution from 4K to 8K. This sets the stage for the current "endgame" scenario of full Rec. 2020 compatibility, which calls for wider still HDR courtesy of 12bit color, and of course 8K displays. Compared to GM204 and GM200 then, this gives Pascal a significant leg up in video decode capabilities and overall performance.

Video Decode Performance (1080p)
Video Card H.264 HEVC
GTX 1080 197fps 211fps
GTX 980 139fps 186fps (Hybrid)

Meanwhile in terms of total throughput running a quick 1080p benchmark check with DXVAChecker finds that the new video decoder is much faster than GM204’s. H.264 throughput is 40% higher, and HEVC throughput, while a bit less apples-to-apples due to hybrid decode, is 13% higher (with roughly half the GPU power consumption at the same time). At 4K things are a bit more lopsided; GM204 can’t sustain H.264 4Kp60, and 4Kp60 HEVC is right out.

All of that said, there is one new feature to the video decode block and associated display controller on Pascal that’s not present on GM206: Microsoft PlayReady 3.0 DRM support. The latest version of Microsoft’s DRM standard goes hand in hand with the other DRM requirements (e.g. HDCP 2.2) that content owners/distributors have required for 4K video, which is why Netflix 4K support has until now been limited to more limited devices such as TVs and set top boxes. Pascal is, in turn, the first GPU to support all of Netflix’s DRM requirements and will be able to receive 4K video once Netflix starts serving it up to PCs.

Flipping over to the video encode side of matters, unique to Pascal is a new video encode block (NVENC). NVENC on Maxwell 2 was one of the first hardware HEVC encoders we saw, supporting hardware HEVC encoding before NVIDIA actually supported full hardware HEVC decoding. That said, Maxwell 2’s HEVC encoder was a bit primitive, lacking support for bi-directional frames (B-frames) and only supporting Main Profile (8bit) encoding.

Pascal in turn boosts NVIDIA’s HEVC encoding capabilities significantly with the newest NVENC block. This latest encoder essentially grants NVENC full HEVC support, resolving Maxwell 2’s limitations while also significantly boosting encoding performance. Pascal can now encode Main10 Profile (10bit) video, and total encode throughput is rated by NVIDIA for 2 4Kp60 streams at once. The later in particular, I suspect, is going to be especially important to NVIDIA once they start building Pascal Tesla cards for virtualization, such as a successor to Tesla M60.

As for Main10 Profile encoding support, this can benefit overall quality, but the most visible purpose is to move NVIDIA’s video encode capabilities in lockstep with their decode capabilities, extending HDR support to video encoding as well. By and large HDR encoding is one of those changes that will prove more important farther down the line – think Twitch with HDR – but in the short term NVIDIA has already put the new encoder to good use.

Recently added via an update to the SHIELD Android TV console, NVIDIA now supports a full HDR GameStream path between Pascal and the SATV. This leverages Pascal’s HEVC Main10 encode capabilities and the SATA’s HEVC Main10 decode capabilities to allow Pascal to encode in HDR and for the SATV to receive it. This in turn is the first game streaming setup to support HDR, giving NVIDIA a technological advantage over other game streaming platforms such as Steam’s In-Home Streaming. Though with that said, it should be noted that it’s still early in the lifecycle for HDR games, so the number of games that can take advantage of a GameStream HDR setup are still few and far between until more developers add HDR support.

Simultaneous Multi-Projection: Reusing Geometry on the Cheap Fast Sync & SLI Updates: Less Latency, Fewer GPUs
Comments Locked

200 Comments

View All Comments

  • TestKing123 - Wednesday, July 20, 2016 - link

    Then you're woefully behind the times since other sites can do this better. If you're not able to re-run a benchmark for a game with a pretty significant patch like Tomb Raider, or a high profile game like Doom with a significant performance patch like Vulcan that's been out for over a week, then you're workflow is flawed and this site won't stand a chance against the other crop. I'm pretty sure you're seeing this already if you have any sort of metrics tracking in place.
  • TheinsanegamerN - Wednesday, July 20, 2016 - link

    So question, if you started this article on may 14th, was their no time in the over 2 months to add one game to that benchmark list?
  • nathanddrews - Wednesday, July 20, 2016 - link

    Seems like an official addendum is necessary at some point. Doom on Vulkan is amazing. Dota 2 on Vulkan is great, too (and would be useful in reviews of low end to mainstream GPUs especially). Talos... not so much.
  • Eden-K121D - Thursday, July 21, 2016 - link

    Talos Principle was a proof of concept
  • ajlueke - Friday, July 22, 2016 - link

    http://www.pcgamer.com/doom-benchmarks-return-vulk...

    Addendum complete.
  • mczak - Wednesday, July 20, 2016 - link

    The table with the native FP throughput rates isn't correct on page 5. Either it's in terms of flops, then gp104 fp16 would be 1:64. Or it's in terms of hw instruction throughput - then gp100 would be 1:1. (Interestingly, the sandra numbers for half-float are indeed 1:128 - suggesting it didn't make any use of fp16 packing at all.)
  • Ryan Smith - Wednesday, July 20, 2016 - link

    Ahh, right you are. I was going for the FLOPs rate, but wrote down the wrong value. Thanks!

    As for the Sandra numbers, they're not super precise. But it's an obvious indication of what's going on under the hood. When the same CUDA 7.5 code path gives you wildly different results on Pascal, then you know something has changed...
  • BurntMyBacon - Thursday, July 21, 2016 - link

    Did nVidia somehow limit the ability to promote FP16 operations to FP32? If not, I don't see the point in creating such a slow performing FP16 mode in the first place. Why waste die space when an intelligent designer can just promote the commands to get normal speeds out of the chip anyways? Sure you miss out on speed doubling through packing, but that is still much better than the 1/128 (1/64) rate you get using the provided FP16 mode.
  • Scali - Thursday, July 21, 2016 - link

    I think they can just do that in the shader compiler. Any FP16 operation gets replaced by an FP32 one.
    Only reading from buffers and writing to buffers with FP16 content should remain FP16. Then again, if their driver is smart enough, it can even promote all buffers to FP32 as well (as long as the GPU is the only one accessing the data, the actual representation doesn't matter. Only when the CPU also accesses the data, does it actually need to be FP16).
  • owan - Wednesday, July 20, 2016 - link

    Only 2 months late and published the day after a different major GPU release. What happened to this place?

Log in

Don't have an account? Sign up now