Display Matters: New Display Controller, HDR, & HEVC

Outside of the core Pascal architecture, Pascal/GP104 also introduces a new display controller and a new video encode/decode block to the NVIDIA ecosystem. As a result, Pascal offers a number of significant display improvements, particularly for forthcoming high dynamic range (HDR) displays.

Starting with the display controller then, Pascal’s new display controller has been updated to support the latest DisplayPort and HDMI standards, with a specific eye towards HDR. On the DisplayPort side, DisplayPort 1.3 and 1.4 support has been added. As our regular readers might recall, DisplayPort 1.3 adds support for DisplayPort’s new High Bit Rate 3 (HBR3) signaling mode, which increases the per-lane bandwidth rate from 5.4Gbps to 8.1Gbps, a 50% increase in bandwidth over DisplayPort 1.2’s HBR2. For a full 4 lane DP connection, this means the total connection bandwidth has been increased from 21.4Gbps to 32.4 Gbps, for a final data rate of 25.9Gbps after taking encoding overhead into account.

DisplayPort 1.4 in turns builds off of that, adding some HDR-specific functionality. DP 1.4 adds support for HDR static metadata, specifically the CTA 861.3 standard already used in other products and standards such as HDMI 2.0a. HDR static metadata is specifically focused on recorded media, such as Ultra HD Blu-Ray, which use static metadata to pass along the necessary HDR information to displays. This also improves DP/HDMI interoperability, as it allows DP-to-HDMI adapters to pass along that metadata. Also new to 1.4 is support for VESA Display Stream Compression – something I’m not expecting we’re going to see on desktop displays right now – and for the rare DisplayPort-transported audio setup, support for a much larger number of audio channels, now supporting 32 such channels.

Compared to DisplayPort 1.2, the combination of new features and significantly greater bandwidth is geared towards driving better displays; mixing and matching higher resolutions, higher refresh rates, and HDR. 5K displays with a single cable are possible under DP 1.3/1.4, as are 4Kp120 displays (for high refresh gaming monitors), and of course, HDR displays that need to use 10-bit (or better) color to better support the larger dynamic range.

Display Bandwidth Requirements (RGB/4:4:4 Chroma)
Resolution Minimum DisplayPort Version
1920x1080@60Hz, 8bpc SDR 1.1
3840x2160@60Hz, 8bpc SDR 1.2
3840x2160@60Hz, 10bpc HDR 1.3
5120x2880@60Hz, 8bpc SDR 1.3
5120x2880@60Hz, 10bpc HDR 1.4 w/DSC
7680x4320@60Hz, 8bpc SDR 1.4 w/DSC
7680x4320@60Hz, 10bpc HDR 1.4 w/DSC

I should note that officially, NVIDIA’s cards are only DisplayPort 1.3 and 1.4 “ready” as opposed to “certified.” While NVIDIA hasn’t discussed the distinction in any more depth, as best as I can tell it appears that no one is 1.3/1.4 certified yet. In cases such as these in the past, the typical holdup has been that the test isn’t finished, particularly because there’s little-to-no hardware to test against. I suspect the case is much the same here, and certification will come once the VESA is ready to hand it out.

Moving on, on the HDMI side things aren’t nearly as drastic as DisplayPort. In fact, technically Pascal’s HDMI capabilities are the same as Maxwell. However since we’re already on the subject of HDR support, this is a perfect time to clarify just what’s been going on in the HDMI ecosystem.

Since Maxwell 2 launched with HDMI 2.0 support, the HDMI consortium has made two minor additions to the standard: 2.0a and 2.0b. The difference between the two, as you might expect given the focus on HDR, is that the combination of 2.0a and 2.0b introduces support for HDR in the HDMI ecosystem. In fact it uses the same CTA 861.3 static metadata as DisplayPort 1.4 also added. There are no other changes to HDMI (e.g. bandwidth), so this is purely about supporting HDR.

Being a more flexible GPU with easily distributed software updates, NVIDIA was able to add HDMI 2.0a/b support to Maxwell 2. This means that if you’re just catching up from the launch of the GTX 980, Maxwell 2 actually has more functionality than when it launched. And all of this functionality has been carried over to Pascal, where thanks to some other feature additions it’s going to be much more useful.

Overall when it comes to HDR on NVIDIA’s display controller, not unlike AMD’s Polaris architecture, this is a case of display technology catching up to rendering technology. NVIDIA’s display controllers have supported HDR rendering going back farther than this – Maxwell 2 can do full HDR and wide gamut processing – however until now the display and display connectivity standards have not caught up. Maxwell’s big limitations were spec support and bandwidth. Static HDR metadata, necessary to support HDR movies, was not supported over DisplayPort 1.2 on Maxwell 2. And lacking a newer DisplayPort standard, Maxwell lacked the bandwidth to support deep color (10bit+ color) HDR in conjunction with high resolutions.

Pascal in turn addresses these problems. With DisplayPort 1.3 and 1.4 it gains both the spec support and the bandwidth to do HDR on DisplayPort displays, and to do so with more flexibility than HDMI provides.

Of course, how you get HDR to the display controller is an important subject in and of itself. On the rendering side of matters, HDR support is still a new concept. So new, in fact, that even Windows 10 doesn’t fully support it. The current versions of Windows and Direct3D support the wide range of values required for HDR (i.e. more than -1.0f to +1.0f), but it lacks a means to expose HDR display information to game engines. As a result the current state of HDR is a bit rocky.

For the moment, developers need to do an end-run around Windows to support HDR rendering today. This means using exclusive fullscreen mode to bypass the Windows desktop compositor, which is not HDR-ready, and combining that with the use of new NVAPI functions to query the video card about HDR monitor capabilities and tell it that you’re intentionally feeding it HDR-processed data.

All of this means that for now, developers looking to support HDR have to put in additional vendor-specific hooks, one set for NVIDIA and another for AMD. Microsoft will be fixing this in future versions of Windows – Windows 10 Anniversary Edition is expected to bring top-to-bottom HDR support – which along with providing a generic solution also means that HDR can be used outside of fullscreen exclusive mode. But even then, I expect the vendor APIs to stick around for a while, as these work to enable HDR on Windows 7 and 8 as well. Windows 10 Anniversary Edition adoption will itself take some time, and developers will need to decide if they want to support HDR for customers on older OSes.

Meanwhile, with all of this chat of 10bit color support and HDR, I reached out to NVIDIA to see if they’d be changing their policies for professional software support. In brief, NVIDIA has traditionally used 10bit+ color support under OpenGL as a feature differentiator between GeForce and Quadro. GeForce cards can use 10bit color with Direct3D, but only Quadro cards supported 10bit color under OpenGL, and professional applications like Photoshop are usually OpenGL based.

For Pascal, NVIDIA is opening things up a bit more, but they are still going to keep the most important aspect of that feature differentiation in place. 10bit color is being enabled for fullscreen exclusive OpenGL applications – so your typical OpenGL game would be able to tap into deeper colors and HDR – however 10bit OpenGL windowed support is still limited to the Quadro cards. So professional users will still need Quadro cards for 10bit support in their common applications.

Moving on, more straightforward – and much more Pascal-oriented – is video encoding and decoding. Relative to Maxwell 2 (specifically, GM204), both NVIDIA’s video encode and decode blocks have seen significant updates.

As a bit of background information here, on the decode side, GM204 only supported complete fixed function video decode for standards up to H.264. For HEVC/H.265, limited hardware acceleration support was provided via what NVIDIA called Hybrid decode, which used a combination of fixed function hardware, shaders, and the CPU to decode HEVC video with a lower CPU load than pure CPU decoding. However hybrid decode is still not especially power efficient, and depending on factors such as bitrate, it tended to only be good for up to 4Kp30 video.

In between GM204 and Pascal though, NVIDIA introduced GM206. And in something of a tradition from NVIDIA, they used their low-end GPU to introduce a new video decoder (Feature Set F). GM206’s decoder was a major upgrade from GM204’s. Importantly, it added full HEVC fixed function decoding. And not just for standard 8bit Main Profile HEVC, but 10bit Main10 Profile support as well. Besides the immediate benefits of full fixed function HEVC decoding – GM206 and newer can decode 4Kp120 HEVC without breaking a sweat – greater bit depths are also important to HDR video. Though not technically required, the additional bit depth is critical to encoding HDR video without artifacting. So the inclusion of a similarly powerful video decoder on Pascal is an important part of making it able to display HDR video.

GM206’s video decode block also introduced hardware decode support for the VP9 video codec. A Google-backed royalty free codec, VP9 is sometimes used as an alternative to H.264. Specifically, Google prefers to use it for YouTube when possible. Though not nearly as widespread as H.264, support for VP9 means that systems thrown VP9 video will be able to decode it efficiently in hardware instead of inefficient software.

Pascal, in turn, inherits GM206’s video decode block and makes some upgrades of its own. Relative to GM206, Pascal's decode block (Feature Set H) adds support for HEVC Main12 profile (12bit color) while also extending the maximum video resolution from 4K to 8K. This sets the stage for the current "endgame" scenario of full Rec. 2020 compatibility, which calls for wider still HDR courtesy of 12bit color, and of course 8K displays. Compared to GM204 and GM200 then, this gives Pascal a significant leg up in video decode capabilities and overall performance.

Video Decode Performance (1080p)
Video Card H.264 HEVC
GTX 1080 197fps 211fps
GTX 980 139fps 186fps (Hybrid)

Meanwhile in terms of total throughput running a quick 1080p benchmark check with DXVAChecker finds that the new video decoder is much faster than GM204’s. H.264 throughput is 40% higher, and HEVC throughput, while a bit less apples-to-apples due to hybrid decode, is 13% higher (with roughly half the GPU power consumption at the same time). At 4K things are a bit more lopsided; GM204 can’t sustain H.264 4Kp60, and 4Kp60 HEVC is right out.

All of that said, there is one new feature to the video decode block and associated display controller on Pascal that’s not present on GM206: Microsoft PlayReady 3.0 DRM support. The latest version of Microsoft’s DRM standard goes hand in hand with the other DRM requirements (e.g. HDCP 2.2) that content owners/distributors have required for 4K video, which is why Netflix 4K support has until now been limited to more limited devices such as TVs and set top boxes. Pascal is, in turn, the first GPU to support all of Netflix’s DRM requirements and will be able to receive 4K video once Netflix starts serving it up to PCs.

Flipping over to the video encode side of matters, unique to Pascal is a new video encode block (NVENC). NVENC on Maxwell 2 was one of the first hardware HEVC encoders we saw, supporting hardware HEVC encoding before NVIDIA actually supported full hardware HEVC decoding. That said, Maxwell 2’s HEVC encoder was a bit primitive, lacking support for bi-directional frames (B-frames) and only supporting Main Profile (8bit) encoding.

Pascal in turn boosts NVIDIA’s HEVC encoding capabilities significantly with the newest NVENC block. This latest encoder essentially grants NVENC full HEVC support, resolving Maxwell 2’s limitations while also significantly boosting encoding performance. Pascal can now encode Main10 Profile (10bit) video, and total encode throughput is rated by NVIDIA for 2 4Kp60 streams at once. The later in particular, I suspect, is going to be especially important to NVIDIA once they start building Pascal Tesla cards for virtualization, such as a successor to Tesla M60.

As for Main10 Profile encoding support, this can benefit overall quality, but the most visible purpose is to move NVIDIA’s video encode capabilities in lockstep with their decode capabilities, extending HDR support to video encoding as well. By and large HDR encoding is one of those changes that will prove more important farther down the line – think Twitch with HDR – but in the short term NVIDIA has already put the new encoder to good use.

Recently added via an update to the SHIELD Android TV console, NVIDIA now supports a full HDR GameStream path between Pascal and the SATV. This leverages Pascal’s HEVC Main10 encode capabilities and the SATA’s HEVC Main10 decode capabilities to allow Pascal to encode in HDR and for the SATV to receive it. This in turn is the first game streaming setup to support HDR, giving NVIDIA a technological advantage over other game streaming platforms such as Steam’s In-Home Streaming. Though with that said, it should be noted that it’s still early in the lifecycle for HDR games, so the number of games that can take advantage of a GameStream HDR setup are still few and far between until more developers add HDR support.

Simultaneous Multi-Projection: Reusing Geometry on the Cheap Fast Sync & SLI Updates: Less Latency, Fewer GPUs
Comments Locked

200 Comments

View All Comments

  • patrickjp93 - Wednesday, July 20, 2016 - link

    That doesn't actually support your point...
  • Scali - Wednesday, July 20, 2016 - link

    Did I read a different article?
    Because the article that I read said that the 'holes' would be pretty similar on Maxwell v2 and Pascal, given that they have very similar architectures. However, Pascal is more efficient at filling the holes with its dynamic repartitioning.
  • mr.techguru - Wednesday, July 20, 2016 - link

    Just Ordered the MSI GeForce GTX 1070 Gaming X , way better than 1060 / 480. NVidia Nail it :)
  • tipoo - Wednesday, July 20, 2016 - link

    " NVIDIA tells us that it can be done in under 100us (0.1ms), or about 170,000 clock cycles."

    Is my understanding right that Polaris, and I think even earlier with late GCN parts, could seamlessly interleave per-clock? So 170,000 times faster than Pascal in clock cycles (less in total time, but still above 100,000 times faster)?
  • Scali - Wednesday, July 20, 2016 - link

    That seems highly unlikely. Switching to another task is going to take some time, because you also need to switch all the registers, buffers, caches need to be re-filled etc.
    The only way to avoid most of that is to duplicate the whole register file, like HyperThreading does. That's doable on an x86 CPU, but a GPU has way more registers.
    Besides, as we can see, nVidia's approach is fast enough in practice. Why throw tons of silicon on making context switching faster than it needs to be? You want to avoid context switches as much as possible anyway.

    Sadly AMD doesn't seem to go into any detail, but I'm pretty sure it's going to be in the same ballpark.
    My guess is that what AMD calls an 'ACE' is actually very similar to the SMs and their command queues on the Pascal side.
  • Ryan Smith - Wednesday, July 20, 2016 - link

    Task switching is separate from interleaving. Interleaving takes place on all GPUs as a basic form of latency hiding (GPUs are very high latency).

    The big difference is that interleaving uses different threads from the same task; task switching by its very nature loads up another task entirely.
  • Scali - Thursday, July 21, 2016 - link

    After re-reading AMD's asynchronous shader PDF, it seems that AMD also speaks of 'interleaving' when they switch a graphics CU to a compute task after the graphics task has completed. So 'interleaving' at task level, rather than at instruction level.
    Which would be pretty much the same as NVidia's Dynamic Load Balancing in Pascal.
  • eddman - Thursday, July 21, 2016 - link

    The more I read about async computing in Polaris and Pascal, the more I realize that the implementations are not much different.

    As Ryan pointed out, it seems that the reason that Polaris, and GCN as a whole, benefit more from async is the architecture of the GPU itself, being wider and having more ALUs.

    Nonetheless, I'm sure we're still going to see comments like "Polaris does async in hardware. Pascal is hopeless with its software async hack".
  • Matt Doyle - Wednesday, July 20, 2016 - link

    Typo in the lead sentence of HPC vs. Consumer: Divergence paragraph: "Pascal in an architecture that..."

    "is" instead of "in"
  • Matt Doyle - Wednesday, July 20, 2016 - link

    Feeding Pascal page, "GDDR5X uses a 16n prefetch, which is twice the size of GDDR5’s 8n prefect."

    Prefect = prefetch

Log in

Don't have an account? Sign up now