Display Matters: New Display Controller, HDR, & HEVC

Outside of the core Pascal architecture, Pascal/GP104 also introduces a new display controller and a new video encode/decode block to the NVIDIA ecosystem. As a result, Pascal offers a number of significant display improvements, particularly for forthcoming high dynamic range (HDR) displays.

Starting with the display controller then, Pascal’s new display controller has been updated to support the latest DisplayPort and HDMI standards, with a specific eye towards HDR. On the DisplayPort side, DisplayPort 1.3 and 1.4 support has been added. As our regular readers might recall, DisplayPort 1.3 adds support for DisplayPort’s new High Bit Rate 3 (HBR3) signaling mode, which increases the per-lane bandwidth rate from 5.4Gbps to 8.1Gbps, a 50% increase in bandwidth over DisplayPort 1.2’s HBR2. For a full 4 lane DP connection, this means the total connection bandwidth has been increased from 21.4Gbps to 32.4 Gbps, for a final data rate of 25.9Gbps after taking encoding overhead into account.

DisplayPort 1.4 in turns builds off of that, adding some HDR-specific functionality. DP 1.4 adds support for HDR static metadata, specifically the CTA 861.3 standard already used in other products and standards such as HDMI 2.0a. HDR static metadata is specifically focused on recorded media, such as Ultra HD Blu-Ray, which use static metadata to pass along the necessary HDR information to displays. This also improves DP/HDMI interoperability, as it allows DP-to-HDMI adapters to pass along that metadata. Also new to 1.4 is support for VESA Display Stream Compression – something I’m not expecting we’re going to see on desktop displays right now – and for the rare DisplayPort-transported audio setup, support for a much larger number of audio channels, now supporting 32 such channels.

Compared to DisplayPort 1.2, the combination of new features and significantly greater bandwidth is geared towards driving better displays; mixing and matching higher resolutions, higher refresh rates, and HDR. 5K displays with a single cable are possible under DP 1.3/1.4, as are 4Kp120 displays (for high refresh gaming monitors), and of course, HDR displays that need to use 10-bit (or better) color to better support the larger dynamic range.

Display Bandwidth Requirements (RGB/4:4:4 Chroma)
Resolution Minimum DisplayPort Version
1920x1080@60Hz, 8bpc SDR 1.1
3840x2160@60Hz, 8bpc SDR 1.2
3840x2160@60Hz, 10bpc HDR 1.3
5120x2880@60Hz, 8bpc SDR 1.3
5120x2880@60Hz, 10bpc HDR 1.4 w/DSC
7680x4320@60Hz, 8bpc SDR 1.4 w/DSC
7680x4320@60Hz, 10bpc HDR 1.4 w/DSC

I should note that officially, NVIDIA’s cards are only DisplayPort 1.3 and 1.4 “ready” as opposed to “certified.” While NVIDIA hasn’t discussed the distinction in any more depth, as best as I can tell it appears that no one is 1.3/1.4 certified yet. In cases such as these in the past, the typical holdup has been that the test isn’t finished, particularly because there’s little-to-no hardware to test against. I suspect the case is much the same here, and certification will come once the VESA is ready to hand it out.

Moving on, on the HDMI side things aren’t nearly as drastic as DisplayPort. In fact, technically Pascal’s HDMI capabilities are the same as Maxwell. However since we’re already on the subject of HDR support, this is a perfect time to clarify just what’s been going on in the HDMI ecosystem.

Since Maxwell 2 launched with HDMI 2.0 support, the HDMI consortium has made two minor additions to the standard: 2.0a and 2.0b. The difference between the two, as you might expect given the focus on HDR, is that the combination of 2.0a and 2.0b introduces support for HDR in the HDMI ecosystem. In fact it uses the same CTA 861.3 static metadata as DisplayPort 1.4 also added. There are no other changes to HDMI (e.g. bandwidth), so this is purely about supporting HDR.

Being a more flexible GPU with easily distributed software updates, NVIDIA was able to add HDMI 2.0a/b support to Maxwell 2. This means that if you’re just catching up from the launch of the GTX 980, Maxwell 2 actually has more functionality than when it launched. And all of this functionality has been carried over to Pascal, where thanks to some other feature additions it’s going to be much more useful.

Overall when it comes to HDR on NVIDIA’s display controller, not unlike AMD’s Polaris architecture, this is a case of display technology catching up to rendering technology. NVIDIA’s display controllers have supported HDR rendering going back farther than this – Maxwell 2 can do full HDR and wide gamut processing – however until now the display and display connectivity standards have not caught up. Maxwell’s big limitations were spec support and bandwidth. Static HDR metadata, necessary to support HDR movies, was not supported over DisplayPort 1.2 on Maxwell 2. And lacking a newer DisplayPort standard, Maxwell lacked the bandwidth to support deep color (10bit+ color) HDR in conjunction with high resolutions.

Pascal in turn addresses these problems. With DisplayPort 1.3 and 1.4 it gains both the spec support and the bandwidth to do HDR on DisplayPort displays, and to do so with more flexibility than HDMI provides.

Of course, how you get HDR to the display controller is an important subject in and of itself. On the rendering side of matters, HDR support is still a new concept. So new, in fact, that even Windows 10 doesn’t fully support it. The current versions of Windows and Direct3D support the wide range of values required for HDR (i.e. more than -1.0f to +1.0f), but it lacks a means to expose HDR display information to game engines. As a result the current state of HDR is a bit rocky.

For the moment, developers need to do an end-run around Windows to support HDR rendering today. This means using exclusive fullscreen mode to bypass the Windows desktop compositor, which is not HDR-ready, and combining that with the use of new NVAPI functions to query the video card about HDR monitor capabilities and tell it that you’re intentionally feeding it HDR-processed data.

All of this means that for now, developers looking to support HDR have to put in additional vendor-specific hooks, one set for NVIDIA and another for AMD. Microsoft will be fixing this in future versions of Windows – Windows 10 Anniversary Edition is expected to bring top-to-bottom HDR support – which along with providing a generic solution also means that HDR can be used outside of fullscreen exclusive mode. But even then, I expect the vendor APIs to stick around for a while, as these work to enable HDR on Windows 7 and 8 as well. Windows 10 Anniversary Edition adoption will itself take some time, and developers will need to decide if they want to support HDR for customers on older OSes.

Meanwhile, with all of this chat of 10bit color support and HDR, I reached out to NVIDIA to see if they’d be changing their policies for professional software support. In brief, NVIDIA has traditionally used 10bit+ color support under OpenGL as a feature differentiator between GeForce and Quadro. GeForce cards can use 10bit color with Direct3D, but only Quadro cards supported 10bit color under OpenGL, and professional applications like Photoshop are usually OpenGL based.

For Pascal, NVIDIA is opening things up a bit more, but they are still going to keep the most important aspect of that feature differentiation in place. 10bit color is being enabled for fullscreen exclusive OpenGL applications – so your typical OpenGL game would be able to tap into deeper colors and HDR – however 10bit OpenGL windowed support is still limited to the Quadro cards. So professional users will still need Quadro cards for 10bit support in their common applications.

Moving on, more straightforward – and much more Pascal-oriented – is video encoding and decoding. Relative to Maxwell 2 (specifically, GM204), both NVIDIA’s video encode and decode blocks have seen significant updates.

As a bit of background information here, on the decode side, GM204 only supported complete fixed function video decode for standards up to H.264. For HEVC/H.265, limited hardware acceleration support was provided via what NVIDIA called Hybrid decode, which used a combination of fixed function hardware, shaders, and the CPU to decode HEVC video with a lower CPU load than pure CPU decoding. However hybrid decode is still not especially power efficient, and depending on factors such as bitrate, it tended to only be good for up to 4Kp30 video.

In between GM204 and Pascal though, NVIDIA introduced GM206. And in something of a tradition from NVIDIA, they used their low-end GPU to introduce a new video decoder (Feature Set F). GM206’s decoder was a major upgrade from GM204’s. Importantly, it added full HEVC fixed function decoding. And not just for standard 8bit Main Profile HEVC, but 10bit Main10 Profile support as well. Besides the immediate benefits of full fixed function HEVC decoding – GM206 and newer can decode 4Kp120 HEVC without breaking a sweat – greater bit depths are also important to HDR video. Though not technically required, the additional bit depth is critical to encoding HDR video without artifacting. So the inclusion of a similarly powerful video decoder on Pascal is an important part of making it able to display HDR video.

GM206’s video decode block also introduced hardware decode support for the VP9 video codec. A Google-backed royalty free codec, VP9 is sometimes used as an alternative to H.264. Specifically, Google prefers to use it for YouTube when possible. Though not nearly as widespread as H.264, support for VP9 means that systems thrown VP9 video will be able to decode it efficiently in hardware instead of inefficient software.

Pascal, in turn, inherits GM206’s video decode block and makes some upgrades of its own. Relative to GM206, Pascal's decode block (Feature Set H) adds support for HEVC Main12 profile (12bit color) while also extending the maximum video resolution from 4K to 8K. This sets the stage for the current "endgame" scenario of full Rec. 2020 compatibility, which calls for wider still HDR courtesy of 12bit color, and of course 8K displays. Compared to GM204 and GM200 then, this gives Pascal a significant leg up in video decode capabilities and overall performance.

Video Decode Performance (1080p)
Video Card H.264 HEVC
GTX 1080 197fps 211fps
GTX 980 139fps 186fps (Hybrid)

Meanwhile in terms of total throughput running a quick 1080p benchmark check with DXVAChecker finds that the new video decoder is much faster than GM204’s. H.264 throughput is 40% higher, and HEVC throughput, while a bit less apples-to-apples due to hybrid decode, is 13% higher (with roughly half the GPU power consumption at the same time). At 4K things are a bit more lopsided; GM204 can’t sustain H.264 4Kp60, and 4Kp60 HEVC is right out.

All of that said, there is one new feature to the video decode block and associated display controller on Pascal that’s not present on GM206: Microsoft PlayReady 3.0 DRM support. The latest version of Microsoft’s DRM standard goes hand in hand with the other DRM requirements (e.g. HDCP 2.2) that content owners/distributors have required for 4K video, which is why Netflix 4K support has until now been limited to more limited devices such as TVs and set top boxes. Pascal is, in turn, the first GPU to support all of Netflix’s DRM requirements and will be able to receive 4K video once Netflix starts serving it up to PCs.

Flipping over to the video encode side of matters, unique to Pascal is a new video encode block (NVENC). NVENC on Maxwell 2 was one of the first hardware HEVC encoders we saw, supporting hardware HEVC encoding before NVIDIA actually supported full hardware HEVC decoding. That said, Maxwell 2’s HEVC encoder was a bit primitive, lacking support for bi-directional frames (B-frames) and only supporting Main Profile (8bit) encoding.

Pascal in turn boosts NVIDIA’s HEVC encoding capabilities significantly with the newest NVENC block. This latest encoder essentially grants NVENC full HEVC support, resolving Maxwell 2’s limitations while also significantly boosting encoding performance. Pascal can now encode Main10 Profile (10bit) video, and total encode throughput is rated by NVIDIA for 2 4Kp60 streams at once. The later in particular, I suspect, is going to be especially important to NVIDIA once they start building Pascal Tesla cards for virtualization, such as a successor to Tesla M60.

As for Main10 Profile encoding support, this can benefit overall quality, but the most visible purpose is to move NVIDIA’s video encode capabilities in lockstep with their decode capabilities, extending HDR support to video encoding as well. By and large HDR encoding is one of those changes that will prove more important farther down the line – think Twitch with HDR – but in the short term NVIDIA has already put the new encoder to good use.

Recently added via an update to the SHIELD Android TV console, NVIDIA now supports a full HDR GameStream path between Pascal and the SATV. This leverages Pascal’s HEVC Main10 encode capabilities and the SATA’s HEVC Main10 decode capabilities to allow Pascal to encode in HDR and for the SATV to receive it. This in turn is the first game streaming setup to support HDR, giving NVIDIA a technological advantage over other game streaming platforms such as Steam’s In-Home Streaming. Though with that said, it should be noted that it’s still early in the lifecycle for HDR games, so the number of games that can take advantage of a GameStream HDR setup are still few and far between until more developers add HDR support.

Simultaneous Multi-Projection: Reusing Geometry on the Cheap Fast Sync & SLI Updates: Less Latency, Fewer GPUs
Comments Locked

200 Comments

View All Comments

  • grrrgrrr - Wednesday, July 20, 2016 - link

    Solid review! Some nice architecture introductions.
  • euskalzabe - Wednesday, July 20, 2016 - link

    The HDR discussion of this review was super interesting, but as always, there's one key piece of information missing: WHEN are we going to see HDR monitors that take advantage of these new GPU abilities?

    I myself am stuck at 1080p IPS because more resolution doesn't entice me, and there's nothing better than IPS. I'm waiting for HDR to buy my next monitor, but being 5 years old my Dell ST2220T is getting long in the teeth...
  • ajlueke - Wednesday, July 20, 2016 - link

    Thanks for the review Ryan,

    I think the results are quite interesting, and the games chosen really help show the advantages and limitations of the different architectures. When you compare the GTX 1080 to its price predecessor, the 980 Ti, you are getting an almost universal ~25%-30% increase in performance.
    Against rival AMDs R9 Fury X, there is more of a mixed bag. As the resolutions increase the bandwidth provided by the HBM memory on the Fury X really narrows the gap, sometimes trimming the margin to less that 10%,s specifically in games optimized more for DX12 "Hitman, "AotS". But it other games, specifically "Rise of the Tomb Raider" which boasts extremely high res textures, the 4Gb memory size on the Fury X starts to limit its performance in a big way. On average, there is again a ~25%-30% performance increase with much higher game to game variability.
    This data lets a little bit of air out of the argument I hear a lot that AMD makes more "future proof" cards. While many Nvidia 900 series users may have to upgrade as more and more games switch to DX12 based programming. AMD Fury users will be in the same boat as those same games come with higher and higher res textures, due to the smaller amount of memory on board.
    While Pascal still doesn't show the jump in DX12 versus DX11 that AMD's GPUs enjoy, it does at least show an increase or at least remain at parity.
    So what you have is a card that wins in every single game tested, at every resolution over the price predecessors from both companies, all while consuming less power. That is a win pretty much any way you slice it. But there are elements of Nvidia’s strategy and the card I personally find disappointing.
    I understand Nvidia wants to keep features specific to the higher margin professional cards, but avoiding HBM2 altogether in the consumer space seems to be a missed opportunity. I am a huge fan of the mini ITX gaming machines. And the Fury Nano, at the $450 price point is a great card. With an NVMe motherboard and NAS storage the need for drive bays in the case is eliminated, the Fury Nano at only 6” leads to some great forward thinking, and tiny designs. I was hoping to see an explosion of cases that cut out the need for supporting 10-11” cards and tons of drive bays if both Nvidia and AMD put out GPUs in the Nano space, but it seems not to be. HBM2 seems destined to remain on professional cards, as Nvidia won’t take the risk of adding it to a consumer Titan or GTX 1080 Ti card and potentially again cannibalize the higher margin professional card market. Now case makers don’t really have the same incentive to build smaller cases if the Fury Nano will still be the only card at that size. It’s just unfortunate that it had to happen because NVidia decided HBM2 was something they could slap on a pro card and sell for thousands extra.
    But also what is also disappointing about Pascal stems from the GTX 1080 vs GTX 1070 data Ryan has shown. The GTX 1070 drops off far more than one would expect based off CUDA core numbers as the resolution increases. The GDDR5 memory versus the GDDR5X is probably at fault here, leading me to believe that Pascal can gain even further if the memory bandwidth is increased more, again with HBM2. So not only does the card limit you to the current mini-ITX monstrosities (I’m looking at you bulldog) by avoiding HBM2, it also very likely is costing us performance.
    Now for the rank speculation. The data does present some interesting scenarios for the future. With the Fury X able to approach the GTX 1080 at high resolutions, most specifically in DX12 optimized games. It seems extremely likely that the Vega GPU will be able to surpass the GTX 1080, especially if the greatest limitation (4 Gb HBM) is removed with the supposed 8Gb of HBM2 and games move more and more the DX12. I imagine when it launches it will be the 4K card to get, as the Fury X already acquits itself very well there. For me personally, I will have to wait for the Vega Nano to realize my Mini-ITX dreams, unless of course, AMD doesn’t make another Nano edition card and the dream is dead. A possibility I dare not think about.
  • eddman - Wednesday, July 20, 2016 - link

    The gap getting narrower at higher resolutions probably has more to do with chips' designs rather than bandwidth. After all, Fury is the big GCN chip optimized for high resolutions. Even though GP104 does well, it's still the middle Pascal chip.

    P.S. Please separate the paragraphs. It's a pain, reading your comment.
  • Eidigean - Wednesday, July 20, 2016 - link

    The GTX 1070 is really just a way for Nvidia to sell GP104's that didn't pass all of their tests. Don't expect them to put expensive memory on a card where they're only looking to make their money back. Keeping the card cost down, hoping it sells, is more important to them.

    If there's a defect anywhere within one of the GPC's, the entire GPC is disabled and the chip is sold at a discount instead of being thrown out. I would not buy a 1070 which is really just a crippled 1080.

    I'll be buying a 1080 for my 2560x1600 desktop, and an EVGA 1060 for my Mini-ITX build; which has a limited power supply.
  • mikael.skytter - Wednesday, July 20, 2016 - link

    Thanks Ryan! Much appreciated.
  • chrisp_6@yahoo.com - Wednesday, July 20, 2016 - link

    Very good review. One minor comment to the article writers - do a final check on grammer - granted we are technical folks, but it was noticeable especially on the final words page.
  • madwolfa - Wednesday, July 20, 2016 - link

    It's "grammar", though. :)
  • Eden-K121D - Thursday, July 21, 2016 - link

    Oh the irony
  • chrisp_6@yahoo.com - Thursday, July 21, 2016 - link

    oh snap, that is some funny stuff right there

Log in

Don't have an account? Sign up now