Display Matters: HDMI 2.0, HEVC, & VR Direct

Stepping away from the graphics heart of Maxwell 2 for a bit, NVIDIA has been busy not just optimizing their architecture and adding graphical features to their hardware, but they have also added some new display-oriented features to the Maxwell 2 architecture. This has resulted in an upgrade of their video encode capabilities, their display I/O capabilities, and even their ability to drive virtual reality headsets such as the Oculus Rift.

We’ll start first with display I/O. HDMI users will be happy to see that as of GM204, NVIDIA now supports HDMI 2.0, which will allow NVIDIA to drive future 4K@60Hz displays over HDMI and without compromise. HDMI 2.0 for its part is the 4K-focused upgrade of the HDMI standard, and brings with it support for the much higher data rate (through a greatly increased clockspeed of 600MHz) necessary to drive 4K displays at 60Hz, while also introducing features such as new subsampling patterns like YCbCr 4:2:0, and official support for wide aspect ratio (21:9 displays).

It should be noted that this is full HDMI 2.0 support, and as a result it notably differs from the earlier support that NVIDIA patched into Kepler and Maxwell 1 through drivers. Whereas NVIDIA’s earlier update was to allow these products to drive a 4K@60Hz display using 4:2:0 subsampling to stay within the bandwidth limitations of HDMI 1.4, Maxwell 2 implements the bandwidth improvements necessary to support 4K@60Hz with full resolution 4:4:4 and RGB color spaces.

Given the timeline for HDMI 2.0 development, the fact that we’re seeing HDMI 2.0 support now is if anything a pleasant surprise, since it’s earlier than we expected it. However this will leave HTPC users in a pickle if they want HDMI 2.0 support; with the GM107 based GTX 750 series having launched only 7 months ago without HDMI 2.0 support, we would not expect NVIDIA’s HTPC-centric video cards to be replaced any time soon. This means the only option for HTPC users wanting HDMI 2.0 support right away is to upgrade to a larger and more powerful Maxwell 2 based card, or otherwise stick to the low powered GTX 750 series and go without HDMI 2.0.

Meanwhile alongside the upgrade to HDMI 2.0, NVIDIA has also made one other change to their display controllers that should be of interest to multi-monitor users. With Maxwell 2, a single display controller can now drive multiple identical MST substreams on its own, rather than requiring a different display controller for each stream. This feature will be especially useful for driving tiled monitors such as many of today’s 4K monitors, which are internally a pair of identical displays driven using MST. By being able to drive both tiles off of a single display controller, NVIDIA can make better use of their 4 display controllers, allowing them to drive up to 4 such displays off of a Maxwell 2 GPU as opposed to the 2 display limitation that is inherent to Kepler GPUs. For the consumer cards we’re seeing today, the most common display I/O configuration will include 3 DisplayPorts, allowing these specific cards to drive up to 3 such 4K monitors.

HEVC & 4K Encoding

In Maxwell 1, NVIDIA introduced updated versions of both their video encode and decode engines. On the decode side the new VP6 decoder increased the performance of the decode block to allow NVIDIA to decode H.264 up to 4K@60Hz (Level 5.2), something the older VP5 decoder was not fast enough to do. Meanwhile the Maxwell 1 NVEC video encoder received a similar speed boost, roughly doubling its performance compared to Kepler.

Surprisingly, even after only 7 months since the first Maxwell 1 GPUs, NVIDIA has once again overhauled NVENC, and this time more significantly. The Maxwell 2 version of NVENC further builds off of the Maxwell 1 NVENC by adding full support for HEVC (H.265) encoding. Like HDMI 2.0 support, this marks the very first PC GPU we’ve seen integrate support for this more advanced codec.

At this point there’s really not much that can be done with Maxwell 2’s HEVC encoder – it’s not exposed in anything or used in NVIDIA’s current tools – but NVIDIA is laying the groundwork for the future once HEVC support becomes more commonplace in other hardware and software. NVIDIA envisions their killer app for HEVC support to be game streaming, where the higher efficiency of HEVC will improve the image quality of game streams due to the limited bandwidth available in most streaming scenarios. In the long run we would expect NVIDIA to utilize HEVC for GameStream for the home, and at the server level support for HEVC in the next generation of GRID cards will be a major boon to NVIDIA’s GRID streaming efforts.

Meanwhile where the enhanced version of NVENC is going to be applicable today is in ShadowPlay. While still recording in H.264, the higher performance of NVENC means that NVIDIA can now offer recording at higher resolutions and bitrates. With GM204 NVIDIA’s hardware can now record at 1440p60 and 4Kp60 at bitrates up to 130Mbps, as opposed to the 1080p60 @ 50Mbps limit for their Kepler cards.

Finally, and somewhat paradoxically, Maxwell 2 inherits Kepler and Maxwell 1’s hybrid HEVC decode support. First introduced with Maxwell 1 and backported to Kepler, NVIDIA’s hybrid HEVC decode support enables HEVC decoding on these parts by using a combination of software (shader) and hardware decoding, leveraging the reusable portions of the H.264 decode block to offload to fixed function hardware what elements it can, and processing the rest in software.

A hybrid decode process is not going to be as power efficient as a full fixed function decoder, but handled in the GPU it will be much faster and more power efficient than handling the process in software. The fact that Maxwell 2 gets a hardware HEVC encoder but a hybrid HEVC decoder is in turn a result of the realities of hardware development for NVIDIA; you can’t hybridize encoding, and the hybrid decode process is good enough for now. So NVIDIA spent their efforts on getting hardware HEVC encoding going first, and at this point we’d expect to see full hardware HEVC decoding show up in a future generation of hardware (and we’d note that NVIDIA can swap VP blocks at will, so it doesn’t necessarily have to be Pascal).

VR Direct

Our final item on the list of NVIDIA’s new display features is a family of technologies NVIDIA is calling VR Direct.

VR Direct in a nutshell is a collection of technologies and software enhancements designed to improve the experience and performance of virtual reality headsets such as the Oculus Rift. From a practical perspective NVIDIA already has some experience in stereoscopic rendering through 3D Vision, and from a marketing perspective the high resource requirements of VR would be good for encouraging GeForce sales, so NVIDIA will be heavily investing into the development of VR technologies through VR Direct.

From a technical perspective the biggest thing that Oculus and other VR headset makers need from GPU manufacturers and the other companies involved in the PC ecosystem is methods of reducing the latency/input lag between a user’s input and when a finished frame becomes visible on a headset. While some latency is inevitable – it takes time to gather data and render a frame – the greater the latency the greater the disconnect will be between the user and the rendered world. In more extreme cases this can make the simulation unusable, or even trigger motion sickness in individuals whose minds can’t handle the disorientation from the latency. As a result several of NVIDIA’s features are focused on reducing latency in some manner.

First and foremost, for VR headsets NVIDIA has implemented a low latency mode that minimizes the amount of time a frame spends being prepared by the drivers and OS. In an average case this low latency mode eliminates 10ms of OS-induced latency from the rendering pipeline, and this is the purest optimization of the bunch.

Meanwhile at the more extreme end of the feature spectrum, NVIDIA will be supporting a feature called asynchronous warp. This feature, known by Oculus developers as time warp, involves rendering a frame and then at the last possible moment updating the head tracking information from the user. After that information is acquired, the nearly finished frame then has a post-process warping applied to it to take into account head movement since the frame was initially submitted, with the ultimate goal of this warping being the simulation of what the frame should look like had it been rendered instantaneously.

From a quality perspective asynchronous warp stands to be a bit of a kludge, but it is the single most potent latency improvement among the VR Direct feature set. By modifying the frame to account for the user’s head position as late as is possible, it reduces the perceived latency by as much as 25ms.

NVIDIA’s third latency optimization is less a VR optimization and more a practical effect of an existing technology, and that is Multi-Frame sampled Anti-Aliasing. As we'll discuss later in our look at this new AA mode, Multi-Frame sampled Anti-Aliasing is designed to offer 4x MSAA-like quality with 2x MSAA-like performance. Assuming a baseline of 4x MSAA, switching it out for Multi-Frame sampled Anti-Aliasing can shave an additional few milliseconds off of the frame rendering time.

Lastly, NVIDIA’s fourth and final latency optimization for VR Direct is VR SLI. And this feature is simple enough: rather than using alternate frame rendering (AFR) to render both eyes at once on one GPU, split up the workload such that each GPU is working on each eye simultaneously. AFR, though highly compatible with traditional monoscopic rendering, introduces additional latency that would be undesirable for VR. By rendering each eye separately on each GPU, NVIDIA is able to apply the performance benefits of SLI to VR without creating additional latency. Given the very high performance and low latencies required for VR, it’s currently expected that most high-end games supporting VR headsets will need SLI to achieve their necessary performance, so being able to use SLI without a latency penalty will be an important part of making VR gaming commercially viable.

On a side note, for the sake of clarity we do want to point out that many of NVIDIA’s latency optimizations come from the best practices suggestions of Oculus VR. Asynchronous warp and OS level latency optimizations for example are features that Oculus VR is suggesting for hardware developers and/or pursuing themselves. So while these features are very useful to have on GeForce hardware, they are not necessarily all ideas that NVIDIA has come up with or technologies that are limited to NVIDIA hardware (or even the Maxwell 2 architecture).

Moving on, other than NVIDIA’s latency reduction technologies the VR Direct feature set will also include some feature improvements designed to improve the quality and usability of VR. NVIDIA’s Dynamic Super Resolution (DSR) technology will be available to VR, and given the physical limits on pixel density in today’s OLED panels it will be an important tool in reducing perceptible aliasing. NVIDIA will also be extending VR support to GeForce Experience at a future time, simplifying the configuration of VR-enabled games. For VR on GeForce Experience NVIDIA wants to go beyond just graphical settings and also auto-configure inputs as well, handling remapping of inputs to head/body tracking for the user automatically.

Ultimately at this point VR Direct is more of a forward looking technology than it is something applicable today – the first consumer Oculus Rift hasn’t even been announced, let alone shipped – but by focusing on VR early NVIDIA is hoping to improve the speed and ease of VR development, and have the underpinnings in place once consumer VR gear becomes readily available.

Maxwell 2’s New Features: Direct3D 11.3 & VXGI Better AA: Dynamic Super Resolution & Multi-Frame Sampled Anti-Aliasing
Comments Locked

274 Comments

View All Comments

  • TheJian - Saturday, September 20, 2014 - link

    http://blogs.nvidia.com/blog/2014/09/19/maxwell-an...
    Did I miss it in the article or did you guys just purposely forget to mention NV claims it does DX12 too? see their own blog. Microsoft's DX12 demo runs on ...MAXWELL. Did I just miss the DX12 talk in the article? Every other review I've read mentions this (techpowerup, tomshardware, hardocp etc etc). Must be that AMD Center still having it's effect on your articles ;)

    They were running a converted elemental demo (converted to dx12) and Fable Legends from MS. Yet curiously missing info from this site's review. No surprise I guess with only an AMD portal still :(

    From the link above:
    "Part of McMullen’s presentation was the announcement of a broadly accessible early access program for developers wishing to target DX12. Microsoft will supply the developer with DX12, UE4-DX12 and the source for Epic’s Elemental demo ported to run on the DX12-based engine. In his talk, McMullen demonstrated Maxwell running Elemental at speed and flawlessly. As a development platform for this effort, NVIDIA’s GeForce GPUs and Maxwell in particular is a natural vehicle for DX12 development."

    So maxwell is a dev platform for dx12, but you guys leave that little detail out so newbs will think it doesn't do it? Major discussion of dx11 stuff missing before, now up to 11.3 but no "oh and it runs all of dx12 btw".

    One more comment on 980: If it's a reference launch how come other sites already have OC versions (IE, tomshardware has a Windforce OC 980, though stupidly as usual they downclocked it and the two OC/superclocked 970's they had to ref clocks...ROFL - like you'd buy an OC card and downclock them)? IT seems to be a launch of OC all around. Newegg even has them in stock (check EVGA OC version):
    http://www.newegg.com/Product/Product.aspx?Item=N8...
    And with a $10 rebate so only $559 and a $5 gift card also.
    "This model is factory overclocked to 1241 MHz Base Clock/1342 MHz Boost Clock (1126 MHz/1216 MHz for reference design)"

    Who would buy ref for $10 diff? IN fact the ref cards are $569 at newegg, so you save buying the faster card...LOL.
  • cactusdog - Saturday, September 20, 2014 - link

    TheJian, Wow, Did you read the article? Did you read the conclusion? AT says the 980 is "remarkable" , "well engineered", "impeccable design" and has "no competition" They covered almost all of Nvidia marketing talking points and you're going to accuse them of a conspiracy? Are you fking retarded??
  • Daniel Egger - Saturday, September 20, 2014 - link

    It would be nice to rather than just talk about about the 750 Ti to also include it in comparisons to see it clearer in perspective what it means to go from Maxwell I to Maxwell II in terms of performance, power consumption, noise and (while we are at it) performance per Watt and performance per $.

    Also where're the benchmarks for the GTX 970? I sure respect that this card is in a different ballpark but the somewhat reasonable power output might actually make the GTX 970 a viable candidate for an HTPC build. Is it also possible to use it with just one additional 6 Pin connector (since as you mentioned this would be within the specs without any overclocking) or does it absolutely need 2 of them?
  • SkyBill40 - Saturday, September 20, 2014 - link

    As was noted in the review at least twice, they were having issues with the 970 and thus it won't be tested in full until next week (along with the 980 in SLI).
  • MrSpadge - Saturday, September 20, 2014 - link

    Wow! This makes me upgrade from a GTX660Ti - not because of gaming (my card is fast enough for my needs) but because of the power efficiency gains for GP-GPU (running GPU-Grid under BOINC). Thank you nVidia for this marvelous chip and fair prices!
  • jarfin - Saturday, September 20, 2014 - link

    i still CANT understand amd 'uber' option.
    its totally out of test,bcoz its just 'oc'd' button,nothing else.
    its must be just r290x and not anantech 'amd canter' way uber way.

    and,i cant help that feeling,what is strong,that anatech is going badly amd company way,bcoz they have 'amd center own sector.
    so,its mean ppl cant read them review for nvidia vs radeon cards race without thinking something that anatech keep raden side way or another.
    and,its so clear thats it.

    btw
    i hope anantech get clear that amd card R9200 series is just competition for nvidia 90 series,bcoz that every1 kow amd skippedd 8000 series and put R9 200 series for nvidia 700 series,but its should be 8000 series.
    so now,generation of gpu both side is even.

    meaning that next amd r9 300 series or what it is coming amd company battle nvidia NEXT level gpu card,NOT 900 series.

    there is clear both gpu card history for net.

    thank you all

    p.s. where is nvidia center??
  • Gigaplex - Saturday, September 20, 2014 - link

    Uber mode is not an overclock. It's a fan speed profile change to reduce thermal throttling (underclock) at the expense of noise.
  • dexgen - Saturday, September 20, 2014 - link

    Ryan, Is it possible to see the average clock speeds in different tests after increasing the power and temperature limit in afterburner?

    And also once the review units for non-reference cards come in it would be very nice to see what the average clock speeds for different cards with and without increased power limit would be. That would be a great comparison for people deciding which card to buy.
  • silverblue - Saturday, September 20, 2014 - link

    Exceptional by NVIDIA; it's always good to see a more powerful yet more frugal card especially at the top end.

    AMD's power consumption could be tackled - at least partly - by some re-engineering. Do they need a super-wide memory bus when NVIDIA are getting by with half the width and moderately faster RAM? Tonga has lossless delta colour compression which largely negates the need for a wide bus, although they did shoot themselves in the foot by not clocking the memory a little higher to anticipate situations where this may not help the 285 overcome the 280.

    Perhaps AMD could divert some of their scant resources towards shoring up their D3D performance to calm down some of the criticism because it does seem like they're leaving performance on the table and perhaps making Mantle look better than it might be as a result.
  • Luke212 - Saturday, September 20, 2014 - link

    Where are the SGEMM compute benchmarks you used to put on high end reviews?

Log in

Don't have an account? Sign up now