The NVIDIA GeForce GTX 980 Review: Maxwell Mark 2

Name: The NVIDIA GeForce GTX 980 Review: Maxwell Mark 2
Item: The NVIDIA GeForce GTX 980 Review: Maxwell Mark 2
Author: Ryan Smith

by Ryan Smith on September 18, 2014 10:30 PM EST

Posted in
GPUs
GeForce
NVIDIA
Maxwell

274 Comments | Add A Comment

274 Comments

Maxwell 2’s New Features: Direct3D 11.3 & VXGI

When NVIDIA introduced the Maxwell 1 architecture and the GM107 based GTX 750 series, one of the unexpected aspects of their decision was to release these parts as members of the existing 700 series rather than a newer series to communicate a difference in features. However as it turned out there really wasn’t a feature difference between it and Kepler; other than a newer NVENC block, Maxwell 1 was for all intents and purposes an optimized Kepler architecture. It was the same features built upon the efficiency improvements of the Maxwell architecture.

With that in mind, along with the hardware/architectural changes we’ve listed earlier, the other big factor that sets Maxwell 2 apart from Maxwell 1 is its feature set. In that respect Maxwell 2 is almost a half-generational update on its own, as it implements a number of new features that were not present in Maxwell 1. This means Maxwell 2 is bringing some new features that we need to cover, but it also means that the GM204 based GTX 900 series is feature differentiated from the GTX 600/700 series in a way that the earlier GTX 750 series was not.

Direct3D 11.3

First and foremost among Maxwell 2’s new features is the inclusion of full Direct3D 11.2/11.3 compatibility. Kepler and Maxwell 1 before it were officially feature level 11_0, but they contained an almost complete set of FL 11_1 features, allowing most of these features to be accessed through cap bits. With Maxwell 2 however, NVIDIA has finally implemented the remaining features required for FL11_1 compatibility and beyond, updating their architecture to support the 16x raster coverage sampling required for Target Independent Rasterization and UAVOnlyRenderingForcedSampleCount.

This extended feature set also extends to Direct3D 11.2, which although it doesn’t have an official feature level of its own, does introduce some new (and otherwise optional) features that are accessed via cap bits. Key among these, Maxwell 2 will support the more advanced Tier 2 tiled resources, otherwise known as sparse textures or partially resident textures. Tier 2 was introduced into the specification to differentiate the more capable AMD implementation of this feature from NVIDIA’s hardware, and now as of Maxwell 2 NVIDIA can support the more advanced functionality required for Tier 2.

Finally, Maxwell will also support the features being introduced in Direct3D 11.3 (and made available to D3D 12), which was announced alongside Maxwell at NVIDIA’s editors’ day event. We have a separate article covering Direct3D 11.3, so we won’t completely retread that ground here, but we will cover the highlights.

The forthcoming Direct3D 11.3 features, which will form the basis (but not entirety) of what’s expected to be feature level 11_3, are Rasterizer Ordered Views, Typed UAV Load, Volume Tiled Resources, and Conservative Rasterization. Maxwell 2 will offer full support for these forthcoming features, and of these features the inclusion of volume tiled resources and conservative rasterization is seen as being especially important by NVIDIA, particularly since NVIDIA is building further technologies off of them.

Volume tiled resources is for all intents and purposes tiled resources extended into the 3^rd dimension. Volume tiled resources are primarily meant to be used with 3D/volumetric pixels (voxels), with the idea being that with sparse allocation, volume tiles that do not contain any useful information can avoid being allocated, avoiding tying up memory in tiles that will never be used or accessed. This kind of sparse allocation is necessary to make certain kinds of voxel techniques viable.

Meanwhile conservative rasterization is also new to Maxwell 2. Conservative rasterization is essentially a more accurate but performance intensive solution to figuring out whether a polygon covers part of a pixel. Instead of doing a quick and simple test to see if the center of the pixel is bounded by the lines of the polygon, conservative rasterization checks whether the pixel covers the polygon by testing it against the corners of the pixel. This means that conservative rasterization will catch cases where a polygon was too small to cover the center of a pixel, which results in a more accurate outcome, be it better identifying pixels a polygon resides in, or finding polygons too small to cover the center of any pixel at all.

Conservative rasterization is being added to Direct3D in order to allow new algorithms to be used which would fail under the imprecise nature of point sampling. Like VTR, voxels play a big part here as conservative rasterization can be used to build a voxel. However it also has use cases in more accurate tiling and even collision detection. This feature is technically possible in existing hardware, but the performance of such an implementation would be very low as it’s essentially a workaround for the lack of necessary support in the rasterizers. By implementing conservative rasterization directly in hardware, Maxwell 2 will be able to perform the task far more quickly, which is necessary to make the resulting algorithms built on top of this functionality fast enough to be usable.

VXGI

Outside of the features covered by Direct3D 11.3, NVIDIA will also be adding features specifically to drive a new technology they are calling Voxel accelerated Global Illumination (VXGI).

At the highest level, VXGI is a manner of implementing global illumination by utilizing voxels in the calculations. Global illumination is something of a holy grail for computer graphics, as it can produce highly realistic and accurate lighting dynamically in real time. However global illumination is also very expensive, the path tracing involved taking up considerable time and resources. For this reason developers have played around with global illumination in the past – the original version of Epic’s Unreal 4 Engine Elemental demo implanted a voxel based global illumination method, for example – but it has always been too slow for practical use.

With VXGI NVIDIA is looking to solve the voxel global illumination problem through a combination of software and hardware. VXGI proper is the software component, and describes the algorithm being used. NVIDIA has been doing considerable research into voxel based global illumination over the years, and has finally reached a point where they have an algorithm ready to go in the form of VXGI.

VXGI will eventually be made available for Unreal Engine 4 and other major game engines starting in Q4 of this year. And while the VXGI greatly benefits from the hardware features built into Maxwell 2, it is not strictly reliant on the hardware and can be implemented through more traditional means on existing hardware. VXGI is if nothing else scalable, with the algorithm being designed to scale up and down with hardware by adjusting the density of the voxel grid, which in turn influences the number of calculations required and the resulting accuracy. Maxwell 2 for its part will be capable of using denser grids due to its hardware acceleration capabilities, allowing for better performance and more accurate lighting.

It’s at this point we’ll take a break and apologize to NVIDIA’s engineers for blowing through VXGI so quickly. This is actually a really interesting technology, as global illumination offers the possibility of finally attaining realistic real-time lighting in any kind of rendered environment. However VXGI is also a complex technology that is a subject in and of itself, and we could spend all day just covering it (we’d need to cover rasterization and path tracing to fully explain it). Instead we’d suggest reading NVIDIA’s own article on the technology once that is posted, as NVIDIA is ready and willing to go into great detail in how the technology works.

Getting back to today’s launch then, the other aspect of VXGI is the hardware features that NVIDIA has implemented to accelerate the technology. Though a big part of VXGI remains brute forcing through the path and cone tracing, the other major aspect of VXGI is building the voxel grids used in these calculations. It’s here where NVIDIA has pulled together the D3D 11.3 feature set, along with additional hardware features, to greatly accelerate the process of creating the voxel grid in order to speed up the overall algorithm.

From the D3D 11.3 feature set, conservative rasterization and volumetric tiled resources will play a big part. Conservative rasterization allows the creation of more accurate voxels, owing to the more accurate determination of whether a pixel/voxel covers a given polygon. Meanwhile volumetric tiled resources will allow for the space efficient storage of voxels, allowing software to store only the voxels it needs and not the many empty voxels that would otherwise be present in a scene.

Joining these features as the final VXGI-centric feature for Maxwell 2 is a feature NVIDIA is calling Multi-Projection Acceleration. The idea behind MPA is that there are certain scenarios where the same geometry needs to be projected multiple times – voxels being a big case of this due to being 6 sided – and that for performance reasons it is desirable to do all of these projections much more quickly than simply iterating though every necessary projection in shaders. In these scenarios being able to quickly project geometry to all the necessary surfaces is a significant performance advantage.

A big part of MPA is a sub-feature called viewport multicast. In viewport multicast Maxwell 2 can replay the necessary geometry to all of the viewports in a single pass. At the hardware level this involves giving the hardware the ability to automatically determine when it needs to engage in viewport multicast, based on its understanding of the workload it's receiving. This is once again a case where something is being done in a fixed-function like fashion for performance reasons, rather than being shuffled off to slower shader hardware.

Alongside voxelization, NVIDIA tells us that MPA should also be applicable to cube map generation and shadow map generation. Both of which make plenty of sense in this case: in both scenarios you are projecting the same geometry multiple times, whether it’s to faces of a cube or to shadow maps of increasing resolution. As a result MPA should have some benefits even in renderers that aren’t using VXGI, though clearly the greatest benefits are still going to be when VXGI is in play.

NVIDIA believes that the overall performance improvement to voxelization from these technologies will be very significant. In their own testing of the technology in rendering a scene set in San Miguel de Allende, Mexico (a common test scene for global illumination), NVIDIA has found that Maxwell 2’s hardware acceleration features tripled their voxelization performance.

Overall NVIDIA is heavily betting on VXGI at this time both to further set apart Maxwell 2 based cards from the competition, and to further advance the state of PC graphics. In the gaming space in particular NVIDIA has a significant interest in making sure PC games aren’t just straight console ports that run at higher framerates and resolutions. This is the situation that has spurred on the development of GameWorks and technologies like VXGI, so that game developers can enhance the PC ports of their games with technologies that improve their overall rendering quality. Maxwell 2 in turn is the realization that while some of these features can be performed in software/shaders on today’s hardware, these features will be even more useful and impressive when backed with dedicated hardware to improve their performance.

Finally, we’ll close out our look at VXGI with a preview of NVIDIA’s GTX 900 series tech demo, which is a rendered recreation of a photo/scene involving Buzz Aldrin and the Apollo 11 moon landing. The Apollo 11 demo is designed to show off the full capabilities of VXGI, utilizing the lighting technique to correctly and dynamically emulate specular, diffuse, and other forms of lighting that occur in reality. At editors’ day NVIDIA originally attempted to pass off the rendering as the original photo, and while after a moment it’s clear that it’s a rendering – among other things it lacks the graininess of a 1969 film based camera – it comes very, very close. In showcasing the Apollo 11 tech demo, NVIDIA’s hope is that one day games will be able to achieve similarly accurate lighting effects through the use of VXGI.

Gallery: NVIDIA Apollo 11 VXGI Tech Demo

Maxwell 2 Architecture: Introducing GM204 Display Matters: HDMI 2.0, HEVC, & VR Direct

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

274 Comments

View All Comments

jmunjr - Friday, September 19, 2014 - link
Wish you had done a GTX 970 review as well like many other sites since way more of us care about that card than the 980 since it is cheaper.
Gonemad - Friday, September 19, 2014 - link
Apparently, if I want to run anything under the sun in 1080p cranked to full at 60fps, I will need to get me one GTX 980 and a suitable system to run with it, and forget mid-ranged priced cards.

That should put an huge hole in my wallet.

Oh yes, the others can run stuff at 1080p, but you have to keep tweaking drivers, turning AA on, turning AA off, what a chore. And the milennar joke, yes it RUNS Crysis, at the resolution I'd like.

Didn't, by any chance, the card actually benefit of being fabricated at 28nm, by spreading its heat over a larger area? If the whole thing, hipothetically, just shrunk to 14nm, wouldn't all that 165W of power would be dissipated over a smaller area (1/4 area?), and this thing would hit the throttle and stay there?

Or by being made smaller, it would actually dissipate even less heat and still get faster?
Yojimbo - Friday, September 19, 2014 - link
I think that it depends on the process. If Dennard scaling were to be in effect, then it should dissipate proportionally less heat. But to my understanding, Dennard scaling has broken down somewhat in recent years, and so I think heat density could be a concern. However, I don't know if it would be accurate to say that the chip benefited from the 28nm process, since I think it was originally designed with the 20nm process in mind, and the problem with putting the chip on that process had to do with the cost and yields. So, presumably, the heat dissipation issues were already worked out for that process..?
AnnonymousCoward - Friday, September 26, 2014 - link
The die size doesn't really matter for heat dissipation when the external heat sink is the same size; the thermal resistance from die to heat sink would be similar.
danjw - Friday, September 19, 2014 - link
I would love to see these built on Intel's 14nm process or even the 22nm. I think both Nvidia and AMD aren't comfortable letting Intel look at their technology, despite NDAs and firewalls that would be a part of any such agreement.

Anyway, thanks for the great review Ryan.
Yojimbo - Friday, September 19, 2014 - link
Well, if one goes by Jen-Hsun Huang's (Nvidia's CEO) comments of a year or two ago, Nvidia would have liked Intel to manufacture their SOCs for them, but it seems Intel was unwilling. I don't see why they would be willing to have them manufacture SOCs and not GPUs being that at that time they must have already had the plan to put their desktop GPU technology into their SOCs, unless the one year delay between the parts makes a difference.
r13j13r13 - Friday, September 19, 2014 - link
hasta que no salga la serie 300 de AMD con soporte nativo para directx 12
Arakageeta - Friday, September 19, 2014 - link
No interpretation of the compute graphs whatsoever? Could you at least report the output of CUDA's deviceQuery tool?
texasti89 - Friday, September 19, 2014 - link
I'm truly impressed with this new line of GPUs. To be able to acheive this leap on efficiency using the same transistor feature size is a great incremental achievement. Bravo TSMC & Nvidia. I feel comfortable to think that we will soon get this amazing 980 performance level on game laptops once we scale technology to the 10nm process. Keep up the great work.
stateofstatic - Friday, September 19, 2014 - link
Spoiler alert: Intel is building a new fab in Hillsboro, OR specifically for this purpose...

The NVIDIA GeForce GTX 980 Review: Maxwell Mark 2

Maxwell 2’s New Features: Direct3D 11.3 & VXGI

Direct3D 11.3

VXGI

Post Your Comment

274 Comments

View All Comments

jmunjr - Friday, September 19, 2014 - link

Gonemad - Friday, September 19, 2014 - link

Yojimbo - Friday, September 19, 2014 - link

AnnonymousCoward - Friday, September 26, 2014 - link

danjw - Friday, September 19, 2014 - link

Yojimbo - Friday, September 19, 2014 - link

r13j13r13 - Friday, September 19, 2014 - link

Arakageeta - Friday, September 19, 2014 - link

texasti89 - Friday, September 19, 2014 - link

stateofstatic - Friday, September 19, 2014 - link

Log in

Don't have an account? Sign up now