Maxwell 2’s New Features: Direct3D 11.3 & VXGI

When NVIDIA introduced the Maxwell 1 architecture and the GM107 based GTX 750 series, one of the unexpected aspects of their decision was to release these parts as members of the existing 700 series rather than a newer series to communicate a difference in features. However as it turned out there really wasn’t a feature difference between it and Kepler; other than a newer NVENC block, Maxwell 1 was for all intents and purposes an optimized Kepler architecture. It was the same features built upon the efficiency improvements of the Maxwell architecture.

With that in mind, along with the hardware/architectural changes we’ve listed earlier, the other big factor that sets Maxwell 2 apart from Maxwell 1 is its feature set. In that respect Maxwell 2 is almost a half-generational update on its own, as it implements a number of new features that were not present in Maxwell 1. This means Maxwell 2 is bringing some new features that we need to cover, but it also means that the GM204 based GTX 900 series is feature differentiated from the GTX 600/700 series in a way that the earlier GTX 750 series was not.

Direct3D 11.3

First and foremost among Maxwell 2’s new features is the inclusion of full Direct3D 11.2/11.3 compatibility. Kepler and Maxwell 1 before it were officially feature level 11_0, but they contained an almost complete set of FL 11_1 features, allowing most of these features to be accessed through cap bits. With Maxwell 2 however, NVIDIA has finally implemented the remaining features required for FL11_1 compatibility and beyond, updating their architecture to support the 16x raster coverage sampling required for Target Independent Rasterization and UAVOnlyRenderingForcedSampleCount.

This extended feature set also extends to Direct3D 11.2, which although it doesn’t have an official feature level of its own, does introduce some new (and otherwise optional) features that are accessed via cap bits. Key among these, Maxwell 2 will support the more advanced Tier 2 tiled resources, otherwise known as sparse textures or partially resident textures. Tier 2 was introduced into the specification to differentiate the more capable AMD implementation of this feature from NVIDIA’s hardware, and now as of Maxwell 2 NVIDIA can support the more advanced functionality required for Tier 2.

Finally, Maxwell will also support the features being introduced in Direct3D 11.3 (and made available to D3D 12), which was announced alongside Maxwell at NVIDIA’s editors’ day event. We have a separate article covering Direct3D 11.3, so we won’t completely retread that ground here, but we will cover the highlights.

The forthcoming Direct3D 11.3 features, which will form the basis (but not entirety) of what’s expected to be feature level 11_3, are Rasterizer Ordered Views, Typed UAV Load, Volume Tiled Resources, and Conservative Rasterization. Maxwell 2 will offer full support for these forthcoming features, and of these features the inclusion of volume tiled resources and conservative rasterization is seen as being especially important by NVIDIA, particularly since NVIDIA is building further technologies off of them.

Volume tiled resources is for all intents and purposes tiled resources extended into the 3rd dimension. Volume tiled resources are primarily meant to be used with 3D/volumetric pixels (voxels), with the idea being that with sparse allocation, volume tiles that do not contain any useful information can avoid being allocated, avoiding tying up memory in tiles that will never be used or accessed. This kind of sparse allocation is necessary to make certain kinds of voxel techniques viable.

Meanwhile conservative rasterization is also new to Maxwell 2. Conservative rasterization is essentially a more accurate but performance intensive solution to figuring out whether a polygon covers part of a pixel. Instead of doing a quick and simple test to see if the center of the pixel is bounded by the lines of the polygon, conservative rasterization checks whether the pixel covers the polygon by testing it against the corners of the pixel. This means that conservative rasterization will catch cases where a polygon was too small to cover the center of a pixel, which results in a more accurate outcome, be it better identifying pixels a polygon resides in, or finding polygons too small to cover the center of any pixel at all.

Conservative rasterization is being added to Direct3D in order to allow new algorithms to be used which would fail under the imprecise nature of point sampling. Like VTR, voxels play a big part here as conservative rasterization can be used to build a voxel. However it also has use cases in more accurate tiling and even collision detection. This feature is technically possible in existing hardware, but the performance of such an implementation would be very low as it’s essentially a workaround for the lack of necessary support in the rasterizers. By implementing conservative rasterization directly in hardware, Maxwell 2 will be able to perform the task far more quickly, which is necessary to make the resulting algorithms built on top of this functionality fast enough to be usable.

VXGI

Outside of the features covered by Direct3D 11.3, NVIDIA will also be adding features specifically to drive a new technology they are calling Voxel accelerated Global Illumination (VXGI).

At the highest level, VXGI is a manner of implementing global illumination by utilizing voxels in the calculations. Global illumination is something of a holy grail for computer graphics, as it can produce highly realistic and accurate lighting dynamically in real time. However global illumination is also very expensive, the path tracing involved taking up considerable time and resources. For this reason developers have played around with global illumination in the past – the original version of Epic’s Unreal 4 Engine Elemental demo implanted a voxel based global illumination method, for example – but it has always been too slow for practical use.

With VXGI NVIDIA is looking to solve the voxel global illumination problem through a combination of software and hardware. VXGI proper is the software component, and describes the algorithm being used. NVIDIA has been doing considerable research into voxel based global illumination over the years, and has finally reached a point where they have an algorithm ready to go in the form of VXGI.

VXGI will eventually be made available for Unreal Engine 4 and other major game engines starting in Q4 of this year. And while the VXGI greatly benefits from the hardware features built into Maxwell 2, it is not strictly reliant on the hardware and can be implemented through more traditional means on existing hardware. VXGI is if nothing else scalable, with the algorithm being designed to scale up and down with hardware by adjusting the density of the voxel grid, which in turn influences the number of calculations required and the resulting accuracy. Maxwell 2 for its part will be capable of using denser grids due to its hardware acceleration capabilities, allowing for better performance and more accurate lighting.

It’s at this point we’ll take a break and apologize to NVIDIA’s engineers for blowing through VXGI so quickly. This is actually a really interesting technology, as global illumination offers the possibility of finally attaining realistic real-time lighting in any kind of rendered environment. However VXGI is also a complex technology that is a subject in and of itself, and we could spend all day just covering it (we’d need to cover rasterization and path tracing to fully explain it). Instead we’d suggest reading NVIDIA’s own article on the technology once that is posted, as NVIDIA is ready and willing to go into great detail in how the technology works.

Getting back to today’s launch then, the other aspect of VXGI is the hardware features that NVIDIA has implemented to accelerate the technology. Though a big part of VXGI remains brute forcing through the path and cone tracing, the other major aspect of VXGI is building the voxel grids used in these calculations. It’s here where NVIDIA has pulled together the D3D 11.3 feature set, along with additional hardware features, to greatly accelerate the process of creating the voxel grid in order to speed up the overall algorithm.

From the D3D 11.3 feature set, conservative rasterization and volumetric tiled resources will play a big part. Conservative rasterization allows the creation of more accurate voxels, owing to the more accurate determination of whether a pixel/voxel covers a given polygon. Meanwhile volumetric tiled resources will allow for the space efficient storage of voxels, allowing software to store only the voxels it needs and not the many empty voxels that would otherwise be present in a scene.

Joining these features as the final VXGI-centric feature for Maxwell 2 is a feature NVIDIA is calling Multi-Projection Acceleration. The idea behind MPA is that there are certain scenarios where the same geometry needs to be projected multiple times – voxels being a big case of this due to being 6 sided – and that for performance reasons it is desirable to do all of these projections much more quickly than simply iterating though every necessary projection in shaders. In these scenarios being able to quickly project geometry to all the necessary surfaces is a significant performance advantage.

A big part of MPA is a sub-feature called viewport multicast. In viewport multicast Maxwell 2 can replay the necessary geometry to all of the viewports in a single pass. At the hardware level this involves giving the hardware the ability to automatically determine when it needs to engage in viewport multicast, based on its understanding of the workload it's receiving. This is once again a case where something is being done in a fixed-function like fashion for performance reasons, rather than being shuffled off to slower shader hardware.

Alongside voxelization, NVIDIA tells us that MPA should also be applicable to cube map generation and shadow map generation. Both of which make plenty of sense in this case: in both scenarios you are projecting the same geometry multiple times, whether it’s to faces of a cube or to shadow maps of increasing resolution. As a result MPA should have some benefits even in renderers that aren’t using VXGI, though clearly the greatest benefits are still going to be when VXGI is in play.

NVIDIA believes that the overall performance improvement to voxelization from these technologies will be very significant. In their own testing of the technology in rendering a scene set in San Miguel de Allende, Mexico (a common test scene for global illumination), NVIDIA has found that Maxwell 2’s hardware acceleration features tripled their voxelization performance.

Overall NVIDIA is heavily betting on VXGI at this time both to further set apart Maxwell 2 based cards from the competition, and to further advance the state of PC graphics. In the gaming space in particular NVIDIA has a significant interest in making sure PC games aren’t just straight console ports that run at higher framerates and resolutions. This is the situation that has spurred on the development of GameWorks and technologies like VXGI, so that game developers can enhance the PC ports of their games with technologies that improve their overall rendering quality. Maxwell 2 in turn is the realization that while some of these features can be performed in software/shaders on today’s hardware, these features will be even more useful and impressive when backed with dedicated hardware to improve their performance.

Finally, we’ll close out our look at VXGI with a preview of NVIDIA’s GTX 900 series tech demo, which is a rendered recreation of a photo/scene involving Buzz Aldrin and the Apollo 11 moon landing. The Apollo 11 demo is designed to show off the full capabilities of VXGI, utilizing the lighting technique to correctly and dynamically emulate specular, diffuse, and other forms of lighting that occur in reality. At editors’ day NVIDIA originally attempted to pass off the rendering as the original photo, and while after a moment it’s clear that it’s a rendering – among other things it lacks the graininess of a 1969 film based camera – it comes very, very close. In showcasing the Apollo 11 tech demo, NVIDIA’s hope is that one day games will be able to achieve similarly accurate lighting effects through the use of VXGI.

Maxwell 2 Architecture: Introducing GM204 Display Matters: HDMI 2.0, HEVC, & VR Direct
Comments Locked

274 Comments

View All Comments

  • Laststop311 - Saturday, September 20, 2014 - link

    I'm going to wait for the custom gtx 980's. It was already throttling from reaching the 80C limit on most games. Blower design wouldn't of throttled if they left the vapor chamber in but they didnt. My case has plenty of airflow so i don't require a blower design. MSI twin frozr V open air design will cool the gpu much better and stop it from throttling during gaming. People rushing to buy the reference design are missing out on 100's of mhz due to thermal throttle.
  • chizow - Saturday, September 20, 2014 - link

    Yep the open-faced custom coolers are definitely better at OC'ing, especially in single-GPU configs, but the problems I have with them are:

    1) they tend to have cheaper build quality than the ref, especially the NVTTM cooler which is just classy stuff. The custom coolers replace this with lots and lots of plastic, visible heatpipes, cheapo looking fans. If I wanted an Arctic Accelero on my GPUs I would just buy one.

    2) they usually take longer to come to market. Frequently +3-6 weeks lead time. I know its not a super long time in the grand scheme of things, but I'd rather upgrade sooner.

    3) The blowers tend to do better in SLI over longer periods of time, and also don't impact your CPU temps/OC as much. I have a ton of airflow too (HAF-X) but I still prefer most of the heat being expelled from the start, and not through my H100i rad.

    4) Frankly I'm not too worried about squeezing the last 100-150MHz out of these chips. There was a time I might have been, but I tend to stick it to a safe OC about 100-150MHz below what most people are getting and then call it a day without having to do a dozen 3DMark loops to verify stability.
  • Laststop311 - Sunday, September 21, 2014 - link

    Did you see the benchmarks. Some games were running in the 900's some in the 1000's some in 1100's. Stuck at these frequencies because the card was riding the 80C limit. As the review mentioned these aren't the same titan coolers as they removed the vapor chamber and replaced it with regular heatpipes. Getting a custom cooled card isnt about squeezing the last 100-150 from an OC its about squeezing an extra 400-600 mhz from an OC as many reviewers have gotten the gtx 980 to OC to 1500mhz. We are talking a massive performance increase from getting the proper cooling bigger than even the r9 290x going from reference to custom and that was pretty big itself.
  • Laststop311 - Sunday, September 21, 2014 - link

    Even to get the card to reliably run at stock settings during intense gaming you need a custyom cooled card. The reference cooled card can't even reliably hit its stock clock under intense gaming because the blower cooler without vapor chamber sucks.
  • chizow - Sunday, September 21, 2014 - link

    No, you can adjust the Nvidia fan and GPU temp settings to get sustained Boosts. There is a trade-off in terms of fan noise and/or operating temps, but it is easy to get close to the results of the custom coolers at the expense of fan noise. I personally set my fan curve differently because I think Nvidia's 80C target temp profile is a little bit too passive in how quickly it ramps up fanspeeds. I don't expect to have any problems at all maintaining rated Boost speed, and if I want to overclock, I fully understand the sacrifice will be more fan noise over the custom coolers, but the rest of the negatives regarding custom coolers makes the reference cooler more appealing to me.
  • venk90 - Thursday, September 18, 2014 - link

    The GTX 980 page on NVIDIA website seems to indicate HDMI 1.4 as it says 3840*2160 at 30 Hz over HDMI (it is mentioned as a foot note). Are you sure about it being HDMI 2.0 ?
  • Ryan Smith - Thursday, September 18, 2014 - link

    Yes. I've confirmed it in writing and in person.
  • vegitto4 - Thursday, September 18, 2014 - link

    Hi Ryan, great review! There will be the usual HTPC perspective? For example, did they fix the 23.976 refresh rate as Haswell does? I think it's important to know how these work as htpc cards. Regards
  • Ryan Smith - Thursday, September 18, 2014 - link

    For this article there will not. These cards aren't your traditional HTPC cards. However we can possibly look into it for next week's follow-up.
  • chizow - Friday, September 19, 2014 - link

    I think the definition of HTPC is beginning to change though, and while these may not yet fit into traditional HTPC (Brix and NUC seem to be filling this niche more), they are definitely right in the SteamBox/BattleBox category.

    Honestly, SteamBox was the first thing that came to mind when I saw that 165W TDP on the GTX 980, we will be seeing a lot of GM204 variants in the upcoming years in SFF, LAN, SteamBox and gaming laptop form factors that is for sure.

Log in

Don't have an account? Sign up now