Maxwell 2’s New Features: Direct3D 11.3 & VXGI

When NVIDIA introduced the Maxwell 1 architecture and the GM107 based GTX 750 series, one of the unexpected aspects of their decision was to release these parts as members of the existing 700 series rather than a newer series to communicate a difference in features. However as it turned out there really wasn’t a feature difference between it and Kepler; other than a newer NVENC block, Maxwell 1 was for all intents and purposes an optimized Kepler architecture. It was the same features built upon the efficiency improvements of the Maxwell architecture.

With that in mind, along with the hardware/architectural changes we’ve listed earlier, the other big factor that sets Maxwell 2 apart from Maxwell 1 is its feature set. In that respect Maxwell 2 is almost a half-generational update on its own, as it implements a number of new features that were not present in Maxwell 1. This means Maxwell 2 is bringing some new features that we need to cover, but it also means that the GM204 based GTX 900 series is feature differentiated from the GTX 600/700 series in a way that the earlier GTX 750 series was not.

Direct3D 11.3

First and foremost among Maxwell 2’s new features is the inclusion of full Direct3D 11.2/11.3 compatibility. Kepler and Maxwell 1 before it were officially feature level 11_0, but they contained an almost complete set of FL 11_1 features, allowing most of these features to be accessed through cap bits. With Maxwell 2 however, NVIDIA has finally implemented the remaining features required for FL11_1 compatibility and beyond, updating their architecture to support the 16x raster coverage sampling required for Target Independent Rasterization and UAVOnlyRenderingForcedSampleCount.

This extended feature set also extends to Direct3D 11.2, which although it doesn’t have an official feature level of its own, does introduce some new (and otherwise optional) features that are accessed via cap bits. Key among these, Maxwell 2 will support the more advanced Tier 2 tiled resources, otherwise known as sparse textures or partially resident textures. Tier 2 was introduced into the specification to differentiate the more capable AMD implementation of this feature from NVIDIA’s hardware, and now as of Maxwell 2 NVIDIA can support the more advanced functionality required for Tier 2.

Finally, Maxwell will also support the features being introduced in Direct3D 11.3 (and made available to D3D 12), which was announced alongside Maxwell at NVIDIA’s editors’ day event. We have a separate article covering Direct3D 11.3, so we won’t completely retread that ground here, but we will cover the highlights.

The forthcoming Direct3D 11.3 features, which will form the basis (but not entirety) of what’s expected to be feature level 11_3, are Rasterizer Ordered Views, Typed UAV Load, Volume Tiled Resources, and Conservative Rasterization. Maxwell 2 will offer full support for these forthcoming features, and of these features the inclusion of volume tiled resources and conservative rasterization is seen as being especially important by NVIDIA, particularly since NVIDIA is building further technologies off of them.

Volume tiled resources is for all intents and purposes tiled resources extended into the 3rd dimension. Volume tiled resources are primarily meant to be used with 3D/volumetric pixels (voxels), with the idea being that with sparse allocation, volume tiles that do not contain any useful information can avoid being allocated, avoiding tying up memory in tiles that will never be used or accessed. This kind of sparse allocation is necessary to make certain kinds of voxel techniques viable.

Meanwhile conservative rasterization is also new to Maxwell 2. Conservative rasterization is essentially a more accurate but performance intensive solution to figuring out whether a polygon covers part of a pixel. Instead of doing a quick and simple test to see if the center of the pixel is bounded by the lines of the polygon, conservative rasterization checks whether the pixel covers the polygon by testing it against the corners of the pixel. This means that conservative rasterization will catch cases where a polygon was too small to cover the center of a pixel, which results in a more accurate outcome, be it better identifying pixels a polygon resides in, or finding polygons too small to cover the center of any pixel at all.

Conservative rasterization is being added to Direct3D in order to allow new algorithms to be used which would fail under the imprecise nature of point sampling. Like VTR, voxels play a big part here as conservative rasterization can be used to build a voxel. However it also has use cases in more accurate tiling and even collision detection. This feature is technically possible in existing hardware, but the performance of such an implementation would be very low as it’s essentially a workaround for the lack of necessary support in the rasterizers. By implementing conservative rasterization directly in hardware, Maxwell 2 will be able to perform the task far more quickly, which is necessary to make the resulting algorithms built on top of this functionality fast enough to be usable.


Outside of the features covered by Direct3D 11.3, NVIDIA will also be adding features specifically to drive a new technology they are calling Voxel accelerated Global Illumination (VXGI).

At the highest level, VXGI is a manner of implementing global illumination by utilizing voxels in the calculations. Global illumination is something of a holy grail for computer graphics, as it can produce highly realistic and accurate lighting dynamically in real time. However global illumination is also very expensive, the path tracing involved taking up considerable time and resources. For this reason developers have played around with global illumination in the past – the original version of Epic’s Unreal 4 Engine Elemental demo implanted a voxel based global illumination method, for example – but it has always been too slow for practical use.

With VXGI NVIDIA is looking to solve the voxel global illumination problem through a combination of software and hardware. VXGI proper is the software component, and describes the algorithm being used. NVIDIA has been doing considerable research into voxel based global illumination over the years, and has finally reached a point where they have an algorithm ready to go in the form of VXGI.

VXGI will eventually be made available for Unreal Engine 4 and other major game engines starting in Q4 of this year. And while the VXGI greatly benefits from the hardware features built into Maxwell 2, it is not strictly reliant on the hardware and can be implemented through more traditional means on existing hardware. VXGI is if nothing else scalable, with the algorithm being designed to scale up and down with hardware by adjusting the density of the voxel grid, which in turn influences the number of calculations required and the resulting accuracy. Maxwell 2 for its part will be capable of using denser grids due to its hardware acceleration capabilities, allowing for better performance and more accurate lighting.

It’s at this point we’ll take a break and apologize to NVIDIA’s engineers for blowing through VXGI so quickly. This is actually a really interesting technology, as global illumination offers the possibility of finally attaining realistic real-time lighting in any kind of rendered environment. However VXGI is also a complex technology that is a subject in and of itself, and we could spend all day just covering it (we’d need to cover rasterization and path tracing to fully explain it). Instead we’d suggest reading NVIDIA’s own article on the technology once that is posted, as NVIDIA is ready and willing to go into great detail in how the technology works.

Getting back to today’s launch then, the other aspect of VXGI is the hardware features that NVIDIA has implemented to accelerate the technology. Though a big part of VXGI remains brute forcing through the path and cone tracing, the other major aspect of VXGI is building the voxel grids used in these calculations. It’s here where NVIDIA has pulled together the D3D 11.3 feature set, along with additional hardware features, to greatly accelerate the process of creating the voxel grid in order to speed up the overall algorithm.

From the D3D 11.3 feature set, conservative rasterization and volumetric tiled resources will play a big part. Conservative rasterization allows the creation of more accurate voxels, owing to the more accurate determination of whether a pixel/voxel covers a given polygon. Meanwhile volumetric tiled resources will allow for the space efficient storage of voxels, allowing software to store only the voxels it needs and not the many empty voxels that would otherwise be present in a scene.

Joining these features as the final VXGI-centric feature for Maxwell 2 is a feature NVIDIA is calling Multi-Projection Acceleration. The idea behind MPA is that there are certain scenarios where the same geometry needs to be projected multiple times – voxels being a big case of this due to being 6 sided – and that for performance reasons it is desirable to do all of these projections much more quickly than simply iterating though every necessary projection in shaders. In these scenarios being able to quickly project geometry to all the necessary surfaces is a significant performance advantage.

A big part of MPA is a sub-feature called viewport multicast. In viewport multicast Maxwell 2 can replay the necessary geometry to all of the viewports in a single pass. At the hardware level this involves giving the hardware the ability to automatically determine when it needs to engage in viewport multicast, based on its understanding of the workload it's receiving. This is once again a case where something is being done in a fixed-function like fashion for performance reasons, rather than being shuffled off to slower shader hardware.

Alongside voxelization, NVIDIA tells us that MPA should also be applicable to cube map generation and shadow map generation. Both of which make plenty of sense in this case: in both scenarios you are projecting the same geometry multiple times, whether it’s to faces of a cube or to shadow maps of increasing resolution. As a result MPA should have some benefits even in renderers that aren’t using VXGI, though clearly the greatest benefits are still going to be when VXGI is in play.

NVIDIA believes that the overall performance improvement to voxelization from these technologies will be very significant. In their own testing of the technology in rendering a scene set in San Miguel de Allende, Mexico (a common test scene for global illumination), NVIDIA has found that Maxwell 2’s hardware acceleration features tripled their voxelization performance.

Overall NVIDIA is heavily betting on VXGI at this time both to further set apart Maxwell 2 based cards from the competition, and to further advance the state of PC graphics. In the gaming space in particular NVIDIA has a significant interest in making sure PC games aren’t just straight console ports that run at higher framerates and resolutions. This is the situation that has spurred on the development of GameWorks and technologies like VXGI, so that game developers can enhance the PC ports of their games with technologies that improve their overall rendering quality. Maxwell 2 in turn is the realization that while some of these features can be performed in software/shaders on today’s hardware, these features will be even more useful and impressive when backed with dedicated hardware to improve their performance.

Finally, we’ll close out our look at VXGI with a preview of NVIDIA’s GTX 900 series tech demo, which is a rendered recreation of a photo/scene involving Buzz Aldrin and the Apollo 11 moon landing. The Apollo 11 demo is designed to show off the full capabilities of VXGI, utilizing the lighting technique to correctly and dynamically emulate specular, diffuse, and other forms of lighting that occur in reality. At editors’ day NVIDIA originally attempted to pass off the rendering as the original photo, and while after a moment it’s clear that it’s a rendering – among other things it lacks the graininess of a 1969 film based camera – it comes very, very close. In showcasing the Apollo 11 tech demo, NVIDIA’s hope is that one day games will be able to achieve similarly accurate lighting effects through the use of VXGI.

Maxwell 2 Architecture: Introducing GM204 Display Matters: HDMI 2.0, HEVC, & VR Direct


View All Comments

  • garadante - Thursday, September 25, 2014 - link

    Yeah. To be honest nobody except ardent Nvidia fanboys would've believed Nvidia would release cards as performance and price competitive as they did, especially the 970. The 980 is honestly a little overpriced compared to a few generations ago as they'll slap a $200 premium on it for Big Maxwell but $330 MSRP for the 970 (if I remember correctly) wasn't bad at all, for generally what, 290/780/290X performance? Reply
  • tuxRoller - Friday, September 26, 2014 - link

    It's not too surprising as we saw what the 750ti was like.
    What is disappointing, though, is that I thought nvidia had made some fundamental breakthrough in their designs where, instead, it looks as though they "simply" enabled a better governor.
  • garadante - Friday, September 26, 2014 - link

    It'll be interesting to see how the efficiency suffers once nvidia releases a proper compute die with area dedicated to double precision FP. I have to keep in mind that when factoring in the stripped down die compared to AMD's 290/290X cards, the results aren't as competition. Lowing as they first seem. But if AMD can't counter these cards with their own stripped down gaming only cards then nvidia took the win this generation. Reply
  • tuxRoller - Friday, September 26, 2014 - link

    That's an excellent point. I take it you already read the tomshardware review? They're compute performance/W is still good, but not so unbelievable as their gaming performance, but I'm not sure it's b/c this is a gaming only card. Regardless, though, amd needs to offer something better than what's currently available. Unfortunately, I don't think they will be able to do it. There was a lot of driver work than went into making these maxwell cards hum Reply
  • garadante - Friday, September 26, 2014 - link

    One thing that really bothers me though is how Anandtech keeps testing the 290/290X with reference cards. Those cards run at 95 C due to the fan control profile in the BIOS and I remember seeing that when people ran those cards with decent nonreference cooling in the 70 C range that power consumption was 15-20+ watts lower. So an AMD die that sacrifices FP64 performance to focus on FP32(gaming, some compute) performance as well as decreasing die size due to the lack of FP64 resources seems like it could be a lot more competitive with Maxwell than people are making it out to be. I have this feeling that the people saying how badly Maxwell trounces AMD's efficiency and that AMD can't possibly hope to catch up are too biased in their thinking. Reply
  • tuxRoller - Saturday, September 27, 2014 - link

    Do you have a link to those reviews that show non-reference fans make gpus more efficient? I don't know how that could be possible. Given the temps we're looking at the effects on the conductors should be very, very small.
    Regarding the reduction in fp performance and gaming efficiency, that's a good point. That may indeed be part of the reason why nvidia has the gaming/compute split (aside from the prices they can charge).
  • garadante - Sunday, September 28, 2014 - link

    Here's an example of a card with liquid cooling. Factor in the overclock that the nonreference card has and that it draws something like 20 watts less in Furmark and the same in 3Dmark. I could be mistaken on the improved power usage but I do recall seeing shortly after the 290X launch that nonreference coolers helped immensely, and power usage dropped as well. Sadly I don't believe Anandtech ever reviewed a nonreference 290X... which is mind boggling to consider, considering how much nonreference cooling helped that card, even outside of any potential power usage decreases. Reply
  • garadante - Sunday, September 28, 2014 - link Whoops, forgot the link. Reply
  • jman9295 - Friday, September 26, 2014 - link

    I wonder why they still give these cards these boring numbered names like GTX 980. Except for the Titan, these names kinda suck. Why not at least name it the Maxwell 980 or for AMD's R( 290 series the Hawaii 290. That sounds a lot cooler than GTX or R9. Also, for the last several generations, AMD and Nvidia's numbering system seems to be similar up until AMD ended that with the R9/R7 200 series. Before that, they had the GTX 700 and HD 7000 series, the GTX 600 and HD 6000 series and so on. Then, as soon as AMD changed it up, Nvidia decides to skip the GTX 800's for retail desktop GPUs and jump right up to the 900 series. Maybe they will come up with a fancier name for their next gen cards besides the GTX 1000's. Reply
  • AnnonymousCoward - Saturday, September 27, 2014 - link

    Naw, names are much harder to keep track of than numbers that inherently describe relative performance. Reply

Log in

Don't have an account? Sign up now