Maxwell 2’s New Features: Direct3D 11.3 & VXGI

When NVIDIA introduced the Maxwell 1 architecture and the GM107 based GTX 750 series, one of the unexpected aspects of their decision was to release these parts as members of the existing 700 series rather than a newer series to communicate a difference in features. However as it turned out there really wasn’t a feature difference between it and Kepler; other than a newer NVENC block, Maxwell 1 was for all intents and purposes an optimized Kepler architecture. It was the same features built upon the efficiency improvements of the Maxwell architecture.

With that in mind, along with the hardware/architectural changes we’ve listed earlier, the other big factor that sets Maxwell 2 apart from Maxwell 1 is its feature set. In that respect Maxwell 2 is almost a half-generational update on its own, as it implements a number of new features that were not present in Maxwell 1. This means Maxwell 2 is bringing some new features that we need to cover, but it also means that the GM204 based GTX 900 series is feature differentiated from the GTX 600/700 series in a way that the earlier GTX 750 series was not.

Direct3D 11.3

First and foremost among Maxwell 2’s new features is the inclusion of full Direct3D 11.2/11.3 compatibility. Kepler and Maxwell 1 before it were officially feature level 11_0, but they contained an almost complete set of FL 11_1 features, allowing most of these features to be accessed through cap bits. With Maxwell 2 however, NVIDIA has finally implemented the remaining features required for FL11_1 compatibility and beyond, updating their architecture to support the 16x raster coverage sampling required for Target Independent Rasterization and UAVOnlyRenderingForcedSampleCount.

This extended feature set also extends to Direct3D 11.2, which although it doesn’t have an official feature level of its own, does introduce some new (and otherwise optional) features that are accessed via cap bits. Key among these, Maxwell 2 will support the more advanced Tier 2 tiled resources, otherwise known as sparse textures or partially resident textures. Tier 2 was introduced into the specification to differentiate the more capable AMD implementation of this feature from NVIDIA’s hardware, and now as of Maxwell 2 NVIDIA can support the more advanced functionality required for Tier 2.

Finally, Maxwell will also support the features being introduced in Direct3D 11.3 (and made available to D3D 12), which was announced alongside Maxwell at NVIDIA’s editors’ day event. We have a separate article covering Direct3D 11.3, so we won’t completely retread that ground here, but we will cover the highlights.

The forthcoming Direct3D 11.3 features, which will form the basis (but not entirety) of what’s expected to be feature level 11_3, are Rasterizer Ordered Views, Typed UAV Load, Volume Tiled Resources, and Conservative Rasterization. Maxwell 2 will offer full support for these forthcoming features, and of these features the inclusion of volume tiled resources and conservative rasterization is seen as being especially important by NVIDIA, particularly since NVIDIA is building further technologies off of them.

Volume tiled resources is for all intents and purposes tiled resources extended into the 3rd dimension. Volume tiled resources are primarily meant to be used with 3D/volumetric pixels (voxels), with the idea being that with sparse allocation, volume tiles that do not contain any useful information can avoid being allocated, avoiding tying up memory in tiles that will never be used or accessed. This kind of sparse allocation is necessary to make certain kinds of voxel techniques viable.

Meanwhile conservative rasterization is also new to Maxwell 2. Conservative rasterization is essentially a more accurate but performance intensive solution to figuring out whether a polygon covers part of a pixel. Instead of doing a quick and simple test to see if the center of the pixel is bounded by the lines of the polygon, conservative rasterization checks whether the pixel covers the polygon by testing it against the corners of the pixel. This means that conservative rasterization will catch cases where a polygon was too small to cover the center of a pixel, which results in a more accurate outcome, be it better identifying pixels a polygon resides in, or finding polygons too small to cover the center of any pixel at all.

Conservative rasterization is being added to Direct3D in order to allow new algorithms to be used which would fail under the imprecise nature of point sampling. Like VTR, voxels play a big part here as conservative rasterization can be used to build a voxel. However it also has use cases in more accurate tiling and even collision detection. This feature is technically possible in existing hardware, but the performance of such an implementation would be very low as it’s essentially a workaround for the lack of necessary support in the rasterizers. By implementing conservative rasterization directly in hardware, Maxwell 2 will be able to perform the task far more quickly, which is necessary to make the resulting algorithms built on top of this functionality fast enough to be usable.

VXGI

Outside of the features covered by Direct3D 11.3, NVIDIA will also be adding features specifically to drive a new technology they are calling Voxel accelerated Global Illumination (VXGI).

At the highest level, VXGI is a manner of implementing global illumination by utilizing voxels in the calculations. Global illumination is something of a holy grail for computer graphics, as it can produce highly realistic and accurate lighting dynamically in real time. However global illumination is also very expensive, the path tracing involved taking up considerable time and resources. For this reason developers have played around with global illumination in the past – the original version of Epic’s Unreal 4 Engine Elemental demo implanted a voxel based global illumination method, for example – but it has always been too slow for practical use.

With VXGI NVIDIA is looking to solve the voxel global illumination problem through a combination of software and hardware. VXGI proper is the software component, and describes the algorithm being used. NVIDIA has been doing considerable research into voxel based global illumination over the years, and has finally reached a point where they have an algorithm ready to go in the form of VXGI.

VXGI will eventually be made available for Unreal Engine 4 and other major game engines starting in Q4 of this year. And while the VXGI greatly benefits from the hardware features built into Maxwell 2, it is not strictly reliant on the hardware and can be implemented through more traditional means on existing hardware. VXGI is if nothing else scalable, with the algorithm being designed to scale up and down with hardware by adjusting the density of the voxel grid, which in turn influences the number of calculations required and the resulting accuracy. Maxwell 2 for its part will be capable of using denser grids due to its hardware acceleration capabilities, allowing for better performance and more accurate lighting.

It’s at this point we’ll take a break and apologize to NVIDIA’s engineers for blowing through VXGI so quickly. This is actually a really interesting technology, as global illumination offers the possibility of finally attaining realistic real-time lighting in any kind of rendered environment. However VXGI is also a complex technology that is a subject in and of itself, and we could spend all day just covering it (we’d need to cover rasterization and path tracing to fully explain it). Instead we’d suggest reading NVIDIA’s own article on the technology once that is posted, as NVIDIA is ready and willing to go into great detail in how the technology works.

Getting back to today’s launch then, the other aspect of VXGI is the hardware features that NVIDIA has implemented to accelerate the technology. Though a big part of VXGI remains brute forcing through the path and cone tracing, the other major aspect of VXGI is building the voxel grids used in these calculations. It’s here where NVIDIA has pulled together the D3D 11.3 feature set, along with additional hardware features, to greatly accelerate the process of creating the voxel grid in order to speed up the overall algorithm.

From the D3D 11.3 feature set, conservative rasterization and volumetric tiled resources will play a big part. Conservative rasterization allows the creation of more accurate voxels, owing to the more accurate determination of whether a pixel/voxel covers a given polygon. Meanwhile volumetric tiled resources will allow for the space efficient storage of voxels, allowing software to store only the voxels it needs and not the many empty voxels that would otherwise be present in a scene.

Joining these features as the final VXGI-centric feature for Maxwell 2 is a feature NVIDIA is calling Multi-Projection Acceleration. The idea behind MPA is that there are certain scenarios where the same geometry needs to be projected multiple times – voxels being a big case of this due to being 6 sided – and that for performance reasons it is desirable to do all of these projections much more quickly than simply iterating though every necessary projection in shaders. In these scenarios being able to quickly project geometry to all the necessary surfaces is a significant performance advantage.

A big part of MPA is a sub-feature called viewport multicast. In viewport multicast Maxwell 2 can replay the necessary geometry to all of the viewports in a single pass. At the hardware level this involves giving the hardware the ability to automatically determine when it needs to engage in viewport multicast, based on its understanding of the workload it's receiving. This is once again a case where something is being done in a fixed-function like fashion for performance reasons, rather than being shuffled off to slower shader hardware.

Alongside voxelization, NVIDIA tells us that MPA should also be applicable to cube map generation and shadow map generation. Both of which make plenty of sense in this case: in both scenarios you are projecting the same geometry multiple times, whether it’s to faces of a cube or to shadow maps of increasing resolution. As a result MPA should have some benefits even in renderers that aren’t using VXGI, though clearly the greatest benefits are still going to be when VXGI is in play.

NVIDIA believes that the overall performance improvement to voxelization from these technologies will be very significant. In their own testing of the technology in rendering a scene set in San Miguel de Allende, Mexico (a common test scene for global illumination), NVIDIA has found that Maxwell 2’s hardware acceleration features tripled their voxelization performance.

Overall NVIDIA is heavily betting on VXGI at this time both to further set apart Maxwell 2 based cards from the competition, and to further advance the state of PC graphics. In the gaming space in particular NVIDIA has a significant interest in making sure PC games aren’t just straight console ports that run at higher framerates and resolutions. This is the situation that has spurred on the development of GameWorks and technologies like VXGI, so that game developers can enhance the PC ports of their games with technologies that improve their overall rendering quality. Maxwell 2 in turn is the realization that while some of these features can be performed in software/shaders on today’s hardware, these features will be even more useful and impressive when backed with dedicated hardware to improve their performance.

Finally, we’ll close out our look at VXGI with a preview of NVIDIA’s GTX 900 series tech demo, which is a rendered recreation of a photo/scene involving Buzz Aldrin and the Apollo 11 moon landing. The Apollo 11 demo is designed to show off the full capabilities of VXGI, utilizing the lighting technique to correctly and dynamically emulate specular, diffuse, and other forms of lighting that occur in reality. At editors’ day NVIDIA originally attempted to pass off the rendering as the original photo, and while after a moment it’s clear that it’s a rendering – among other things it lacks the graininess of a 1969 film based camera – it comes very, very close. In showcasing the Apollo 11 tech demo, NVIDIA’s hope is that one day games will be able to achieve similarly accurate lighting effects through the use of VXGI.

Maxwell 2 Architecture: Introducing GM204 Display Matters: HDMI 2.0, HEVC, & VR Direct
Comments Locked

274 Comments

View All Comments

  • mesahusa - Tuesday, September 23, 2014 - link

    Nvidia and AMD havent moved to 22 because they don't have the funding. Intel has tens of billions to blow away in R&D. Broadwells going to be released in 2015, and its 14nm.
  • Hrel - Monday, September 22, 2014 - link

    In light of Nvidia not even trying to reduce manufacturing nodes It would be really nice to see them go on the offensive in the price war. $300 for the GTX980, everything lower from there. Probably not now, but like, spring 2015, that'd be great! Make good and sure to wipe out all the hold outs (like myself) keeping their old cards because they still play everything they play on 1080p. Kinda, get all your customers caught up on hardware in the same upgrade time frame.

    Then when they finally do drop nodes they can focus on making every card they sell run games at 8K resolution.
  • Nfarce - Monday, September 22, 2014 - link

    Hate to break the news to you, but if you want to game at high level (above 1080p), you need to pay at high level. There is nothing new about that in the entire history of PC enthusiast building and gaming either for those of us who remember making the "huge leap" from a 15" 1024x768 resolution CRT monitor to a whopping 19" 1600x1200 CRT monitor. At least not in the 20 years since I've been involved with it anyway.

    Besides all that, that's why GPU makers offer cards for different budgets. If you can't afford their top tier products, you can't afford to game top tier. Period and end of discussion.
  • tuxRoller - Monday, September 22, 2014 - link

    It seems as though the big improvement nvidia has made is to enable cpu-level scheduling/dvfs granularity into their chip. However, once all cores are engaged it ends up using as much power as its predecessor (see tomshardware).
    What I really want to know is how much of this due to purely driver-level changes.
  • yhselp - Tuesday, September 23, 2014 - link

    Exceptional design. The sad thing is that NVIDIA will take forever to release a 30 SMM Maxwell GPU and once it finally does, it would cost a ton; even later on when they release a "budget" version for an unreasonable price of around $650 it would be too late - the great performance potential of today wouldn't be so great tomorrow. Striving for and building amazing GPUs is the right way forward, not empowering the people with them is a crime. Whatever happened to $500 flagship products?
  • Rhodie - Wednesday, September 24, 2014 - link

    Just got a GTX970, and only latest Nvidia drivers will install for 9xx series cards it seems. Unfortunately the latest drivers totally screw up some programs that use CUDA, seem to hide its
    presence from programs lile Xillisoft Video Convertor Ultimate:-/ No response of course from either Nvidia or Xillisoft regarding the problem. Wonder how many other programs the drivers break?
  • garadante - Thursday, September 25, 2014 - link

    Geeze. Anandtech, do an updated best value graphics card list because since the launch of the 970/980 retailers are giving some serious price cuts to 770/780/780 Ti's. Newegg has a 780 for less than $300 after rebate and just a hair over $300 before rebate. I'm seeing 780 Ti's for ~$430 and 770s for ~$240. I am amazed to see price cuts this deep since I haven't seen them the last several generations and considering how overpriced these cards were. But while supplies last and prices hold/drop, this completely flips price/performance on it's head. I feel bad recommending an AMD 290 Tri-X to a friend a couple months back now. xD
  • garadante - Thursday, September 25, 2014 - link

    Please do an updates best value graphics card* Where are my manners! D:
  • jman9295 - Friday, September 26, 2014 - link

    Newegg has an Asus DirectCU II GTX 780 selling in the $290 range after a mail in rebate, promo code and discount. It also comes with a pre-order copy of the new Borderlands game. That has to be the best value to performance GPU out right now. It is almost a full $100 less than the cheapest non-reference R9 290 on newegg and $40 less than the cheapest reference R9 290 which is crazy since this same Asus GTX 780 was selling for over $550 just last month with no free games (and still is on Amazon for some reason).
  • mixer4x - Thursday, September 25, 2014 - link

    I feel bad for having just bought the 290 tri-x just a month ago! =(
    I bought it because you never know when the new cards will be released and how much they will cost. Unfortunately, the new cards came out too soon!

Log in

Don't have an account? Sign up now