Simultaneous Multi-Projection: Reusing Geometry on the Cheap

In case you’ve missed the memo, 2016 is the year of virtual reality headsets in the PC gaming space, and both NVIDIA and AMD are pushing the concept hard. From a market perspective VR is seen as the “next great thing,” but more importantly from a technical perspective, VR demands much better GPU performance, and those performance requirements are only going to skyrocket as VR headsets get better. Today’s 2160x1200 VR headsets already require 233MPix/second rendered, and future headsets that operate at higher resolutions and refresh rates are likely to push that to 1GPix/second, if not higher. Consequently if VR takes off with the broader public, it’s going to be a gold rush for AMD and NVIDIA, but it also means that to get to 1GPix/second, they need to pull out all of the stops to deliver better performance.

This brings us to Pascal’s final marquee feature: Simultaneous Multi-Projection (SMP). Although its applications are more involved than just VR – and we’ll cover those more in a bit – VR is the most immediate and applicable use case for the technology.

So just what is SMP? To answer that question, we first need to take a short step back one generation to Maxwell 2. With Maxwell 2, NVIDIA introduced a feature called Multi-Projection Acceleration (MPA) as part of their larger emphasis on voxel acceleration. With MPA, Maxwell 2 could replay the scene geometry to up to 6 viewports in a single pass, essentially reusing the geometry. The benefit of this technology was that instead of having to setup the scene geometry 6 times, Maxwell 2 could save significant time and resources by only doing it once. This was one of the keys in making voxel acceleration practical, as the very nature of the 6 sided voxel meant that it would otherwise be redoing a lot of work.

Simultaneous Multi-Projection then can be thought of as Multi-Projection Acceleration grown up. The fundamental idea is still the same – replay geometry across multiple viewports for efficiency reasons – but rather than a cool hack, it’s now a fully-fledged out and far more flexible feature. Whereas MPA had a much more limited number of viewports and only supported fixed 90 degree angles – a result of the neat sign bit hack NVIDIA used to make it work – SMP supports a much larger number of viewports and arbitrary angles, making it useful for much more than just voxels and other cubic data structures.

SMP in turn is a function of the new PolyMorph Engine 4.0, one of the few graphical subsystems of Pascal to receive a feature update versus Maxwell 2. NVIDIA’s slide on the matter is especially helpful here, showing where SMP fits into the standard rendering workflow. After all of the geometry work is done – triangle setup and any tessellation or vertex shading – SMP can then step in and reproject the geometry as desired before being sent out to rasterization to pixels.

How NVIDIA is doing this so efficiently is their secret sauce for now, but I’m told that the resource cost of using SMP is miniscule. What I do know is that with Pascal and the PolyMorph Engine 4.0, the rasterizer is being called "quasi-programmable," so there is some new flexibility in there NVIDIA is exploiting for SMP.

Under the hood, SMP combines two slightly different but closely related features. The first of course is geometry reprojection; SMP can reproject geometry to up to 16 viewports. Each viewport can, in turn, be set to an arbitrary angle, varying in both tilt and rotation.

The second feature is that SMP can also reproject geometry around a second viewpoint. This is slightly different from basic geometry reprojection as we’re not just adjusting the angle of the view, but the view is being shifted entirely. In this case the view can be shifted along the X-axis, allowing for a second viewpoint to be cheaply created without actually setting up the geometry twice.

As for why you’d want to generate two viewpoints, the big use case is virtual reality. VR requires two viewpoints, one for each eye. Without SMP, this requires doing a full geometry pass twice, once for each eye. But with SMP, this is reduced to a single geometry pass.

Overall, SMP exists as an efficiency measure. There is technically nothing it can do that couldn’t be done without SMP – GPUs are flexible enough without it – however the scenarios SMP is envisioned for are all about executing them more efficiently by skipping a geometry and or/compute shader passes.

The actual efficiency gains, in turn, will depend on where the bottlenecks are and how much geometry setup is being avoided by reprojecting it. In the extreme case, 2 viewpoints combined with 16 viewports would allow geometry setup to happen a single time, versus 32 times in a naive setup. But that said, to go back to our VR example, geometry reprojection on its own doesn’t eliminate the need to generate pixels; a straightforward rendering pipeline still requires shading and rendering 233MPix every second. So SMP’s geometry reprojection abilities are most potent when it’s geometry that’s the bottleneck, which at least historically has not been the case for NVIDIA GPUs.

With all of that said, SMP is a fairly broad-reaching technology, and NVIDIA is in a sense chomping at the bit to find good ways to put it to use. The immediate geometry efficiency gains aside, the company has several different ideas on the table on how to use the technology. This include some novel uses that allow geometry reprojection to either replace compute shader tasks or otherwise alter the rendering pipeline, allowing for reduced pixel workloads, amplifying the total performance impact of SMP.

When it comes to VR, NVIDIA has two SMP-powered technologies that they are making available to developers. The first, dubbed Single Pass Stereo, is essentially the full implementation of the above VR scenario. Besides using SMP to reproject the scene geometry across multiple viewpoints and viewports, Single Pass Stereo also encompasses optimizations at the scene submission and driver/OS stage. In this case, developers using Single Pass Stereo need only submit the scene once, and the driver will take care of setting up the second instance for the second eye. Maxwell 2 already supported the application-side optimizations, as the CPU benefits of the scene submission optimization alone can be quite significant, but that architecture still required the GPU to setup the geometry twice. However with Pascal this has been bundled with SMP so that not only is a scene only submitted to the driver once, but the GPU also only has to setup the geometry once.

The other VR-centric technology being exposed to developers is what NVIDIA calls Lens Matched Shading, and this is one of those more novel uses where SMP’s geometry reprojection can be used to avoid pixel shading work farther down the line. Lens Matched Shading is based around the physical properties of the lenses in a VR headset, which because they warp the view coming out of them, requires the OLED screen in a VR headset to be fed an oppositely warped view. In practice, Lens Matched Shading is the successor to NVIDIA’s earlier Multi-Res Shading technology for Maxwell 2, which tried something similar within the greater limitations of the Maxwell 2 architecture.

Briefly, in a naïve rendering implementation, warping an image for a VR headset is done in a compute shader. Due to the optical properties of the lenses, the edges of the warped image contain less detail than the center of the lens. However in a straightforward flat projection, the entire frame must be rendered to be correctly warped. In practice this means that the edges are unnecessarily oversampled, wasting rendering resources on detail that will never be seen.

Lens Matched Shading in turn uses SMP to subdivide each eye into 4 viewports (or as NVJDIA calls them, quadrants), in an effort to mimic the shape of the lens. Done correctly, this reduces the number of pixels that need to be drawn because the combined viewports more closely match the desired warped image. In NVIDIA’s in-house developed Barbarian demo, they were able to reduce the number of pixels drawn per frame per eye from 2.1Mpix to 1.4Mpix, a 50% reduction in the number of pixels rendered. This is still more pixels than a perfect implementation – where only 1.1Mpix are required – but it none the less represents a significant decrease in the pixel rendering workload as an indirect result of SMP.

This is also why you’ll occasionally see NVIDIA touting the VR performance gains of various Pascal-powered video cards as being far greater than the raw increase in rendering hardware. In these cases NVIDIA is factoring in the expected performance gains from using SMP and Lens Matched Shading to reduce the rendering workload relative to an optimized implementation.

Moving on, the other major display optimization scenario NVIDIA is pushing with SMP is centered around traditional 2D displays. With curved displays or multi-monitor setups where the displays are angled to emulate a curved display, a flat projection is technically incorrect relative to the viewer. What the viewer should be seeing is essentially a wider field of view mapped to the display setup.

With most games this problem isn’t corrected for, as doing so would be too expensive. With a single viewport the only option is to render the scene at a very high resolution and then use a compute shader to warp it to the screen(s), invoking the overdraw problems mentioned above with VR. More practically, the scene could be rendered once for each monitor, avoiding the overdraw, but then you instead have the overhead of rendering a scene multiple times.

So for Pascal NVIDIA is introducing a 2D display feature they’re calling Perspective Surround. As you can most likely guess from the lead-up to this feature, Perspective Surround uses SMP’s geometry reprojection capabilities to efficiently create multiple viewports to get around the overdraw issues. In this case NVIDIA uses a projection for each monitor (e.g. 3 projections) in order to render a perspective-correct view on each monitor.

Like SMP’s VR features, Perspective Surround is a feature that requires developers to code specifically for it, so it can’t universally be enabled for all multi-monitor setups. Instead developers will need to go through NVIDIA’s respective SMP API in order to tell the GPU how to properly setup the scene.

Preemption Improved: Fine-Grained Preemption for Time-Critical Tasks Display Matters: New Display Controller, HDR, & HEVC
Comments Locked

200 Comments

View All Comments

  • Ninhalem - Wednesday, July 20, 2016 - link

    There's 32 freaking pages in this review. Maybe people have other jobs instead of writing all day long. Did you ever think of that?

    I'll take quality and a long publishing time over crap and rushing out the door.
  • Stuka87 - Wednesday, July 20, 2016 - link

    Thanks for the extremely in depth review Ryan!
  • cknobman - Wednesday, July 20, 2016 - link

    I cannot help feel just a bit underwhelmed.

    Of course these Nvidia cards kick some major butt in games that have always favored Nvidia but I noticed that in games not specifically coded to take advantage of Nvidia and furthermore games with DX12 that these cards performance advantage is minimal at best vs an old Fury X with half the video RAM.

    Then when you take into account Vulcan API and newer DX12 games (which can be found elsewhere) you see that the prices for these cards is a tad ridiculous and the performance advantage starts to melt away.

    I am waiting for AMD to release their next "big gun" before I make a purchase decision.
    I'm rocking a 4k monitor right now and 60fps at that resolution is my target.
  • nathanddrews - Wednesday, July 20, 2016 - link

    1080 is close to being that 4K60 card, but can't quite cut it. I'm waiting for "Big Vega" vs 1080Ti before dropping any money.
  • lefty2 - Wednesday, July 20, 2016 - link

    Great review - one of the few that highlights the fact the Pascal async compute is only half as good as AMD's version. Async compute is a key feature for increasing performance in DX12 and Vulkan and that's going to allow the RX 480 to perform well against the GTX 1060
  • Daniel Egger - Wednesday, July 20, 2016 - link

    "... why the memory controller organization of GP104 is 8x32b instead of 4x64b like GM204"

    Sounds like it's the other way around.
  • Ryan Smith - Wednesday, July 20, 2016 - link

    No, that's correct. 8 32bit wide controllers rather than 4 64bit wide controllers.

    http://images.anandtech.com/doci/10325/GeForce_GTX...

    http://images.anandtech.com/doci/8526/GeForce_GTX_...
  • DominionSeraph - Wednesday, July 20, 2016 - link

    >It has taken about 2 years longer than we’d normally see

    ... for a review of a flagship card to come out
  • sgeocla - Wednesday, July 20, 2016 - link

    The old Maxwell was so optimized it was always full and didn't even need Async Compute. The new Pascal is so much more optimized that it even has time to create the "holes" in execution (not counting the ones in your pocket) that were "missing" in the old architecture to be able to benefit for Async Compute. Expect Volta to create even more holes (with hardware support) for Async Compute to fill.
  • tipoo - Wednesday, July 20, 2016 - link

    That's demonstrably untrue.

    http://www.futuremark.com/pressreleases/a-closer-l...

    Plenty of holes that could have been filled in Maxwell.

Log in

Don't have an account? Sign up now