Simultaneous Multi-Projection: Reusing Geometry on the Cheap

In case you’ve missed the memo, 2016 is the year of virtual reality headsets in the PC gaming space, and both NVIDIA and AMD are pushing the concept hard. From a market perspective VR is seen as the “next great thing,” but more importantly from a technical perspective, VR demands much better GPU performance, and those performance requirements are only going to skyrocket as VR headsets get better. Today’s 2160x1200 VR headsets already require 233MPix/second rendered, and future headsets that operate at higher resolutions and refresh rates are likely to push that to 1GPix/second, if not higher. Consequently if VR takes off with the broader public, it’s going to be a gold rush for AMD and NVIDIA, but it also means that to get to 1GPix/second, they need to pull out all of the stops to deliver better performance.

This brings us to Pascal’s final marquee feature: Simultaneous Multi-Projection (SMP). Although its applications are more involved than just VR – and we’ll cover those more in a bit – VR is the most immediate and applicable use case for the technology.

So just what is SMP? To answer that question, we first need to take a short step back one generation to Maxwell 2. With Maxwell 2, NVIDIA introduced a feature called Multi-Projection Acceleration (MPA) as part of their larger emphasis on voxel acceleration. With MPA, Maxwell 2 could replay the scene geometry to up to 6 viewports in a single pass, essentially reusing the geometry. The benefit of this technology was that instead of having to setup the scene geometry 6 times, Maxwell 2 could save significant time and resources by only doing it once. This was one of the keys in making voxel acceleration practical, as the very nature of the 6 sided voxel meant that it would otherwise be redoing a lot of work.

Simultaneous Multi-Projection then can be thought of as Multi-Projection Acceleration grown up. The fundamental idea is still the same – replay geometry across multiple viewports for efficiency reasons – but rather than a cool hack, it’s now a fully-fledged out and far more flexible feature. Whereas MPA had a much more limited number of viewports and only supported fixed 90 degree angles – a result of the neat sign bit hack NVIDIA used to make it work – SMP supports a much larger number of viewports and arbitrary angles, making it useful for much more than just voxels and other cubic data structures.

SMP in turn is a function of the new PolyMorph Engine 4.0, one of the few graphical subsystems of Pascal to receive a feature update versus Maxwell 2. NVIDIA’s slide on the matter is especially helpful here, showing where SMP fits into the standard rendering workflow. After all of the geometry work is done – triangle setup and any tessellation or vertex shading – SMP can then step in and reproject the geometry as desired before being sent out to rasterization to pixels.

How NVIDIA is doing this so efficiently is their secret sauce for now, but I’m told that the resource cost of using SMP is miniscule. What I do know is that with Pascal and the PolyMorph Engine 4.0, the rasterizer is being called "quasi-programmable," so there is some new flexibility in there NVIDIA is exploiting for SMP.

Under the hood, SMP combines two slightly different but closely related features. The first of course is geometry reprojection; SMP can reproject geometry to up to 16 viewports. Each viewport can, in turn, be set to an arbitrary angle, varying in both tilt and rotation.

The second feature is that SMP can also reproject geometry around a second viewpoint. This is slightly different from basic geometry reprojection as we’re not just adjusting the angle of the view, but the view is being shifted entirely. In this case the view can be shifted along the X-axis, allowing for a second viewpoint to be cheaply created without actually setting up the geometry twice.

As for why you’d want to generate two viewpoints, the big use case is virtual reality. VR requires two viewpoints, one for each eye. Without SMP, this requires doing a full geometry pass twice, once for each eye. But with SMP, this is reduced to a single geometry pass.

Overall, SMP exists as an efficiency measure. There is technically nothing it can do that couldn’t be done without SMP – GPUs are flexible enough without it – however the scenarios SMP is envisioned for are all about executing them more efficiently by skipping a geometry and or/compute shader passes.

The actual efficiency gains, in turn, will depend on where the bottlenecks are and how much geometry setup is being avoided by reprojecting it. In the extreme case, 2 viewpoints combined with 16 viewports would allow geometry setup to happen a single time, versus 32 times in a naive setup. But that said, to go back to our VR example, geometry reprojection on its own doesn’t eliminate the need to generate pixels; a straightforward rendering pipeline still requires shading and rendering 233MPix every second. So SMP’s geometry reprojection abilities are most potent when it’s geometry that’s the bottleneck, which at least historically has not been the case for NVIDIA GPUs.

With all of that said, SMP is a fairly broad-reaching technology, and NVIDIA is in a sense chomping at the bit to find good ways to put it to use. The immediate geometry efficiency gains aside, the company has several different ideas on the table on how to use the technology. This include some novel uses that allow geometry reprojection to either replace compute shader tasks or otherwise alter the rendering pipeline, allowing for reduced pixel workloads, amplifying the total performance impact of SMP.

When it comes to VR, NVIDIA has two SMP-powered technologies that they are making available to developers. The first, dubbed Single Pass Stereo, is essentially the full implementation of the above VR scenario. Besides using SMP to reproject the scene geometry across multiple viewpoints and viewports, Single Pass Stereo also encompasses optimizations at the scene submission and driver/OS stage. In this case, developers using Single Pass Stereo need only submit the scene once, and the driver will take care of setting up the second instance for the second eye. Maxwell 2 already supported the application-side optimizations, as the CPU benefits of the scene submission optimization alone can be quite significant, but that architecture still required the GPU to setup the geometry twice. However with Pascal this has been bundled with SMP so that not only is a scene only submitted to the driver once, but the GPU also only has to setup the geometry once.

The other VR-centric technology being exposed to developers is what NVIDIA calls Lens Matched Shading, and this is one of those more novel uses where SMP’s geometry reprojection can be used to avoid pixel shading work farther down the line. Lens Matched Shading is based around the physical properties of the lenses in a VR headset, which because they warp the view coming out of them, requires the OLED screen in a VR headset to be fed an oppositely warped view. In practice, Lens Matched Shading is the successor to NVIDIA’s earlier Multi-Res Shading technology for Maxwell 2, which tried something similar within the greater limitations of the Maxwell 2 architecture.

Briefly, in a naïve rendering implementation, warping an image for a VR headset is done in a compute shader. Due to the optical properties of the lenses, the edges of the warped image contain less detail than the center of the lens. However in a straightforward flat projection, the entire frame must be rendered to be correctly warped. In practice this means that the edges are unnecessarily oversampled, wasting rendering resources on detail that will never be seen.

Lens Matched Shading in turn uses SMP to subdivide each eye into 4 viewports (or as NVJDIA calls them, quadrants), in an effort to mimic the shape of the lens. Done correctly, this reduces the number of pixels that need to be drawn because the combined viewports more closely match the desired warped image. In NVIDIA’s in-house developed Barbarian demo, they were able to reduce the number of pixels drawn per frame per eye from 2.1Mpix to 1.4Mpix, a 50% reduction in the number of pixels rendered. This is still more pixels than a perfect implementation – where only 1.1Mpix are required – but it none the less represents a significant decrease in the pixel rendering workload as an indirect result of SMP.

This is also why you’ll occasionally see NVIDIA touting the VR performance gains of various Pascal-powered video cards as being far greater than the raw increase in rendering hardware. In these cases NVIDIA is factoring in the expected performance gains from using SMP and Lens Matched Shading to reduce the rendering workload relative to an optimized implementation.

Moving on, the other major display optimization scenario NVIDIA is pushing with SMP is centered around traditional 2D displays. With curved displays or multi-monitor setups where the displays are angled to emulate a curved display, a flat projection is technically incorrect relative to the viewer. What the viewer should be seeing is essentially a wider field of view mapped to the display setup.

With most games this problem isn’t corrected for, as doing so would be too expensive. With a single viewport the only option is to render the scene at a very high resolution and then use a compute shader to warp it to the screen(s), invoking the overdraw problems mentioned above with VR. More practically, the scene could be rendered once for each monitor, avoiding the overdraw, but then you instead have the overhead of rendering a scene multiple times.

So for Pascal NVIDIA is introducing a 2D display feature they’re calling Perspective Surround. As you can most likely guess from the lead-up to this feature, Perspective Surround uses SMP’s geometry reprojection capabilities to efficiently create multiple viewports to get around the overdraw issues. In this case NVIDIA uses a projection for each monitor (e.g. 3 projections) in order to render a perspective-correct view on each monitor.

Like SMP’s VR features, Perspective Surround is a feature that requires developers to code specifically for it, so it can’t universally be enabled for all multi-monitor setups. Instead developers will need to go through NVIDIA’s respective SMP API in order to tell the GPU how to properly setup the scene.

Preemption Improved: Fine-Grained Preemption for Time-Critical Tasks Display Matters: New Display Controller, HDR, & HEVC
Comments Locked

200 Comments

View All Comments

  • Robalov - Tuesday, July 26, 2016 - link

    Feels like it took 2 years longer than normal for this review :D
  • extide - Wednesday, July 27, 2016 - link

    The venn diagram is wrong -- for GP104 it says 1:64 speed for FP16 -- it is actually 1:1 for FP16 (ie same speed as FP32) (NOTE: GP100 has 2:1 FP16 -- meaning FP16 is twice as fast as FP32)
  • extide - Wednesday, July 27, 2016 - link

    EDIT: I might be incorrect about this actually as I have seen information claiming both .. weird.
  • mxthunder - Friday, July 29, 2016 - link

    its really driving me nuts that a 780 was used instead of a 780ti.
  • yhselp - Monday, August 8, 2016 - link

    Have I understood correctly that Pascal offers a 20% increase in memory bandwidth from delta color compression over Maxwell? As in a total average of 45% over Kepler just from color compression?
  • flexy - Sunday, September 4, 2016 - link

    Sorry, late comment. I just read about GPU Boost 3.0 and this is AWESOME. What they did, is expose what previously was only doable with bios modding - eg assigning the CLK bins different voltages. The problem with overclocking Kepler/Maxwell was NOT so much that you got stuck with the "lowest" overclock as the article says, but that simply adding a FIXED amount of clocks across the entire range of clocks, as you would do with Afterburner etc. where you simply add, say +120 to the core. What happened here is that you may be "stable" at the max overclock (CLK bin), but since you added more CLKs to EVERY clock bin, the assigned voltages (in the BIOS) for each bin might not be sufficient. Say you have CLK bin 63 which is set to 1304Mhz in a stock bios. Now you use Afterburner and add 150 Mhz, now all of a sudden this bin amounts to 1454Mhz BUT STILL at the same voltage as before, which is too low for 1454Mhz. You had to manually edit the table in the BIOS to shift clocks around, especially since not all Maxwell cards allowed adding voltage via software.
  • Ether.86 - Tuesday, November 1, 2016 - link

    Astonishing review. That's the way Anandtech should be not like the mobile section which sucks...
  • Warsun - Tuesday, January 17, 2017 - link

    Yeah looking at the bottom here.The GTX 1070 is on the same level as a single 480 4GB card.So that graph is wrong.
    http://www.hwcompare.com/30889/geforce-gtx-1070-vs...
    Remember this is from GPU-Z based on hardware specs.No amount of configurations in the Drivers changes this.They either screwed up i am calling shenanigans.
  • marceloamaral - Thursday, April 13, 2017 - link

    Nice Ryan Smith! But, my question is, is it truly possible to share the GPU with different workloads in the P100? I've read in the NVIDIA manual that "The GPU has a time sliced scheduler to schedule work from work queues belonging to different CUDA contexts. Work launched to the compute engine from work queues belonging to different CUDA contexts cannot execute concurrently."
  • marceloamaral - Thursday, April 13, 2017 - link

    Nice Ryan Smith! But, my question is, is it truly possible to share the GPU with different workloads in the P100? I've read in the NVIDIA manual that "The GPU has a time sliced scheduler to schedule work from work queues belonging to different CUDA contexts. Work launched to the compute engine from work queues belonging to different CUDA contexts cannot execute concurrently."

Log in

Don't have an account? Sign up now