Simultaneous Multi-Projection: Reusing Geometry on the Cheap

In case you’ve missed the memo, 2016 is the year of virtual reality headsets in the PC gaming space, and both NVIDIA and AMD are pushing the concept hard. From a market perspective VR is seen as the “next great thing,” but more importantly from a technical perspective, VR demands much better GPU performance, and those performance requirements are only going to skyrocket as VR headsets get better. Today’s 2160x1200 VR headsets already require 233MPix/second rendered, and future headsets that operate at higher resolutions and refresh rates are likely to push that to 1GPix/second, if not higher. Consequently if VR takes off with the broader public, it’s going to be a gold rush for AMD and NVIDIA, but it also means that to get to 1GPix/second, they need to pull out all of the stops to deliver better performance.

This brings us to Pascal’s final marquee feature: Simultaneous Multi-Projection (SMP). Although its applications are more involved than just VR – and we’ll cover those more in a bit – VR is the most immediate and applicable use case for the technology.

So just what is SMP? To answer that question, we first need to take a short step back one generation to Maxwell 2. With Maxwell 2, NVIDIA introduced a feature called Multi-Projection Acceleration (MPA) as part of their larger emphasis on voxel acceleration. With MPA, Maxwell 2 could replay the scene geometry to up to 6 viewports in a single pass, essentially reusing the geometry. The benefit of this technology was that instead of having to setup the scene geometry 6 times, Maxwell 2 could save significant time and resources by only doing it once. This was one of the keys in making voxel acceleration practical, as the very nature of the 6 sided voxel meant that it would otherwise be redoing a lot of work.

Simultaneous Multi-Projection then can be thought of as Multi-Projection Acceleration grown up. The fundamental idea is still the same – replay geometry across multiple viewports for efficiency reasons – but rather than a cool hack, it’s now a fully-fledged out and far more flexible feature. Whereas MPA had a much more limited number of viewports and only supported fixed 90 degree angles – a result of the neat sign bit hack NVIDIA used to make it work – SMP supports a much larger number of viewports and arbitrary angles, making it useful for much more than just voxels and other cubic data structures.

SMP in turn is a function of the new PolyMorph Engine 4.0, one of the few graphical subsystems of Pascal to receive a feature update versus Maxwell 2. NVIDIA’s slide on the matter is especially helpful here, showing where SMP fits into the standard rendering workflow. After all of the geometry work is done – triangle setup and any tessellation or vertex shading – SMP can then step in and reproject the geometry as desired before being sent out to rasterization to pixels.

How NVIDIA is doing this so efficiently is their secret sauce for now, but I’m told that the resource cost of using SMP is miniscule. What I do know is that with Pascal and the PolyMorph Engine 4.0, the rasterizer is being called "quasi-programmable," so there is some new flexibility in there NVIDIA is exploiting for SMP.

Under the hood, SMP combines two slightly different but closely related features. The first of course is geometry reprojection; SMP can reproject geometry to up to 16 viewports. Each viewport can, in turn, be set to an arbitrary angle, varying in both tilt and rotation.

The second feature is that SMP can also reproject geometry around a second viewpoint. This is slightly different from basic geometry reprojection as we’re not just adjusting the angle of the view, but the view is being shifted entirely. In this case the view can be shifted along the X-axis, allowing for a second viewpoint to be cheaply created without actually setting up the geometry twice.

As for why you’d want to generate two viewpoints, the big use case is virtual reality. VR requires two viewpoints, one for each eye. Without SMP, this requires doing a full geometry pass twice, once for each eye. But with SMP, this is reduced to a single geometry pass.

Overall, SMP exists as an efficiency measure. There is technically nothing it can do that couldn’t be done without SMP – GPUs are flexible enough without it – however the scenarios SMP is envisioned for are all about executing them more efficiently by skipping a geometry and or/compute shader passes.

The actual efficiency gains, in turn, will depend on where the bottlenecks are and how much geometry setup is being avoided by reprojecting it. In the extreme case, 2 viewpoints combined with 16 viewports would allow geometry setup to happen a single time, versus 32 times in a naive setup. But that said, to go back to our VR example, geometry reprojection on its own doesn’t eliminate the need to generate pixels; a straightforward rendering pipeline still requires shading and rendering 233MPix every second. So SMP’s geometry reprojection abilities are most potent when it’s geometry that’s the bottleneck, which at least historically has not been the case for NVIDIA GPUs.

With all of that said, SMP is a fairly broad-reaching technology, and NVIDIA is in a sense chomping at the bit to find good ways to put it to use. The immediate geometry efficiency gains aside, the company has several different ideas on the table on how to use the technology. This include some novel uses that allow geometry reprojection to either replace compute shader tasks or otherwise alter the rendering pipeline, allowing for reduced pixel workloads, amplifying the total performance impact of SMP.

When it comes to VR, NVIDIA has two SMP-powered technologies that they are making available to developers. The first, dubbed Single Pass Stereo, is essentially the full implementation of the above VR scenario. Besides using SMP to reproject the scene geometry across multiple viewpoints and viewports, Single Pass Stereo also encompasses optimizations at the scene submission and driver/OS stage. In this case, developers using Single Pass Stereo need only submit the scene once, and the driver will take care of setting up the second instance for the second eye. Maxwell 2 already supported the application-side optimizations, as the CPU benefits of the scene submission optimization alone can be quite significant, but that architecture still required the GPU to setup the geometry twice. However with Pascal this has been bundled with SMP so that not only is a scene only submitted to the driver once, but the GPU also only has to setup the geometry once.

The other VR-centric technology being exposed to developers is what NVIDIA calls Lens Matched Shading, and this is one of those more novel uses where SMP’s geometry reprojection can be used to avoid pixel shading work farther down the line. Lens Matched Shading is based around the physical properties of the lenses in a VR headset, which because they warp the view coming out of them, requires the OLED screen in a VR headset to be fed an oppositely warped view. In practice, Lens Matched Shading is the successor to NVIDIA’s earlier Multi-Res Shading technology for Maxwell 2, which tried something similar within the greater limitations of the Maxwell 2 architecture.

Briefly, in a naïve rendering implementation, warping an image for a VR headset is done in a compute shader. Due to the optical properties of the lenses, the edges of the warped image contain less detail than the center of the lens. However in a straightforward flat projection, the entire frame must be rendered to be correctly warped. In practice this means that the edges are unnecessarily oversampled, wasting rendering resources on detail that will never be seen.

Lens Matched Shading in turn uses SMP to subdivide each eye into 4 viewports (or as NVJDIA calls them, quadrants), in an effort to mimic the shape of the lens. Done correctly, this reduces the number of pixels that need to be drawn because the combined viewports more closely match the desired warped image. In NVIDIA’s in-house developed Barbarian demo, they were able to reduce the number of pixels drawn per frame per eye from 2.1Mpix to 1.4Mpix, a 50% reduction in the number of pixels rendered. This is still more pixels than a perfect implementation – where only 1.1Mpix are required – but it none the less represents a significant decrease in the pixel rendering workload as an indirect result of SMP.

This is also why you’ll occasionally see NVIDIA touting the VR performance gains of various Pascal-powered video cards as being far greater than the raw increase in rendering hardware. In these cases NVIDIA is factoring in the expected performance gains from using SMP and Lens Matched Shading to reduce the rendering workload relative to an optimized implementation.

Moving on, the other major display optimization scenario NVIDIA is pushing with SMP is centered around traditional 2D displays. With curved displays or multi-monitor setups where the displays are angled to emulate a curved display, a flat projection is technically incorrect relative to the viewer. What the viewer should be seeing is essentially a wider field of view mapped to the display setup.

With most games this problem isn’t corrected for, as doing so would be too expensive. With a single viewport the only option is to render the scene at a very high resolution and then use a compute shader to warp it to the screen(s), invoking the overdraw problems mentioned above with VR. More practically, the scene could be rendered once for each monitor, avoiding the overdraw, but then you instead have the overhead of rendering a scene multiple times.

So for Pascal NVIDIA is introducing a 2D display feature they’re calling Perspective Surround. As you can most likely guess from the lead-up to this feature, Perspective Surround uses SMP’s geometry reprojection capabilities to efficiently create multiple viewports to get around the overdraw issues. In this case NVIDIA uses a projection for each monitor (e.g. 3 projections) in order to render a perspective-correct view on each monitor.

Like SMP’s VR features, Perspective Surround is a feature that requires developers to code specifically for it, so it can’t universally be enabled for all multi-monitor setups. Instead developers will need to go through NVIDIA’s respective SMP API in order to tell the GPU how to properly setup the scene.

Preemption Improved: Fine-Grained Preemption for Time-Critical Tasks Display Matters: New Display Controller, HDR, & HEVC
Comments Locked

200 Comments

View All Comments

  • jcardel - Wednesday, July 27, 2016 - link

    This is excactly the same situation as me. I got a 770 sitting in my rig, and am looking hard at the 1070, maybe soon. Although my 770 is still up to the task in most games, I really play only blizzard games theese days and they are not hard on your hardware.

    My biggest issue is really that it is rather noisy, so I will be looking for a solution with the lowest DB.

    Great article, it was totally worth waiting for.. I only read this sort of stuff here so have been waiting till now for any 1080 review.

    Thanks!
  • D. Lister - Thursday, July 21, 2016 - link

    Nice job, Ryan. Good comeback. Keep it up.
  • Saeid92 - Thursday, July 21, 2016 - link

    What is 99th procentile framerate?
  • Ryan Smith - Thursday, July 21, 2016 - link

    If you sorted the framerate from highest to lowest, this would be the framerate of the slowest 1%. It's basically a more accurate/meaningful metric for minimum frame rates.
  • Eris_Floralia - Thursday, July 21, 2016 - link

    This is why I love Anandtech. Deep in reviews. Well I even wanted to be one of your editors if you have the plan to create a Chinese transtate version of these reviews.
  • daku123 - Thursday, July 21, 2016 - link

    Typo on FP16 Throughput page. In second paragraph, it should be Tegra X1 (not Tesla X1?).
  • Ryan Smith - Thursday, July 21, 2016 - link

    Eyup. Thanks!
  • Badelhas - Thursday, July 21, 2016 - link

    Great detailed review, as always. But I have to ask once again:
    why didnt you do some kind of VR Benchmarks? Thats what drives my choises now, to be honest.

    Cheers
  • Ranger1065 - Thursday, July 21, 2016 - link

    After over 2 months of reading GTX1080 reviews I felt a distinct lack of excitement
    as I read Anandtech kicking off their review of the finfet generation. Could it
    prove to be anything but an anticlimax?

    Sadly and unsurprisingly...NOT.

    It was however amusing to see the faithfull positively gushing praises for Anandtech
    now that the "greatly anticipated" review is finally out.

    Yes folks, 20 or so pages of (well written) information, mostly already covered by other tech sites,
    finally published, it's as if a magic wand has been waved, the information has been presented with
    that special Anandtech sauce, new insights have been illuminated and all is well in Anandtechland again.

    (AT LEAST UNTIL THE NEXT 2 MONTH DELAY.) LOL.

    I do like the way Anandtech presents the FPS charts.

    Back to sleep now Anandtech :)
  • mkaibear - Thursday, July 21, 2016 - link

    You've hit the nail on the head here Ranger.

    The info which is included within the article is indeed mostly already covered by other tech sites.

    Emphasis on the "mostly" and the plural "sites".

    Those of us who have jobs which keep us busy and have an interest in this sort of thing often don't have the time to trawl round many different sites to get reviews and pertinent technical data so we rely upon those sites which we trust to produce in-depth articles, even if they take a bit longer.

    As an IT Manager for (most recently) a manufacturing firm and then a school, I don't care about bleeding edge, get the new stuff as soon as it comes out, I care about getting the right stuff, and a two month delay to get a proper review is absolutely fine. If I need quick benchmarks I'll use someone like Hexus or HardOCP but to get a deep dive into the architecture so I can justify purchases to the Art and Media departments, or the programers is essential. You don't get that anywhere else.

Log in

Don't have an account? Sign up now