NVIDIA Works: ANSEL & VRWorks Audio

Along with the various hardware aspects of Pascal, NVIDIA’s software teams have also been working on new projects to coincide with the Pascal launch. These are a new screenshot tool, and a new audio simulation package based on path traced audio.

We’ll start with NVIDIA’s new screenshot utility. Dubbed ANSEL, after famous American environmental photographer Ansel Adams, ANSEL is a very different take on screenshots. Rather than taking screenshots from the player’s perspective at the game rendering resolution, ANSEL allows for an entire scene to be captured at a far higher resolution than with standard screenshots. NVIDIA is pitching this as an art tool rather than a gaming tool, and I get the impression that this is one of those pie-in-the-sky kind of ideas that NVIDIA’s software group decided to run with in order to best show off Pascal’s various capabilities.

At its core, ANSEL is a means to decouple taking screenshot from the limitations of the player’s view. In an ANSEL-enabled application, ANSEL can freeze the state of the game, move the camera around, and then generate a copious amount of viewports to take screenshots. The end result is that ANSEL makes it possible to generate an ultra-high resolution 360 degree stereo 3D image of a game scene. The analogy NVIDIA is working towards is dropping a high quality 360 degree camera into a game, and letting users play with it as they see fit.

But even this isn’t really a great description of ANSEL, as there isn’t anything else like it to compare it to. Some games have offered 360 degree capture, but they haven’t done so at any kind of resolution approaching what ANSEL can do. And this still doesn’t touch features such as HDR (FP16) scene capture or the free camera.

Under the hood, ANSEL is at times a checklist for Pascal technologies (though it does work with Maxwell 2 as well). In order to capture scenes at a super high resolution, it forces a scene to its maximum LOD and breaks it down into a number of viewports, implemented efficiently using SMP. To demonstrate this technology NVIDIA put together a 4.5Gpix image rendered out of The Witcher 3, which was composed of 3600 such viewport tiles. Meanwhile stitching together the individual tiles is a CUDA based rendering process, which uses overlapping tiles to resolve any tone mapping conflicts. Finally, ANSEL captures images before they’re actually sent to a display, grabbing HDR images (in EXR format0 in games that support HDR.

Meanwhile given its level of deep interaction with games, ANSEL does require individual game support to work. This is in the form of a library provided by NVIDIA, which helps ANSEL and NVIDIA’s driver make sense of a scene and pause the simulation when necessary. Unsurprisingly, NVIDIA is eager to get ANSEL into more games – it just launched on Mirror’s Edge: Catalyst – and as a result is touting to developers that ANSEL is easy to implement, having taken only 150 lines of code on The Witcher 3.

Ultimately NVIDIA seems to be throwing ANSEL at the wall here to see what sticks. But it should be neat to see what users end up doing with the technology,

VRWorks Audio

Not to be outdone by the ANSEL team, other parts of NVIDIA’s software group has been working on a slightly different kind of project for NVIDIA: audio. As a GPU company, NVIDIA has never been deeply involved with audio (not since getting out of the chipset business, at least), but with the current focus on VR, they are taking a crack at it in a new way.

VRWorks Audio is the latest library as part of NVIDIA’s larger VRWorks suite. As given away by the name, this library is focused on audio, specifically for VR. In a nutshell, VRWorks is a full audio simulation library, using path tracing to power the simulation. The goal of VRWorks Audio is to provide a realistic sound simulation for VR, to further increase the apparent realism.

Under the hood, VRWorks audio leverages NVIDIA’s existing OptiX path tracing technology. Only rather than tracing light it’s used to trace sound waves. Along with simulating audio propagation itself – including occlusion and reverb – VRWorks Audio is also able to run the necessary Head Related Transfer Functions (HRTFs) to reduce the simulation down to binaural audio for headphones.

All of this is, of course, executed on Pascal’s CUs in a manner similar to path tracing or PhysX, running alongside the main graphics rendering thread. The amount of processing power required for VRWorks Audio can vary considerably depending on the detail desired (particularly the number of reflections); for NVIDIA’s VR Funhouse demo, VR Works audio can occupy most of a GPU on its own.

Ultimately, unlike some of the other technologies presented by NVIDIA, VRWorks Audio is in a relatively early stage. As a result while NVIDIA is shipping the SDK, there aren’t any games that are announced to be using it at this time, and if it gets any traction it’ll be farther into the future before we see the first games using it. That said, NVIDIA is already reaching out to the all-important middleware vendors on the subject, and to that end their own VR Funhouse demo is using FMOD with a VRWorks Audio plugin to handle the sound, demonstrating that they already have VRWorks Audio working with the popular audio middleware.

GPU Boost 3.0: Finer-Grained Clockspeed Controls Meet the GeForce GTX 1080 & GTX 1070 Founders Edition Cards
Comments Locked

200 Comments

View All Comments

  • patrickjp93 - Wednesday, July 20, 2016 - link

    That doesn't actually support your point...
  • Scali - Wednesday, July 20, 2016 - link

    Did I read a different article?
    Because the article that I read said that the 'holes' would be pretty similar on Maxwell v2 and Pascal, given that they have very similar architectures. However, Pascal is more efficient at filling the holes with its dynamic repartitioning.
  • mr.techguru - Wednesday, July 20, 2016 - link

    Just Ordered the MSI GeForce GTX 1070 Gaming X , way better than 1060 / 480. NVidia Nail it :)
  • tipoo - Wednesday, July 20, 2016 - link

    " NVIDIA tells us that it can be done in under 100us (0.1ms), or about 170,000 clock cycles."

    Is my understanding right that Polaris, and I think even earlier with late GCN parts, could seamlessly interleave per-clock? So 170,000 times faster than Pascal in clock cycles (less in total time, but still above 100,000 times faster)?
  • Scali - Wednesday, July 20, 2016 - link

    That seems highly unlikely. Switching to another task is going to take some time, because you also need to switch all the registers, buffers, caches need to be re-filled etc.
    The only way to avoid most of that is to duplicate the whole register file, like HyperThreading does. That's doable on an x86 CPU, but a GPU has way more registers.
    Besides, as we can see, nVidia's approach is fast enough in practice. Why throw tons of silicon on making context switching faster than it needs to be? You want to avoid context switches as much as possible anyway.

    Sadly AMD doesn't seem to go into any detail, but I'm pretty sure it's going to be in the same ballpark.
    My guess is that what AMD calls an 'ACE' is actually very similar to the SMs and their command queues on the Pascal side.
  • Ryan Smith - Wednesday, July 20, 2016 - link

    Task switching is separate from interleaving. Interleaving takes place on all GPUs as a basic form of latency hiding (GPUs are very high latency).

    The big difference is that interleaving uses different threads from the same task; task switching by its very nature loads up another task entirely.
  • Scali - Thursday, July 21, 2016 - link

    After re-reading AMD's asynchronous shader PDF, it seems that AMD also speaks of 'interleaving' when they switch a graphics CU to a compute task after the graphics task has completed. So 'interleaving' at task level, rather than at instruction level.
    Which would be pretty much the same as NVidia's Dynamic Load Balancing in Pascal.
  • eddman - Thursday, July 21, 2016 - link

    The more I read about async computing in Polaris and Pascal, the more I realize that the implementations are not much different.

    As Ryan pointed out, it seems that the reason that Polaris, and GCN as a whole, benefit more from async is the architecture of the GPU itself, being wider and having more ALUs.

    Nonetheless, I'm sure we're still going to see comments like "Polaris does async in hardware. Pascal is hopeless with its software async hack".
  • Matt Doyle - Wednesday, July 20, 2016 - link

    Typo in the lead sentence of HPC vs. Consumer: Divergence paragraph: "Pascal in an architecture that..."

    "is" instead of "in"
  • Matt Doyle - Wednesday, July 20, 2016 - link

    Feeding Pascal page, "GDDR5X uses a 16n prefetch, which is twice the size of GDDR5’s 8n prefect."

    Prefect = prefetch

Log in

Don't have an account? Sign up now