The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX

Name: The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX
Item: The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX
Author: Nate Oh

by Nate Oh on September 14, 2018 12:30 PM EST

111 Comments | Add A Comment

111 Comments

Turing RT Cores: Hybrid Rendering and Real Time Raytracing

As it presents itself in Turing, real-time raytracing doesn’t completely replace traditional rasterization-based rendering, instead existing as part of Turing’s ‘hybrid rendering’ model. In other words, rasterization is used for most rendering, while ray-tracing techniques are used for select graphical effects. Meanwhile, the ‘real-time’ performance is generally achieved with a very small amount of rays (e.g. 1 or 2) per pixel, and a very large amount of denoising.

The specific implementation is ultimately in the hands of developers, and NVIDIA naturally has their raytracing development ecosystem, which we’ll go over in a later section. But because of the computational intensity, it simply isn’t possible to use real-time raytracing for the complete rendering workload. And higher resolutions, more complex scenes, and numerous graphical effects also compound the difficulty. So for performance reasons, developers will be utilizing raytracing in a deliberate and targeted manner for specific effects, such as global illumination, ambient occlusion, realistic shadows, reflections, and refractions. Likewise, raytracing may be limited to specific objects in a scene, and rasterization and z-buffering may replace primary ray casting while only secondary rays are raytraced. Thus, the goal of developers is to use raytracing for the most noticeable and realistic effects that rasterization cannot accomplish.

Essentially, this style of ‘hybrid rendering’ is a lot less raytracing than one might imagine from the marketing material. Perhaps a blunt way to generalize might be: real time raytracing in Turing typically means only certain objects are being rendered with certain raytraced graphical effects, using a minimal amount of rays per pixel and/or only raytracing secondary rays, and using a lot of denoising filtering; anything more would affect performance too much. Interestingly, explaining all the caveats this way both undersells and oversells the technology, because therein lies the paradox. Even in this very circumscribed way, GPU performance is significantly affected, but image quality is enhanced with a realism that cannot be provided by a higher resolution or better anti-aliasing. Except ‘real time’ interactivity in gaming essentially means a minimum of 30 to 45 fps, and lowering the render resolution to achieve those framerates hurts image quality. What complicates this is that real time raytracing is indeed considered the ‘holy grail’ of computer graphics, and so managing the feat at all is a big deal, but there are equally valid professional and consumer perspectives on how that translates into a compelling product.

On that note, then, NVIDIA accomplished what the industry was not expecting to be possible for at least a few more years, and certainly not at this scale and development ecosystem. Real time raytracing is the culmination of a decade or so of work, and the Turing RT Cores are the lynchpin. But in building up to it, NVIDIA summarizes the achievement as a result of:

Hybrid rendering pipeline
Efficient denoising algorithms
Efficient BVH algorithms

By themselves, these developments were unable to improve raytracing efficiency, but set the stage for RT Cores. By virtue of raytracing’s importance in the world of computer graphics, NVIDIA Research has been looking into various BVH implementations for quite some time, as well as exploring architectural concerns for raytracing acceleration, something easily noted from their patents and publications. Likewise with denoising, though the latest trend has veered towards using AI and by extension Tensor Cores. When BVH became a standard of sorts, NVIDIA was able to design a corresponding fixed function hardware accelerator.

Being so crucial to their achievement, NVIDIA is not disclosing many details about the RT Cores or their BVH implementation. Of the details given, much is somewhat generic. To reiterate, BVH is a rather general category, and all modern raytracing acceleration structures are typically BVH or kd-tree based.

Unlike Tensor Cores, which are better seen as an FMA array alongside the FP and INT cores, the RT Cores are more like a classic offloading IP block. Treated very similar to texture units by the sub-cores, instructions bound for RT Cores are routed out of sub-cores, which is later notified on completion. Upon receiving a ray probe from the SM, the RT Core proceeds to autonomously traverse the BVH and perform ray-intersection tests. This type of ‘traversal and intersection’ fixed function raytracing accelerator is a well-known concept and has had quite a few implementations over the years, as traversal and intersection testing are two of the most computationally intensive tasks involved. In comparison, traversing the BVH in shaders would require thousands of instruction slots per ray cast, all for testing against bounding box intersections in the BVH.

Returning to the RT Core, it will then return any hits and letting shaders do implement the result. The RT Core also handles some grouping and scheduling of memory operations for maximizing memory throughput across multiple rays. And given the workload, presumably some amount of memory and/or ray buffer within the SIP block as well. Like in many other workloads, memory bandwidth is a common bottleneck in raytracing, and has been the focus of several NVIDIA Research papers. And in general, raytracing workloads result in very irregular and random memory accesses, mainly due to incoherent rays, that prove especially problematic for how GPUs typically utilize their memory.

But otherwise, everything else is at a high level governed by the API (i.e. DXR) and the application; construction and update of the BVH is done on CUDA cores, governed by the particular IHV – in this case, NVIDIA – in their DXR implementation.

All-in-all, there’s clearly more involved, and we’ll be looking to run some microbenchmarks in the future. NVIDIA’s custom BVH algorithms are clearly in play, but right now we can’t say what the optimizations might be, such as compressions, wide BVH, node subdivision into treelets. The way the RT Cores are integrated into the SM and into the architecture is likely crucial to how it operates well. Internally, the RT Core might just be a basic traversal and intersection unit, but it might also have other bits inside; one of NVIDIA’s recent patents provide a representation, albeit dated, of what else might be present. I, for one, would not be surprised to see it closely tied with the MIO blocks, and perhaps did more with coherency gathering by manipulating memory traffic for higher efficiency. It would need to coordinate well with the other workloads in the SMs without strangling memory access with unmitigated incoherent rays.

Nevertheless, details like performance impact are as yet unspecified.

The Turing Architecture: Volta in Spirit Turing Tensor Cores: Leveraging Deep Learning Inference for Gaming

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

111 Comments

View All Comments

bernstein - Friday, September 14, 2018 - link
a lot of this will also depend on what kind of silicon ends up in the next playstation & xbox generation...
Spunjji - Monday, September 17, 2018 - link
Isn't that already pretty much pinned to AMD? AFAIK Navi is pretty much the consumer interpretation of AMD's PS5 design. Microsoft really aren't likely to jump ship because of their history with Nvidia.
Yojimbo - Saturday, September 15, 2018 - link
I think Turing's price/perf ratio will be better than Pascal's. It's the increase in price/performance that is not spectacular. But since AMD isn't releasing anything at all, that doesn't reflect negatively on Turing in any way.

I don't know why people are throwing around this "50% of transistors" idea. Where is this information coming from?

Of course Turing will be crushed by a next generation of 7 nm GPUs that is architected equally as well, as such GPUs will have both additional time for architectural improvements and the advantage of a full node shrink. That will be true for both hybrid and raster-only rendering. And it would have been true for raster rendering no matter if RT cores were included or not.

It sounds like NVIDIA is providing the DLSS service to developers for free. I'd expect DLSS usage to be widespread for any developers interested in making games geared towards the 4K market.

I am guessing that Microsoft, at least, will want a raytracing-capable GPU in its next console. I doubt they would spend the effort to make the DXR API extension and then leave the technology out of their console, especially considering the convergence of console and PC gaming they seem to be pushing for.
jwcalla - Friday, September 14, 2018 - link
This is probably my first disinterested nvidia launch. Tensor cores and ray tracing don't really get me excited. I can't imagine half a die used for that stuff. Do the graphics really look that much better? Does hyper-realism even matter?
Dizoja86 - Friday, September 14, 2018 - link
It doesn't even have to be hyper-realism. Just the basic limitations you can see with rasterized reflections in the Battlefield V tech demo paints a strong case for the use of ray-tracing. Being able to see reflections of objects that aren't directly on the screen in front of you seems like an important thing to move towards.
HollyDOL - Saturday, September 15, 2018 - link
classic rasterized shading and reflection is basically one big cheat on human eye. Imagine something along mp3 128kbit being 'cd quality'. Trying to get that cheat closer and closer to 'reality' is more and more a challenge and resource eater. Ray-Tracing _should_ be able to quite simplify the issue on development front in future. And that's not considering possible visuals quality raise.
Tamz_msc - Saturday, September 15, 2018 - link
Lol, players are complaining that in BF V it is hard to distinguish between friendlies and enemies. Adding RTX reflections to the mix would just make it worse.
jwcalla - Saturday, September 15, 2018 - link
Watching the Battlefield tech demo (and the others), I didn't think it added a lot of value. When you analyze it side-by-side with a magnifying glass, yes, you can see some differences. I just don't think they're that dramatic and in the heat of game play you're not even going to recognize it. The improvements to global illumination look good though.

I just feel like the industry has lost a lot of focus.
RSAUser - Saturday, September 15, 2018 - link
In a game like BF V, you're not just going to stand there looking at reflections, and it's going to hammer your frame rate/force you to go to 1080p or lower.

I'd rather turn it off and have a high fps on 4k, tyvm, same as near everyone turned off hairworks for witcher 3, though with that it was at least single player so you'd sacrifice performance for visuals.
Dizoja86 - Friday, September 14, 2018 - link
Sometimes I get frustrated with Anandtech, but being able to have these fantastic articles when new technology is released is why I keep coming back.

The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX

Turing RT Cores: Hybrid Rendering and Real Time Raytracing

Post Your Comment

111 Comments

View All Comments

bernstein - Friday, September 14, 2018 - link

Spunjji - Monday, September 17, 2018 - link

Yojimbo - Saturday, September 15, 2018 - link

jwcalla - Friday, September 14, 2018 - link

Dizoja86 - Friday, September 14, 2018 - link

HollyDOL - Saturday, September 15, 2018 - link

Tamz_msc - Saturday, September 15, 2018 - link

jwcalla - Saturday, September 15, 2018 - link

RSAUser - Saturday, September 15, 2018 - link

Dizoja86 - Friday, September 14, 2018 - link

Log in

Don't have an account? Sign up now