The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX
by Nate Oh on September 14, 2018 12:30 PM ESTTuring RT Cores: Hybrid Rendering and Real Time Raytracing
As it presents itself in Turing, real-time raytracing doesn’t completely replace traditional rasterization-based rendering, instead existing as part of Turing’s ‘hybrid rendering’ model. In other words, rasterization is used for most rendering, while ray-tracing techniques are used for select graphical effects. Meanwhile, the ‘real-time’ performance is generally achieved with a very small amount of rays (e.g. 1 or 2) per pixel, and a very large amount of denoising.
The specific implementation is ultimately in the hands of developers, and NVIDIA naturally has their raytracing development ecosystem, which we’ll go over in a later section. But because of the computational intensity, it simply isn’t possible to use real-time raytracing for the complete rendering workload. And higher resolutions, more complex scenes, and numerous graphical effects also compound the difficulty. So for performance reasons, developers will be utilizing raytracing in a deliberate and targeted manner for specific effects, such as global illumination, ambient occlusion, realistic shadows, reflections, and refractions. Likewise, raytracing may be limited to specific objects in a scene, and rasterization and z-buffering may replace primary ray casting while only secondary rays are raytraced. Thus, the goal of developers is to use raytracing for the most noticeable and realistic effects that rasterization cannot accomplish.
Essentially, this style of ‘hybrid rendering’ is a lot less raytracing than one might imagine from the marketing material. Perhaps a blunt way to generalize might be: real time raytracing in Turing typically means only certain objects are being rendered with certain raytraced graphical effects, using a minimal amount of rays per pixel and/or only raytracing secondary rays, and using a lot of denoising filtering; anything more would affect performance too much. Interestingly, explaining all the caveats this way both undersells and oversells the technology, because therein lies the paradox. Even in this very circumscribed way, GPU performance is significantly affected, but image quality is enhanced with a realism that cannot be provided by a higher resolution or better anti-aliasing. Except ‘real time’ interactivity in gaming essentially means a minimum of 30 to 45 fps, and lowering the render resolution to achieve those framerates hurts image quality. What complicates this is that real time raytracing is indeed considered the ‘holy grail’ of computer graphics, and so managing the feat at all is a big deal, but there are equally valid professional and consumer perspectives on how that translates into a compelling product.
On that note, then, NVIDIA accomplished what the industry was not expecting to be possible for at least a few more years, and certainly not at this scale and development ecosystem. Real time raytracing is the culmination of a decade or so of work, and the Turing RT Cores are the lynchpin. But in building up to it, NVIDIA summarizes the achievement as a result of:
- Hybrid rendering pipeline
- Efficient denoising algorithms
- Efficient BVH algorithms
By themselves, these developments were unable to improve raytracing efficiency, but set the stage for RT Cores. By virtue of raytracing’s importance in the world of computer graphics, NVIDIA Research has been looking into various BVH implementations for quite some time, as well as exploring architectural concerns for raytracing acceleration, something easily noted from their patents and publications. Likewise with denoising, though the latest trend has veered towards using AI and by extension Tensor Cores. When BVH became a standard of sorts, NVIDIA was able to design a corresponding fixed function hardware accelerator.
Being so crucial to their achievement, NVIDIA is not disclosing many details about the RT Cores or their BVH implementation. Of the details given, much is somewhat generic. To reiterate, BVH is a rather general category, and all modern raytracing acceleration structures are typically BVH or kd-tree based.
Unlike Tensor Cores, which are better seen as an FMA array alongside the FP and INT cores, the RT Cores are more like a classic offloading IP block. Treated very similar to texture units by the sub-cores, instructions bound for RT Cores are routed out of sub-cores, which is later notified on completion. Upon receiving a ray probe from the SM, the RT Core proceeds to autonomously traverse the BVH and perform ray-intersection tests. This type of ‘traversal and intersection’ fixed function raytracing accelerator is a well-known concept and has had quite a few implementations over the years, as traversal and intersection testing are two of the most computationally intensive tasks involved. In comparison, traversing the BVH in shaders would require thousands of instruction slots per ray cast, all for testing against bounding box intersections in the BVH.
Returning to the RT Core, it will then return any hits and letting shaders do implement the result. The RT Core also handles some grouping and scheduling of memory operations for maximizing memory throughput across multiple rays. And given the workload, presumably some amount of memory and/or ray buffer within the SIP block as well. Like in many other workloads, memory bandwidth is a common bottleneck in raytracing, and has been the focus of several NVIDIA Research papers. And in general, raytracing workloads result in very irregular and random memory accesses, mainly due to incoherent rays, that prove especially problematic for how GPUs typically utilize their memory.
But otherwise, everything else is at a high level governed by the API (i.e. DXR) and the application; construction and update of the BVH is done on CUDA cores, governed by the particular IHV – in this case, NVIDIA – in their DXR implementation.
All-in-all, there’s clearly more involved, and we’ll be looking to run some microbenchmarks in the future. NVIDIA’s custom BVH algorithms are clearly in play, but right now we can’t say what the optimizations might be, such as compressions, wide BVH, node subdivision into treelets. The way the RT Cores are integrated into the SM and into the architecture is likely crucial to how it operates well. Internally, the RT Core might just be a basic traversal and intersection unit, but it might also have other bits inside; one of NVIDIA’s recent patents provide a representation, albeit dated, of what else might be present. I, for one, would not be surprised to see it closely tied with the MIO blocks, and perhaps did more with coherency gathering by manipulating memory traffic for higher efficiency. It would need to coordinate well with the other workloads in the SMs without strangling memory access with unmitigated incoherent rays.
Nevertheless, details like performance impact are as yet unspecified.
111 Comments
View All Comments
Spunjji - Monday, September 17, 2018 - link
There's no such thing as a bad product, just bad pricing. AMD aren't out of the game but they are playing in an entirely different league.siberian3 - Friday, September 14, 2018 - link
Good architectural leap for nvidia but it is sad very few of gamers can afford the new cards.And AMD is not doing anything for 2018 and probably navi will be mid range on 7nm
V900 - Friday, September 14, 2018 - link
Meh, it’s always been that way with the newest, fastest GPUs.Wait 6 months to a year, and prices will be where people with more modest budgets can play along.
B3an - Friday, September 14, 2018 - link
You must literally live under a rock while also being absurdly naive.It's never been this way in the 20 years that i've been following GPUs. These new RTX GPUs are ridiculously expensive, way more than ever, and the prices will not be changing much at all when there's literally zero competition. The GPU space right now is worse than it's ever been before in history.
Amandtec - Friday, September 14, 2018 - link
I read somewhere that8800GTX + inflation = 2080ti price
Without factoring in inflation the prices seem unprecedented.
Yojimbo - Saturday, September 15, 2018 - link
And you must factor in inflation, otherwise you are just pushing numbers around.Yojimbo - Saturday, September 15, 2018 - link
And comparing the 2080 Ti to previous flagship launch cards is not really proper. The 2080 Ti is a different tier of card. The die size is so much larger than any previous launch GPU. It's just a demonstration of the increase in the amount of resources people are willing to devote to their GPUs, not an indication of an inflation of GPU prices.eddman - Saturday, September 15, 2018 - link
2006 $600 at 2018 dollar value = $750Samus - Saturday, September 15, 2018 - link
What inflation, exactly are you talking about. The dollar hasn't had a substantial change in valuation for 20 years (compared to other first-world currency.)The USD inflation rate has averaged around 2.7%/year since 2000. That means one dollar in 2000 is now worth slightly less than $1.50 today. That means the top-of-the-line GPU released in 2000, I'd take a guess it was the Geforce2 GTS and/or the 3Dfx Voodoo5 5500, both cost $300.
For those who want to throw in cards like the Geforce 2 Ultra and the Voodoo5 6000, the former a card for nVidia to 'probe' the market for how much they could milk it going forward (and creating the situation we have today) and the other a card that never actually "launched"...we can include them for fun. The Ultra launched at $500 (even though it was slower than the Geforce 3 that launched 3 months later) and the Voodoo5 6000 had an MSRP set by 3Dfx at $500.
These were the most expensive gaming-focused GPU's ever made up until that date. Even SLI setups didn't cost $500 (the most expensive Voodoo2 card in the 90's was from Creative Labs @$229/ea - you needed two cards of course - so $460.)
Ok, so you have the absolute cream-of-the-crop cards in 2000 at $500, one was a marketing stunt, and the other never launched because nobody would have bought it. Realistically the most expensive cards were $300. But we will go with $500.
The most expensive high-end gaming focused cards now are $1000+
That would assume an inflation rate of over 5% annually, or the value of the dollar DOUBLING over 2 decades. Which it didn't come close to doing.
Stop using inflation as an excuse. It's bullshit. These companies are fucking greedy. Especially nVidia. They are effectively charging FOUR TIMES more than they used to for the same market segment card. 20 years ago you would have bought a TNT2 Ultra for $230 bucks and had the ultimate card available. Most people purchased entirely capable mainstream cards for $100-$150 like the TNT2 Pro or the Geforce2 MX400 that ran the most demanding games of the day like Counter Strike and Half-Life at 1024x768 in maximum detail.
http://www.in2013dollars.com/2000-dollars-in-2018?...
Yojimbo - Saturday, September 15, 2018 - link
"What inflation, exactly are you talking about."CPI. Consumer Price Index. Even though inflation has been low for quite a while, $649 in 2013 is $697 today. That's almost $50 more, and it's enough to make up the difference between the 2013 launch price of the GTX 780 and the 2018 launch price of the RTX 2080.
I'm not sure why you are talking about cards from 20+ years ago. It's not relevant to my reply. In any case, those cards were completely different. The die sizes were much smaller and the cards were much less capable. They did a lot less of the work, as much of it was done on the CPU. The CPU was much more important to the game performance than today, as was the RAM and other components that were worth spending money on to significantly improve the gaming performance/experience.
"Stop using inflation as an excuse."
I'm not using inflation as an excuse. I'm using inflation as a tool to accurately compare the prices of cards from different years. And doing so clearly shows that the claim that the OP made is wrong. My reply had nothing to do with whether cards were in general cheaper 20 years ago or not. It was in response to "These new RTX GPUs are ridiculously expensive, way more than ever". That's provably untrue. Why are you replying to me and arguing about some entirely different point I wasn't ever talking about?