The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX

Name: The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX
Item: The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX
Author: Nate Oh

by Nate Oh on September 14, 2018 12:30 PM EST

111 Comments | Add A Comment

111 Comments

Bounding Volume Hierarchy - How Computers Test the World

Perhaps the biggest aspect of NVIDIA’s gamble on ray tracing is that traditional GPUs just aren’t very good at the task. They’re fast at rasterization and they’re even fast at parallel computing, however ray tracing does not map very well to either of those computing paradigms. Instead NVIDIA has to add hardware dedicated to ray tracing, which means devoting die space and power to hardware that cannot help with traditional rasterization.

A big part of that hardware, in turn, will go into solving the most basic problem of ray tracing: how do you figure out what a ray is intersecting with? The most common solution to this problem is to store triangles in a data structure that is well-suited for ray tracing. And this data structure is called a Bounding Volume Hierarchy.

Conceptually, a BVH is relatively simple – at least for the purposes of this article. Rather than testing every polygon to see if a ray interacts with it, the idea is to test a portion of a scene to see if it interacts with a ray, and then keep drilling down. If there is an intersection with that portion of the scene, then subdivide it into smaller portions and test again. And again. And again. All the way until you reach the individual polygon, at which point the ray testing is resolved.

For the computer scientists in the crowd, this might sound a lot like an application of a binary search, and it is. Each test allows for a significant number of options (in this case polygons) to be discarded as possible answers. This gets to the right polygon in just a fraction of the time. A BVH, in turn, is stored in what’s essentially a tree data structure, with each subdivision – called bounding boxes – stored as children of their parent bounding box.

Now the catch with BVH is that while it radically cuts down on the amount of ray intersection needed compared to a naïve implementation, it’s still not super cheap. A number of tests are still required for each ray, with both successful and failed tests adding to the total number of tests taken. And all of this is for a single ray, when a significant number of rays are going to be needed for each pixel. Which is why hardware acceleration of the process is so important (and not at all easy).

The other major computational cost here is that BVHs themselves aren’t free. One needs to be created for a scene from the polygons in it, so there is an additional step before ray casting can even begin. This is more a developer concern – when can they modify and reuse a BVH versus building a new one – but it’s another step in the process. Furthermore it’s an example of why developer training and efficient engine implementations are so crucial to the process, as a poor implementation can make ray tracing much too slow to be viable.

Ray Tracing 101: What It Is & Why NVIDIA Is Betting On It The Turing Architecture: Volta in Spirit

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

111 Comments

View All Comments

gglaw - Saturday, September 15, 2018 - link
Why bother to make up statements claiming the prices are completely as expected with inflation added without even having a slight clue what the inflation rate has been in recent history? Outside of the very young readers here, most of us were around for 700 series, 8800, etc. and know first hand what type of changes inflation has had in the last 10-20 years. Especially comparing to the 980 Ti, and 1080 Ti, inflation has barely moved since those releases.
Spunjji - Monday, September 17, 2018 - link
This. Most people here aren't stupid.
notashill - Saturday, September 15, 2018 - link
700 series wasn't even close. 780 was $650->adjusted ~$700, 780Ti was $700->adjusted ~$760. And the 780 MSRP dropped to $500 after 6 months when the Ti launched.
Santoval - Monday, September 17, 2018 - link
Yes, Navi will be midrange, at around a GTX 1080 performance level, or at best a bit faster. They initially planned a dual Navi package for the high end, linked by Infinity Fabric, but they canned (or postponed) it, due to the reluctance of game developers to support dual-die consumer graphics cards (according to AMD). They might release dual Navi professional graphics cards though.
Tensor and RT cores should not be expected either. These will have to wait for the post-Navi (and post-GCN) generation.
TropicMike - Friday, September 14, 2018 - link
Good article. Lots of complicated stuff to try to explain.

Just a quick typo on page 2: "It’s in pixel shaders that the various forms of lighting (shadows, reflection, reflection, etc) " I'm guessing you meant 'refraction' for one of those.
Smell This - Wednesday, July 3, 2019 - link
Super **Duper** Turbo Hyper Championship Edition
Yaldabaoth - Friday, September 14, 2018 - link
For the "eye diagram" on page 8, the texts says, "In this case we’re looking at a fairly clean eye diagram, illustrating the very tight 70ns transitions between data transfers." However, the image is labeled as "70 ps".
Ryan Smith - Friday, September 14, 2018 - link
Nano. Pico. Really, it's a small difference... =P

Thanks!
Bulat Ziganshin - Friday, September 14, 2018 - link
It's not "Volta in spirit". It's Volta for the masses. The only differences
- reduced FP64 cores
- reduced sharedmem/cache from 128 KB to 96 KB
- added RT cores

Now let's check what you want to change to produce "scientific" Turing GPU. Yes, exactly these things. So, despite the name, it's the same architecture, tuned for the gaming market
Yojimbo - Saturday, September 15, 2018 - link
You don't really know that. This article, as explained in the beginning, focuses only on the RT core improvements. There are other Turing features that were left out. I think we have no idea if Volta has variable rate shading, mesh shading,or multi-view rendering. I'm guessing it does not.

Besides, what you said isn't true even limiting the discussion to what was covered in this article. The Turing Tensor cores allow for a greater range of precisions.

The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX

Bounding Volume Hierarchy - How Computers Test the World

Post Your Comment

111 Comments

View All Comments

gglaw - Saturday, September 15, 2018 - link

Spunjji - Monday, September 17, 2018 - link

notashill - Saturday, September 15, 2018 - link

Santoval - Monday, September 17, 2018 - link

TropicMike - Friday, September 14, 2018 - link

Smell This - Wednesday, July 3, 2019 - link

Yaldabaoth - Friday, September 14, 2018 - link

Ryan Smith - Friday, September 14, 2018 - link

Bulat Ziganshin - Friday, September 14, 2018 - link

Yojimbo - Saturday, September 15, 2018 - link

Log in

Don't have an account? Sign up now