The RTX Recap: A Brief Overview of the Turing RTX Platform

Overall, NVIDIA’s grand vision for real-time, hybridized raytracing graphics means that they needed to make significant architectural investments into future GPUs. The very nature of the operations required for ray tracing means that they don’t map to traditional SIMT execution especially well, and while this doesn’t preclude GPU raytracing via traditional GPU compute, it does end up doing so relatively inefficiently. Which means that of the many architectural changes in Turing, a lot of them have gone into solving the raytracing problem – some of which exclusively so.

To that end, on the ray tracing front Turing introduces two new kinds of hardware units that were not present on its Pascal predecessor: RT cores and Tensor cores. The former is pretty much exactly what the name says on the tin, with RT cores accelerating the process of tracing rays, and all the new algorithms involved in that. Meanwhile the tensor cores are technically not related to the raytracing process itself, however they play a key part in making raytracing rendering viable, along with powering some other features being rolled out with the GeForce RTX series.

Starting with the RT cores, these are perhaps NVIDIA’s biggest innovation – efficient raytracing is a legitimately hard problem – however for that reason they’re also the piece of the puzzle that NVIDIA likes talking about the least. The company isn’t being entirely mum, thankfully. But we really only have a high level overview of what they do, with the secret sauce being very much secret. How NVIDIA ever solved the coherence problems that dog normal raytracing methods, they aren’t saying.

At a high level then, the RT cores can essentially be considered a fixed-function block that is designed specifically to accelerate Bounding Volume Hierarchy (BVH) searches. BVH is a tree-like structure used to store polygon information for raytracing, and it’s used here because it’s an innately efficient means of testing ray intersection. Specifically, by continuously subdividing a scene through ever-smaller bounding boxes, it becomes possible to identify the polygon(s) a ray intersects with in only a fraction of the time it would take to otherwise test all polygons.

NVIDIA’s RT cores then implement a hyper-optimized version of this process. What precisely that entails is NVIDIA’s secret sauce – in particular the how NVIDIA came to determine the best BVH variation for hardware acceleration – but in the end the RT cores are designed very specifically to accelerate this process. The end product is a collection of two distinct hardware blocks that constantly iterate through bounding box or polygon checks respectively to test intersection, to the tune of billions of rays per second and many times that number in individual tests. All told, NVIDIA claims that the fastest Turing parts, based on the TU102 GPU, can handle upwards of 10 billion ray intersections per second (10 GigaRays/second), ten-times what Pascal can do if it follows the same process using its shaders.

NVIDIA has not disclosed the size of an individual RT core, but they’re thought to be rather large. Turing implements just one RT core per SM, which means that even the massive TU102 GPU in the RTX 2080 Ti only has 72 of the units. Furthermore because the RT cores are part of the SM, they’re tightly couple to the SMs in terms of both performance and core counts. As NVIDIA scales down Turing for smaller GPUs by using a smaller number of SMs, the number of RT cores and resulting raytracing performance scale down with it as well. So NVIDIA always maintains the same ratio of SM resources (though chip designs can very elsewhere).

Along with developing a means to more efficiently test ray intersections, the other part of the formula for raytracing success in NVIDIA’s book is to eliminate as much of that work as possible. NVIDIA’s RT cores are comparatively fast, but even so, ray interaction testing is still moderately expensive. As a result, NVIDIA has turned to their tensor cores to carry them the rest of the way, allowing a moderate number of rays to still be sufficient for high-quality images.

In a nutshell, raytracing normally requires casting many rays from each and every pixel in a screen. This is necessary because it takes a large number of rays per pixel to generate the “clean” look of a fully rendered image. Conversely if you test too few rays, you end up with a “noisy” image where there’s significant discontinuity between pixels because there haven’t been enough rays casted to resolve the finer details. But since NVIDIA can’t actually test that many rays in real time, they’re doing the next-best thing and faking it, using neural networks to clean up an image and make it look more detailed than it actually is (or at least, started out at).

To do this, NVIDIA is tapping their tensor cores. These cores were first introduced in NVIDIA’s server-only Volta architecture, and can be thought of as a CUDA core on steroids. Fundamentally they’re just a much larger collection of ALUs inside a single core, with much of their flexibility stripped away. So instead of getting the highly flexible CUDA core, you end up with a massive matrix multiplication machine that is incredibly optimized for processing thousands of values at once (in what’s called a tensor operation). Turing’s tensor cores, in turn, double down on what Volta started by supporting newer, lower precision methods than the original that in certain cases can deliver even better performance while still offering sufficient accuracy.

As for how this applies to ray tracing, the strength of tensor cores is that tensor operations map extremely well to neural network inferencing. This means that NVIDIA can use the cores to run neural networks which will perform additional rendering tasks.  in this case a neural network denoising filter is used to clean up the noisy raytraced image in a fraction of the time (and with a fraction of the resources) it would take to actually test the necessary number of rays.


No Denoising vs. Denoising in Raytracing

The denoising filter itself is essentially an image resizing filter on steroids, and can (usually) produce a similar quality image as brute force ray tracing by algorithmically guessing what details should be present among the noise. However getting it to perform well means that it needs to be trained, and thus it’s not a generic solution. Rather developers need to take part in the process, training a neural network based on high quality fully rendered images from their game.

Overall there are 8 tensor cores in every SM, so like the RT cores, they are tightly coupled with NVIDIA’s individual processor blocks. Furthermore this means tensor performance scales down with smaller GPUs (smaller SM counts) very well. So NVIDIA always has the same ratio of tensor cores to RT cores to handle what the RT cores coarsely spit out.

Deep Learning Super Sampling (DLSS)

Now with all of that said, unlike the RT cores, the tensor cores are not fixed function hardware in a traditional sense. They’re quite rigid in their abilities, but they are programmable none the less. And for their part, NVIDIA wants to see just how many different fields/tasks that they can apply their extensive neural network and AI hardware to.

Games of course don’t fall under the umbrella of traditional neural network tasks, as these networks lean towards consuming and analyzing images rather than creating them. None the less, along with denoising the output of their RT cores, NVIDIA’s other big gaming use case for their tensor cores is what they’re calling Deep Learning Super Sampling (DLSS).

DLSS follows the same principle as denoising – how can post-processing be used to clean up an image – but rather than removing noise, it’s about restoring detail. Specifically, how to approximate the image quality benefits of anti-aliasing – itself a roundabout way of rendering at a higher resolution – without the high cost of actually doing the work. When all goes right, according to NVIDIA the result is an image comparable to an anti-aliased image without the high cost.

Under the hood, the way this works is up to the developers, in part because they’re deciding how much work they want to do with regular rendering versus DLSS upscaling. In the standard mode, DLSS renders at a lower input sample count – typically 2x less but may depend on the game – and then infers a result, which at target resolution is similar quality to a Temporal Anti-Aliasing (TAA) result. A DLSS 2X mode exists, where the input is rendered at the final target resolution and then combined with a larger DLSS network. TAA is arguably not a very high bar to set – it’s also a hack of sorts that seeks to avoid doing real overdrawing in favor of post-processing – however NVIDIA is setting out to resolve some of TAA’s traditional inadequacies with DLSS, particularly blurring.

Now it should be noted that DLSS has to be trained per-game; it isn’t a one-size-fits all solution. This is done in order to apply a unique neutral network that’s appropriate for the game at-hand. In this case the neural networks are trained using 64x SSAA images, giving the networks a very high quality baseline to work against.

None the less, of NVIDIA’s two major gaming use cases for the tensor cores, DLSS is by far the more easily implemented. Developers need only to do some basic work to add NVIDIA’s NGX API calls to a game – essentially adding DLSS as a post-processing stage – and NVIDIA will do the rest as far as neural network training is concerned. So DLSS support will be coming out of the gate very quickly, while raytracing (and especially meaningful raytracing) utilization will take much longer.

In sum, then the upcoming game support aligns with the following table.

Planned NVIDIA Turing Feature Support for Games
Game Real Time Raytracing Deep Learning Supersampling (DLSS) Turing Advanced Shading
Ark: Survival Evolved   Yes  
Assetto Corsa Competizione Yes    
Atomic Heart Yes Yes  
Battlefield V Yes    
Control Yes    
Dauntless   Yes  
Darksiders III   Yes  
Deliver Us The Moon: Fortuna   Yes  
Enlisted Yes    
Fear The Wolves   Yes  
Final Fantasy XV   Yes  
Fractured Lands   Yes  
Hellblade: Senua's Sacrifice   Yes  
Hitman 2   Yes  
In Death     Yes
Islands of Nyne   Yes  
Justice Yes Yes  
JX3 Yes Yes  
KINETIK   Yes  
MechWarrior 5: Mercenaries Yes Yes  
Metro Exodus Yes    
Outpost Zero   Yes  
Overkill's The Walking Dead   Yes  
PlayerUnknown Battlegrounds   Yes  
ProjectDH Yes    
Remnant: From the Ashes   Yes  
SCUM   Yes  
Serious Sam 4: Planet Badass   Yes  
Shadow of the Tomb Raider Yes    
Stormdivers   Yes  
The Forge Arena   Yes  
We Happy Few   Yes  
Wolfenstein II     Yes
Meet The New Future of Gaming: Different Than The Old One Meet The GeForce RTX 2080 Ti & RTX 2080 Founders Editions Cards
Comments Locked

337 Comments

View All Comments

  • eddman - Thursday, September 20, 2018 - link

    It still doesn't justify their prices. Great cards, finally ray-tracing for games, horribly cutthroat prices.
  • Yojimbo - Saturday, September 22, 2018 - link

    So don't buy it, eddman. In the end the only real justification for prices is what people are willing to pay. If one isn't able to make a product cheaply enough for it to be sold for what people are willing to pay then the product is a bad product.

    I don't understand why you are so worried about the price. Or why you think they are "cut-throat". A cut-throat price is a very low price, not a high one.
  • eddman - Sunday, September 23, 2018 - link

    There is a wealthy minority who'd pay that much, and? It's only "justified" if you are an nvidia shareholder.

    The cards are overpriced compared to last gen and that's an absolute fact. Your constant defending of nvidia's pricing is certainly not a normal consumer behavior.
  • mapesdhs - Wednesday, September 26, 2018 - link

    Yojimbo is right that an item is only ever worth what someone is willing to pay, so in that sense NVIDIA can do what it likes, in the end it's up to the market, to consumers, whether the prices "make sense", ie. whether people actually buy them. In this regard the situation we have atm is largely that made by gamers themselves, because even when AMD released competitive products (whether by performance, value, or both), people didn't buy them. There are even people saying atm they hope AMD can release something to compete with Turing just so NVIDIA will drop its prices and thus they can buy a cheaper NVIDIA card; that's completely crazy, AMD would be mad to make something if that's how the market is going to respond.

    What's interesting this time though is that even those who in the past have been happy to buy the more expensive cards are saying they're having major hesitation about buying Turing, and the street cred which used to be perceived as coming with buying the latest & greatest has this time largely gone, people are more likely to react like someone is a gullible money pumped moron for buying these products ("More money than sense!", as my parents used to say). By contrast, when the 8800 GTX came out, that was a huge leap over the 7800 and people were very keen to get one, those who could afford it. Having one was cool. Ditto the later series right through to Maxwell (though a bit of a dip with the GTX 480 due to heat/power). The GTX 460 was a particularly good release (though the endless rebranding later was annoying). Even Pascal was a good bump over what had come before.

    Not this time though, it's a massive price increase for little gain, while the headline features provide sub-60Hz performance at a resolution far below what NVIDIA themselves have been pushing as desirable for the last 5 years (the focus has been on high frequency monitors, 4K and VR); now NVIDIA is trying to roll back the clock, which won't work, especially since those who've gotten used to high frequency monitors physically cannot go back (ref New Scientist, changes in the brain's vision system).

    Thus, eddman is right that the card's are overpriced in a general sense, as they don't remotely match what the market has come to expect from NVIDIA based on previous releases. However, if gamers don't vote with their wallets then nothing will change. Likewise, if AMD releases something just as good, or better value, but gamers don't buy them, then again nothing will change, we'll be stuck with this new expensive normal.

    I miss the Fermi days, buy two GTX 460s to have better performance than a GTX 580, didn't cost much, games ran great, and the lesser VRAM didn't bother me anyway as I wasn't using an uber monitor. Now we have cards that cost many hundreds that don't even support multi-GPU. It's as daft as Intel making the cost entry point to >= 40 PCIe lanes much higher than it was with X79 (today it's almost 1000 UKP); an old cheapo 4820K can literally do things a 7820X can't. :D

    Alas though, again it boils down to individual choice. Some want the fastest possible and if they can afford it then that's up to them, it's their free choice, we don't have the right to tell people they shouldn't buy these cards. It's their money afterall (anything else is communism). It is though an unfortunate reality that if the cards do sell well then NVIDIA will know they can maintain this higher priced and more feature restricted strategy, while selling the premium parts to Enterprise. Btw, it amazes me how people keep comparing the 2080 to the 1080 Ti even though the former has less RAM; how is that an upgrade in the product stack? (people will respond with ray tracing! Ray tracing! A feature which can't be used yet and runs too slow to be useful anyway, and with an initial implementation that's a pretty crippled implementation of the idea aswell).And why doesn't the 2080 Ti have more than 11GB? It really should, unless NVIDIA figures that if they can indeed push people back to 1080p then 11GB is enough anyway, which would be ironic.

    I'm just going to look for a used 1080 Ti, more than enough for my needs. For those with much older cards, a used 980 Ti or 1070, or various AMD cards, are good options.

    Ian.
  • Yojimbo - Wednesday, September 19, 2018 - link

    Yes, exactly. A very appropriate quote.
  • Skiddywinks - Thursday, September 20, 2018 - link

    No reason Ford couldn't have done both though. There is no technological reason nVidia could not have released a GTX 2080 Ti as well. But they know they couldn't charge as much, and the vast majority of people would not buy the RTX version. Instead, it makes their 1080 Ti stock look much more appealing to for value oriented gamers, helping them shift that stock as well as charge a huge price for the new cards.

    It's really great business, but as a gamer and not a stockholder, I'm salty.
  • Spunjji - Friday, September 21, 2018 - link

    Ford didn't invent the car, though. Ford invented a way to make them cheaper.

    Ford's strategy was not to make a new car that might do something different one day and then charge through the effing nose for it.
  • Gastec - Thursday, September 27, 2018 - link

    That quote applies perfectly to our digital electronic World: we want to go faster from point A to point B. To do that, Henry Ford gave us a car (a faster "horse"). We want the same from GPUs and CPU's, to be faster. Prettier sure, pink even. But first just make it fast.
  • Writer's Block - Monday, October 1, 2018 - link

    Except there is no evidence he said that - it is a great statement though, and conveys the intended message well
  • Hxx - Wednesday, September 19, 2018 - link

    overall dissapointing performance. RTX 2080 is a flat out bad buy at $800+ when 1080 ti custom boards are as low as $600. the RTX 2080 TI is a straight up ripoff when consumers can easily surpass its performance with 2 x 1080 TIs. I agree on the conclusion though that you are buying hardware that you wont take adavantage of yet but still, if Nvidia wants to push this hardware to all gamers, they need to drop the pricing in line with their performance otherwise not many will buy into the hype.

Log in

Don't have an account? Sign up now