The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX

Name: The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX
Item: The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX
Author: Nate Oh

by Nate Oh on September 14, 2018 12:30 PM EST

111 Comments | Add A Comment

111 Comments

Turing Tensor Cores: Leveraging Deep Learning Inference for Gaming

Though RT Cores are Turing’s poster child feature, the tensor cores were very much Volta’s. In Turing, they’ve been updated, reflecting its positioning as a gaming/consumer feature via inferencing. The main changes for the 2^nd generation tensor cores are INT8 and INT4 precision modes for inferencing, enabled by new hardware data paths, and perform dot products to accumulate into an INT32 product. INT8 mode operates at double the FP16 rate, or 2048 integer operations per clock. INT4 mode operates at quadruple the FP16 rate, or 4096 integer ops per clock.

Naturally, only some networks tolerate these lower precisions and any necessary quantization, meaning the storage and calculation of compacted format data. INT4 is firmly in the research area, whereas INT8’s practical applicability is much more developed. Regardless, the 2^nd generation tensor cores still have FP16 mode, which they now support in a pure FP16 mode without FP32 accumulator. While CUDA 10 is not yet out, the enhanced WMMA operations should shed light on any other differences, such as additional accepted matrix sizes for operands.

Inasmuch as deep learning is involved, NVIDIA is pushing what was a purely compute/professional feature into consumer territory, and we will go over the full picture in a later section. For Turing, the tensor cores can accelerate the features under the NGX umbrella, which includes DLSS. They can also accelerate certain AI-based denoisers that cleanup and correct real time raytraced rendering, though most developers seem to be opting for non-tensor core accelerated denoisers at the moment.

Turing RT Cores: Hybrid Rendering and Real Time Raytracing The Turing Trio: TU102, TU104, & TU106

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

111 Comments

View All Comments

gglaw - Saturday, September 15, 2018 - link
Why bother to make up statements claiming the prices are completely as expected with inflation added without even having a slight clue what the inflation rate has been in recent history? Outside of the very young readers here, most of us were around for 700 series, 8800, etc. and know first hand what type of changes inflation has had in the last 10-20 years. Especially comparing to the 980 Ti, and 1080 Ti, inflation has barely moved since those releases.
Spunjji - Monday, September 17, 2018 - link
This. Most people here aren't stupid.
notashill - Saturday, September 15, 2018 - link
700 series wasn't even close. 780 was $650->adjusted ~$700, 780Ti was $700->adjusted ~$760. And the 780 MSRP dropped to $500 after 6 months when the Ti launched.
Santoval - Monday, September 17, 2018 - link
Yes, Navi will be midrange, at around a GTX 1080 performance level, or at best a bit faster. They initially planned a dual Navi package for the high end, linked by Infinity Fabric, but they canned (or postponed) it, due to the reluctance of game developers to support dual-die consumer graphics cards (according to AMD). They might release dual Navi professional graphics cards though.
Tensor and RT cores should not be expected either. These will have to wait for the post-Navi (and post-GCN) generation.
TropicMike - Friday, September 14, 2018 - link
Good article. Lots of complicated stuff to try to explain.

Just a quick typo on page 2: "It’s in pixel shaders that the various forms of lighting (shadows, reflection, reflection, etc) " I'm guessing you meant 'refraction' for one of those.
Smell This - Wednesday, July 3, 2019 - link
Super **Duper** Turbo Hyper Championship Edition
Yaldabaoth - Friday, September 14, 2018 - link
For the "eye diagram" on page 8, the texts says, "In this case we’re looking at a fairly clean eye diagram, illustrating the very tight 70ns transitions between data transfers." However, the image is labeled as "70 ps".
Ryan Smith - Friday, September 14, 2018 - link
Nano. Pico. Really, it's a small difference... =P

Thanks!
Bulat Ziganshin - Friday, September 14, 2018 - link
It's not "Volta in spirit". It's Volta for the masses. The only differences
- reduced FP64 cores
- reduced sharedmem/cache from 128 KB to 96 KB
- added RT cores

Now let's check what you want to change to produce "scientific" Turing GPU. Yes, exactly these things. So, despite the name, it's the same architecture, tuned for the gaming market
Yojimbo - Saturday, September 15, 2018 - link
You don't really know that. This article, as explained in the beginning, focuses only on the RT core improvements. There are other Turing features that were left out. I think we have no idea if Volta has variable rate shading, mesh shading,or multi-view rendering. I'm guessing it does not.

Besides, what you said isn't true even limiting the discussion to what was covered in this article. The Turing Tensor cores allow for a greater range of precisions.

The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX

Turing Tensor Cores: Leveraging Deep Learning Inference for Gaming

Post Your Comment

111 Comments

View All Comments

gglaw - Saturday, September 15, 2018 - link

Spunjji - Monday, September 17, 2018 - link

notashill - Saturday, September 15, 2018 - link

Santoval - Monday, September 17, 2018 - link

TropicMike - Friday, September 14, 2018 - link

Smell This - Wednesday, July 3, 2019 - link

Yaldabaoth - Friday, September 14, 2018 - link

Ryan Smith - Friday, September 14, 2018 - link

Bulat Ziganshin - Friday, September 14, 2018 - link

Yojimbo - Saturday, September 15, 2018 - link

Log in

Don't have an account? Sign up now