Turing Tensor Cores: Leveraging Deep Learning Inference for Gaming

Though RT Cores are Turing’s poster child feature, the tensor cores were very much Volta’s. In Turing, they’ve been updated, reflecting its positioning as a gaming/consumer feature via inferencing. The main changes for the 2nd generation tensor cores are INT8 and INT4 precision modes for inferencing, enabled by new hardware data paths, and perform dot products to accumulate into an INT32 product. INT8 mode operates at double the FP16 rate, or 2048 integer operations per clock. INT4 mode operates at quadruple the FP16 rate, or 4096 integer ops per clock.

Naturally, only some networks tolerate these lower precisions and any necessary quantization, meaning the storage and calculation of compacted format data. INT4 is firmly in the research area, whereas INT8’s practical applicability is much more developed. Regardless, the 2nd generation tensor cores still have FP16 mode, which they now support in a pure FP16 mode without FP32 accumulator. While CUDA 10 is not yet out, the enhanced WMMA operations should shed light on any other differences, such as additional accepted matrix sizes for operands.

Inasmuch as deep learning is involved, NVIDIA is pushing what was a purely compute/professional feature into consumer territory, and we will go over the full picture in a later section. For Turing, the tensor cores can accelerate the features under the NGX umbrella, which includes DLSS. They can also accelerate certain AI-based denoisers that cleanup and correct real time raytraced rendering, though most developers seem to be opting for non-tensor core accelerated denoisers at the moment.

Turing RT Cores: Hybrid Rendering and Real Time Raytracing The Turing Trio: TU102, TU104, & TU106
Comments Locked

111 Comments

View All Comments

  • gglaw - Saturday, September 15, 2018 - link

    Why bother to make up statements claiming the prices are completely as expected with inflation added without even having a slight clue what the inflation rate has been in recent history? Outside of the very young readers here, most of us were around for 700 series, 8800, etc. and know first hand what type of changes inflation has had in the last 10-20 years. Especially comparing to the 980 Ti, and 1080 Ti, inflation has barely moved since those releases.
  • Spunjji - Monday, September 17, 2018 - link

    This. Most people here aren't stupid.
  • notashill - Saturday, September 15, 2018 - link

    700 series wasn't even close. 780 was $650->adjusted ~$700, 780Ti was $700->adjusted ~$760. And the 780 MSRP dropped to $500 after 6 months when the Ti launched.
  • Santoval - Monday, September 17, 2018 - link

    Yes, Navi will be midrange, at around a GTX 1080 performance level, or at best a bit faster. They initially planned a dual Navi package for the high end, linked by Infinity Fabric, but they canned (or postponed) it, due to the reluctance of game developers to support dual-die consumer graphics cards (according to AMD). They might release dual Navi professional graphics cards though.
    Tensor and RT cores should not be expected either. These will have to wait for the post-Navi (and post-GCN) generation.
  • TropicMike - Friday, September 14, 2018 - link

    Good article. Lots of complicated stuff to try to explain.

    Just a quick typo on page 2: "It’s in pixel shaders that the various forms of lighting (shadows, reflection, reflection, etc) " I'm guessing you meant 'refraction' for one of those.
  • Smell This - Wednesday, July 3, 2019 - link

    Super **Duper** Turbo Hyper Championship Edition
  • Yaldabaoth - Friday, September 14, 2018 - link

    For the "eye diagram" on page 8, the texts says, "In this case we’re looking at a fairly clean eye diagram, illustrating the very tight 70ns transitions between data transfers." However, the image is labeled as "70 ps".
  • Ryan Smith - Friday, September 14, 2018 - link

    Nano. Pico. Really, it's a small difference... =P

    Thanks!
  • Bulat Ziganshin - Friday, September 14, 2018 - link

    It's not "Volta in spirit". It's Volta for the masses. The only differences
    - reduced FP64 cores
    - reduced sharedmem/cache from 128 KB to 96 KB
    - added RT cores

    Now let's check what you want to change to produce "scientific" Turing GPU. Yes, exactly these things. So, despite the name, it's the same architecture, tuned for the gaming market
  • Yojimbo - Saturday, September 15, 2018 - link

    You don't really know that. This article, as explained in the beginning, focuses only on the RT core improvements. There are other Turing features that were left out. I think we have no idea if Volta has variable rate shading, mesh shading,or multi-view rendering. I'm guessing it does not.

    Besides, what you said isn't true even limiting the discussion to what was covered in this article. The Turing Tensor cores allow for a greater range of precisions.

Log in

Don't have an account? Sign up now