AnandTech Year in Review 2018: GPUsby Ryan Smith on December 26, 2018 11:00 AM EST
NVIDIA Turing Turns to Ray Tracing
The tentpole event of the GPU industry is of course the launch of a new architecture and its chips, and 2018 didn’t disappoint. Over the summer NVIDIA launched their Turing GPU architecture, and with it their new GeForce RTX 20 series of video cards.
Turing itself is an interesting beast, as NVIDIA used the new architecture to overhaul parts of their designs and in order places to introduce new features entirely. The core GPU architecture is essentially a Volta derivative with additional features; this is a notable distinction as while Volta has been available in servers as Tesla V100 since the middle of 2017, it never came to the consumer market. From a consumer standpoint then, Turing is the biggest update to NVIDIA’s core GPU architecture since the launch of Maxwell (1) over four and a half years ago.
Though covering the full depths of what Turing’s core architecture entails is best left for Nate Oh’s fantastic Turing Deep Dive, in short the new architecture further optimizes NVIDIA’s performance and workflow by reorganizing the layout of an individual SM, and for the first time (for a consumer part) breaking out the Integer units into their own execution block. The net result is that a single Turing SM is now composed of 4 processing blocks, each containing 16 FP cores and 16 INT cores. The benefit of this change is that it allows integer instructions to be more readily executed alongside floating point instructions, whereas previously the two occupied the same slot. Meanwhile NVIDIA also updated the cache system, introducing an L0 instruction cache to better feed all of their cores.
With all of that said, as it turns out the marquee feature improvement for Turing isn’t even part of the GPU’s core compute architecture, rather it’s new hardware entirely: everything NVIDIA needs to accelerate ray-tracing on a GPU. Long considered the holy grail of graphics due to its accuracy and quality – and long out of reach of GPUs due to its absurd performance requirements – GPUs are finally getting to the point where they’re fast enough to mix in ray tracing with traditional rasterization for improving graphics in a measured manner.
Ray Tracing Diagram (Henrik / CC BY-SA 4.0)
Turing in turn introduces two new hardware units (relative to consumer Pascal) to achieve this. The first is what NVIDIA calls an RT core, which is their hardware block for actually computing the all-important ray intersections. The second hardware unit are tensor cores, which are actually another carryover from Volta. The tensor cores excel at neural network execution, and while they have many purposes – as demonstrated with NVIDIA’s Tesla accelerators – for ray tracing their purpose is to help smooth out the rough output of the ray tracing process itself. By applying a neural network model to the initial, grainy output of the ray tracing unit, NVIDIA is able to save a lot of expensive computational work by firing off far fewer rays than would otherwise be necessary for a clean ray-traced image.
Truthfully, the results of this whole ray tracing endeavor at a bit mixed right now since we’re still in the very early days of the technology. Microsoft only announced the relevant DXR standard earlier this year, and the first games with ray tracing features are just now shipping, which means developers haven’t had much time to integrate and optimize the technology. The resulting image quality improvement isn’t a night-and-day difference, which make it a bit harder for NVIDIA to quickly sell consumers on the idea. But we’re expecting to see the level of integration and resulting performance to improve over time, as game developers get better acquainted with the technology and what to use it for.
In the meantime, while the GeForce RTX 20 series is a major step up from its predecessor in terms of features, the resulting performance gains at every price segment are much smaller than what we usually see from a new GPU architecture launch. A $500 RTX 2070 is only around 10% faster than what was a $500 GTX 1080, unlike the 50%+ gains of years gone by. There are a few reasons for why the latest cards haven’t significantly moved the needle on price-to-performance ratios, but the biggest factors are that the transistors allocated to Turing’s RT features can’t be used for traditional rasterization – meaning they add nothing to the performance of existing games – and because NVIDIA is carefully controlling GPU prices to deal with the inventory issues mentioned earlier. As long as NVIDIA is sitting on leftover Pascal GPUs to sell, they aren’t going to be in a hurry to sell Turing GPUs at low prices.
As for individual cards, at the moment we’ve seen the launch of three consumer cards – RTX 2070, RTX 2080, and RTX 2080 Ti – along with the more professionally-oriented Titan RTX. With even the cheapest Turing card going for $500 and mobile variants nowhere to be found, I don’t expect that NVIDIA is done rolling out their RTX 20 series quite yet.
The Incredible Shrinking Polaris
While AMD is between GPU architectures for 2018 – Vega was launched last year and Navi will launch in 2019 – AMD hasn’t spent the year entirely idle. The company’s other child, the workhorse architecture that is Polaris, received a somewhat oddly timed die shrink.
This fall AMD started shipping Polaris 30, a version of Polaris 10 that is built on long-time partner GlobalFoundries’ 12nm process. In practice Polaris 30’s die size isn’t any smaller than Polaris 10’s – officially, AMD lists it at the same 232mm2 as Polaris 10 – however AMD has tapped the 12nm process’s general performance improvements to give the Polaris 10 design a late-life performance boost.
Paired with the launch of Polaris 30 is the Radeon RX 590, which is the first (and thus far, only) video card to use the new GPU. By going all-out on performance (and throwing power efficiency into the wind), AMD has been able to muster enough performance to consistently and convincingly pull ahead of the GeForce GTX 1060 6GB, the RX 480/580’s Green competitor for the last two years. To be sure, the resulting performance increase isn’t very big, gaining an average of 12% over the RX 580. But this is enough to keep it consistently ahead of the GeForce GTX 1060 by around 9%. And given the relatively high volume of cards sold in this mainstream market segment, it’s an important win for AMD and should be a good morale boost for the GPU group after the Radeon RX Vega family didn’t quite land where AMD wanted it to.
The catch for now with Polaris 30/RX 590 is pricing, especially in light of the numerous RX 580 sales already going on. AMD launched the card at $279, and that’s where it stays to this day. And while faster than the GTX 1060, it’s also priced so far ahead of the RX 580 (regularly found at $199) that the RX 580 is serving as a spoiler to the RX 590. Which if nothing else helps move RX 580 cards, but doesn’t do the RX 590 any favors.
Intel Goes Xe
Last but not least we have Intel. The blue team is currently in the middle of an extensive process to ramp up and become the third major GPU vendor in the industry, a process that started with the hiring of Raja Koduri from AMD back in late 2017. At the time the company also announced that they would be developing discrete GPUs, and those plans are starting to fall into place.
For Intel, their 2018 was all about laying out their future GPU plans and illustrating to consumers and partners alike how they’re going to get from today’s integrated graphics to a top-to-bottom range of integrated and discrete GPUs. Intel doesn’t have any hardware to show in 2018 – they didn’t even launch a new iGPU this year – and rather the focus for the company is on 2020, when their new GPU family will launch.
Announced at their Architecture Day event earlier this month, Intel’s discrete GPUs will be sold under the Xe brand. Nothing has been published about the architecture itself at this point, but Intel intends for Xe to be the foundation for several generations of graphics going forward. Xe will also be a true top-to-bottom stack, with the company intending to use it for everything from iGPUs up to datacenter accelerators as a replacement for Xeon Phi (itself an offshoot of the Larrabee GPU project).
Ultimately we’re talking about a GPU architecture that’s still more than a year off, so there’s still a lot of time for plans to change and for Intel to plot about how they want to handle their Xe disclosures. But it’s clear that the company is no longer content to sit on the sidelines and let the high-margin GPU accelerator market grow all around them. So it should be interesting to see how Intel fares in jumping into a market that hasn’t seen a viable third-party competitor in over 15 years.
Looking Forward to 2019
Finally, let’s break out the crystal ball for a quick look at some of the things we should see in 2019.
All but given at this point is more new GPUs from NVIDIA. As the current GeForce RTX 20 series product stack stops at $500, they’re going to need to introduce new products to finish refreshing the product lineup. The GeForce GTX 1060 in particular is due for a successor, owing both to its importance in NVIDIA’s product stack as their high-volume mainstream video card, and because of the challenge posed by AMD’s Radeon RX 590. We may see this as soon as CES 2019 – where NVIDIA is once again giving a presentation – but if not there, then I’d expect to see it not too long thereafter.
Meanwhile for AMD, 2019 is going to be the year of Navi. AMD has been playing their cards very close to their chest on this one, and besides the fact that it will be built on a 7nm process and will utilize a next-generation memory technology (presumably GDDR6), little else has been said. By the time AMD does launch Navi, Vega will be coming up on 2 years old and Polaris 3, so it’s possible that we’ll see AMD do a top-to-bottom refresh here in order to bring everything in sync. However it’s also equally possible that they’ll replace either the top (Vega) or bottom (Polaris) end of the market first, as this is more in-line with how AMD has operated over the past half-decade.
The biggest wildcard for a moment then is what, if anything, NVIDIA does this year to take advantage of 7nm production. A replacement for GV100 at the very high end is a likely candidate – server customers can afford the price tag that comes with low margin production – however consumer parts are a bit more nebulous. NVIDIA surprised a lot of people by launching the 12nm Turing parts right when 7nm was entering mass production, getting these parts to market sooner but missing out on the density and power efficiency improvements of 7nm in the process. A 7nm mid-generation refresh is not out of the picture, however NVIDIA hasn’t done a refresh like that in almost a decade. But then again, the current fab situation is unparalleled; as Moore’s Law continues to slow down, the standard 2-year GPU design cycle and the fab upgrade cycle are getting increasingly out of sync. So there are good arguments to be made on both sides, and it should prove interesting to see which route NVIDIA ultimately takes for 2019.