The Vega Architecture: AMD’s Brightest Day

From an architectural standpoint, AMD’s engineers consider the Vega architecture to be their most sweeping architectural change in five years. And looking over everything that has been added to the architecture, it’s easy to see why. In terms of core graphics/compute features, Vega introduces more than any other iteration of GCN before it.

Speaking of GCN, before getting too deep here, it’s interesting to note that at least publicly, AMD is shying away from the Graphics Core Next name. GCN doesn’t appear anywhere in AMD’s whitepaper, while in programmers’ documents such as the shader ISA, the name is still present. But at least for the purposes of public discussion, rather than using the term GCN 5, AMD is consistently calling it the Vega architecture. Though make no mistake, this is still very much GCN, so AMD’s basic GPU execution model remains.

So what does Vega bring to the table? Back in January we got what has turned out to be a fairly extensive high-level overview of Vega’s main architectural improvements. In a nutshell, Vega is:

  • Higher clocks
  • Double rate FP16 math (Rapid Packed Math)
  • HBM2
  • New memory page management for the high-bandwidth cache controller
  • Tiled rasterization (Draw Stream Binning Rasterizer)
  • Increased ROP efficiency via L2 cache
  • Improved geometry engine
  • Primitive shading for even faster triangle culling
  • Direct3D feature level 12_1 graphics features
  • Improved display controllers

The interesting thing is that even with this significant number of changes, the Vega ISA is not a complete departure from the GCN4 ISA. AMD has added a number of new instructions – mostly for FP16 operations – along with some additional instructions that they expect to improve performance for video processing and some 8-bit integer operations, but nothing that radically upends Vega from earlier ISAs. So in terms of compute, Vega is still very comparable to Polaris and Fiji in terms of how data moves through the GPU.

Consequently, the burning question I think many will ask is if the effective compute IPC is significantly higher than Fiji, and the answer is no. AMD has actually taken significant pains to keep the throughput latency of a CU at 4 cycles (4 stages deep), however strictly speaking, existing code isn’t going to run any faster on Vega than earlier architectures. In order to wring the most out of Vega’s new CUs, you need to take advantage of the new compute features. Note that this doesn’t mean that compilers can’t take advantage of them on their own, but especially with the datatype matters, it’s important that code be designed for lower precision datatypes to begin with.

Vega 10: Fiji of the Stars Rapid Packed Math: Fast FP16 Comes to Consumer Cards
POST A COMMENT

215 Comments

View All Comments

  • npz - Monday, August 14, 2017 - link

    My point was that since most modern games have recieved enhancements for ps4 pro and more will moving forward -- given it's the engine the devs use -- and that the vast majority are cross platform, then major PC games will already have a built in fp16 optimazation path to be taken advantage of.

    Also don't forget s
    Scorpio's arrival which will likely feature the same, so there's would be even more incentive for using this on PC
    Reply
  • Yojimbo - Tuesday, August 15, 2017 - link

    From what I have heard, Scorpio will not contain double rate fp16.

    And I am not sure that your claim that most modern game engines have been enhanced to take advantage of double rate fp16. I highly doubt that's true. Maybe a few games have cobbled in code to take advantage of low-hanging fp16 fruit.

    As far as AMD's "advantage", don't forget that NVIDIA had double rate FP16 before AMD. They left it out of Pascal to help differentiate their various data center cards (namely the P100 from the P40) in machine learning tasks. But now that the Volta GV100 has tensor cores it's not necessary to restrict double rate FP16 to only the GV100. For all we know double rate FP16 will be in their entire Volta lineup.
    Reply
  • Yojimbo - Tuesday, August 15, 2017 - link

    edit: I meant to say "They left it out of mainstream Pascal..." (as in GP102, GP104, GP106, GP107, GP108) Reply
  • Santoval - Tuesday, August 15, 2017 - link

    I am almost 100% certain that consumer Volta GPUs will have disabled double rate FP16 and completely certain that it will have disabled tensor cores. Otherwise they will kiss their super high margins of professional GPU cards goodbye, and Nvidia is never going to do that. Tensor cores were largely added so that Nvidia can compete with Google's tensor CPU in the AI / deep learning space. Google still does not sell that CPU but that might change. Unlike Google's CPU, which can be used only for AI inference, Volta's tensor cores will do both inference and training, and that is very important for this market. Reply
  • Yojimbo - Wednesday, August 16, 2017 - link

    Well, my point was that since they have tensor cores they can afford to have double rate FP16, so of course I agree that there will not be tensor cores enabled on consumer Volta cards. If the tensor cores give significantly superior performance to simple double rate FP16 (and NVIDIA's benchmarks show that they do) then why would NVIDIA need to wall off simple double rate FP16 to protect their V100 card? As much as NVIDIA want to try to protect their margins they also need to stave off competition. The tensor cores allow them to do both at once. They push forward the capabilities of the ultra high end (V100 while allowing double rate FP16 to trickle down to cheaper cards to stave off competition. I am not saying that I think they definitely will do it, but I see the opportunity is there. Frankly, I think the reason they wouldn't do it is if they don't think the cost of power budget or dollars to implement it is worth the gain in performance in gaming. Also, perhaps they want to create three tiers: the V100 with tensor cores, the Volta Titan X and/or Tesla V40 with double rate FP16, and everything else.

    As far as Google's TPUs, their TPU 2 can do training and inferencing. Their first TPU did only inferencing on 8 bit quantized (integer) networks. The TPU 2 does training and inferencing on FP16-based networks. The advantage NVIDIA's GPUs have are that they are general purpose parallel processors, and not specific to running computations for convolutional neural networks.
    Reply
  • Santoval - Tuesday, August 15, 2017 - link

    Nope, it was explicitly stated by MS that Scorpio's GPU will ship with disabled Rapid Math. Why? I have no idea. Reply
  • Nintendo Maniac 64 - Tuesday, August 15, 2017 - link

    Codemasters apparently doesn't realize that the Tegra X1 used in the Nintendo Switch also supports fp16, so it's not something unique to the PS4 Pro... Reply
  • OrphanageExplosion - Tuesday, August 15, 2017 - link

    There was also FP16 support in the PlayStation 3's RSX GPU. Generally speaking, the PS3 still lagged behind Xbox 360 in platform comparisons.

    The 30% perf improvement for Mass Effect is referring to the checkerboard resolve shader, not the entire rendering pipeline.

    For a more measured view of what FP16 brings to the table, check out this post: http://www.neogaf.com/forum/showpost.php?p=2223481...
    Reply
  • Wise lnvestor - Tuesday, August 15, 2017 - link

    Did you even read the gamingbolt article? And look at the picture? When a dev talk about how much they saved in milliseconds, IT IS THE ENTIRE rendering pipeline. Reply
  • romrunning - Monday, August 14, 2017 - link

    6th para - "seceded" should be "ceded" - AMD basically yielded the high-market to Nvidia, not "withdraw" to Nvidia. :) Reply

Log in

Don't have an account? Sign up now