The Vega Architecture: AMD’s Brightest Day

From an architectural standpoint, AMD’s engineers consider the Vega architecture to be their most sweeping architectural change in five years. And looking over everything that has been added to the architecture, it’s easy to see why. In terms of core graphics/compute features, Vega introduces more than any other iteration of GCN before it.

Speaking of GCN, before getting too deep here, it’s interesting to note that at least publicly, AMD is shying away from the Graphics Core Next name. GCN doesn’t appear anywhere in AMD’s whitepaper, while in programmers’ documents such as the shader ISA, the name is still present. But at least for the purposes of public discussion, rather than using the term GCN 5, AMD is consistently calling it the Vega architecture. Though make no mistake, this is still very much GCN, so AMD’s basic GPU execution model remains.

So what does Vega bring to the table? Back in January we got what has turned out to be a fairly extensive high-level overview of Vega’s main architectural improvements. In a nutshell, Vega is:

  • Higher clocks
  • Double rate FP16 math (Rapid Packed Math)
  • HBM2
  • New memory page management for the high-bandwidth cache controller
  • Tiled rasterization (Draw Stream Binning Rasterizer)
  • Increased ROP efficiency via L2 cache
  • Improved geometry engine
  • Primitive shading for even faster triangle culling
  • Direct3D feature level 12_1 graphics features
  • Improved display controllers

The interesting thing is that even with this significant number of changes, the Vega ISA is not a complete departure from the GCN4 ISA. AMD has added a number of new instructions – mostly for FP16 operations – along with some additional instructions that they expect to improve performance for video processing and some 8-bit integer operations, but nothing that radically upends Vega from earlier ISAs. So in terms of compute, Vega is still very comparable to Polaris and Fiji in terms of how data moves through the GPU.

Consequently, the burning question I think many will ask is if the effective compute IPC is significantly higher than Fiji, and the answer is no. AMD has actually taken significant pains to keep the throughput latency of a CU at 4 cycles (4 stages deep), however strictly speaking, existing code isn’t going to run any faster on Vega than earlier architectures. In order to wring the most out of Vega’s new CUs, you need to take advantage of the new compute features. Note that this doesn’t mean that compilers can’t take advantage of them on their own, but especially with the datatype matters, it’s important that code be designed for lower precision datatypes to begin with.

Vega 10: Fiji of the Stars Rapid Packed Math: Fast FP16 Comes to Consumer Cards
POST A COMMENT

215 Comments

View All Comments

  • ddriver - Tuesday, August 15, 2017 - link

    More like "since nvidia castrates FP64 performance like razy" and "AT sure doesn't want to make nvidia look bad"... Reply
  • Manch - Tuesday, August 15, 2017 - link

    and here we go with the shill comments.... Reply
  • ZeDestructor - Tuesday, August 15, 2017 - link

    AMD would look just as bad given they cut down FP64 just as much on modern cards. Reply
  • zoxo - Monday, August 14, 2017 - link

    fp32 works well for MD tasks, there is not much need for double precision atm. Reply
  • mapesdhs - Monday, August 14, 2017 - link

    If you need FP64, just stuff in some cheap, used GTX 580s. Or hunt for an original Titan or two. Actually, a Quadro 6000 is also a pretty decent buy for FP64. Reply
  • abrowne1993 - Monday, August 14, 2017 - link

    I imagine a lot of people are very happy to have something when the embargo lifts even if it's not complete yet. Reply
  • vanilla_gorilla - Monday, August 14, 2017 - link

    Thank you for not making us wait for the whole thing, that's awesome. Reply
  • Xajel - Monday, August 14, 2017 - link

    Take your time with it...

    On a Side note, I would love to see a revisit on the the current status of GPU's and iGPU's (&CPU's & APU) on the Media Centric usage scenario (like the old HTPC GPU roundup), but with increased modern media usage scenarios like for example streaming, 4K, x265, HEVC, Plex, transcoding, encoding, etc...
    Reply
  • npz - Monday, August 14, 2017 - link

    I don't know if you'll include this in the update, but regarding fp16/int16:

    "So for AMD there’s a real risk of developers not bothering with FP16 support when 70% of all GPUs sold similarly don’t support it. It will be an uphill battle, but one that can significantly improve AMD’s performance if they can win it, and even more so if NVIDIA chooses not to budge on their position."

    AMD does have an advantage actually, as fp16 operations are already used in PS4 Pro:

    http://www.gamepur.com/news/27399-f1-2017-ps4-pro-...
    > The packed fp16 ALU operations in PlayStation 4 Pro helped the development at Codemasters to improve track shaders. "Packed fp16 ALU operations, which are unique to the PS4 Pro in the console space, offer us a very powerful tool for optimizing shaders, allowing us to enhance CU occupancy and reduce overall instruction count" explained Codemasters.

    http://gamingbolt.com/mass-effect-andromeda-ps4-pr...
    > Mass Effect Andromeda PS4 Pro – Dev Explains Benefits of Checkerboard Rendering, 30% Improvement Due To FP16
    Reply
  • Yojimbo - Monday, August 14, 2017 - link

    "AMD does have an advantage actually, as fp16 operations are already used in PS4 Pro:"

    What do you think the market share of the PS4 Pro, in relation to the PS4 and the Xbox One, will turn out to be? If it were on the original PS4 that would be a different story.
    Reply

Log in

Don't have an account? Sign up now