The Vega Architecture: AMD’s Brightest Day

From an architectural standpoint, AMD’s engineers consider the Vega architecture to be their most sweeping architectural change in five years. And looking over everything that has been added to the architecture, it’s easy to see why. In terms of core graphics/compute features, Vega introduces more than any other iteration of GCN before it.

Speaking of GCN, before getting too deep here, it’s interesting to note that at least publicly, AMD is shying away from the Graphics Core Next name. GCN doesn’t appear anywhere in AMD’s whitepaper, while in programmers’ documents such as the shader ISA, the name is still present. But at least for the purposes of public discussion, rather than using the term GCN 5, AMD is consistently calling it the Vega architecture. Though make no mistake, this is still very much GCN, so AMD’s basic GPU execution model remains.

So what does Vega bring to the table? Back in January we got what has turned out to be a fairly extensive high-level overview of Vega’s main architectural improvements. In a nutshell, Vega is:

  • Higher clocks
  • Double rate FP16 math (Rapid Packed Math)
  • HBM2
  • New memory page management for the high-bandwidth cache controller
  • Tiled rasterization (Draw Stream Binning Rasterizer)
  • Increased ROP efficiency via L2 cache
  • Improved geometry engine
  • Primitive shading for even faster triangle culling
  • Direct3D feature level 12_1 graphics features
  • Improved display controllers

The interesting thing is that even with this significant number of changes, the Vega ISA is not a complete departure from the GCN4 ISA. AMD has added a number of new instructions – mostly for FP16 operations – along with some additional instructions that they expect to improve performance for video processing and some 8-bit integer operations, but nothing that radically upends Vega from earlier ISAs. So in terms of compute, Vega is still very comparable to Polaris and Fiji in terms of how data moves through the GPU.

Consequently, the burning question I think many will ask is if the effective compute IPC is significantly higher than Fiji, and the answer is no. AMD has actually taken significant pains to keep the throughput latency of a CU at 4 cycles (4 stages deep), however strictly speaking, existing code isn’t going to run any faster on Vega than earlier architectures. In order to wring the most out of Vega’s new CUs, you need to take advantage of the new compute features. Note that this doesn’t mean that compilers can’t take advantage of them on their own, but especially with the datatype matters, it’s important that code be designed for lower precision datatypes to begin with.

Vega 10: Fiji of the Stars Rapid Packed Math: Fast FP16 Comes to Consumer Cards
POST A COMMENT

215 Comments

View All Comments

  • Scabies - Monday, August 14, 2017 - link

    SR-IOV? Reply
  • bcronce - Monday, August 14, 2017 - link

    Exactly. I REALLY want to run my games in a VM guest. Reply
  • sutamatamasu - Monday, August 14, 2017 - link

    In RTG slide on architecture side. Vega have some MB SRAM. Can you tell me what this SRAM use for? Reply
  • DanNeely - Monday, August 14, 2017 - link

    Various caches and internal buffers; on die memory is normally SRAM because it's several times faster than DRAM. (DRAM is several times denser since it only uses 1 transistor/bit vs the 4(?) for SRAM; which is why its used for main memory where total capacity is more important - and where the data bus is the main latency source anyway.) I'd be curious what the breakdown is since only 4MB if it's in the L2 cache. Reply
  • sutamatamasu - Monday, August 14, 2017 - link

    Yes, same with me. Like we all know GCN 5 has no change on L2 Cache size but i am curious, AMD say this SRAM and L2 Cache size differently. Reply
  • extide - Monday, August 14, 2017 - link

    A lot of it is going to be in the low level L1 caches and stuff local to the shaders -- there are a lot of shaders, so it will add up fast. GCN 5 does have double L2 cache, at least according to this article, 4MB vs 2MB. AMD says there is a total of over 45MB of SRAM on there, which is pretty impressive for a GPU! Reply
  • ratbuddy - Monday, August 14, 2017 - link

    I'm disappointed that Vega Frontier results were not included in the benches :-/ Reply
  • Ryan Smith - Monday, August 14, 2017 - link

    AMD did not sample that card, and there's not much of a reason for us to include it now when the RX Vega is faster. Reply
  • Nfarce - Monday, August 14, 2017 - link

    Another Fury X fail. You'd have to be a hard core AMD fan to buy this over a GTX 1080, and that's not even taking into consideration the horrid power use compared to the 1080. Isn't that what AMD fans tell us is so important when comparing Ryzen to i7 CPUs in core/watt performance? Amazingly they are silent here. Reply
  • IchiOni - Monday, August 14, 2017 - link

    I do not care about power consumption. Only poor people care about power consumption. I will be purchasing an air cooled Vega 64. Reply

Log in

Don't have an account? Sign up now