The Vega Architecture: AMD’s Brightest Day

From an architectural standpoint, AMD’s engineers consider the Vega architecture to be their most sweeping architectural change in five years. And looking over everything that has been added to the architecture, it’s easy to see why. In terms of core graphics/compute features, Vega introduces more than any other iteration of GCN before it.

Speaking of GCN, before getting too deep here, it’s interesting to note that at least publicly, AMD is shying away from the Graphics Core Next name. GCN doesn’t appear anywhere in AMD’s whitepaper, while in programmers’ documents such as the shader ISA, the name is still present. But at least for the purposes of public discussion, rather than using the term GCN 5, AMD is consistently calling it the Vega architecture. Though make no mistake, this is still very much GCN, so AMD’s basic GPU execution model remains.

So what does Vega bring to the table? Back in January we got what has turned out to be a fairly extensive high-level overview of Vega’s main architectural improvements. In a nutshell, Vega is:

  • Higher clocks
  • Double rate FP16 math (Rapid Packed Math)
  • HBM2
  • New memory page management for the high-bandwidth cache controller
  • Tiled rasterization (Draw Stream Binning Rasterizer)
  • Increased ROP efficiency via L2 cache
  • Improved geometry engine
  • Primitive shading for even faster triangle culling
  • Direct3D feature level 12_1 graphics features
  • Improved display controllers

The interesting thing is that even with this significant number of changes, the Vega ISA is not a complete departure from the GCN4 ISA. AMD has added a number of new instructions – mostly for FP16 operations – along with some additional instructions that they expect to improve performance for video processing and some 8-bit integer operations, but nothing that radically upends Vega from earlier ISAs. So in terms of compute, Vega is still very comparable to Polaris and Fiji in terms of how data moves through the GPU.

Consequently, the burning question I think many will ask is if the effective compute IPC is significantly higher than Fiji, and the answer is no. AMD has actually taken significant pains to keep the throughput latency of a CU at 4 cycles (4 stages deep), however strictly speaking, existing code isn’t going to run any faster on Vega than earlier architectures. In order to wring the most out of Vega’s new CUs, you need to take advantage of the new compute features. Note that this doesn’t mean that compilers can’t take advantage of them on their own, but especially with the datatype matters, it’s important that code be designed for lower precision datatypes to begin with.

Vega 10: Fiji of the Stars Rapid Packed Math: Fast FP16 Comes to Consumer Cards
POST A COMMENT

215 Comments

View All Comments

  • Kratos86 - Monday, August 14, 2017 - link

    That is at 200 Watts, not $35. Anandtech, reporting on the world of tomorrow, without an edit button. Reply
  • mapesdhs - Monday, August 14, 2017 - link

    If/when AT finally does revamp the forums to enable editing, it's going to be a bigger forum headline splash than Threadripper. :D I'd post with typos just so I could delight at being able to edit it ten seconds later. 8) Reply
  • AndrewJacksonZA - Tuesday, August 15, 2017 - link

    @Kratos86: You made me chuckle. :-) Reply
  • Lolimaster - Monday, August 14, 2017 - link

    Proof of stake is almost here, Ethereum is basically done unless you get the gpu's for free, else you don't have much more than 4 months for ROI. Reply
  • Notmyusualid - Monday, August 14, 2017 - link

    Thats right. Ethereum was the only crypto-coin out there.

    Fool.
    Reply
  • Ryan Smith - Monday, August 14, 2017 - link

    As a heads up, this article is very much not done, as Nate and I had to rush to cover everything in 3 days. The performance data is up there, along with bits and pieces on the architecture.

    I have probably another 5000 words on the architecture left to draft and revise, and I hope to get that added in the next couple of days.

    In the meantime I apologize for the state of things, and we're continuing to work on the article to wrap things up.
    Reply
  • FireSnake - Monday, August 14, 2017 - link

    Take your time ... we will wait :) Reply
  • rtho782 - Monday, August 14, 2017 - link

    Ha, understandable, and I'd much rather have this than nothing :) Your unfinished reviews are generally more indepth than most places complete reviews.

    Hows the GTX960 review coming tho? :P
    Reply
  • ddriver - Monday, August 14, 2017 - link

    No longer doing folding at double precision? Reply
  • Ryan Smith - Monday, August 14, 2017 - link

    Since no one has shipped a consumer GPU with FP64 performance better than 1/16 in a few years now, there's not much of a need for a FP64 benchmark. Reply

Log in

Don't have an account? Sign up now