The Vega Architecture: AMD’s Brightest Day

From an architectural standpoint, AMD’s engineers consider the Vega architecture to be their most sweeping architectural change in five years. And looking over everything that has been added to the architecture, it’s easy to see why. In terms of core graphics/compute features, Vega introduces more than any other iteration of GCN before it.

Speaking of GCN, before getting too deep here, it’s interesting to note that at least publicly, AMD is shying away from the Graphics Core Next name. GCN doesn’t appear anywhere in AMD’s whitepaper, while in programmers’ documents such as the shader ISA, the name is still present. But at least for the purposes of public discussion, rather than using the term GCN 5, AMD is consistently calling it the Vega architecture. Though make no mistake, this is still very much GCN, so AMD’s basic GPU execution model remains.

So what does Vega bring to the table? Back in January we got what has turned out to be a fairly extensive high-level overview of Vega’s main architectural improvements. In a nutshell, Vega is:

  • Higher clocks
  • Double rate FP16 math (Rapid Packed Math)
  • HBM2
  • New memory page management for the high-bandwidth cache controller
  • Tiled rasterization (Draw Stream Binning Rasterizer)
  • Increased ROP efficiency via L2 cache
  • Improved geometry engine
  • Primitive shading for even faster triangle culling
  • Direct3D feature level 12_1 graphics features
  • Improved display controllers

The interesting thing is that even with this significant number of changes, the Vega ISA is not a complete departure from the GCN4 ISA. AMD has added a number of new instructions – mostly for FP16 operations – along with some additional instructions that they expect to improve performance for video processing and some 8-bit integer operations, but nothing that radically upends Vega from earlier ISAs. So in terms of compute, Vega is still very comparable to Polaris and Fiji in terms of how data moves through the GPU.

Consequently, the burning question I think many will ask is if the effective compute IPC is significantly higher than Fiji, and the answer is no. AMD has actually taken significant pains to keep the throughput latency of a CU at 4 cycles (4 stages deep), however strictly speaking, existing code isn’t going to run any faster on Vega than earlier architectures. In order to wring the most out of Vega’s new CUs, you need to take advantage of the new compute features. Note that this doesn’t mean that compilers can’t take advantage of them on their own, but especially with the datatype matters, it’s important that code be designed for lower precision datatypes to begin with.

Vega 10: Fiji of the Stars Rapid Packed Math: Fast FP16 Comes to Consumer Cards
Comments Locked


View All Comments

  • Ryan Smith - Tuesday, August 15, 2017 - link

    3 CUs per array is a maximum, not a fixed amount. Each Hawaii shader engine had a 4/4/3 configuration, for example.

    So in the case of Vega 10, it should be a 3/3/3/3/2/2 configuration.
  • watzupken - Tuesday, August 15, 2017 - link

    I think the performance is in line with recent rumors and my expectation. The fact that AMD beats around the bush to release Vega was a tell tale sign. Unlike Ryzen where they are marketing how well it runs in the likes of Cinebench and beating the gong and such, AMD revealed nothing on benchmarks throughout the year for Vega just like they did when they first released Polaris.
    The hardware no doubt is forward looking, but where it needs to matter most, I feel AMD may have fallen short. It seems like the way around is probably to design a new GPU from scratch.
  • Yojimbo - Wednesday, August 16, 2017 - link

    "It seems like the way around is probably to design a new GPU from scratch. "

    Well, perhaps, but I do think with more money they could be doing better with what they've got. They made the decision to focus on reviving their CPU business with their resources, however.

    They probably have been laying the groundwork for an entirely new architecture for some time, though. My belief is that APUs were of primary concern when originally designing GCN. They were hoping to enable heterogeneous computing, but it didn't work out. If that strategy did tie them down somewhat, their next gen architecture should free them from those tethers.
  • Glock24 - Tuesday, August 15, 2017 - link

    Nice review, I'll say the outcome was expected given the Vega FE reviews.

    Other reviews state that the Vega 64 has a switch that sets the power limts, and you have "power saving", "normal" and "turbo" modes. From what I've read the difference between the lowest and highest power limit is as high as 100W for about 8% more performance.

    It seems AMD did not reach the expected performance levels so they just boosted the clocks and voltage. Vega is like Skylake-X in that sense :P

    As others have mentioned, it would be great to have a comparison of Vega using Ryzen CPUs vs. Intel's CPUs.
  • Vertexgaming - Wednesday, August 16, 2017 - link

    It sucks so much that price drops on GPUs aren't a thing anymore because of miners. I have been upgrading my GPU every year and getting an awesome deal on the newest generation GPU, but now the situation has changed so much, that I will have to skip a generation to justify a $600-$800 (higher than MSRP) price tag for a new graphics card. :-(
  • prateekprakash - Wednesday, August 16, 2017 - link

    In my opinion, it would have been great if Vega 64 had a 16gb vram version at 100$ more... That would be 599$ apiece for the air cooled version... That would future proof it to run future 4k games (CF would benefit too)...

    It's too bad we still don't have 16gb consumer gaming cards, the Vega pro being not strictly for gamers...
  • Dosi - Wednesday, August 16, 2017 - link

    So the system does consumes 91W more with Vega 64, cant imagine with the LC V64... it can be 140W more? Actually what you saved on the GPU (V64 instead 1080) you already spent on electricity bill...
  • versesuvius - Wednesday, August 16, 2017 - link

    NVIDIA obviously knows how to break down the GPU tasks into chunks and processing those chunks and sending them out the door better than AMD. And more ROPs can certainly help AMD cards a lot.
  • peevee - Thursday, August 17, 2017 - link

    "as electrons can only move so far on a single (ever shortening) clock cycle"

    Seriously? Electrons? You think that how far electrons move matters? Sheesh.
  • FourEyedGeek - Tuesday, August 22, 2017 - link

    You being serious or sarcastic? If serious then you are ignorant.

Log in

Don't have an account? Sign up now