A Quick Refresher, Cont

Having established what’s bad about VLIW as a compute architecture, let’s discuss what makes a good compute architecture. The most fundamental aspect of compute is that developers want stable and predictable performance, something that VLIW didn’t lend itself to because it was dependency limited. Architectures that can’t work around dependencies will see their performance vary due to those dependencies. Consequently, if you want an architecture with stable performance that’s going to be good for compute workloads then you want an architecture that isn’t impacted by dependencies.

Ultimately dependencies and ILP go hand-in-hand. If you can extract ILP from a workload, then your architecture is by definition bursty. An architecture that can’t extract ILP may not be able to achieve the same level of peak performance, but it will not burst and hence it will be more consistent. This is the guiding principle behind NVIDIA’s Fermi architecture; GF100/GF110 have no ability to extract ILP, and developers love it for that reason.

So with those design goals in mind, let’s talk GCN.

VLIW is a traditional and well proven design for parallel processing. But it is not the only traditional and well proven design for parallel processing. For GCN AMD will be replacing VLIW with what’s fundamentally a Single Instruction Multiple Data (SIMD) vector architecture (note: technically VLIW is a subset of SIMD, but for the purposes of this refresher we’re considering them to be different).


A Single GCN SIMD

At the most fundamental level AMD is still using simple ALUs, just like Cayman before it. In GCN these ALUs are organized into a single SIMD unit, the smallest unit of work for GCN. A SIMD is composed of 16 of these ALUs, along with a 64KB register file for the SIMDs to keep data in.

Above the individual SIMD we have a Compute Unit, the smallest fully independent functional unit. A CU is composed of 4 SIMD units, a hardware scheduler, a branch unit, L1 cache, a local date share, 4 texture units (each with 4 texture fetch load/store units), and a special scalar unit. The scalar unit is responsible for all of the arithmetic operations the simple ALUs can’t do or won’t do efficiently, such as conditional statements (if/then) and transcendental operations.

Because the smallest unit of work is the SIMD and a CU has 4 SIMDs, a CU works on 4 different wavefronts at once. As wavefronts are still 64 operations wide, each cycle a SIMD will complete ¼ of the operations on their respective wavefront, and after 4 cycles the current instruction for the active wavefront is completed.

Cayman by comparison would attempt to execute multiple instructions from the same wavefront in parallel, rather than executing a single instruction from multiple wavefronts. This is where Cayman got bursty – if the instructions were in any way dependent, Cayman would have to let some of its ALUs go idle. GCN on the other hand does not face this issue, because each SIMD handles single instructions from different wavefronts they are in no way attempting to take advantage of ILP, and their performance will be very consistent.


Wavefront Execution Example: SIMD vs. VLIW. Not To Scale - Wavefront Size 16

There are other aspects of GCN that influence its performance – the scalar unit plays a huge part – but in comparison to Cayman, this is the single biggest difference. By not taking advantage of ILP, but instead taking advantage of Thread Level Parallism (TLP) in the form of executing more wavefronts at once, GCN will be able to deliver high compute performance and to do so consistently.

Bringing this all together, to make a complete GPU a number of these GCN CUs will be combined with the rest of the parts we’re accustomed to seeing on a GPU. A frontend is responsible for feeding the GPU, as it contains both the command processors (ACEs) responsible for feeding the CUs and the geometry engines responsible for geometry setup. Meanwhile coming after the CUs will be the ROPs that handle the actual render operations, the L2 cache, the memory controllers, and the various fixed function controllers such as the display controllers, PCIe bus controllers, Universal Video Decoder, and Video Codec Engine.

At the end of the day if AMD has done their homework GCN should significantly improve compute performance relative to VLIW4 while gaming performance should be just as good. Gaming shader operations will execute across the CUs in a much different manner than they did across VLIW, but they should do so at a similar speed. And for games that use compute shaders, they should directly benefit from the compute improvements. It’s by building out a GPU in this manner that AMD can make an architecture that’s significantly better at compute without sacrificing gaming performance, and this is why the resulting GCN architecture is balanced for both compute and graphics.

A Quick Refresher: Graphics Core Next Building Tahiti & the Southern Islands
Comments Locked

292 Comments

View All Comments

  • chiddy - Thursday, December 22, 2011 - link

    Ryan,

    Thanks for the great review. My only gripe - and I've been noticing this for a while - is the complete non-mention of drivers or driver releases for Linux/Unix and/or their problems.

    For example, Catalyst drivers exhibit graphical corruption when using the latest version (Version 3) of Gnome Desktop Environment since its release before April. This is a major bug which required most users of AMD/ATI GPUs to either switch desktop environments, switch to Nvidia or Intel GPUs, or use the open source drivers which lack many features. A partial fix appeared in Catalyst 11.9 making Gnome3 usable but there are still elements of screen corruption on occassion. (Details in the "non-official" AMD run bugzilla http://ati.cchtml.com/show_bug.cgi?id=99 ).

    AMD have numerous other issues with Linux Catalyst drivers including buggy openGL implementation, etc.

    Essentially, as a hardware review, a quick once over with non-Microsoft OSs would help alot, especially for products which are marketed as supporting such platforms.

    Regards,
  • kyuu - Thursday, December 22, 2011 - link

    Why in the heck would they mention Linux drivers and their issues in an article covering the (paper) release and preliminary benchmarking of AMD's new graphics cards? It has nada to do with the subject at hand.

    Besides, hardly anyone cares, and those that do care already know.
  • chiddy - Thursday, December 22, 2011 - link

    And I guess that AMD GPUs are sold as "Windows Only"?

    Thanks for your informative insight.
  • MrSpadge - Thursday, December 22, 2011 - link

    There are no games for *nix and everything always depends on your distribution. The problems are so diverse and numerous.. it would take an entire article to briefly touch this field.
    Exagerating, but I really wouldn't be interested in endless *nix troubleshooting. Hell, I can't even get nVidia 2D acceleration in CentOS..
  • chiddy - Thursday, December 22, 2011 - link

    You have a valid point on that front and I agree, nor would I expect such an article any time soon.

    However, on the other hand, one would at the very least expect a GPU using manufacturer released drivers to load a usable desktop. This is an issue that was distro agnostic and instantly noticeable, and only affected AMD hardware, as do most *nix GPU driver issues!

    If all that was done during a new GPU review was fire it up in any *nix distribution of choice for just a few minutes (even Ubuntu as I think its the most popular at the moment) to ensure that the basics work it would still be a great help.

    I will have to accept though that there is precious little interest!
  • Ryan Smith - Thursday, December 22, 2011 - link

    Hi Chiddy;

    It's a fair request, so I'll give you a fair answer.

    The fact of the matter is that Linux drivers are not a top priority for either NVIDIA or AMD. Neither party makes Linux drivers available for our launch reviews, so I wouldn't be able to test new cards at launch. Not to speak for either company, but how many users are shelling out $550 to run Linux? Cards like the 7970 have a very specifically defined role: Windows gaming video card, and their actions reflect this.

    At best we'd be able to look at these issues at some point after the launch when AMD or NVIDIA have added support for the new product to their respective Linux drivers. But that far after the product's launch and for such a small category of users (there just aren't many desktop Linux users these days), I'm not sure it would be worth the effort on our part.
  • chiddy - Friday, December 23, 2011 - link

    Hi Ryan,

    Thanks very much for taking the time to respond. I fully appreciate your position, particularly as the posts above very much corroborate the lack of interest!

    Thanks again for the response, I very much appreciate the hard work yourself and the rest of the AT team are doing, and its quality speaks for itself in the steady increase in readers over the years.

    If you do however ever find the time to do a brief piece on *nix GPU support after launch of the next generation nVidia and AMD GPUs that would be wonderful - and even though one would definately not buy a top level GPU for *nix, it would very much help those of us who are dual booting (in my case Windows for gaming / Scientific Linux for work), and somewhat remove the guessing game during purchase time. If not though I fully understand :-).

    Regards,
    Ali
  • CeriseCogburn - Thursday, March 8, 2012 - link

    Nvidia consistenly wins over and over again in this area, so it's "of no interest", like PhysX...
  • AmdInside - Thursday, December 22, 2011 - link

    I won't be getting much sleep tonight since that article took a long time to read (can't imagine how long it must have taken to write up). Great article as usual. While it has some very nice features, all in all, it doesn't make me regret my purchase of a Geforce GTX 580 a couple of months ago. Especially since I mainly picked it up for Battlefield 3.
  • ET - Thursday, December 22, 2011 - link

    The Cayman GPU's got quite a performance boost from drivers over time, gaining on NVIDIA's GPU since their launce. The difference in architecture between the 79x0 and 69x0 is higher than the 69x0 and 58x0, so I'm sure there's quite a bit of room for performance improvement in games.

    Have to say though that I really hope AMD stops increasing the card size each generation.

Log in

Don't have an account? Sign up now