AMD's Graphics Core Next Preview: AMD's New GPU, Architected For Compute

Name: AMD's Graphics Core Next Preview: AMD's New GPU, Architected For Compute
Item: AMD's Graphics Core Next Preview: AMD's New GPU, Architected For Compute
Author: Ryan Smith

by Ryan Smith on December 21, 2011 9:38 PM EST

Posted in
GPUs
AMD
Radeon
GCN

83 Comments | Add A Comment

83 Comments

And Many Compute Units Make A GPU

While the compute unit is the fundamental unit of computation, it is not a GPU on its own. As with SIMDs in Cayman it’s a configurable building block for making a larger GPU, with a GPU implementing a suitable number of CUs in multiples of 4. Like past GPUs this will be the primary way to scale the GPU to the desired die size, but of course this isn’t the only element of the design that scales.

With a suitable number of CUs in hand, it’s time to attach the rest of units that make up a GPU. As this is a high-level overview on the part of AMD they haven’t gone into great deal on what each unit does and how it does it, but as the first GCN product gets closer to launching the picture will take on a more complete form.

Starting with memory and cache, GCN will once more pair its L2 cache with its memory controllers. The architecture supports 64KB or 128KB of L2 cache per memory controller, and given that AMD’s memory controllers are typically 64bits each, this means a Cayman-like design would likely have 512KB of L2 cache. The L2 cache is write-back, and will be fully coherent so that all CUs will see the same data, saving expensive trips to VRAM for synchronization. CPU/GPU synchronization will also be handled at the L2 cache level, where it will be important to maintain coherency between the two in order to efficiently split up a task between the CPU and GPU. For APUs there is a dedicated high-speed bus between the two, while discrete GPUs will rely on PCIe’s coherency protocols to keep the CPU and dGPU in sync.

Meanwhile on the compute side, AMD’s new Asynchronous Compute Engines serve as the command processors for compute operations on GCN. The principal purpose of ACEs will be to accept work and to dispatch it off to the CUs for processing. As GCN is designed to concurrently work on several tasks, there can be multiple ACEs on a GPU, with the ACEs deciding on resource allocation, context switching, and task priority. AMD has not established an immediate relationship between ACEs and the number of tasks that can be worked on concurrently, so we’re not sure whether there’s a fixed 1:X relationship or whether it’s simply more efficient for the purposes of working on many tasks in parallel to have more ACEs.

One effect of having the ACEs is that GCN has a limited ability to execute tasks out of order. As we mentioned previously GCN is an in-order architecture, and the instruction stream on a wavefront cannot be reodered. However the ACEs can prioritize and reprioritize tasks, allowing tasks to be completed in a different order than they’re received. This allows GCN to free up the resources those tasks were using as early as possible rather than having the task consuming resources for an extended period of time in a nearly-finished state. This is not significantly different from how modern in-order CPUs (Atom, ARM A8, etc) handle multi-tasking.

On the other side of the coin we have the graphics hardware. As with Cayman a graphics command processor sits at the top of the stack and is responsible for farming out work to the various components of the graphics subsystem. Below that Cayman’s dual graphics engines have been replaced with multiple primitive pipelines, which will serve the same general purpose of geometry and fixed-function processing. Primative pipelines will be responsible for tessellation, geometry, and high-order surface processing among other things. Whereas Cayman was limited to 2 such units, GCN will be fully scalable, so AMD will be able to handle incredibly large amounts of geometry if necessary.

After a trip through the CUs, graphics work then hits the pixel pipelines, which are home to the ROPs. As it’s customary to have a number of ROPs, there will be a scalable number of pixel pipelines in GCN; we expect this will be closely coupled with the number of memory controllers to maintain the tight ROP/L2/Memory integration that’s so critical for high ROP performance.

Unfortunately, those of you expecting any additional graphics information will have to sit tight for the time being. As was the case with NVIDIA’s early reveal of Fermi in 2009, AFDS is a development show, and GCN’s early reveal is about the compute capabilities rather than the graphics capabilities. AMD needs to prime developers for GCN now, so that when GCN appears in an APU developers are ready for it. We’ll find out more about the capabilities of the ROPs, the primitive pipelines, the texture mapping units, the display controllers and other dedicated hardware blocks farther down the line.

In the meantime AMD did throw out one graphics tidbit: partially resident textures (PRT). PRTs allow for only part of a texture to actually be loaded in memory, allowing developers to use large textures without taking the performance hit of loading the entire texture into memory if parts of it are going unused. John Carmack already does something very similar in software with his MegaTexture technology, which is used in the id Tech 4 and id Tech 5 engines. This is essentially a hardware implementation of that technology.

Many SIMDs Make One Compute Unit Not Just A New Architecture, But New Features Too

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

83 Comments

View All Comments

Targon - Saturday, June 18, 2011 - link
With Windows 7 having a 80 percent(or higher at this point) install base being 64 bit, it will take until late 2013 before we see the majority of the old 32 bit install base being phased out in the home computer market(as people replace their computers at the four-five year mark). Until then, application developers have to expect that they MUST support both 32 and 64 bit platforms. Lowest common denominator for your user base is what developers generally have to compile for.
DanNeely - Saturday, June 18, 2011 - link
I assume you're using the steam hardware survey since they're showing 4:1. Unfortunately steam's not a good source for broad market stats since it excludes the low end boxes bought by non-gamers and corporate boxes. Surveys that capture these numbers only show a 2:1ish ratio for win7 64:32.

Beyond that, it's the people with the low end 32bit boxes that will keep their old clunkers the longest. You're also underestimating how long support for legacy OSes will continue despite their very small market shares. Firefox 4 still runs on win2k, despite it's market share having been negligible for several years and being officially out of support for almost a year.

Excepting apps that actually can benefit from going 64bit I expect most to stay 32bit for at least the next 5 years.
swaaye - Saturday, June 18, 2011 - link
Indeed. In the non-gamer realm, I know of people happy with 2003 Pentium 4s and Athlon XPs yet. I have no doubt that there are many people with even older hardware. This stuff tends to stick around until the PCs die and the owner is told it's not worth the money to upgrade. Fear of change and the simple lack of a true need to upgrade is the reason.
swaaye - Saturday, June 18, 2011 - link
Oops. I meant that the owner is told it's not worth the money to fix the dead old hardware. But they do also tend to ask about upgrading their ancient box too.
Randomblame - Saturday, June 18, 2011 - link
I was at office max the other day and a guy was screaming at a sales rep because they didn't carry any serial mice that supported his rig. I don't mean ps2 either. He was carrying around a busted up brown serial mouse. He said his rig came with windows 95 but last year he upgraded it to windows 98. Seriously. This is the world we live in.
EJ257 - Saturday, June 18, 2011 - link
I still have my Compaq (that came with Win95 which I upgraded to win98) running on a Pentium 133 with 32MB of EDO RAM and a 2.1GB HDD. Its sitting ilde in my basement collecting dust at the moment. :D
Operandi - Sunday, June 19, 2011 - link
But Steam is good representation of those who could benefit from and will ultimately will be using these future technologies, professionals and enthusiasts. Such is always the way of high-end computing.
softdrinkviking - Monday, June 20, 2011 - link
exactly. people still running XP are probably not the target market for developers because if they are so slow on the uptake of new technology, it would follow that they are also relatively uninterested in other new programs.
Targon - Sunday, June 19, 2011 - link
Nope, I am going on what my customers have and are upgrading to. If you BUY a machine with Windows 7 on it, 9 out of 10 have Windows 7 64 bit on them. Those that have 32 bit are either the very low-end machines with only 1GB of RAM(yes, they still sell those), or they are the result of doing an upgrade from Windows Vista 32 bit.

That is the thing about 64 bit, people don't "go to 64 bit" at this point, they get a new computer that comes with 64 bit Windows on it. The number of people who do an upgrade on an older machine has dropped, since those who would have done the upgrade did that back in 2009 and early 2010 when Windows 7 first came out.

Now, the real benefit to 64 bit isn't as much about the software as it is about how much RAM the machine comes with. If you get a machine with 4GB of RAM, you want 64 bit, just so you don't lose memory due to the 4GB limit on 32 bit Windows, and hardware mapping below the 4GB mark.

A part of this is also about the area you live in, and how much money there is going around. I live in an area where it is the norm to pay over $8 per person for lunch at a deli, and as a result, the value of the dollar isn't as high. Spending $20/day just on lunch and minor expenses is the norm, so with that in mind, replacing a computer every 4-5 years, even for the non-technical is NORMAL. The last time I encountered Windows 95 or 98 was around 6 years ago.
UrQuan3 - Thursday, June 23, 2011 - link
There is a little more benefit. A few of us were doing an internal benchmark of our software using VStudio 2010 and all the random hardware we have around. 32bit, 32bit + SSE2, and 64bit + SSE2. We found across the board, 64bit is about 5-10% than 32bit + SSE2 and 5-20% faster than basic x86.

However, a 64bit OS gave no benefit (or penalty) for a 32bit program. The same 32bit software ran the same speed on XP32, XP64, Vista32, Vista64, and 7-64.

AMD's Graphics Core Next Preview: AMD's New GPU, Architected For Compute

Post Your Comment

83 Comments

View All Comments

Targon - Saturday, June 18, 2011 - link

DanNeely - Saturday, June 18, 2011 - link

swaaye - Saturday, June 18, 2011 - link

swaaye - Saturday, June 18, 2011 - link

Randomblame - Saturday, June 18, 2011 - link

EJ257 - Saturday, June 18, 2011 - link

Operandi - Sunday, June 19, 2011 - link

softdrinkviking - Monday, June 20, 2011 - link

Targon - Sunday, June 19, 2011 - link

UrQuan3 - Thursday, June 23, 2011 - link

Log in

Don't have an account? Sign up now