AMD's Graphics Core Next Preview: AMD's New GPU, Architected For Compute

Name: AMD's Graphics Core Next Preview: AMD's New GPU, Architected For Compute
Item: AMD's Graphics Core Next Preview: AMD's New GPU, Architected For Compute
Author: Ryan Smith

by Ryan Smith on December 21, 2011 9:38 PM EST

Posted in
GPUs
AMD
Radeon
GCN

83 Comments | Add A Comment

83 Comments

And Many Compute Units Make A GPU

While the compute unit is the fundamental unit of computation, it is not a GPU on its own. As with SIMDs in Cayman it’s a configurable building block for making a larger GPU, with a GPU implementing a suitable number of CUs in multiples of 4. Like past GPUs this will be the primary way to scale the GPU to the desired die size, but of course this isn’t the only element of the design that scales.

With a suitable number of CUs in hand, it’s time to attach the rest of units that make up a GPU. As this is a high-level overview on the part of AMD they haven’t gone into great deal on what each unit does and how it does it, but as the first GCN product gets closer to launching the picture will take on a more complete form.

Starting with memory and cache, GCN will once more pair its L2 cache with its memory controllers. The architecture supports 64KB or 128KB of L2 cache per memory controller, and given that AMD’s memory controllers are typically 64bits each, this means a Cayman-like design would likely have 512KB of L2 cache. The L2 cache is write-back, and will be fully coherent so that all CUs will see the same data, saving expensive trips to VRAM for synchronization. CPU/GPU synchronization will also be handled at the L2 cache level, where it will be important to maintain coherency between the two in order to efficiently split up a task between the CPU and GPU. For APUs there is a dedicated high-speed bus between the two, while discrete GPUs will rely on PCIe’s coherency protocols to keep the CPU and dGPU in sync.

Meanwhile on the compute side, AMD’s new Asynchronous Compute Engines serve as the command processors for compute operations on GCN. The principal purpose of ACEs will be to accept work and to dispatch it off to the CUs for processing. As GCN is designed to concurrently work on several tasks, there can be multiple ACEs on a GPU, with the ACEs deciding on resource allocation, context switching, and task priority. AMD has not established an immediate relationship between ACEs and the number of tasks that can be worked on concurrently, so we’re not sure whether there’s a fixed 1:X relationship or whether it’s simply more efficient for the purposes of working on many tasks in parallel to have more ACEs.

One effect of having the ACEs is that GCN has a limited ability to execute tasks out of order. As we mentioned previously GCN is an in-order architecture, and the instruction stream on a wavefront cannot be reodered. However the ACEs can prioritize and reprioritize tasks, allowing tasks to be completed in a different order than they’re received. This allows GCN to free up the resources those tasks were using as early as possible rather than having the task consuming resources for an extended period of time in a nearly-finished state. This is not significantly different from how modern in-order CPUs (Atom, ARM A8, etc) handle multi-tasking.

On the other side of the coin we have the graphics hardware. As with Cayman a graphics command processor sits at the top of the stack and is responsible for farming out work to the various components of the graphics subsystem. Below that Cayman’s dual graphics engines have been replaced with multiple primitive pipelines, which will serve the same general purpose of geometry and fixed-function processing. Primative pipelines will be responsible for tessellation, geometry, and high-order surface processing among other things. Whereas Cayman was limited to 2 such units, GCN will be fully scalable, so AMD will be able to handle incredibly large amounts of geometry if necessary.

After a trip through the CUs, graphics work then hits the pixel pipelines, which are home to the ROPs. As it’s customary to have a number of ROPs, there will be a scalable number of pixel pipelines in GCN; we expect this will be closely coupled with the number of memory controllers to maintain the tight ROP/L2/Memory integration that’s so critical for high ROP performance.

Unfortunately, those of you expecting any additional graphics information will have to sit tight for the time being. As was the case with NVIDIA’s early reveal of Fermi in 2009, AFDS is a development show, and GCN’s early reveal is about the compute capabilities rather than the graphics capabilities. AMD needs to prime developers for GCN now, so that when GCN appears in an APU developers are ready for it. We’ll find out more about the capabilities of the ROPs, the primitive pipelines, the texture mapping units, the display controllers and other dedicated hardware blocks farther down the line.

In the meantime AMD did throw out one graphics tidbit: partially resident textures (PRT). PRTs allow for only part of a texture to actually be loaded in memory, allowing developers to use large textures without taking the performance hit of loading the entire texture into memory if parts of it are going unused. John Carmack already does something very similar in software with his MegaTexture technology, which is used in the id Tech 4 and id Tech 5 engines. This is essentially a hardware implementation of that technology.

Many SIMDs Make One Compute Unit Not Just A New Architecture, But New Features Too

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

83 Comments

View All Comments

ClagMaster - Tuesday, June 21, 2011 - link
What is being describe is tantamont Vector Processing that was featured on CRAY supercomputers available in the 70's through 90's. In the machines I once programmed (using CFT77 compiler), a vector was 64 64-bit words that was processed through a pipe.
789427 - Thursday, June 23, 2011 - link
Is it just me, or will we be seeing AMD refresh cycles quadruple for their processors because of on-die graphics?

I sense a prefix/suffix CPU/GPU diversification happening soon - and a bit of confusion with maybe some sideport memory enabled chips coming our way.

2/4/8 cores with
6550, 6750, 6850 level graphics and
512Mb/1Gb sideport
all for $100-$200 and crossfire capable?
Drool now?
cb
Kakkoii - Sunday, August 21, 2011 - link
This pleases me, because this will likely mean that AMD no longer has such a performance per dollar and watt difference from Nvidia. Thus further degrading most arguments AMD fanboys have against Nvidia. I see this being a benefit for Nvidia in the long term. After AMD claiming what Nvidia was doing wasn't right, they basically give up and are doing it themselves now too.
Cyber.Angel - Saturday, October 15, 2011 - link
exactly what I was thinking
AMD/ATI is catching up - in the HPC sector
otherwise they are still a better buy in the consumer market
and in 2012 also in HPC
Nvidia uses too much power

too bad if even Trinity is not using this new GPU design...
Wreckage - Wednesday, December 21, 2011 - link
I'm guessing we won't see product until sometime next year.
tzhu07 - Wednesday, December 21, 2011 - link
Looking forward to buying a 7970 (or possibly a 7950) to go along with my Sandy Bridge build. I'm currently running on Intel HD3000 and it's killing me. But just a few more days now. Hopefully I can hit the refresh button on my browser fast enough to catch one before they sell out.
OwnedKThxBye - Thursday, December 22, 2011 - link
Typo on the last page. At no point has AMD specified when a GPU will appear using GCN will appear, so it’s very much a guessing game.
R3MF - Thursday, December 22, 2011 - link
"We expect AMD to take a page from NVIDIA here and configure lower-end consumer parts to use the slower rates since FP64 is not currently important for consumer uses."

Will AMD be likewise crippling the FP64 support native to the chip, in products that have the resident features, if they are sold in a consumer SKU rather than a more expensive professional SKU?

I refer to nvidia's practice of crippling access to FP64 functionality in Geforce 580 cards that is otherwise available in Tesla 580 products.
zarck - Thursday, December 22, 2011 - link
For the GPGPU GRID, a test with Radeon 7970 and Folding@Home it's possible ?

https://fah-web.stanford.edu/projects/FAHClient/wi...
morricone - Thursday, December 22, 2011 - link
I'm a developer myself and you have to look really hard to find an article as good as this. Keep this stuff up!

AMD's Graphics Core Next Preview: AMD's New GPU, Architected For Compute

Post Your Comment

83 Comments

View All Comments

ClagMaster - Tuesday, June 21, 2011 - link

789427 - Thursday, June 23, 2011 - link

Kakkoii - Sunday, August 21, 2011 - link

Cyber.Angel - Saturday, October 15, 2011 - link

Wreckage - Wednesday, December 21, 2011 - link

tzhu07 - Wednesday, December 21, 2011 - link

OwnedKThxBye - Thursday, December 22, 2011 - link

R3MF - Thursday, December 22, 2011 - link

zarck - Thursday, December 22, 2011 - link

morricone - Thursday, December 22, 2011 - link

Log in

Don't have an account? Sign up now