We’ve just returned from sunny Bellevue, Washington, where AMD held their first Fusion Developer Summit (AFDS). As with other technical conferences of this nature such as NVIDIA’s GTC and Intel’s IDF, AFDS is a chance for AMD to reach out to developers to prepare them for future products and to receive feedback in turn. While AMD can make powerful hardware it’s ultimately the software that runs on it that drives sales, so it’s important for them to reach out to developers to ensure that such software is being made.

AFDS 2011 served as a focal point for several different things going on at AMD. At its broadest, it was a launch event for Llano, AMD’s first mainstream Fusion APU that launched at the start of the week. AMD has invested the future of the company into APUs, and not just for graphical purposes but for compute purposes too. So Llano is a big deal for the company even though it’s only a taste of what’s to come.

The second purpose of course was to provide sessions for developers to learn more about how to utilize AMD’s GPUs for compute and graphics tasks. Microsoft, Acceleware, Adobe, academic researchers, and others were on hand to provide talks on how they’re using GPUs in current and future projects.

The final purpose – and what is going to be most interesting to most outside observers – was to prepare developers for what’s coming down the pipe. AMD has big plans for the future and it’s important to get developers involved as soon as is reasonably possible so that they’re ready to use AMD’s future technologies when they launch. Over the next few days we’ll talk about a couple of different things AMD is working on, and today we’ll start with the first and most exciting project: AMD Graphics Core Next.

Graphics Core Next (GCN) is the architectural basis for AMD’s future GPUs, both for discrete products and for GPUs integrated with CPUs as part of AMD’s APU products. AMD will be instituting a major overhaul of its traditional GPU architecture for future generation products in order to meet the direction of the market and where they want to go with their GPUs in the future.

While graphics performance and features have been and will continue to be important aspects of a GPU’s design, AMD and the rest of the market have been moving towards further exploiting the compute capabilities of GPUs, which in the right circumstances are capable of being utilized as massive parallel processors that can complete a number of tasks in the fraction of the time as a highly generalized CPU. Since the introduction of shader-capable GPUs in 2002, GPUs have slowly evolved to become more generalized so that their resources can be used for more than just graphics. AMD’s most recent shift was with their VLIW4 architecture with Cayman late last year; now they’re looking to make their biggest leap yet with GCN.

GCN at its core is the basis of a GPU that performs well at both graphical and computing tasks. AMD has stretched their traditional VLIW architecture as far as they reasonably can for computing purposes, and as more developers get on board for GPU computing a clean break is needed in order to build a better performing GPU to meet their needs. This is in essence AMD’s Fermi: a new architecture and a radical overhaul to make a GPU that is as monstrous at computing as it is at graphics. And this is the story of the architecture that AMD will be building to make it happen.

Finally, it should be noted that the theme of AFDS 2011 was heterogeneous computing, as it has become AMD’s focus to get developers to develop heterogeneous applications that effectively utilize both AMD’s CPUs and AMD’s GPUs. Ostensibly AFDS is a conference about GPU computing, but AMD’s true strength is not their CPU side or their GPU side, it’s the combination of the two. Bulldozer will be the first half of AMD’s future APUs, while GCN will be the other half.

Prelude: The History of VLIW & Graphics
Comments Locked

83 Comments

View All Comments

  • StormyParis - Friday, June 17, 2011 - link

    Thank you for a very enlightening write up. Comments and questions:

    1- please add a comma in there somewhere. I had to read the sentence 4 times to understand it (page 1=: "VLIW designs will never achieve perfect efficiency in this regard, but the farther off real world utilization is the weaker the benefits of VLIW."

    2- When, if ever, will we vile users see any benefits ? I get the feeling that most apps are still not optimized well, if at all, for multicore/threading. Come to think of it, most don't even use most of the x86 extensions more recent than SSE2. Now we're talking of yet another x86 extension, that is not only AMD-specific, but very task-specific. Apart from a handful of labs doing GPU computing, and the usual Photoshop filters... i'm doubtful ?
  • MonkeyPaw - Friday, June 17, 2011 - link

    I'm not an expert in this sort of design, but is AMD setting up this architecture to replace the x86 ALU? Bulldozer is already running 2 ALUs for every 1 FPU, which is promoting ALU-heavy software design. It may take a few revisions to meld them (or phase one out), but it certainly seems like that's a heterogeneous CPU in the end.
  • marc1000 - Friday, June 17, 2011 - link

    there is a slide (on Llano article, I believe) where AMD points this. yes, they want to completely merge them, and the ALU would be one of this mergind points.
  • A5 - Friday, June 17, 2011 - link

    I think it'll be quite awhile before the monolithic cores dissolve into the heterogeneous architectures, mostly depending on how fine-grained the power gating can get. When it gets to the point where the CPU can selectively turn off components inside a given SIMD unit, I think we'll see someone go "Wait a minute, then why do we even have this big core anymore?" and it'll go away. 2018ish, maybe?
  • jamescox - Monday, June 20, 2011 - link


    ALU is generally used to refer to a very simple unit that performs arithmetic, logic, and possibly bit shift operations on integers, not floating point values. The units labeled ALU in the GPU diagrams in the article may support some integer operations, but they mainly process 32-bit floating point values, and (IMO) should not be labeled as "ALUs". FPU would probably be more accurate, but I do not know what operations these units support and whether they include a native integer ALU or just convert to integers to FP.

    I don't know what you would mean by ALU-heavy software design. Bulldozer has two integer execution cores per module. Each core is composed of 2 ALUs and 2 AGUs, not shared. It also has 2 128-bit floating point (FMA) units per module shared between the two threads. This isn't really much different than an intel hyper-threaded core. Intel has, I believe, 3 ALUs, 3 AGUs, and 2 FPUs per core which is shared between 2 threads. AMDs version of multi-threading just doesn't share as much hardware between threads, which may be better than Intel's HT (2-2-1 AMD vs 1.5-1.5-1 Intel ALU-AGU-FPU). Intel's version would allow a single thread to us all of the execution resources at once, if there is no competing thread. Sharing the FPU makes a lot of sense, since most code that runs on CPUs only uses the FPU intermittently. If the code uses FP more than intermittently, then it would be a candidate for vectorization, and execution on the GPU instead.

    While AMDs next generation graphics hardware may be able to execute more general code compiled from a wider range of languages, it is not an x86 processor, and it can not replace the CPU. If you look at the diagram, it has a single scalar unit to handle non-vector code in each compute unit. It also has 64 units in the 4 vector arrays of each CU. If you actually tried to compile and run the kind of branch heavy, integer code that CPUs have to deal with on a CU, then it would probably run entirely and very, very slowly on that single scalar unit.
  • MrSpadge - Wednesday, June 22, 2011 - link

    I think you've got the right idea with this being melted into a Bulldozer-like design. however, it wouldn't replace the x86 ALUs, which are highly-optimized for high clock speed and low latency execution, as well as excellent handling of branches etc.
    No, it would rather replace or supplement a fat FPU shared between many "cores" (which, by then would basically mean ALUs + scheduling). Most tasks which requires massive fp number crunching can be executed well in parallel and therefore are suitable for execution on a GPU core. The question is just how to bond them together so that the software guys can actually use them..

    MrS
  • Deleted - Thursday, December 22, 2011 - link

    Basically, what we have here is a math coprocessor. Back in the day, Intel's x86 processors were very good (relatively speaking) at integer math, but choked on floating point math. So Intel created the 8087 to handle the floating point calculations while the CPU handled the integer calculations (obviously this wasn't exclusive to Intel, but I'm generalizing). Eventually, the floating point unit was merged onto the CPU, and programs began using them interchangeably.

    What we have today is very similar. CPUs, even with their advanced FPUs, are nowhere near as powerful as the massively parallel monstrosities we use for graphics. Eventually, they will be merged onto the CPU, and used as readily for general floating point processing tasks as FPUs are currently.

    And this is the point of Fusion: to fully replace the aging floating point unit with an IGP.
  • A5 - Friday, June 17, 2011 - link

    The benefits to home or enthusiast users of heterogeneous CPUs are still several years off. We need market penetration of hardware along with fundamental changes in software development models and smarter compilers.
  • nedwards - Tuesday, January 28, 2014 - link

    Smarter programmers would help! Let me rephrase that. Programmers thinking in a parallel mindset would help!
  • Beenthere - Friday, June 17, 2011 - link

    If AMD delivers in a timely manner they will have a bright future. This looks like a huge technological transition and I understand the need to get developers onboard now but it also tips AMD's hand to Intel who will steal any ideas that they can.

    Unfortunately we are still waiting for most applications to be written for 64-bit use so I'm not holding out much hope for an expeditious migration on a complex technological transition though it does appear that maybe AMD has been working on this for some time and may be able to do a better job of executing with Trinity and future products. Time will tell but I hope AMD delivers on time and they will definitely get my dime - all of them.

Log in

Don't have an account? Sign up now