Cayman: The New Dawn of AMD GPU Computing

We’ve already covered how the shift from VLIW5 to VLIW4 is beneficial for AMD’s computing efforts: narrower SPUs are easier to fully utilize, FP64 performance improves to 1/4th FP32 performance, and the space savings give AMD room to lay down additional SIMDs to improve performance. But if Cayman is meant to be a serious effort by AMD to relaunch themselves in to the GPU computing market and to grab a piece of NVIDIA’s pie, it takes more than just new shaders to accomplish the task. Accordingly, AMD has been hard at work to round out the capabilities of their latest GPU to make it a threat for NVIDIA’s Fermi architecture.

AMD’s headline compute feature is called asynchronous dispatch, a long word that actually does a pretty good job of describing what it does. To touch back on Fermi for a moment, with Fermi NVIDIA introduced support for parallel kernels, giving Fermi the ability to execute multiple kernels at once. AMD in turn is following NVIDIA’s approach of executing multiple kernels at once, but is going to take it one step further.

The limit of NVIDIA’s design is that while Fermi can execute multiple kernels at once, each one must come from the same CPU thread. Independent threads/applications for example cannot issue their own kernels and have them execute in parallel, rather the GPU must context switch between them. With asynchronous dispatch AMD is going to allow independent threads/applications to issue kernels that execute in parallel. On paper at least, this would give AMD’s hardware a significant advantage in this scenario (context switching is expensive), one that would likely eclipse any overall performance advantages NVIDIA had.

Fundamentally asynchronous dispatch is achieved by having the GPU hide some information about its real state from applications and kernels, in essence leading to virtualization of GPU resources. As far as each kernel is concerned it’s running in its own GPU, with its own command queue and own virtual address space. This places more work on the GPU and drivers to manage this shared execution, but the payoff is that it’s better than context switching.

For the time being the catch for asynchronous dispatch is that it requires API support. As DirectCompute is a fixed standard this just isn’t happening – at least not with DirectCompute 11. Asynchronous dispatch will be exposed under OpenCL in the form of an extension.

Meanwhile the rest of AMD’s improvements are focusing on memory and cache performance. While the fundamental architecture is not changing, there are several minor changes here to improve compute performance. The Local Data Store attached to each SIMD is now able to bypass the cache hierarchy and Global Data Store by having memory fetches read directly in to the LDS. Meanwhile Cayman is getting a 2nd DMA engine, improving memory reads & writes by allowing Cayman to execute two at once in each direction.

Finally, read ops from shaders are being sped up a bit. Compared to Cypress, Cayman can coalesce them in to fewer operations.

As today’s launch is primarily about the Radeon HD 6900 series AMD isn’t going too much in depth on the compute side of things, so everything here is a fairly high level overview of the architecture. Once AMD has Firestream cards ready to go with Cayman in them, there will likely be more to talk about.

VLIW4: Finding the Balance Between TLP, ILP, and Everything Else Advancing Primitives: Dual Graphics Engines & New ROPs
Comments Locked

168 Comments

View All Comments

  • mac2j - Wednesday, December 15, 2010 - link

    Um - if you have the money for a 580 ... pick up another $80-100 and get 2 x 6950 - you'll get nearly the best possible performance on the market at a similar cost.

    Also I agree that Nvidia will push the 580 price down as much as possible... the problem is that if you believe all of the admittedly "unofficial" breakdowns ... it costs Nvidia 1.5-2x as much to make a 580 as it costs AMD to make a 6970.

    So its hard to be sure how far Nvidia can push down the price on the 580 before it ceases to become profitable - my guess is they'll focus on making a 565 type card which has almost 570 performance but for a manufacturing cost closer to what a 460 runs them.
  • fausto412 - Wednesday, December 15, 2010 - link

    yeah. AMD let us down on this here product. We see what gtx580 is and what 6970 is...i would say if you planning to spend 500...the gtx580 is worth it.
  • truepurple - Wednesday, December 15, 2010 - link

    "support for color correction in linear space"

    What does that mean?
  • Ryan Smith - Wednesday, December 15, 2010 - link

    There are two common ways to represent color, linear and gamma.

    Linear: Used for rendering an image. More generally linear has a simple, fixed relationship between X and Y, such that if you drew the relationship it would be a straight line. A linear system is easy to work with because of the simple relationship.

    Gamma: Used for final display purposes. It's a non-linear colorspace that was originally used because CRTs are inherently non-linear devices. If you drew out the relationship, it would be a curved line. The 5000 series is unable to apply color correction in linear space and has to apply it in gamma space, which for the purposes of color correction is not as accurate.
  • IceDread - Wednesday, December 15, 2010 - link

    Yet again we do not get to see hd 5970 in crossfire despite it being a single card! Is this an nvidia site?

    Anyway, for those of you who do want to see those results, here is a link to a professional Swedish site!

    http://www.sweclockers.com/recension/13175-amd-rad...

    Maybe there is some google translation available or so if you want to understand more than the charts shows.
  • medi01 - Wednesday, December 15, 2010 - link

    Wow, 5970 in crossfire consumes less than 580 in SLI.
    http://www.sweclockers.com/recension/13175-amd-rad...
  • ggathagan - Wednesday, December 15, 2010 - link

    Absolutely!!!
    There's no way on God's green earth that Anandtech doesn't currently have a pair of 5970's on hand, so that MUST be the reason.
    I'll go talk to Anand and Ryan right now!!!!
    Oh, wait, they're on a conference call with Huang Jen-Hsun.....

    I'd like to note that I do not believe Anadtech ever did a test of two 5970's, so it's somewhat difficult to supply non-existent into any review.
    Ryan did a single card test in November 2009.That is the only review I've found of any 5970's on the site.
  • vectorm12 - Wednesday, December 15, 2010 - link

    I was not aware of the fact that the 32nm process had been canned completely and was still expecting the 6970 to blow the 580 out of the water.

    Although we can't possibly know and are unlikely to ever find out what cayman at 32nm would have performed like I suspect AMD had to give up a good chunk of performance to fit it on the 389mm^2 40nm die.

    This really makes my choice easy as I'll pickup another cheap 5870 and run my system in CF.
    I think I'll be able to live with the performance until the refreshed cayman/next gen GPUs are ready for prime time.

    Ryan: I'd really like to see what ighashgpu can do with the new 6970 cards though. Although you produce a few GPGPU charts I feel like none of them really represent the real "number-crunching" performance of the 6970/6950.

    Ivan has already posted his analysis in his blog and it seems like the change from LWIV5 to LWIV4 made a negligible impact at the most. However I'd really love to see ighashgpu included in future GPU tests to test new GPUs and architectures.

    Thanks for the site and keep up the work guys!
  • slagar - Wednesday, December 15, 2010 - link

    Gaming seems to be in the process of bursting its own bubble. Graphics of games isn't keeping up with the hardware (unless you cound gaming on 6 monitors) because most developers are still targeting consoles with much older technology.
    Consoles won't upgrade for a few more years, and even then, I'm wondering how far we are from "the final console generation". Visual improvements in graphics are becoming quite incremental, so it's harder to "wow" consumers into buying your product, and the costs for developers is increasing, so it's becoming harder for developers to meet these standards. Tools will always improve and make things easier and more streamlined over time I suppose, but still... it's going to be an interesting decade ahead of us :)
  • darckhart - Wednesday, December 15, 2010 - link

    that's not entirely true. the hardware now allows not only insanely high resolutions, but it also lets those of us with more stringent IQ requirements (large custom texture mods, SSAA modes, etc) to run at acceptable framerates at high res in intense action spots.

Log in

Don't have an account? Sign up now