Cayman: The New Dawn of AMD GPU Computing

We’ve already covered how the shift from VLIW5 to VLIW4 is beneficial for AMD’s computing efforts: narrower SPUs are easier to fully utilize, FP64 performance improves to 1/4th FP32 performance, and the space savings give AMD room to lay down additional SIMDs to improve performance. But if Cayman is meant to be a serious effort by AMD to relaunch themselves in to the GPU computing market and to grab a piece of NVIDIA’s pie, it takes more than just new shaders to accomplish the task. Accordingly, AMD has been hard at work to round out the capabilities of their latest GPU to make it a threat for NVIDIA’s Fermi architecture.

AMD’s headline compute feature is called asynchronous dispatch, a long word that actually does a pretty good job of describing what it does. To touch back on Fermi for a moment, with Fermi NVIDIA introduced support for parallel kernels, giving Fermi the ability to execute multiple kernels at once. AMD in turn is following NVIDIA’s approach of executing multiple kernels at once, but is going to take it one step further.

The limit of NVIDIA’s design is that while Fermi can execute multiple kernels at once, each one must come from the same CPU thread. Independent threads/applications for example cannot issue their own kernels and have them execute in parallel, rather the GPU must context switch between them. With asynchronous dispatch AMD is going to allow independent threads/applications to issue kernels that execute in parallel. On paper at least, this would give AMD’s hardware a significant advantage in this scenario (context switching is expensive), one that would likely eclipse any overall performance advantages NVIDIA had.

Fundamentally asynchronous dispatch is achieved by having the GPU hide some information about its real state from applications and kernels, in essence leading to virtualization of GPU resources. As far as each kernel is concerned it’s running in its own GPU, with its own command queue and own virtual address space. This places more work on the GPU and drivers to manage this shared execution, but the payoff is that it’s better than context switching.

For the time being the catch for asynchronous dispatch is that it requires API support. As DirectCompute is a fixed standard this just isn’t happening – at least not with DirectCompute 11. Asynchronous dispatch will be exposed under OpenCL in the form of an extension.

Meanwhile the rest of AMD’s improvements are focusing on memory and cache performance. While the fundamental architecture is not changing, there are several minor changes here to improve compute performance. The Local Data Store attached to each SIMD is now able to bypass the cache hierarchy and Global Data Store by having memory fetches read directly in to the LDS. Meanwhile Cayman is getting a 2nd DMA engine, improving memory reads & writes by allowing Cayman to execute two at once in each direction.

Finally, read ops from shaders are being sped up a bit. Compared to Cypress, Cayman can coalesce them in to fewer operations.

As today’s launch is primarily about the Radeon HD 6900 series AMD isn’t going too much in depth on the compute side of things, so everything here is a fairly high level overview of the architecture. Once AMD has Firestream cards ready to go with Cayman in them, there will likely be more to talk about.

VLIW4: Finding the Balance Between TLP, ILP, and Everything Else Advancing Primitives: Dual Graphics Engines & New ROPs
Comments Locked

168 Comments

View All Comments

  • Roland00Address - Wednesday, December 15, 2010 - link

    1) The architecture article is something that can be written before hand, or written during benching (if the bench is on a loop). It has very little "cramming" to get out right after a NDA ends. Anand knows this info for a couple of weeks but can't discuss it due to NDAs. Furthermore the reason anandtech is one of the best review sites on the net is the fact they do go into the architecture details. The architecture as well as the performance benchmarks is the reason I come to anandtech instead of other review sites as my first choice.

    2) The spelling and grammar errors is a common thing at anandtech, this is nothing new. That said I can't complain for my spelling and grammar is far worse than Ryan's.

    If you don't like the style of the review go somewhere else.
  • Ryan Smith - Wednesday, December 15, 2010 - link

    1) That's only half true. AMD told us the basics about the 6900 series back in October, but I never had full access to the product information (and more importantly the developers) until 1 week ago. So this entire article was brought up from scratch in 1 week.

    It's rare for us to get too much access much earlier than that; the closest thing was the Fermi launch where NVIDIA was willing to talk about the architecture months in advance. Otherwise that's usually a closely held secret in order to keep the competition from having concrete details too soon.
  • Dracusis - Wednesday, December 15, 2010 - link

    Neither the AMD 6xxx series or Nvidia's 5xx series have been added. Would like to see how my 4870x2 stack up against this latest generation and weather or not it's worth upgrading.
  • Makaveli - Wednesday, December 15, 2010 - link

    The Canadian pricing on these cards are hilarious.

    Ncix is taking preorder for the 6970 at $474.

    While they sell the 570 for $379.

    Can someone explain to me why I would pay $100 more for the radeon when the 570 gives equal performance?

    Are these retailers that retarded?
  • stangflyer - Thursday, December 16, 2010 - link

    They will price the 6950/6970 high for a few days to get the boys that bleed red and have to have the new cards right away to pay top dollar for the card.

    After a week they will probably be about the same price.
  • Ryan Smith - Thursday, December 16, 2010 - link

    Bench will be up to date by the start of next week.
  • Paladin1211 - Thursday, December 16, 2010 - link

    Whats wrong with you rarson? Do you even know whats the difference between "Graphics card review", "Performance review", "Performance Preview"? I dont know how good your grammar and spelling are, but they dont matter as long as you cant understand the basic meaning of the words.

    Most of the sites will tell you about WHAT, but here at AnandTech, you'll truly find out WHY and HOW. Well, of course, you can always go elsewhere try to read some numbers instead of words.

    Keep up the good works, Ryan.
  • Belard - Thursday, December 16, 2010 - link

    The 3870 and 3850 were the TOP end for ATI, as was the 4800 and the 5800. Their relationship of model numbers do not have anything to do with the status of Nvidia.

    When the 3870 was brand new, what was the HIGHEST end card ATI had back then? Oh yeah, the 3870!

    4800 is over the 3870, easily.
    4600 replaced the 3800

    The 5800s replaces the 4800s... easily.
    the 5700s kind of replaced the 4800s.

    The 6800s replaces the 5700 & 5800s, the 6900s replace the 5800s, but not so much on performance.

    I paid $90 for my 4670 and a much better value than the $220 3870 since both cards perform almost the same.
  • AmdInside - Thursday, December 16, 2010 - link

    I can't think of a single website that has better hardware reviews, at least for computer technology than Anandtech. Ryan, keep up the great work.
  • George.Zhang - Thursday, December 16, 2010 - link

    BTW, HD6950 looks great and affordable for me.

Log in

Don't have an account? Sign up now