Cayman: The New Dawn of AMD GPU Computing

We’ve already covered how the shift from VLIW5 to VLIW4 is beneficial for AMD’s computing efforts: narrower SPUs are easier to fully utilize, FP64 performance improves to 1/4th FP32 performance, and the space savings give AMD room to lay down additional SIMDs to improve performance. But if Cayman is meant to be a serious effort by AMD to relaunch themselves in to the GPU computing market and to grab a piece of NVIDIA’s pie, it takes more than just new shaders to accomplish the task. Accordingly, AMD has been hard at work to round out the capabilities of their latest GPU to make it a threat for NVIDIA’s Fermi architecture.

AMD’s headline compute feature is called asynchronous dispatch, a long word that actually does a pretty good job of describing what it does. To touch back on Fermi for a moment, with Fermi NVIDIA introduced support for parallel kernels, giving Fermi the ability to execute multiple kernels at once. AMD in turn is following NVIDIA’s approach of executing multiple kernels at once, but is going to take it one step further.

The limit of NVIDIA’s design is that while Fermi can execute multiple kernels at once, each one must come from the same CPU thread. Independent threads/applications for example cannot issue their own kernels and have them execute in parallel, rather the GPU must context switch between them. With asynchronous dispatch AMD is going to allow independent threads/applications to issue kernels that execute in parallel. On paper at least, this would give AMD’s hardware a significant advantage in this scenario (context switching is expensive), one that would likely eclipse any overall performance advantages NVIDIA had.

Fundamentally asynchronous dispatch is achieved by having the GPU hide some information about its real state from applications and kernels, in essence leading to virtualization of GPU resources. As far as each kernel is concerned it’s running in its own GPU, with its own command queue and own virtual address space. This places more work on the GPU and drivers to manage this shared execution, but the payoff is that it’s better than context switching.

For the time being the catch for asynchronous dispatch is that it requires API support. As DirectCompute is a fixed standard this just isn’t happening – at least not with DirectCompute 11. Asynchronous dispatch will be exposed under OpenCL in the form of an extension.

Meanwhile the rest of AMD’s improvements are focusing on memory and cache performance. While the fundamental architecture is not changing, there are several minor changes here to improve compute performance. The Local Data Store attached to each SIMD is now able to bypass the cache hierarchy and Global Data Store by having memory fetches read directly in to the LDS. Meanwhile Cayman is getting a 2nd DMA engine, improving memory reads & writes by allowing Cayman to execute two at once in each direction.

Finally, read ops from shaders are being sped up a bit. Compared to Cypress, Cayman can coalesce them in to fewer operations.

As today’s launch is primarily about the Radeon HD 6900 series AMD isn’t going too much in depth on the compute side of things, so everything here is a fairly high level overview of the architecture. Once AMD has Firestream cards ready to go with Cayman in them, there will likely be more to talk about.

VLIW4: Finding the Balance Between TLP, ILP, and Everything Else Advancing Primitives: Dual Graphics Engines & New ROPs
Comments Locked

168 Comments

View All Comments

  • henrikfm - Wednesday, December 15, 2010 - link

    The right numbers for these cards considering the performance:

    6970 -> 5875
    6950 -> 5855
  • flyck - Wednesday, December 15, 2010 - link

    Anand also tested with 'outdated' drivers. It is ofcourse AMD fault to not supply the best drivers available at launch though. But anand used 10.10, Reviews that use 10.11 like HardOcp see that the 6950 performance equally or better than 570GTx!! and 6970 trades blows with 580GTX but is overall little slower (but faster than 570GTX).

    And now we have to wait for the 10.12 drivers which were meant to be for 69xx series.
  • flyck - Wednesday, December 15, 2010 - link

    my bad anand tested with 10.11 :shame:
    10.12 don't seam to improve performance.

    That said, Anand would it be possible to change your graphs?
    Starting with the low quality and ending with the high quality? And also make the high quality chart for single cards only. Now it just isn't readable with SLI and crossfire numbers through it.

    According to your results 6970 is > 570 and 6950~570 but only when everything turned on.. but one cannot deduct that with the current presentation.
  • Will Robinson - Wednesday, December 15, 2010 - link

    $740 for HD6970 CrossfireX dominates GTX580 SLI costing over $1000.
    That's some serious ownage right there.
    Good pricing on these new cards and solid numbers for power/heat and noise.
    Seems like a good new series of cards from AMD.
  • prdola0 - Wednesday, December 15, 2010 - link

    No, you're wrong. Re-read the graphs. GTX580 SLI wins most of the time.
  • softdrinkviking - Wednesday, December 15, 2010 - link

    By a small average amount, and for ~$250 extra.
    Once you get to that level, you're not really hurting for performance anyway, so for people who really just want to play games and aren't interested in having the "fastest card" just to have it, the 6970 is the best value.
  • Nfarce - Wednesday, December 15, 2010 - link

    True. However AMD has just about always been about value over an all out direct card horsepower war with Nvidia. Some people are willing to spend for bragging rights.

    But I'm a little suspect on AT's figures with these cards. Two other tech sites (Toms Hardware and Guru3D) show the GTX 570 and 580 solidly beating the 6950 and 6970 respectively in the same games with similar PC builds.
  • IceDread - Friday, December 17, 2010 - link

    You are wrong. HD 5970 in crossfire wins over gtx 580 sli. But anandtech did not test that.
  • ypsylon - Wednesday, December 15, 2010 - link

    A lot of people were anxious to see what AMD will bring to the market with 6950/6970. And once again not much. Some minor advantages (like 5FPS in handul of games) is nothing worth writing or screaming about. For now GTX580 is more expensive, but now with AMD unveiling new cards nVidia will get really serious about the price. That $500 price point won't live for long. I expecting at least 50$ off that in the next 4-6 weeks.

    GTX580 is best option today for someone who is interested in new VGA, if you do own right now 5850/5870/5970 (CF or not) don't even bother with 69[whatever].
  • duploxxx - Wednesday, December 15, 2010 - link

    at that price point a 580 the best buy, get lost. The 580 is way over prized for the small performance increase it has above 570-6970 not to mentioning the additional power consumption. Don't see any reason at all to buy that card.

    Indeed no need to upgrade from a 58xx series but neither would be to move to a nv based card.

Log in

Don't have an account? Sign up now