Cayman: The New Dawn of AMD GPU Computing

We’ve already covered how the shift from VLIW5 to VLIW4 is beneficial for AMD’s computing efforts: narrower SPUs are easier to fully utilize, FP64 performance improves to 1/4th FP32 performance, and the space savings give AMD room to lay down additional SIMDs to improve performance. But if Cayman is meant to be a serious effort by AMD to relaunch themselves in to the GPU computing market and to grab a piece of NVIDIA’s pie, it takes more than just new shaders to accomplish the task. Accordingly, AMD has been hard at work to round out the capabilities of their latest GPU to make it a threat for NVIDIA’s Fermi architecture.

AMD’s headline compute feature is called asynchronous dispatch, a long word that actually does a pretty good job of describing what it does. To touch back on Fermi for a moment, with Fermi NVIDIA introduced support for parallel kernels, giving Fermi the ability to execute multiple kernels at once. AMD in turn is following NVIDIA’s approach of executing multiple kernels at once, but is going to take it one step further.

The limit of NVIDIA’s design is that while Fermi can execute multiple kernels at once, each one must come from the same CPU thread. Independent threads/applications for example cannot issue their own kernels and have them execute in parallel, rather the GPU must context switch between them. With asynchronous dispatch AMD is going to allow independent threads/applications to issue kernels that execute in parallel. On paper at least, this would give AMD’s hardware a significant advantage in this scenario (context switching is expensive), one that would likely eclipse any overall performance advantages NVIDIA had.

Fundamentally asynchronous dispatch is achieved by having the GPU hide some information about its real state from applications and kernels, in essence leading to virtualization of GPU resources. As far as each kernel is concerned it’s running in its own GPU, with its own command queue and own virtual address space. This places more work on the GPU and drivers to manage this shared execution, but the payoff is that it’s better than context switching.

For the time being the catch for asynchronous dispatch is that it requires API support. As DirectCompute is a fixed standard this just isn’t happening – at least not with DirectCompute 11. Asynchronous dispatch will be exposed under OpenCL in the form of an extension.

Meanwhile the rest of AMD’s improvements are focusing on memory and cache performance. While the fundamental architecture is not changing, there are several minor changes here to improve compute performance. The Local Data Store attached to each SIMD is now able to bypass the cache hierarchy and Global Data Store by having memory fetches read directly in to the LDS. Meanwhile Cayman is getting a 2nd DMA engine, improving memory reads & writes by allowing Cayman to execute two at once in each direction.

Finally, read ops from shaders are being sped up a bit. Compared to Cypress, Cayman can coalesce them in to fewer operations.

As today’s launch is primarily about the Radeon HD 6900 series AMD isn’t going too much in depth on the compute side of things, so everything here is a fairly high level overview of the architecture. Once AMD has Firestream cards ready to go with Cayman in them, there will likely be more to talk about.

VLIW4: Finding the Balance Between TLP, ILP, and Everything Else Advancing Primitives: Dual Graphics Engines & New ROPs
Comments Locked

168 Comments

View All Comments

  • anactoraaron - Wednesday, December 15, 2010 - link

    I would like to thank Ryan for the article that makes me forget the "OC card in the review" debacle. Fantastic in depth review with no real slant to team green or red. Critics go elsewhere please.
  • Hrel - Wednesday, December 15, 2010 - link

    When are you guys gonna put all these cards in bench? Some of them have been out for a relatively long time now and they're still not in bench. Please put them in there.
  • ajlueke - Wednesday, December 15, 2010 - link

    I agree with most of the conclusions I have read here. If you already own a 5800 series card, there isn't really enough here to warrant an upgrade. Some improved features and slightly improved FPS in games doesn't quite give the same upgrade incentive as the 5870 did compared a 4870.
    There are some cool things with the 6900 and 6800 series. Looking at the performance in games, the 6970 and even the 6870 seemed to get much closer to 2X performance when placed in crossfire as compared to 5800 series cards. That is a pretty interesting development. All in all, a good upgrade if you didn't buy a card last generation. If you did, it seems the wait is on for the 28 nm version of the GPU.
  • Belard - Wednesday, December 15, 2010 - link

    NO!

    The 800 cards were the HIGH end models since the 3000 series and worked well through to the 5000 series with the 5970 being the "odd one" since the "X2" made more sense like the 4850X2.

    It also allows for a "x900" series if needed.

    AMD needs to NOT COPY Nvidia's naming games... did they hire someone from Nvidia? Even the GeForce 580/570 still belong to the 400 series since its the same tech. SHould have been named 490 and the 475... But hey, in 12 months, Nvidia will be up to the 700 series. Hey, Google Chrome is version 8.0 and its been on the market for about 2 years! WTF?!

    What was their excuse again? Oh, to not create confusion with the 5700 series? So they frack up the whole model names for a mid-range card? The 6800's should have been 6700s, simple as that. Yes, there will be some people who will accidentally downgrade.

    What the new 6000 series has going for AMD is that they are somewhat cheaper and easily cost less to make than the 5000s and what Nvidia makes.

    In the end, the 6000 series is the first dumb-thing AMD has done since the 2000 series, but nowhere near as bad.
  • MS - Wednesday, December 15, 2010 - link

    In terms of effienct usage of space though AMD is doing quite well; ... should be efficient

    Nice article so far,

    Regards,
    Michael
  • nitrousoxide - Wednesday, December 15, 2010 - link

    The power connector on the left (8-pin of 6970 and 6-pin of 6950) has a corner (bottom left corner) cut down, that's because the cooler doesn't fit with the PCB design, if you install it with force the power connector would get stuck. So the delay of 6900 Series could be due to this issue, AMD needs one month to 'manually polish' all power connectors of the stock-cards in order to go with the cooler. Well, just a joke, but this surely reflects how poorly AMD organizes the whole design and manufacture process :)
  • nitrousoxide - Wednesday, December 15, 2010 - link

    you can find this out here :)
    hiphotos. baidu. com/coreavc/pic/item/70f48d81ffe07cf26d811957. jpg
  • nitrousoxide - Wednesday, December 15, 2010 - link

    AMD promises that every one will get a unique 6970 or 6950, different from any other card on the planet :)
  • GummiRaccoon - Wednesday, December 15, 2010 - link

    The performance of these cards is much better with 10.12, why didn't you test it with that?
  • Ryan Smith - Wednesday, December 15, 2010 - link

    10.12 does not support the 6900 series.

    8.79.6.2RC2, dated December 7th, were the absolute latest drivers for the 6900 series at the time of publication.

Log in

Don't have an account? Sign up now