Cayman: The New Dawn of AMD GPU Computing

We’ve already covered how the shift from VLIW5 to VLIW4 is beneficial for AMD’s computing efforts: narrower SPUs are easier to fully utilize, FP64 performance improves to 1/4th FP32 performance, and the space savings give AMD room to lay down additional SIMDs to improve performance. But if Cayman is meant to be a serious effort by AMD to relaunch themselves in to the GPU computing market and to grab a piece of NVIDIA’s pie, it takes more than just new shaders to accomplish the task. Accordingly, AMD has been hard at work to round out the capabilities of their latest GPU to make it a threat for NVIDIA’s Fermi architecture.

AMD’s headline compute feature is called asynchronous dispatch, a long word that actually does a pretty good job of describing what it does. To touch back on Fermi for a moment, with Fermi NVIDIA introduced support for parallel kernels, giving Fermi the ability to execute multiple kernels at once. AMD in turn is following NVIDIA’s approach of executing multiple kernels at once, but is going to take it one step further.

The limit of NVIDIA’s design is that while Fermi can execute multiple kernels at once, each one must come from the same CPU thread. Independent threads/applications for example cannot issue their own kernels and have them execute in parallel, rather the GPU must context switch between them. With asynchronous dispatch AMD is going to allow independent threads/applications to issue kernels that execute in parallel. On paper at least, this would give AMD’s hardware a significant advantage in this scenario (context switching is expensive), one that would likely eclipse any overall performance advantages NVIDIA had.

Fundamentally asynchronous dispatch is achieved by having the GPU hide some information about its real state from applications and kernels, in essence leading to virtualization of GPU resources. As far as each kernel is concerned it’s running in its own GPU, with its own command queue and own virtual address space. This places more work on the GPU and drivers to manage this shared execution, but the payoff is that it’s better than context switching.

For the time being the catch for asynchronous dispatch is that it requires API support. As DirectCompute is a fixed standard this just isn’t happening – at least not with DirectCompute 11. Asynchronous dispatch will be exposed under OpenCL in the form of an extension.

Meanwhile the rest of AMD’s improvements are focusing on memory and cache performance. While the fundamental architecture is not changing, there are several minor changes here to improve compute performance. The Local Data Store attached to each SIMD is now able to bypass the cache hierarchy and Global Data Store by having memory fetches read directly in to the LDS. Meanwhile Cayman is getting a 2nd DMA engine, improving memory reads & writes by allowing Cayman to execute two at once in each direction.

Finally, read ops from shaders are being sped up a bit. Compared to Cypress, Cayman can coalesce them in to fewer operations.

As today’s launch is primarily about the Radeon HD 6900 series AMD isn’t going too much in depth on the compute side of things, so everything here is a fairly high level overview of the architecture. Once AMD has Firestream cards ready to go with Cayman in them, there will likely be more to talk about.

VLIW4: Finding the Balance Between TLP, ILP, and Everything Else Advancing Primitives: Dual Graphics Engines & New ROPs
Comments Locked

168 Comments

View All Comments

  • MeanBruce - Wednesday, December 15, 2010 - link

    TechPowerUp.com shows the 6850 as 95percent or almost double the performance of the 4850 and 100percent more efficient than the 4850@1920x1200. I also am upgrading an old 4850, as far as the 6950 check their charts when they come up later today.
  • mapesdhs - Monday, December 20, 2010 - link


    Today I will have completed by benchmark pages comparing 4890, 8800GT and
    GTX 460 1GB (800 and 850 core speeds), in both single and CF/SLI, for a range
    of tests. You should be able to extrapolate between known 4850/4890 differences,
    the data I've accumulated, and known GTX 460 vs. 68xx/69xx differences (baring
    in mind I'm testing with 460s with much higher core clocks than the 675 reference
    speed used in this article). Email me at mapesdhs@yahoo.com and I'll send you
    the URL once the data is up. I'm testing with 3DMark06, Unigine (Heaven, Tropics
    and Sanctuary), X3TC, Stalker COP, Cinebench, Viewperf and PT Boats. Later
    I'll also test with Vantage, 3DMark11 and AvP.

    Ian.
  • ZoSo - Wednesday, December 15, 2010 - link

    Helluva 'Bang for the Buck' that's for sure! Currently I'm running a 5850, but I have been toying with the idea of SLI or CF. For a $300 difference, CF is the way to go at this point.
    I'm in no rush, I'm going to wait at least a month or two before I pull any triggers ;)
  • RaistlinZ - Wednesday, December 15, 2010 - link

    I'm a bit underwhelmed from a performance standpoint. I see nothing that will make me want to upgrade from my trusty 5870.

    I would like to see a 2x6950 vs 2x570 comparison though.
  • fausto412 - Wednesday, December 15, 2010 - link

    exactly my feelings.

    it's like thinking Miss Universe is about to screw you and then you find out it's her mom....who's probably still hot...but def not miss universe
  • Paladin1211 - Wednesday, December 15, 2010 - link

    CF scaling is truly amazing now, I'm glad that nVidia has something to catch up in terms of driver. Meanwhile, the ATI wrong refresh rate is not fixed, it stucks at 60hz where the monitor can do 75hz. "Refresh force", "refresh lock", "ATI refresh fix", disable /enable EDID, manually set monitor attributes in CCC, EDID hack... nothing works. Even the "HUGE" 10.12 driver can't get my friend's old Samsung SyncMaster 920NW to work at its native 1440x900@75hz, both in XP 32bit and win 7 64bit. My next monitor will be an 120hz for sure, and I don't want to risk and ruin my investment, AMD.
  • mapesdhs - Monday, December 20, 2010 - link


    I'm not sure if this will help fix the refresh issue (I do the following to fix max res
    limits), but try downloading the drivers for the monitor but modify the data file
    before installing them. Check to ensure it has the correct genuine max res and/or
    max refresh.

    I've been using various models of CRT which have the same Sony tube that can
    do 2048 x 1536, but every single vendor that sells models based on this tube has
    drivers that limited the max res to 1800x1440 by default, so I edit the file to enable
    2048 x 1536 and then it works fine, eg. HP P1130.

    Bit daft that drivers for a monitor do not by default allow one to exploit the monitor
    to its maximum potential.

    Anyway, good luck!!

    Ian.
  • techworm - Wednesday, December 15, 2010 - link

    future DX11 games will stress GPU and video RAM incrementally and it is then that 6970 will shine so it's obvious that 6970 is a better and more future proof purchase than GTX570 that will be frame buffer limited in near future games
  • Nickel020 - Wednesday, December 15, 2010 - link

    In the table about whether PowerTune affects an application or not there's a yes for 3DMark, and in the text you mention two applications saw throttling (with 3DMark it would be three). Is this an error?

    Also, you should maybe include that you're measuring the whole system power in the PowerTune tables, it might be confusing for people who don't read your reviews very often to see that the power draw you measured is way higher than the PowerTune level.

    Reading the rest now :)
  • stangflyer - Wednesday, December 15, 2010 - link

    Sold my 5970 waiting for 6990. With my 5970 playing games at 5040x1050 I would always have a 4th extended monitor hooked up to a tritton uve-150 usb to vga adapter. This would let me game while having the fourth monitor display my teamspeak, afterburner, and various other things.
    Question is this!! Can i use the new 6950/6970 and use triple monitor and also use a 4th screen extended at the same time? I have 3 matching dell native display port monitors and a fourth with vga/dvi. Can I use the 2 dp's and the 2 dvi's on the 6970 at the same time? I have been looking for this answer for hours and can't find it! Thanks for the help.

Log in

Don't have an account? Sign up now