Cayman: The New Dawn of AMD GPU Computing

We’ve already covered how the shift from VLIW5 to VLIW4 is beneficial for AMD’s computing efforts: narrower SPUs are easier to fully utilize, FP64 performance improves to 1/4th FP32 performance, and the space savings give AMD room to lay down additional SIMDs to improve performance. But if Cayman is meant to be a serious effort by AMD to relaunch themselves in to the GPU computing market and to grab a piece of NVIDIA’s pie, it takes more than just new shaders to accomplish the task. Accordingly, AMD has been hard at work to round out the capabilities of their latest GPU to make it a threat for NVIDIA’s Fermi architecture.

AMD’s headline compute feature is called asynchronous dispatch, a long word that actually does a pretty good job of describing what it does. To touch back on Fermi for a moment, with Fermi NVIDIA introduced support for parallel kernels, giving Fermi the ability to execute multiple kernels at once. AMD in turn is following NVIDIA’s approach of executing multiple kernels at once, but is going to take it one step further.

The limit of NVIDIA’s design is that while Fermi can execute multiple kernels at once, each one must come from the same CPU thread. Independent threads/applications for example cannot issue their own kernels and have them execute in parallel, rather the GPU must context switch between them. With asynchronous dispatch AMD is going to allow independent threads/applications to issue kernels that execute in parallel. On paper at least, this would give AMD’s hardware a significant advantage in this scenario (context switching is expensive), one that would likely eclipse any overall performance advantages NVIDIA had.

Fundamentally asynchronous dispatch is achieved by having the GPU hide some information about its real state from applications and kernels, in essence leading to virtualization of GPU resources. As far as each kernel is concerned it’s running in its own GPU, with its own command queue and own virtual address space. This places more work on the GPU and drivers to manage this shared execution, but the payoff is that it’s better than context switching.

For the time being the catch for asynchronous dispatch is that it requires API support. As DirectCompute is a fixed standard this just isn’t happening – at least not with DirectCompute 11. Asynchronous dispatch will be exposed under OpenCL in the form of an extension.

Meanwhile the rest of AMD’s improvements are focusing on memory and cache performance. While the fundamental architecture is not changing, there are several minor changes here to improve compute performance. The Local Data Store attached to each SIMD is now able to bypass the cache hierarchy and Global Data Store by having memory fetches read directly in to the LDS. Meanwhile Cayman is getting a 2nd DMA engine, improving memory reads & writes by allowing Cayman to execute two at once in each direction.

Finally, read ops from shaders are being sped up a bit. Compared to Cypress, Cayman can coalesce them in to fewer operations.

As today’s launch is primarily about the Radeon HD 6900 series AMD isn’t going too much in depth on the compute side of things, so everything here is a fairly high level overview of the architecture. Once AMD has Firestream cards ready to go with Cayman in them, there will likely be more to talk about.

VLIW4: Finding the Balance Between TLP, ILP, and Everything Else Advancing Primitives: Dual Graphics Engines & New ROPs
Comments Locked

168 Comments

View All Comments

  • Remon - Wednesday, December 15, 2010 - link

    Seriously, are you using 10.10? It's not like the 10.11 have been out for a while. Oh, wait...

    They've been out for almost a month now. I'm not expecting you to use the 10.12, as these were released just 2 days ago, but you can't have an excuse about not using a month old drivers. Testing overclocked Nvidia cards against newly released cards, and now using older drivers. This site get's more biased with each release.
  • cyrusfox - Wednesday, December 15, 2010 - link

    I could be wrong, but 10.11 didn't work with the 6800 series, so I would imagine 10.11 wasn't meant for the 6900 either. If that is the case, it makes total sense why they used 10.10(cause it was the most updated driver available when they reviewed.)

    I am still using 10.10e, thinking about updating to 10.12, but why bother, things are working great at the moment. I'll probably wait for 11. or 11.2.
  • Remon - Wednesday, December 15, 2010 - link

    Nevermind, that's what you get when you read reviews early in the morning. The 10.10e was for the older AMD cards. Still, I can't understand the difference between this review and HardOCP's.
  • flyck - Wednesday, December 15, 2010 - link

    it doesn't. Anand has the same result for 25.. resolutions with max details AA and FSAA.

    Presentation on anand however is more focussed on 16x..10.. resolutions. (last graph) if you look in the first graph you'll notice the 6970/6950 performs like HardOcp. e.g. the higher the quality the smaller the gap becomes between 6950 and 570 and 6970 and 580. the lower the more 580 is running away and 6970/6950 are trailing the 570.
  • Gonemad - Wednesday, December 15, 2010 - link

    Oookay, new card from the red competitor. Welcome aboard.

    But, all of this time, I had to ask: why is Crysis is so punitive on the graphics cards? I mean, it was released eons ago, and still can't be run with everything cranked up in a single card, if you want 60fps...

    Is it sloppy coding? Does the game *really* looks better with all the eye candy? Or they built a "FPS bug" on purpose, some method of coding that was sure to torture any hardware that would be built in the next 18 months after release?

    I will get slammed for this, but for instance, the water effects on Half Life 2 look great even on lower spec cards, once you turn all the eye-candy on, and the FPS doesn't drop that much. The same for some subtle HDR effects.

    I guess I should see this game by myself and shut up about things I don't know. Yes, I enjoy some smooth gaming, but I wouldn't like to wait 2 years after release to run a game smoothly with everything cranked up.

    Another one is Dirt 2, I played it with all the eye candy to the top, my 5870 dropped to 50-ish FPS (as per benchmarks),it could be noticed eventually. I turned one or two things off, checked if they were not missing after another run, and the in game FPS meter jumped to 70. Yay.
  • BrightCandle - Wednesday, December 15, 2010 - link

    Crysis really does have some fabulous graphics. The amount of foliage in the forests is very high. Crysis kills cards because it really does push current hardware.

    I've got Dirt 2 and its not close in the level of detail. Its a decent looking game at times but its not a scratch on Crysis for the amount of stuff on screen. Half life 2 is also not bad looking but it still doesn't have the same amount of detail. The water might look good but its not as good as a PC game can look.

    You should buy Crysis, its £9.99 on steam. Its not a good game IMO but it sure is pretty.
  • fausto412 - Wednesday, December 15, 2010 - link

    yes...it's not much of a fun game but damn it is pretty
  • AnnihilatorX - Wednesday, December 15, 2010 - link

    Well original Crysis did push things too far and optimization could be used. Crysis Warhead is much better optimized while giving pretty identical visuals.
  • fausto412 - Wednesday, December 15, 2010 - link

    "I guess I should see this game by myself and shut up about things I don't know. Yes, I enjoy some smooth gaming, but I wouldn't like to wait 2 years after release to run a game smoothly with everything cranked up."

    that's probably a good idea. Crysis was made with future hardware in mind. It's like a freaking tech demo. Ahead of it's time and beaaaaaautiful. check it out on max settings,...then come back tell us what you think.
  • TimoKyyro - Wednesday, December 15, 2010 - link

    Thank you for the SmallLuxGPU test. That really made me decide to get this card. I make 3D animations with Blender in Ubuntu so the only thing holding me back is the driver support. Do these cards work in Ubuntu? Is it possible for you to test if the Linux drivers work at the time?

Log in

Don't have an account? Sign up now