AMD's Radeon HD 6970 & Radeon HD 6950: Paving The Future For AMD

Name: AMD's Radeon HD 6970 & Radeon HD 6950: Paving The Future For AMD
Item: AMD's Radeon HD 6970 & Radeon HD 6950: Paving The Future For AMD
Author: Ryan Smith

by Ryan Smith on December 15, 2010 12:01 AM EST

Posted in
GPUs
AMD
Radeon

168 Comments | Add A Comment

168 Comments

Advancing Primitives: Dual Graphics Engines & New ROPs

AMD has clearly taken NVIDIA’s comments on geometry performance to heart. Along with issuing their manifesto with the 6800 series, they’ve also been working on their own improvements for their geometry performance. As a result AMD’s fixed function Graphics Engine block is seeing some major improvements for Cayman.

Prior to Cypress, AMD had 1 graphics engine, which contained 1 each of the fundamental blocks: the rasterizers/hierarchical-Z units, the geometry/vertex assemblers, and the tessellator. With Cypress AMD added a 2^nd rasterizer and 2^nd hierarchical-Z unit, allowing them to set up 32 pixels per clock as opposed to 16 pixels per clock. However while AMD doubled part of the graphics engine, they did not double the entirety of it, meaning their primitive throughput rate was still 1 primitive/clock, a typical throughput rate even at the time.

Cypress's Graphics Engine

In 2010 with the launch of Fermi, NVIDIA raised the bar on primitive performance, with rasterization moved to NVIDIA’s GPCs, NVIDIA could theoretically push out as many primitives/clock as they had GPCs, in the case of GF100/GF110 pushing this to 4 primitives/clock, a simply massive improvement in geometry performance for a single generation.

With Cayman AMD is catching up with NVIDIA by increasing their own primitive throughput rate, though not by as much as NVIDIA did with Fermi. For Cayman the rest of the graphics engine is being fully duplicated – Cayman will have 2 separate graphics engines, each containing one fundamental block, and each capable of pushing out 1 primitive/clock. Between the two of them AMD’s maximum primitive throughput rate will now be 2 primitives/clock; half as much as NVIDIA but twice that of Cypress.

Cayman's Dual Graphics Engines

As was the case for NVIDIA, splitting up rasterization and tessellation is not a straightforward and easy task. For AMD this meant teaching the graphics engine how to do tile-based load balancing so that the workload being spread among the graphics engines is being kept as balanced as possible. Furthermore AMD believes they have an edge on NVIDIA when it comes to design - AMD can scale the number of eraphics engines at will, whereas NVIDIA has to work within the logical confines of their GPC/SM/SP ratios. This tidbit would seem to be particularly important for future products, when AMD looks to scale beyond 2 graphics engines.

At the end of the day all of this tinking with the graphics engines is necessary in order for AMD to further improve their tessellation performance. AMD’s 7^th generation tessellator improved their performance at lower tessellation factors where the tessellator was the bottleneck, but at higher tessellation factors the graphics engine itself is the bottleneck as the graphics engine gets swamped with more incoming primitives than it can set up in a single clock. By having two graphics engines and a 2-primitive/clock rasterization rate, AMD is shifting the burden back away from the graphics engine.

Just having two 7^th generation-like tessellators goes a long way towards improving AMD’s tessellation performance. However all of that geometry can still lead to a bottleneck at times, which means it needs to be stored somewhere until it can be processed. As AMD has not changed any cache sizes for Cayman, there’s the same amount of cache for potentially thrice as much geometry, so in order to keep things flowing that geometry has to go somewhere. That somewhere is the GPU’s RAM, or as AMD likes to put it, their “off-chip buffer.” Compared to cache access RAM is slow and hence this isn’t necessarily a desirable action, but it’s much, much better than stalling the pipeline entirely while the rasterizers clear out the backlog.

Red = 6970. Yellow = 5870

Overall, clock for clock tessellation performance is anywhere between 1.5x and 3x that of Cypress. In situations where AMD’s already improved tessellation performance at lower tessellation factors plays a part, AMD approaches 3x performance; while at around a factor of 5 the performance drops to near 1.5x. Elsewhere performance is around 2x that of Cypress, representing the doubling of graphics engines.

Tessellation also plays a factor in AMD’s other major gaming-related improvement: ROP performance. As tessellation produces many mini triangles, these triangles begin to choke the ROPs when performing MSAA. Although tessellation isn’t the only reason, it certainly plays a factor in AMD’s reasoning for improving their ROPs to improve MSAA performance.

The 32 ROPs (the same as Cypress) have been tweaked to speed up processing of certain types of values. In the case of both signed and unsigned normalized INT16s, these operations are now 2x faster. Meanwhile FP32 operations are now 2x to 4x faster depending on the scenario. Finally, similar to shader read ops for compute purposes, ROP write ops for graphics purposes can be coalesced, improving performance by requiring fewer operations.

Cayman: The New Dawn of AMD GPU Computing Redefining TDP With PowerTune

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

168 Comments

View All Comments

Ryan Smith - Wednesday, December 15, 2010 - link
AMD rarely has Linux drivers ready for the press ahead of a launch. This is one such occasion.
MeanBruce - Wednesday, December 15, 2010 - link
Great job on the review Ryan, hope you will cover the upcoming Nvidia 560 and 550 when they arrive. Peace Brother!
gescom - Wednesday, December 15, 2010 - link
Please Anand make an update with a new 10.12 driver. Great review btw.
knowom - Wednesday, December 15, 2010 - link
Until you keep into consideration

1) Driver support
2) Cuda
3) PhysX

I also prefer the lower idle noise, but higher load noise than the reverse for Ati because when your gaming usually you have your sound turned up a lot it's when you aren't gaming is when noise is more of the issue for seeking a quieter system.

It's a better trade off in my view, but they are both pretty even in terms of noise for idle and load regardless and a far cry from quite compared to other solutions from both vendors if that's what your worried about not to mention non reference cooler designs effect that situation by leaps and bounds..
Acanthus - Wednesday, December 15, 2010 - link
AMD has been updating drivers more aggressively than Nvidia lately. (the last year)
Anecdotally, my GTX285 has had a lot more game issues than my 4890. Specifically in NWN2 and Civ5.

Cuda is irrelevant unless you are doing heavy 1. photoshop, 2. video encoding.

PhysX is still a crappy gimmick at this point and needs to offer real visual improvements without a 40%+ performance hit.
smookyolo - Wednesday, December 15, 2010 - link
PhysX may be a gimmick in games, but it's one of the better ones.

Also, guess what... it's being used all over the 3D animation industry.

And guess where the real money comes from? The industry.
fausto412 - Wednesday, December 15, 2010 - link
physx is a gimmick that has been around for some time and will never take hold. when physx came around it set a new standard but since then developers have adopted havok more commonly since it doesn't require extra hardware.

it's all marketing and not a worthy decision point when buying a new card
jackstar7 - Wednesday, December 15, 2010 - link
Alternately, my triple-monitor setup makes AMD the obvious choice.
beepboy - Wednesday, December 15, 2010 - link
Agreed on triple-monitor setup. You can make the argument that 2x 460s are cheaper and nets better performance but at the end of the day 2x 460s will be louder, use more power, more heat, etc over a single 69xx. I just want my triple monitor setup, damn it.
codedivine - Wednesday, December 15, 2010 - link
Any info on cache sizes and register files?

AMD's Radeon HD 6970 & Radeon HD 6950: Paving The Future For AMD

Post Your Comment

168 Comments

View All Comments

Ryan Smith - Wednesday, December 15, 2010 - link

MeanBruce - Wednesday, December 15, 2010 - link

gescom - Wednesday, December 15, 2010 - link

knowom - Wednesday, December 15, 2010 - link

Acanthus - Wednesday, December 15, 2010 - link

smookyolo - Wednesday, December 15, 2010 - link

fausto412 - Wednesday, December 15, 2010 - link

jackstar7 - Wednesday, December 15, 2010 - link

beepboy - Wednesday, December 15, 2010 - link

codedivine - Wednesday, December 15, 2010 - link

Log in

Don't have an account? Sign up now