Advancing Primitives: Dual Graphics Engines & New ROPs

AMD has clearly taken NVIDIA’s comments on geometry performance to heart. Along with issuing their manifesto with the 6800 series, they’ve also been working on their own improvements for their geometry performance. As a result AMD’s fixed function Graphics Engine block is seeing some major improvements for Cayman.

Prior to Cypress, AMD had 1 graphics engine, which contained 1 each of the fundamental blocks: the rasterizers/hierarchical-Z units, the geometry/vertex assemblers, and the tessellator. With Cypress AMD added a 2nd rasterizer and 2nd hierarchical-Z unit, allowing them to set up 32 pixels per clock as opposed to 16 pixels per clock. However while AMD doubled part of the graphics engine, they did not double the entirety of it, meaning their primitive throughput rate was still 1 primitive/clock, a typical throughput rate even at the time.


Cypress's Graphics Engine

In 2010 with the launch of Fermi, NVIDIA raised the bar on primitive performance, with rasterization moved to NVIDIA’s GPCs, NVIDIA could theoretically push out as many primitives/clock as they had GPCs, in the case of GF100/GF110 pushing this to 4 primitives/clock, a simply massive improvement in geometry performance for a single generation.

With Cayman AMD is catching up with NVIDIA by increasing their own primitive throughput rate, though not by as much as NVIDIA did with Fermi. For Cayman the rest of the graphics engine is being fully duplicated – Cayman will have 2 separate graphics engines, each containing one fundamental block, and each capable of pushing out 1 primitive/clock. Between the two of them AMD’s maximum primitive throughput rate will now be 2 primitives/clock; half as much as NVIDIA but twice that of Cypress.


Cayman's Dual Graphics Engines

As was the case for NVIDIA, splitting up rasterization and tessellation is not a straightforward and easy task. For AMD this meant teaching the graphics engine how to do tile-based load balancing so that the workload being spread among the graphics engines is being kept as balanced as possible. Furthermore AMD believes they have an edge on NVIDIA when it comes to design - AMD can scale the number of eraphics engines at will, whereas NVIDIA has to work within the logical confines of their GPC/SM/SP ratios. This tidbit would seem to be particularly important for future products, when AMD looks to scale beyond 2 graphics engines.

At the end of the day all of this tinking with the graphics engines is necessary in order for AMD to further improve their tessellation performance. AMD’s 7th generation tessellator improved their performance at lower tessellation factors where the tessellator was the bottleneck, but at higher tessellation factors the graphics engine itself is the bottleneck as the graphics engine gets swamped with more incoming primitives than it can set up in a single clock. By having two graphics engines and a 2-primitive/clock rasterization rate, AMD is shifting the burden back away from the graphics engine.

Just having two 7th generation-like tessellators goes a long way towards improving AMD’s tessellation performance. However all of that geometry can still lead to a bottleneck at times, which means it needs to be stored somewhere until it can be processed. As AMD has not changed any cache sizes for Cayman, there’s the same amount of cache for potentially thrice as much geometry, so in order to keep things flowing that geometry has to go somewhere. That somewhere is the GPU’s RAM, or as AMD likes to put it, their “off-chip buffer.” Compared to cache access RAM is slow and hence this isn’t necessarily a desirable action, but it’s much, much better than stalling the pipeline entirely while the rasterizers clear out the backlog.


Red = 6970. Yellow = 5870

Overall, clock for clock tessellation performance is anywhere between 1.5x and 3x that of Cypress. In situations where AMD’s already improved tessellation performance at lower tessellation factors plays a part, AMD approaches 3x performance; while at around a factor of 5 the performance drops to near 1.5x. Elsewhere performance is around 2x that of Cypress, representing the doubling of graphics engines.

Tessellation also plays a factor in AMD’s other major gaming-related improvement: ROP performance. As tessellation produces many mini triangles, these triangles begin to choke the ROPs when performing MSAA. Although tessellation isn’t the only reason, it certainly plays a factor in AMD’s reasoning for improving their ROPs to improve MSAA performance.

The 32 ROPs (the same as Cypress) have been tweaked to speed up processing of certain types of values. In the case of both signed and unsigned normalized INT16s, these operations are now 2x faster. Meanwhile FP32 operations are now 2x to 4x faster depending on the scenario. Finally, similar to shader read ops for compute purposes, ROP write ops for graphics purposes can be coalesced, improving performance by requiring fewer operations.

Cayman: The New Dawn of AMD GPU Computing Redefining TDP With PowerTune
Comments Locked

168 Comments

View All Comments

  • anactoraaron - Wednesday, December 15, 2010 - link

    I would like to thank Ryan for the article that makes me forget the "OC card in the review" debacle. Fantastic in depth review with no real slant to team green or red. Critics go elsewhere please.
  • Hrel - Wednesday, December 15, 2010 - link

    When are you guys gonna put all these cards in bench? Some of them have been out for a relatively long time now and they're still not in bench. Please put them in there.
  • ajlueke - Wednesday, December 15, 2010 - link

    I agree with most of the conclusions I have read here. If you already own a 5800 series card, there isn't really enough here to warrant an upgrade. Some improved features and slightly improved FPS in games doesn't quite give the same upgrade incentive as the 5870 did compared a 4870.
    There are some cool things with the 6900 and 6800 series. Looking at the performance in games, the 6970 and even the 6870 seemed to get much closer to 2X performance when placed in crossfire as compared to 5800 series cards. That is a pretty interesting development. All in all, a good upgrade if you didn't buy a card last generation. If you did, it seems the wait is on for the 28 nm version of the GPU.
  • Belard - Wednesday, December 15, 2010 - link

    NO!

    The 800 cards were the HIGH end models since the 3000 series and worked well through to the 5000 series with the 5970 being the "odd one" since the "X2" made more sense like the 4850X2.

    It also allows for a "x900" series if needed.

    AMD needs to NOT COPY Nvidia's naming games... did they hire someone from Nvidia? Even the GeForce 580/570 still belong to the 400 series since its the same tech. SHould have been named 490 and the 475... But hey, in 12 months, Nvidia will be up to the 700 series. Hey, Google Chrome is version 8.0 and its been on the market for about 2 years! WTF?!

    What was their excuse again? Oh, to not create confusion with the 5700 series? So they frack up the whole model names for a mid-range card? The 6800's should have been 6700s, simple as that. Yes, there will be some people who will accidentally downgrade.

    What the new 6000 series has going for AMD is that they are somewhat cheaper and easily cost less to make than the 5000s and what Nvidia makes.

    In the end, the 6000 series is the first dumb-thing AMD has done since the 2000 series, but nowhere near as bad.
  • MS - Wednesday, December 15, 2010 - link

    In terms of effienct usage of space though AMD is doing quite well; ... should be efficient

    Nice article so far,

    Regards,
    Michael
  • nitrousoxide - Wednesday, December 15, 2010 - link

    The power connector on the left (8-pin of 6970 and 6-pin of 6950) has a corner (bottom left corner) cut down, that's because the cooler doesn't fit with the PCB design, if you install it with force the power connector would get stuck. So the delay of 6900 Series could be due to this issue, AMD needs one month to 'manually polish' all power connectors of the stock-cards in order to go with the cooler. Well, just a joke, but this surely reflects how poorly AMD organizes the whole design and manufacture process :)
  • nitrousoxide - Wednesday, December 15, 2010 - link

    you can find this out here :)
    hiphotos. baidu. com/coreavc/pic/item/70f48d81ffe07cf26d811957. jpg
  • nitrousoxide - Wednesday, December 15, 2010 - link

    AMD promises that every one will get a unique 6970 or 6950, different from any other card on the planet :)
  • GummiRaccoon - Wednesday, December 15, 2010 - link

    The performance of these cards is much better with 10.12, why didn't you test it with that?
  • Ryan Smith - Wednesday, December 15, 2010 - link

    10.12 does not support the 6900 series.

    8.79.6.2RC2, dated December 7th, were the absolute latest drivers for the 6900 series at the time of publication.

Log in

Don't have an account? Sign up now