Another New Anti-Aliasing Mode: Enhanced Quality AA

With the 6800 series AMD introduced Morphological Anti-Aliasing (MLAA), a low-complexity post-processing anti-aliasing filter. As a post-processing filter it worked with a wide variety of games and APIs, and in most cases the performance overhead was not very severe. However it’s not the only new anti-aliasing mode that AMD has been working on.

New with the 6900 series is a mode AMD is calling Enhanced Quality Anti-Aliasing. If you recall NVIDIA’s Coverage Sample Anti-Aliasing (CSAA) introduced with the GeForce 8800GTX, then all of this should sound quite familiar – in fact it’s basically the same thing.

Under traditional MSAA, for a pixel covered by 2 or more triangles/fragments, 2, 4, or 8 subpixel samples are taken to determine what the final pixel should be. In the process the color of the triangle and the Z/depth of the triangle are both sampled and stored, and at the end of the process the results are blended together to determine the final pixel value. This process works well for resolving aliasing along polygon edges at a fraction of the cost of true super sampling, but it’s still expensive. Collecting and storing the Z and color values requires extra memory to store the values and extra memory bandwidth to work with the values. Ultimately while we need enough samples to determine colors of the involved triangles, we do not always need a great deal of them. With a few color/Z samples we have all of the color data we need in most cases, however the “hard” part of anti-aliasing becomes what the proper blending of color values should be.


1 Pixel Covred by 2 Triangles/Fragments

Thus we have EQAA, a compromise on the idea. Color/Z samples are expensive, but just checking if a triangle covers part of a subpixel is very cheap. If we have enough color/Z samples to get the necessary color information, then just doing additional simple subpixel coverage checks would allow us better determine what percentage of a pixel is covered by a given polygon, which we can then use to blend colors in a more accurate fashion. For example with 4x MSAA we can only determine if a pixel is 0/25/50/75/100 percent covered by a triangle, but with 4x EQAA where we take 4 color samples and then 4 additional coverage-only samples, we can determine blending values down to 0/12/25/37/50/62/75/87/100 percent coverage, the same amount of accuracy as using 8x MSAA. Thus in the right situation we can have quality similar to 8x MSAA for only a little over 4x MSAA’s cost.


MSAA & EQAA Sample Patterns

In reality of course this doesn’t always work out as well. The best case scenario is that the additional coverage samples are almost as good as having additional color/Z samples, while the worst case scenario is that additional coverage samples are practically worthless. This depends on a game-by-game, if not pixel-by-pixel basis. In practice additional coverage samples are a way to slightly improve MSAA quality for a very, very low cost.

While NVIDIA has had the ability to take separate coverage samples since G80, AMD has not had this ability until now. With the 6900 hardware their ROPs finally gain this ability.

Beyond that, AMD and NVIDIA’s implementations are nearly identical except for the naming convention. Both can take a number of coverage samples independent of the color/Z samples based on the setting used; the only notable difference we’re aware of is that like AMD’s other AA modes, their EQAA mode can be programmed to use a custom sample pattern.

As is the case with NVIDIA’s CSAA, AMD’s EQAA mode is available to DirectX applications or can be forced through the drivers. DirectX applications can set it through the Multisample Quality attribute, which is usually abstracted to list the vendor’s name for the mode in a game’s UI. Otherwise it can be forced via the Catalyst Control Center, either by forcing an AA mode, or as is the case with NVIDIA, enhancing the AA mode by letting the game set the AA mode while the driver overrides the game and specifies different Multisample Quality attribute. Thus the “enhance application settings” AA mode is new to AMD with the 6900 series.

To be honest we’re a bit ruffled by the naming choice. True, NVIDIA did go and have to pick daft names for their CSAA modes (when is 8x not 8 sample MSAA?), but ultimately CSAA and EQAA are virtually identical. NVIDIA has a 4 year lead on AMD here, and we’d just as well use NVIDIA’s naming conventions for consistency. Instead we have the following.

Coverage Sampling Modes: CSAA vs EQAA
NVIDIA Mode
(Color + Coverage)
AMD
2x 2+0 2x
N/A 2+2 2xEQ
4x 4+0 4x
8x 4+4 4xEQ
16x 4+12 N/A
8xQ 8+0 8x
16xQ 8+8 8xEQ
32x 8+24 N/A

AMD ends up having 1 mode NVIDIA doesn’t, 2xEQ, which is 2x MSAA + 2x cover samples; meanwhile NVIDIA has 16x (4x MSAA + 12 cover samples) and 32x (8x MSAA + 24 cover samples). Finally, as we’ll see, just as is the case for NVIDIA additional coverage samples are equally cheap for AMD.

Tweaking PowerTune Meet the 6970 & 6950
Comments Locked

168 Comments

View All Comments

  • AnnonymousCoward - Wednesday, December 15, 2010 - link

    First of all, 30fps is choppy as hell in a non-RTS game. ~40fps is a bare minimum, and >60fps all the time is hugely preferred since then you can also use vsync to eliminate tearing.

    Now back to my point. Your counter was "you know that non-AA will be higher than AA, so why measure it?" Is that a point? Different cards will scale differently, and seeing 2560+AA doesn't tell us the performance landscape at real-world usage which is 2560 no-AA.
  • Dug - Wednesday, December 15, 2010 - link

    Is it me, or are the graphs confusing.
    Some leave out cards on certain resolutions, but add some in others.

    It would be nice to have a dynamic graph link so we can make our own comparisons.
    Or a drop down to limit just ati, single card, etc.

    Either that or make a graph that has the cards tested at all the resolutions so there is the same number of cards in each graph.
  • benjwp - Wednesday, December 15, 2010 - link

    Hi,

    You keep using Wolfenstein as an OpenGL benchmark. But it is not. The single player portion uses Direct3D9. You can check this by checking which DLLs it loads or which functions it imports or many other ways (for example most of the idTech4 renderer debug commands no longer work).

    The multiplayer component does use OpenGL though.

    Your best bet for an OpenGL gaming benchmark is probably Enemy Territory Quake Wars.
  • Ryan Smith - Wednesday, December 15, 2010 - link

    We use WolfMP, not WolfSP (you can't record or playback timedemos in SP).
  • 7Enigma - Wednesday, December 15, 2010 - link

    Hi Ryan,

    What benchmark do you use for the noise testing? Is it Crysis or Furmark? Along the same line of questioning I do not think you can use Furmark in the way you have the graph setup because it looks like you have left Powertune on (which will throttle the power consumption) while using numbers from NVIDIA's cards where you have faked the drivers into not throttling. I understand one is a program cheat and another a TDP limitation, but it seems a bit wrong to not compare them in the unmodified position (or VERBALLY mention this had no bearing on the test and they should not be compared).

    Overall nice review, but the new cards are pretty underwhelming IMO.
  • Ryan Smith - Thursday, December 16, 2010 - link

    Hi 7Enigma;

    For noise testing it's FurMark. As is the case with the rest of our power/temp/noise benchmarks, we want to establish the worst case scenario for these products and compare them along those lines. So the noise results you see are derived from the same tests we do for temperatures and power draw.

    And yes, we did leave PowerTune at its default settings. How we test power/temp/noise is one of the things PowerTune made us reevaluate. Our decision is that we'll continue to use whatever method generates the worst case scenario for that card at default settings. For NVIDIA's GTX 500 series, this means disabling OCP because NVIDIA only clamps FurMark/OCCT, and to a level below most games at that. Other games like Program X that we used in the initial GTX 580 article clearly establish that power/temp/noise can and do get much worse than what Crysis or clamped FurMark will show you.

    As for the AMD cards the situation is much more straightforward: PowerTune clamps everything blindly. We still use FurMark because it generates the highest load we can find (even with it being reduced by over 200MHz), however because PowerTune clamps everything, our FurMark results are the worst case scenario for that card. Absolutely nothing will generate a significantly higher load - PowerTune won't allow it. So we consider it accurate for the purposes of establishing the worst case scenario for noise.

    In the long run this means that results will come down as newer cards implement this kind of technology, but then that's the advantage of such technology: there's no way to make the card louder without playing wit the card's settings. For the next iteration of the benchmark suite we will likely implement a game-based noise test, even though technologies like PowerTune are reducing the dynamic range.

    In conclusion: we use FurMark, we will disable any TDP limiting technology that discriminates based on the program type or is based on a known program list, and we will allow any TDP limiting technology that blindly establishes a firm TDP cap for all programs and games.

    -Thanks
    Ryan Smith
  • 7Enigma - Friday, December 17, 2010 - link

    Thanks for the response Ryan! I expected it to be lost in the slew of other posts. I highly recommend (as you mentioned in your second to last paragraph) that a game-based benchmark is used along with the Furmark for power/noise. Until both adopt the same TDP limitation it's going to put the NVIDIA cards in a bad light when comparisons are made. This could be seen as a legitimate beef for the fanboys/trolls, and we all know the less ammunition they have the better. :)

    Also to prevent future confusion it would be nice to have what program you are using for the power draw/noise/heat IN the graph title itself. Just something as simple as "GPU Temperature (Furmark-Load)" would make it instantly understandable.

    Thanks again for the very detailed review (on 1 week nonetheless!)
  • Hrel - Wednesday, December 15, 2010 - link

    I really hope these architexture changes lead to better minimum FPS results. AMD is ALWAYS behind Nvidia on minimum FPS and in many ways that's the most important measurment since min FPS determines if the game is playable or not. I dont' care if it maxes out 122 FPS if when the shit hits the fan I get 15 FPS, I won't be able to accurately hit anything.
  • Soldier1969 - Wednesday, December 15, 2010 - link

    I'm dissapointed in the 6970, its not what I was expecting over my 5870. I will wait to see what the 6990 brings to the table next month. I'm looking for a 30-40% boost from my 5870 at 2560 x 1600 res I game at.
  • stangflyer - Wednesday, December 15, 2010 - link

    Now that we see the power requirements for the 6970 and that it needs more power than the 5870 how would they make a 6990 without really cutting off the performance like the 5970?

    I had a 5970 for a year b4 selling it 3 weeks ago in preparation of getting 570 in sli or 6990.
    It would obviously have to be 2x8 pin power! Or they would have to really use that powertune feature.

    I liked my 5970 as I didn't have the stuttering issues (or i don't notice them) And actually have no issues with eyefinity as i have matching dell monitors with native dp inputs.

    If I was only on one screen I would not even be thinking upgrade but the vram runs out when using aa or keeping settings high as I play at 5040x1050. That is the only reason I am a little shy of getting the 570 in sli.

    Don't see how they can make a 6990 without really killing the performance of it.

    I used my 5970 at 5870 and beyond speeds on games all the time though.

Log in

Don't have an account? Sign up now