The Return of Supersample AA

Over the years, the methods used to implement anti-aliasing on video cards have bounced back and forth. The earliest generation of cards such as the 3Dfx Voodoo 4/5 and ATI and NVIDIA’s DirectX 7 parts implemented supersampling, which involved rendering a scene at a higher resolution and scaling it down for display. Using supersampling did a great job of removing aliasing while also slightly improving the overall quality of the image due to the fact that it was sampled at a higher resolution.

But supersampling was expensive, particularly on those early cards. So the next generation implemented multisampling, which instead of rendering a scene at a higher resolution, rendered it at the desired resolution and then sampled polygon edges to find and remove aliasing. The overall quality wasn’t quite as good as supersampling, but it was much faster, with that gap increasing as MSAA implementations became more refined.

Lately we have seen a slow bounce back to the other direction, as MSAA’s imperfections became more noticeable and in need of correction. Here supersampling saw a limited reintroduction, with AMD and NVIDIA using it on certain parts of a frame as part of their Adaptive Anti-Aliasing(AAA) and Supersample Transparency Anti-Aliasing(SSTr) schemes respectively. Here SSAA would be used to smooth out semi-transparent textures, where the textures themselves were the aliasing artifact and MSAA could not work on them since they were not a polygon. This still didn’t completely resolve MSAA’s shortcomings compared to SSAA, but it solved the transparent texture problem. With these technologies the difference between MSAA and SSAA were reduced to MSAA being unable to anti-alias shader output, and MSAA not having the advantages of sampling textures at a higher resolution.

With the 5800 series, things have finally come full circle for AMD. Based upon their SSAA implementation for Adaptive Anti-Aliasing, they have re-implemented SSAA as a full screen anti-aliasing mode. Now gamers can once again access the higher quality anti-aliasing offered by a pure SSAA mode, instead of being limited to the best of what MSAA + AAA could do.

Ultimately the inclusion of this feature on the 5870 comes down to two matters: the card has lots and lots of processing power to throw around, and shader aliasing was the last obstacle that MSAA + AAA could not solve. With the reintroduction of SSAA, AMD is not dropping or downplaying their existing MSAA modes; rather it’s offered as another option, particularly one geared towards use on older games.

“Older games” is an important keyword here, as there is a catch to AMD’s SSAA implementation: It only works under OpenGL and DirectX9. As we found out in our testing and after much head-scratching, it does not work on DX10 or DX11 games. Attempting to utilize it there will result in the game switching to MSAA.

When we asked AMD about this, they cited the fact that DX10 and later give developers much greater control over anti-aliasing patterns, and that using SSAA with these controls may create incompatibility problems. Furthermore the games that can best run with SSAA enabled from a performance standpoint are older titles, making the use of SSAA a more reasonable choice with older games as opposed to newer games. We’re told that AMD will “continue to investigate” implementing a proper version of SSAA for DX10+, but it’s not something we’re expecting any time soon.

Unfortunately, in our testing of AMD’s SSAA mode, there are clearly a few kinks to work out. Our first AA image quality test was going to be the railroad bridge at the beginning of Half Life 2: Episode 2. That scene is full of aliased metal bars, cars, and trees. However as we’re going to lay out in this screenshot, while AMD’s SSAA mode eliminated the aliasing, it also gave the entire image a smooth makeover – too smooth. SSAA isn’t supposed to blur things, it’s only supposed to make things smoother by removing all aliasing in geometry, shaders, and textures alike.


8x MSAA   8x SSAA

As it turns out this is a freshly discovered bug in their SSAA implementation that affects newer Source-engine games. Presumably we’d see something similar in the rest of The Orange Box, and possibly other HL2 games. This is an unfortunate engine to have a bug in, since Source-engine games tend to be heavily CPU limited anyhow, making them perfect candidates for SSAA. AMD is hoping to have a fix out for this bug soon.

“But wait!” you say. “Doesn’t NVIDIA have SSAA modes too? How would those do?” And indeed you would be right. While NVIDIA dropped official support for SSAA a number of years ago, it has remained as an unofficial feature that can be enabled in Direct3D games, using tools such as nHancer to set the AA mode.

Unfortunately NVIDIA’s SSAA mode isn’t even in the running here, and we’ll show you why.


5870 SSAA


GTX 280 MSAA


GTX 280 SSAA

At the top we have the view from DX9 FSAA Viewer of ATI’s 4x SSAA mode. Notice that it’s a rotated grid with 4 geometry samples (red) and 4 texture samples. Below that we have NVIDIA’s 4x MSAA mode, a rotated grid with 4 geometry samples and a single texture sample. Finally we have NVIDIA’s 4x SSAA mode, an ordered grid with 4 geometry samples and 4 texture samples. For reasons that we won’t get delve into, rotated grids are a better grid layout from a quality standpoint than ordered grids. This is why early implementations of AA using ordered grids were dropped for rotated grids, and is why no one uses ordered grids these days for MSAA.

Furthermore, when actually using NVIDIA's SSAA mode, we ran into some definite quality issues with HL2: Ep2. We're not sure if these are related to the use of an ordered grid or not, but it's a possibility we can't ignore.


4x MSAA   4x SSAA

If you compare the two shots, with MSAA 4x the scene is almost perfectly anti-aliased, except for some trouble along the bottom/side edge of the railcar. If we switch to SSAA 4x that aliasing is solved, but we have a new problem: all of a sudden a number of fine tree branches have gone missing. While MSAA properly anti-aliased them, SSAA anti-aliased them right out of existence.

For this reason we will not be taking a look at NVIDIA’s SSAA modes. Besides the fact that they’re unofficial in the first place, the use of a rotated grid and the problems in HL2 cement the fact that they’re not suitable for general use.

Angle-Independent Anisotropic Filtering At Last AA Image Quality & Performance
Comments Locked

327 Comments

View All Comments

  • Zool - Sunday, September 27, 2009 - link

    The speed of the on chip cache just shows that the external memory bandwith in curent gpus is only to get the data to gpu or recieve the final data from gpu. The raw processing hapenns on chip with those 10 times faster sram cache or else the raw teraflops would vanish.
  • JarredWalton - Sunday, September 27, 2009 - link

    If SD had any reading comprehension or understanding of tech, he would realize that what I am saying is:

    1) Memory bandwidth didn't double - it went up by just 23%
    2) Look at the results and performance increased by far more than 23%
    3) Ergo, the 4890 is not bandwidth limited in most cases, and there was no need to double the bandwidth.

    Would more bandwidth help performance? Almost certainly, as the 5870 is such a high performance part that unlike the 4890 it could use more. Similarly, the 4870X2 has 50% more bandwidth than the 5870, but it's never 50% faster in our tests, so again it's obviously not bandwidth limited.

    Was it that hard to understand? Nope, unless you are trying to pretend I put an ATI bias on everything I say. You're trying to start arguments again where there was none.
  • SiliconDoc - Sunday, September 27, 2009 - link

    The 4800 data rate ram is faster vs former 3600 - hence bus width is running FASTER - so your simple conclusions are wrong.
    When we overlcock the 5870's ram, we get framerate increase - it increases the bandwidth, and up go the numbers.
    ---
    Not like there isn't an argument, because you don't understand tech.
  • JarredWalton - Sunday, September 27, 2009 - link

    The bus is indeed faster -- 4800 effective vs. 3900 on the 4890 or 3600 on the 4870. What's "wrong about my simple conclusions"? You're not wrong, but you're not 100% right if you suggest bandwidth is the only bottleneck.

    Naturally, as most games are at least partially bandwidth limited, if you overclock 10% you increase performance. The question is, does it increase linearly by 10%? Rarely, just as if you overclock the core 10% you usually don't get 10% boost. If you do get a 1-for-1 increase with overclocking, it indicates you are solely bottlenecked by that aspect of performance.

    So my conclusions still stand: the 5870 is more bandwidth limited than 4890, but it is not completely bandwidth limited. Improving the caches will also help the GPU deal with less bandwidth, just as it does on CPUs. As fast as Bloomfield may be with triple-channel DDR3-1066 (25.6GB/s), the CPU can process far more data than RAM could hope to provide. Would a wider/faster bus help the 5870? Yup. Would it be a win-win scenario in terms of cost vs. performance? Apparently ATI didn't think so, and given how quickly sales numbers taper off above $300 for GPUs, I'm inclined to agree.

    I'd also wager we're a lot more CPU limited on 5870 than many other GPUs, particularly with CrossFire setups. I wouldn't even look at 5870 CrossFire unless you're running a high-end/overclocked Core i7 or Phenom II (i.e. over ~3.4GHz).

    And FWIW: Does any of this mean NVIDIA can't go a different route? Nope. GT300 can use 512-bit interfaces with GDDR5, and they can be faster than 5870. They'll probably cost more if that's the case, but then it's still up to the consumers to decide how much they're willing to spend.
  • silverblue - Saturday, September 26, 2009 - link

    I suppose if we end up seeing a 512-bit card then it'll make for a very interesting comparison with the 5870. With equal clocks during testing, we'd have a far better idea, though I'd expect to see far more RAM on a 512-bit card which may serve to skew the figures and muddy the waters, so to speak.
  • Voo - Friday, September 25, 2009 - link

    Hey Jarred I know that's neither the right place nor the right person to ask, but do we get some kind of "Ignore this person" button with the site revamp Anand talked about some months ago?

    I think I'd prefer this feature about almost everything - even an edit button ;)
  • JarredWalton - Friday, September 25, 2009 - link

    I'll ask and find out. I know that the comments are supposed to receive a nice overhaul, but more than that...? Of course, if you ignore his posts on this (and the responses), you'd only have about five comments! ;-)
  • Voo - Saturday, September 26, 2009 - link

    Great!

    Yep it'd be rather short, but I'd rather have 10 interesting comments than 1000 COMMENTS WRITTEN IN CAPS!!11 with dubious content ;)
  • SiliconDoc - Wednesday, September 30, 2009 - link

    I put it in caps so you could easily avoid them, I was thinking of you and your "problems".
    I guess since you "knew this wasn't the right time or place" but went ahead anyway, you've got "lot's of problems".
    Let me know when you have posted an "interesting comment" with no "dubios nature" to it.
    I suspect I'll be waiting years.
  • MODEL3 - Friday, September 25, 2009 - link

    Hi Ryan,

    Nice new info in your review.

    The day you posted your review, i wrote in the forums that according to my perception there are other reasons except bandwidth limitations and driver maturity, that the 850MHz 5870 hasn't doubled its performance in relation with a 850MHz 4890.

    Usually when a GPU has 2X the specs of another GPU the performance gain is 2X (of cource i am not talking about games with engines that are CPU limited or engines that seems to scale badly or are poor coded for example)
    There are many examples in the past that we had 2X performance gain with 2X the specs. (not in all the games, but in many games)

    From the tests that i saw in your review and from my understanding of the AMD slides, i think there are 2 more reasons that 5870 performs like that.

    The day of your review i wrote to the forums the additional reasons that i think the 5870 performs like that, but nobody replied me.

    I wrote that probably 5870 has:

    1.Geometry/vertex performance issues (in the sense that it cannot generate 2X geometry in relation with 4890) (my main assumption)

    or/and

    2.Geometry/vertex shading performance issues (in the sense that the geometry shader [GS] cannot shade vertex with 2X speed in relation with 4890)(another possible assumption)

    I guess there are synthetic benchmarks that have tests like that (pure geometry speed, and pure geometry/vertex shader speed, in addition with the classic pixel shader speed tests) so someone can see if my assumption is true.

    If you have the time and you think that this is possible and you feel like it is worth your time, can you check my hypothesis please?

    Thanks very much,

    MODel3

Log in

Don't have an account? Sign up now