NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10

Name: NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
Item: NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
Author: Anand Lal Shimpi & Derek Wilson

by Anand Lal Shimpi & Derek Wilson on November 8, 2006 6:01 PM EST

Posted in
GPUs

111 Comments | Add A Comment

111 Comments

What is CSAA?

Taking another step forward in antialiasing quality and performance, NVIDIA is introducing Coverage Sample Antialiasing with G80. Coverage Sample AA is an evolutionary step forward in AA technology designed to improve how accurately the hardware is able to determine the area of a pixel covered by any given surface. CSAA can be thought of as extending MSAA. NVIDIA is calling all of their AA modes CSAA, even though common AA modes (2x, 4x, and now 8x (8xQ to NVIDIA)) are performed exactly the same way MSAA would be performed.

To enable modes that more accurately represent each polygon's coverage of a pixel, NVIDIA has introduced an "Enhance the application" option in their driver. This option will allow you to enable a desired MSAA mode in a game (either 4x or 8x) and then "enhance" it by enabling 8x, 16x, or 16xQ CSAA. This will make the 4xAA requested in the game look like 8xAA or 16xAA. Enhancing 8x to 16xQ gives the effect of 16xMSAA without the huge performance impact that would be associated with such a setting.

To understand how it comes together, lets take a quick look at fragments and the evolution of AA.

We usually refer to fragments as pixels for simplicity sake (and because Microsoft decided to use the term pixel shader rather than fragment shader in DirectX), but it helps to understand what the difference between a pixel and a fragment is when talking about AA methods. A pixel is simply a colored dot on the screen (or stored in a frame buffer). The different pieces of data that go into determining the color of a particular pixel are called fragments. For example, if 2 triangles cover the area of a single pixel, both will be processed as fragments. Texture look ups will be done for each at the pixel center, and a color and depth will be determined, and any of this data can be manipulated by a fragment (pixel) shader. Without AA (and ignoring blending, transparency, etc...), only the fragment that is nearest the viewer and covers the pixel center will determine the color of the pixel. Antialiasing techniques are used to make the final pixel color reflect an accurate blend of the colors that cover a pixel.

A sub-pixel can be thought of as a zoomed in look at the area a pixel covers, so for example instead of a single pixel it can be viewed as a 10x10 grid of sub-pixels. Current popular FSAA (full screen AA) methods use the calculated colors of multiple sub-pixels that fall within the area of a pixel rather than just the pixel center to determine the final color. Super Sample AA takes each of these sub-pixels through the entire pipeline to determine texture and pixel shader output at each location. This is very accurate, but wastes lots of processing power without providing a proportional benefit. This is because sub-pixels that fall on the same surface don't usually end up with very different colors. MSAA only looks at one textured/shaded sample point per fragment. The colors of the sub-pixels on a polygon are the same as the color at the center of the pixel, but each sub-pixel gets its own depth value. When two polygons cover the same pixel, we can end up with different colored sub-pixels. Blending these colors proportionally results in properly antialiased polygon edges.

CSAA extends MSAA by decoupling color and depth values from the positions of the sample points within a pixel. Color values are determined at the pixel center, and color and depth data are stored in a buffer. The extension of this in CSAA comes in that we can look at more sample points in the pixel than we store color/Z data for. Under NVIDIA's 16x CSAA, four color values are stored, but the fragment coverage information for each of 16 sample points is retained. These coverage sample points are able to reference the appropriate color/Z data stored for the polygon that covers them.

While NVIDIA couldn't go into much detail on the technology behind CSAA, we can extrapolate what's going on behind the scenes in order to make this happen. For each triangle that covers a pixel, each CSAA sample point gets a boolean value that indicates whether or not it is covered by the triangle. Color/Z data for the fragment are stored in a buffer for that pixel. For this whole thing to work, each CSAA sample point must also know what color in the buffer to indicate. If we assume position is predefined, the most storage that would be needed for each CSAA point is 4 bits (one boolean coverage value plus 3bits to index 8 color/Z values). The color and Z data will be significantly larger than 8 bytes per pixel, especially for floating point color data, so the memory footprint shouldn't be much larger than MSAA.

As fragments are sent out of the pixel shader, sub-pixel data is updated based on depth tests, and coverage samples and color/Z data will be updated as necessary. When the scene is ready to be drawn, the coverage sample points and color/Z data will be used to determine the color of a pixel based on each fragment that influenced it.

So what are the downsides? We have less depth information inside the pixel, but in most cases this isn't as important as color information. We do need to know depth at different sub-pixel positions in order to handle intersecting polygons, but doing this with a different level of detail than color information shouldn't have a big impact on quality.

The other drawback is that algorithms that require stencil/Z data at sub-pixel locations will not work correctly with CSAA in modes where there are more coverage samples than colors stored. In these cases, like with the stencil shadows used in FEAR, only the coverage samples located where color values are taken are used. This effectively reverts these algorithms to MSAA quality levels. CSAA will still be applied to polygon edges, and stencil algorithms will still work with the decreased level of antialiasing applied.

At a basic level, CSAA can provide more accurate coverage information for a pixel without the storage requirements of MSAA. This not only gives gamers an option to enable higher quality AA, but the option to enable higher quality AA without a large performance impact. While the explanation of how it does this may be overly complex, here's a simple table to help convey what's going on:

General Purpose Processing with G80 CSAA Image Quality

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

111 Comments

View All Comments

Sharky974 - Thursday, November 9, 2006 - link
The new features of DX10 stuff was captivating at first, but quickly grew tiresome and needlessly complex. The IQ comparisons the same thing, some simplicity is needed here. Tell us in a nutshell what looks better and why. The mouse over pictures are well nigh useless as well, and all look like crap. Whatever needs to be changed to get the IQ point across, needs to be changed already, I'm guessing 200 zoom is a problem for starters.

Then who's bright idea was it to only test one resolution, through the whole article?

Then who's bright idea was it to dedicate just as many graphs as performance, one per game, to not only power draw, but the even more useless performance per watt? Meaning 66% of your data graphs, in an article about a paradign changin, long-anticipated, brand new GPU, are related to the power usage of the card. Are you electric workers monthly.com now?

I am very surprised more of the comments weren't negative, this review was a total failure.

And yeah, what's with all the non-standard resolution testing? All the big sites like H, Anand, and FS go round and round talking about the incredible depths they go to get the bottom of real world performance as it relates to the real world, average user, and then you guys use stupid resolution likes 1280X960 (FS uses that particular one), that nobody on earth uses, regularly! It's really, really stupid. Hell for that matter, nobody uses 1600X1200 or any non-LCD native res anymore either, yet those are all staples of any review, and so these "real world" articles aren't very real world at all. But that's somewhat of a tangent issue, and I actually dont mind a lot of different resolutions tested, just as long as the big common ones are hit (which is not always the case)
DerekWilson - Friday, November 10, 2006 - link
I'm always working on bringing down the complexity of my explainations. It's one of my weak points as a writer. It's difficult for me to take something and present it at a high level that doesn't reflect exactly what the thing is. Analogies are great -- I like them -- but I have a hard time using them because I can't ever think of analogies that are accurate enough.

Any suggestions you have for helping me explain things completely, accurately, effectively, and (especially) in the most straight forward manner possible are very welcome.

As for the IQ comparisons -- these were much more simplified than I had intended (because Anand told me we couldn't do rollovers with 40 images on one page -- it would load too slow). This is our version of putting things in a nutshell. I could get to the point faster though --

IQ:

gamma correct aa is great for edges, but it causes problems with thin lines and transparency/adaptive AA making textures look mushy. transparency/adaptive aa are great but have a large performance hit -- except in 8800 which keeps these features playable and offers higher IQ. CSAA is great at brining higher AA levels to edges, but the loss of Z data at the sub-pixel level makes it less effective at solving the thin line problem than equivalent MSAA modes. The roll overs illustrate all this.

Thats as simple as I can make it -- I hope it helps.

We did not only test at one resolution -- In every game we tested at 1600x1200, 1920x1440, and 2560x1600. In oblivion we tested at 1280x1024 as well.

All our resolution data was in the last graph on each page -- resolution scaling. There are two graphs per page on performance. As you can see, at resolutions below 2560x1600, the 8800 GTX is almost over kill.

1600x1200 is a standard LCD panel resolution and has been for quite some time. It's actually quite affordable now as well. 1280x1024 (while popular) is often too low to matter in a high end performance analysis piece (and where it did matter we tested it). 1920x1440 is a 4:3 resolution that will give 1920x1200 panel owners a very good idea of performance (differnce is usually under 5% in many games). 2560x1600 is a standard resolution for 30" LCD panels.

I can understand being upset if you missed the performance data at other resolutions, but it seems like the rest of your complaints are that we put too much data in the article. I doubt this will change in the future, but is there anything else we could have done to make this article better? We are very willing to listen to feedback, especially on articles as big as this.

Thanks,
Derek Wilson
flexy - Friday, November 10, 2006 - link
>>>
complaints are that we put too much data in the article. I doubt this will change in the future,
>>>

i doubt you can make it RIGHT for everyone...however i share the opinion w/ MOST that it is an excellent review. TOO much data is seldom bad, NOT on a site where you can expect geeks and nerds digging every bit of information :)

I remember times when reviews where FAR less detailed...and what can be better than going in-depth into AA/AF modi, showing their differnce in detail ? I think this was right on and i value such in-depth coverage !

The DX10 coverage MAYBE was "too much info" for some...but then legitimate IMHO. We're talking about totally new h/w architecture, totally new and revamped DX API and the first hardware supporting it..so it was defintly a good place to cover this.

Also...you always have the option to skip parts of a review...and the MORE detailed it is...the more it is a helpful resource (also later) to come back and read up. You dont need to comprehend any bit of information at once, but it's good to know it's there.

my $0.2
jiulemoigt - Thursday, November 9, 2006 - link
The first really big issue is that a poly can have more than one color on it, due textures, subsurface scattering, displacements, bump maps, normal maps, occulion passes, specular highlight, transparency, and a few others I can not think of off the top of my head, you could probaly find out just by asking in any cg forum like cgtalk or any dev who has worked with a profesional 3d package. That being said it may have confused people to try and explain how it really works.
The other issue is to deal with gamma correct AA, maybe my moniter is showing a way different image but I'm not really sure how you can even compare
http://images.anandtech.com/reviews/video/NVIDIA/G...">http://images.anandtech.com/reviews/video/NVIDIA/G...
http://images.anandtech.com/reviews/video/NVIDIA/G...">http://images.anandtech.com/reviews/video/NVIDIA/G...
as the light is highlighting the building from two different direction in the images, the nvidia image is coming from the left and behind the buildings and the ati image is coming from the right and about midway down the image in front of the little building,
though a question that should be asked what time of day is it supposed to be the nvidia looks like dusk, and the ati looks blown out even for high noon, though the one above seems to be the same time of day and the nvidia is blown and the ati is shadowing correctly... really odd for the images, which suggests that some other filter is causing the issue on both cards like hdr, or something else.
DerekWilson - Thursday, November 9, 2006 - link
Yes a poly can have more than one color on it, and I agree our explaination could have been better ... but it is a difficult topic to talk about.

The whole basis of multisample AA relies on the assumption that the color of a poly *within one pixel* will not vary significantly. Of course, this is not always true. This is, in fact, the reason supersample AA does make a difference -- it takes into account the actual color of the pixel at the position of the sub-pixel. This is also why its so much more expensive.

I didn't mean to imply that an entire poly must have only one color. But it's hard to talk about MSAA without pointing out the fact that the algorithm assumes one color per pixel per poly (calculated at the pixel center in most cases).

We did enable HDR, but we tried our hardest to take the screenshots at exactly the same ammount of time after loading the scene (Valve's HDR uses dynamic exposure which does change saturation over time and with light level coming into the camera).

While this would impact general image comparison, it doesn't impact the effect of gamma correct AA on thin lines (which is what we were trying to show).

Thanks for the feedback -- if there's anything you can add to help us be more specific in our description, we would certainly appreciate it. We would like to avoid simply leaving details out -- we'd like to learn how to better impart knowledge.
Nimbo - Thursday, November 9, 2006 - link
This must be the first GPU article that does not derive in a flame war between ATI and Nvidia fanboys...
flexy - Thursday, November 9, 2006 - link
i actually dont care. I look at performance and comparisons, and then chose what card to get :) Although w/ ATI for years already.

If one card, however, has some substantial advantage over another, i'll gladly point that out and also gladly debate with others why i'd prefer card X over Y.

Thats the difference between a fanboy and a enthusiast, i think. As long as i can back up statements w/ facts instead of just defeinding a "brand".

the other "problem" is really that same gen cards USUALLY are pretty much on par prformance wise...so debating/defeninf brand X over Y does make as much sense as defending ferrari over lamborghini :)

But then..if we wouldn't do that and even discuss about the "littlest" details and have lengthy conversations on forums eg. WHICH AA methods is better and why...and why 5 FPS there are better...and/or why this AF method is better than the other...it would be pretty boring.

I mean we're hardware-enthusiasts, and gfx-cards are (IMHO) the most interesting component in a PC :)
DigitalFreak - Thursday, November 9, 2006 - link
I thought we were done with the days of >$499 single GPU cards after the 7900GTX launch. Guess not.
VooDooAddict - Thursday, November 9, 2006 - link
Great article.

Now I just need to figure out if a 8800GTX will fit in a mATX UltraFly Case.
Araemo - Thursday, November 9, 2006 - link
Everyone is repeating microsoft's claim that dx10 will be Vista-only.

the inq (I know, I know....) reported http://www.theinquirer.net/default.aspx?article=35...">here that there will be a directx '9.0L' for XP that supports the new rendering features of DirectX10, but without the new virtualization/driver model improvements.

NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10

What is CSAA?

Post Your Comment

111 Comments

View All Comments

Sharky974 - Thursday, November 9, 2006 - link

DerekWilson - Friday, November 10, 2006 - link

flexy - Friday, November 10, 2006 - link

jiulemoigt - Thursday, November 9, 2006 - link

DerekWilson - Thursday, November 9, 2006 - link

Nimbo - Thursday, November 9, 2006 - link

flexy - Thursday, November 9, 2006 - link

DigitalFreak - Thursday, November 9, 2006 - link

VooDooAddict - Thursday, November 9, 2006 - link

Araemo - Thursday, November 9, 2006 - link

Log in

Don't have an account? Sign up now