NVIDIA’s GF100: Architected for Gaming

Name: NVIDIA’s GF100: Architected for Gaming
Item: NVIDIA’s GF100: Architected for Gaming
Author: Ryan Smith

by Ryan Smith on January 17, 2010 2:00 AM EST

Posted in
GPUs

115 Comments | Add A Comment

115 Comments

Better Image Quality: CSAA & TMAA

NVIDIA’s next big trick for image quality is that they’ve revised Coverage Sample Anti-Aliasing. CSAA, which was originally introduced with the G80, is a lightweight method of better determining how much of a polygon actually covers a pixel. By merely testing polygon coverage and storing the results, the ROP can get more information without the expense of fetching and storing additional color and Z data as done with a regular sample under MSAA. The quality improvement isn’t as pronounced as just using more multisamples, but coverage samples are much, much cheaper.

32x CSAA sampling pattern

For the G80 and GT200, CSAA could only test polygon edges. That’s great for resolving aliasing at polygon edges, but it doesn’t solve other kinds of aliasing. In particular, GF100 will be waging a war on billboards – flat geometry that uses textures with transparency to simulate what would otherwise require complex geometry. Fences, leaves, and patches of grass in fields are three very common uses of billboards, as they are “minor” visual effects that would be very expensive to do with real geometry, and would benefit little from the quality improvement.

Since billboards are faking geometry, regular MSAA techniques do not remove the aliasing within the billboard. To resolve that DX10 introduced alpha to coverage functionality, which allows MSAA to anti-alias the fake geometry by using the alpha mask as a coverage mask for the MSAA process. The end result of this process is that the GPU creates varying levels of transparency around the fake geometry, so that it blends better with its surroundings.

It’s a great technique, but it wasn’t done all that well by the G80 and GT200. In order to determine the level of transparency to use on an alpha to coverage sampled pixel, the anti-aliasing hardware on those GPUs used MSAA samples to test the coverage. With up to 8 samples (8xQ MSAA mode), the hardware could only compute 9 levels of transparency, which isn’t nearly enough to establish a smooth gradient. The result was that while alpha to coverage testing allowed for some anti-aliasing of billboards, the result wasn’t great. The only way to achieve really good results was to use super-sampling on billboards through Transparency Super-Sample Anti-Aliasing, which was ridiculously expensive given that when billboards are used, they usually cover most of the screen.

For GF100, NVIDIA has made two tweaks to CSAA. First, additional CSAA modes have been unlocked – GF100 can do up to 24 coverage samples per pixel as opposed 16. The second change is that the CSAA hardware can now participate in alpha to coverage testing, a natural extension of CSAA’s coverage testing capabilities. With this ability CSAA can test the coverage of the fake geometry in a billboard along with MSAA samples, allowing the anti-aliasing hardware to fetch up to 32 samples per pixel. This gives the hardware the ability to compute 33 levels of transparency, which while not perfect allows for much smoother gradients.

The example NVIDIA has given us for this is a pair of screenshots taken from a field in Age of Conan, a DX10 game. The first screenshot is from a GT200 based video card running the game with NVIDIA’s 16xQ anti-aliasing mode, which is composed of 8 MSAA samples and 8 CSAA samples. Since the GT200 can’t do alpha to coverage testing using the CSAA samples, the resulting grass blades are only blended with 9 levels of transparency based on the 8 MSAA samples, giving them a dithered look.

Age of Conan grass, GT200 16x AA

The second screenshot is from GF100 running in NVIDIA’s new 32x anti-aliasing mode, which is composed of 8 MSAA samples and 24 CSAA samples. Here the CSAA and MSAA samples can be used in alpha to coverage, giving the hardware 32 samples from which to compute 33 levels of transparency. The result is that the blades of grass are still somewhat banded, but overall much smoother than what the GT200 produced. Bear in mind that since 8x MSAA is faster on the GF100 than it was GT200, and CSAA has very little overhead in comparison (NVIDIA estimates 32x has 93% of the performance of 8xQ), the entire process should be faster on GF100 even if it were running at the same speeds as GT200. Image quality improved, and at the same time the performance improved too.

Age of Conan grass, GF100 32x AA

The ability to use CSAA on billboards left us with a question however: isn’t this what Transparency Anti-Aliasing was for? The answer as it turns out is both yes and no.

Transparency Anti-Aliasing was introduced on the G70 (GeForce 7800GTX) and was intended to help remove aliasing on billboards, exactly what NVIDIA is doing today with MSAA. The difference is that while DX10 has alpha to coverage, DX9 does not – and DX9 was all there was when G70 was released. Transparency Multi-Sample Anti-Aliasing (TMAA) as implemented today is effectively a shader replacement routine to make up for what DX9 lacks. With it, DX9 games can have alpha to coverage testing done on their billboards in spite of DX9 not having this feature, allowing for image quality improvements on games still using DX9. Under DX10 TMAA is superseded by alpha to coverage in the API, but TMAA is still alive and well due to the large number of older games using DX9 and the large number of games yet to come that will still use DX9.

Because TMAA is functionally just enabling alpha to coverage on DX9 games, all of the changes we just mentioned to the CSAA hardware filter down to TMAA. This is excellent news, as TMAA has delivered lackluster results in the past – it was better than nothing, but only Transparency Super-Sample Anti-Aliasing (TSAA) really fixed billboard aliasing, and only at a high cost. Ultimately this means that a number of cases in the past where only TSAA was suitable are suddenly opened up to using the much faster TMAA, in essence making good billboard anti-aliasing finally affordable on newer DX9 games on NVIDIA hardware.

As a consequence of this change, TMAA’s tendency to have fake geometry on billboards pop in and out of existence is also solved. Here we have a set of screenshots from Left 4 Dead 2 showcasing this in action. The GF100 with TMAA generates softer edges on the vertical bars in this picture, which is what stops the popping from the GT200.

Left 4 Dead 2: TMAA on GT200

Left 4 Dead 2: TMAA on GF100

Better Image Quality: Jittered Sampling & Faster Anti-Aliasing Applications of GF100’s Compute Hardware

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

115 Comments

View All Comments

SothemX - Tuesday, March 9, 2010 - link
WELL.lets just make it simple. I am an advid gamer...I WANT and NEED power and performance. I care only about how well my games play, how good they look, and the impression they leave with me when I am done.
I own a PS3 and am thrilled they went with Nvidia- (smart move)
I own and PC that utilizes the 9800GT OC card....getting ready to upgrade to the new GF100 when it releases, last thing that is on my mind is how the market share is, cost is not an issue.

Hard-Core gaming requires Nvidia. Entry-level baby boomers use ATI.

Nvidia is just playing with their food....its a vulgar display of power- better architecture, better programming, better gamming.
StevoLincolnite - Monday, January 18, 2010 - link
[quote]So why does NVIDIA want so much geometry performance? Because with tessellation, it allows them to take the same assets from the same games as AMD and generate something that will look better. With more geometry power, NVIDIA can use tessellation and displacement mapping to generate more complex characters, objects, and scenery than AMD can at the same level of performance.[/quote]

Might I add to that, nVidia's design is essentially "Modular" they can increase and decrease there geometry performance essentially by taking units out, this however will force programmers to program for the lowest common denominator, whilst AMD's iteration of the technology is the same across the board, so essentially you can have identical geometry regardless of the chip.
Yojimbo - Monday, January 18, 2010 - link
just say the minimum, not the lowest common denominator. it may look fancy bit it doesn't seem to fit.
chizow - Monday, January 18, 2010 - link
The real distinction here is that Nvidia's revamp of fixed-function geometry units to a programmable, scalable, and parallel Polymorph engine means their implementation won't be limited to acceleration of Tesselation in games. Their improvements will benefit every game ever made that benefits from increased geometry performance. I know people around here hate to claim "winners" and "losers" around here when AMD isn't winning, but I think its pretty obvious Nvidia's design and implementation is the better one.

Fully programmable vs. fixed-function, as long as the fully programmable option is at least as fast is always going to be the better solution. Just look at the evolution of the GPU from mostly fixed-function hardware to what it is today with GF100...a fully programmable, highly parallel, compute powerhouse.
mcnabney - Monday, January 18, 2010 - link
If Fermi was a winner Nvidia would have had samples out to be benchmarked by Anand and others a long time ago.

Fermi is designed for GPGPU with gaming secondary. Goody for them. They can probably do a lot of great things and make good money in that sector. But I don't know about gaming. Based upon the info that has gotten out and the fact that reality hasn't appeared yet I am guessing that Fermi will only be slightly faster than 5870 and Nvidia doesn't want to show their hand and let AMD respond. Remember, AMD is finishing up the next generation right now - so Fermi will likely compete against Northern Isles on AMDs 32nm process in the Fall.
dragonsqrrl - Monday, February 15, 2010 - link
Firstly, did you not read this article? The gf100 delay was due in large part to the new architecture they developed, and architectural shift ATI will eventually have to make if they wish to remain competitive. In other words, similarly to the g80 enabling GPU computing features/unified shaders for the first time on the PC, Nvidia invested huge resources in r&d and as a result had a next generation, revolutionary GPU before ATI.

Secondly, Nvidia never meant to place gaming second to GPU computing, as much as you ATI fanboys would like to troll about this subject. What they're trying to do is bring GPU computing up to the level GPU gaming is already at (in terms of accessibility, reliability, and performance). The research they're doing in this field could revolutionize research into many fields outside of gaming, including medicine, astronomy, and 'yes' film production (something I happen to deal with a LOT) while revolutionizing gaming performance and feature sets as well

Thirdly, I would be AMAZED if AMD can come out with their new architecture (their first since the hd2900) by the 3rd quarter of this year, and on the 32nm process. I just can't see them pushing GPU technology forward in the same way Nvidia has given their new business model (smaller GPUs, less focus on GPU computing), while meeting that tight deadline.
chewietobbacca - Monday, January 18, 2010 - link
"Winning" the generation? What really matters?

The bottom line, that's what. I'm sure Nvidia liked winning the generation - I'm sure they would have loved it even more if they didn't lose market share and potential profits from the fight...
realneil - Monday, January 25, 2010 - link
winning the generation is a non-prize if the mainstream buyer can only wish they had one. Make this kind of performance affordable and then you'll impress me.
chizow - Monday, January 18, 2010 - link
Yes and the bottom line showed Nvidia turning a profit despite not having the fastest part on the market.

Again, my point about G80'ing the market was more a reference to them revolutionizing GPU design again rather than simply doubling transistors and functional units or increasing clockspeeds based on past designs.

The other poster brought up performance at any given point in time, I was simply pointing out a fact being first or second to market doesn't really matter as long as you win the generation, which Nvidia has done for the last few generations since G80 and will again once GF100 launches.
sc3252 - Monday, January 18, 2010 - link
Yikes, if it is more than the original GTX 280 I would expect some loud cards. When I saw those benchmarks of farcrry 2 I was disappointed that I didn't wait, but now that it is using more than a GTX 280 I think I may have made the right choice. While right now I wan't as much performance as possible eventually my 5850 will go into a secondary pc(why I picked 5850) with a lesser power supply. I don't want to have to buy a bigger power supply just because a friend might come over and play once a week.

NVIDIA’s GF100: Architected for Gaming

Post Your Comment

115 Comments

View All Comments

SothemX - Tuesday, March 9, 2010 - link

StevoLincolnite - Monday, January 18, 2010 - link

Yojimbo - Monday, January 18, 2010 - link

chizow - Monday, January 18, 2010 - link

mcnabney - Monday, January 18, 2010 - link

dragonsqrrl - Monday, February 15, 2010 - link

chewietobbacca - Monday, January 18, 2010 - link

realneil - Monday, January 25, 2010 - link

chizow - Monday, January 18, 2010 - link

sc3252 - Monday, January 18, 2010 - link

Log in

Don't have an account? Sign up now