Lots More Compute, a Leetle More Texturing

NVIDIA's GT200 GPU has a significant increase in computational power thanks to its 240 streaming processors, up from 128 in the previous G80 design. As a result, NVIDIA's GT200 GPU showcases a tremendous increase in transistor count over its previous generation architecture (1.4 billion up from 686 million in G80).

The increase in compute power of GT200 is not mirrored however in the increase in texture processing power. On the previous page we outlined how the Texture/Processing Clusters went from two Shader Multiprocessors to three, and how there are now a total of ten TPCs in the chip up from 8 in the GeForce 8800 GTX.

In the original G80 core, used in the GeForce 8800 GTX NVIDIA's texture block looked like this:

In each block you had 4 texture address units and 8 texture filtering units.

With the move to G92, used in the GeForce 8800 GT, 8800 GTS 512 and 9800 GTX, NVIDIA doubled the number of texture address units and achieved a 1:1 ratio of address/filtering units:

With GT200 in the GeForce GTX 280/260, NVIDIA kept the address-to-filtering ratio at 1:1 but increased the ratio of SPs to texture processors:

In the previous designs you'd have 8 address and 8 filtering units per TPC (or 16 streaming processors), in the GT200 you have the same 8 address and 8 filtering units but for a larger TPC with 24 SPs.

Here's how the specs stand up across the generations:

 NVIDIA Architecture Comparison G80 G92 GT200
Streaming Processors per TPC 16 16 24
Texture Address Units per TPC 4 8 8
Texture Filtering Units per TPC 8 8 8
Total SPs 128 128 240
Total Texture Address Units 32 64 80
Total Texture Filtering Units 64 64 80

 

For a 87.5% increase in compute, there's a mere 25% increase in texture processing power. This ratio echoes what NVIDIA has been preaching for years: that games are running more complex shaders and are not as bound by texture processing as they were in years prior. If this wasn't true then we'd see a closer to 25% increase in performance of GT200 over G80 at the same clock rather than something much greater.

It also means that GT200's performance advantage over G80 or G92 based architectures (e.g. GeForce 9800 GTX) will be determined much by how computationally bound the games we're testing are.

The ratio of increase compute/texture power in the GT200 has been evident in NVIDIA architectures for years now, dating back to the ill-fated GeForce FX. NVIDIA sacrificed memory bandwidth on the GeForce FX, equipping it with a narrow 128-bit memory bus (compared to ATI's 256-bit interface on the Radeon 9700 Pro) and instead focused on building a much more powerful compute engine. Unfortunately, the bet was the wrong one to make at the time and the GeForce FX was hardly competitive (for more reasons than just a lack of memory bandwidth), but today we're dealing in a very different world. Complex shader programs run on each pixel on the screen and there's a definite need for more compute power in today's GPUs.

An Increase in Rasterization Throughput

In addition to the 25% increase in texture processing capabilities of the GT200, NVIDIA added two more ROP partitions to the GPU. While the GeForce 8800 GTX had six ROP partitions, each capable of outputting a maximum of 4 pixels per clock, the GT200 adds two more partitions.

With eight ROP partitions the GT200 can now output a maximum of 32 pixels per clock, up from 24 pixels per clock in the GeForce 8800 GTX and 9800 GTX.

The pixel blend rate on G80/G92 was half-speed, meaning that while you could output 24 pixels per clock, you could only blend 12 pixels per clock. Thanks to the 65nm shrink and redesign, GT200 can now output and blend pixels at full speed - that's 32 pixels per clock for each.

The end result is a non-linear performance improvement in everything from anti-aliasing and fire effects to shadows on GT200. It's an evolutionary change, but that really does sum up many of the enhancements of GT200 over G80/G92.

Building NVIDIA's GT200 Derek Gets Technical: 15th Century Loom Technology Makes a Comeback
Comments Locked

108 Comments

View All Comments

  • strikeback03 - Tuesday, June 17, 2008 - link

    So are you blaming nvidia for games that require powerful hardware, or just for enabling developers to write those games by making powerful hardware?
  • InquiryZ - Monday, June 16, 2008 - link

    Was AC tested with or without the patch? (the patch removes a lot of performance on the ATi cards..)
  • DerekWilson - Monday, June 16, 2008 - link

    the patch only affects performance with aa enabled.

    since the game only allows aa at up to 1680x1050, we tested without aa.

    we also tested with the patch installed.
  • PrinceGaz - Monday, June 16, 2008 - link

    nVidia say they're not saying exactly what GT200 can and cannot do to prevent AMD bribing game developers to use DX10.1 features GT200 does not support, but you mention that

    "It's useful to point out that, in spite of the fact that NVIDIA doesn't support DX10.1 and DX10 offers no caps bits, NVIDIA does enable developers to query their driver on support for a feature. This is how they can support multisample readback and any other DX10.1 feature that they chose to expose in this manner."

    Now whilst it is driver dependent and additional features could be enabled (or disabled) in later drivers, it seems to me that all AMD or anyone else would have to do is go through the whole list of DX10.1 features and query the driver about each one. Voila- an accurate list of what is and isn't supported, at least with that driver.
  • DerekWilson - Monday, June 16, 2008 - link

    the problem is that they don't expose all the features they are capable of supporting. they won't mind if AMD gets some devs on board with something that they don't currently support but that they can enable support for if they need to.

    what they don't want is for AMD to find out what they are incapable of supporting in any reasonable way. they don't want AMD to know what they won't be able to expose via the driver to developers.

    knowing what they already expose to devs is one thing, but knowing what the hardware can actually do is not something nvidia is interested in shareing.
  • emboss - Monday, June 16, 2008 - link

    Well, yes and no. The G80 is capable of more than what is implemented in the driver, and also some of the implemented driver features are actually not natively implemented in the hardware. I assume the GT200 is the same. They only implement the bits that are actually being used, and emulate the operations that are not natively supported. If a game comes along that needs a particular feature, and the game is high-profile enough for NV to care, NV will implement it in the driver (either in hardware if it is capable of it, or emulated if it's not).

    What they don't want to say is what the hardware is actually capable of. Of course, ATI can still get a reasonably good idea by looking at the pattern of performance anomalies and deducing which operations are emulated, so it's still just stupid paranoia that hurts developers.
  • B3an - Monday, June 16, 2008 - link

    @ Derek - I'd really appreciate this if you could reply...

    Games are tested at 2560x1600 in these benchmarks with the 9800GX2, and some games are even playable.
    Now when i do this with my GX2 at this res, a lot of the time even the menu screen is a slide show (often under 10FPS). Epecially if any AA is enabled. Some games that do this are Crysis, GRID, UT3, Mass Effect, ET:QW... with older games it does not happen, only newer stuff with higher res textures.

    This never happened on my 8800GTX to the same extent. So i put it down to the GX2 not having enough memory bandwidth and enough usable VRAM for such high resolution.

    So could you explain how the GX2 is getting 64FPS @ 2560x1600 with 4x AA with ET:Quake Wars? Aswell as other games at that res + AA.
  • DerekWilson - Monday, June 16, 2008 - link

    i really haven't noticed the same issue with menu screens ... except in black and white 2 ... that one sucked and i remember complaining about it.

    to be fair i haven't tested this with mass effect, grid, or ut3.

    as for menu screens, they tend to be less memory intensive than the game itself. i'm really not sure why it happens when it does, but it does suck.

    i'll ask around and see if i can get an explaination of this problem and if i can i'll write about why and when it will happen.

    thanks,
    Derek
  • larson0699 - Monday, June 16, 2008 - link

    "Massiveness" and "aggressiveness"?

    I know the article is aimed to hit as hard as the product it's introducing us to, but put a little English into your English.

    "Mass" and "aggression".

    FWIW, the GTX's numbers are unreal. I can appreciate the power-saving capabilities during lesser load, but I agree, GT200 should've been 55nm. (6pin+8pin? There's a motherboard under that SLI setup??)
  • jobrien2001 - Monday, June 16, 2008 - link

    Seems Nvidia finally dropped the ball.

    -Power consumption and the price tag are really bad.
    -Performance isnt as expected.
    -Huge Die

    Im gonna wait for a die shrink or buy an ATI. The 4870 with ddr5 seems promising from the early benchmarks... and for $350? who in their right mind wouldnt buy one.

Log in

Don't have an account? Sign up now