Lots More Compute, a Leetle More Texturing

NVIDIA's GT200 GPU has a significant increase in computational power thanks to its 240 streaming processors, up from 128 in the previous G80 design. As a result, NVIDIA's GT200 GPU showcases a tremendous increase in transistor count over its previous generation architecture (1.4 billion up from 686 million in G80).

The increase in compute power of GT200 is not mirrored however in the increase in texture processing power. On the previous page we outlined how the Texture/Processing Clusters went from two Shader Multiprocessors to three, and how there are now a total of ten TPCs in the chip up from 8 in the GeForce 8800 GTX.

In the original G80 core, used in the GeForce 8800 GTX NVIDIA's texture block looked like this:

In each block you had 4 texture address units and 8 texture filtering units.

With the move to G92, used in the GeForce 8800 GT, 8800 GTS 512 and 9800 GTX, NVIDIA doubled the number of texture address units and achieved a 1:1 ratio of address/filtering units:

With GT200 in the GeForce GTX 280/260, NVIDIA kept the address-to-filtering ratio at 1:1 but increased the ratio of SPs to texture processors:

In the previous designs you'd have 8 address and 8 filtering units per TPC (or 16 streaming processors), in the GT200 you have the same 8 address and 8 filtering units but for a larger TPC with 24 SPs.

Here's how the specs stand up across the generations:

 NVIDIA Architecture Comparison G80 G92 GT200
Streaming Processors per TPC 16 16 24
Texture Address Units per TPC 4 8 8
Texture Filtering Units per TPC 8 8 8
Total SPs 128 128 240
Total Texture Address Units 32 64 80
Total Texture Filtering Units 64 64 80

 

For a 87.5% increase in compute, there's a mere 25% increase in texture processing power. This ratio echoes what NVIDIA has been preaching for years: that games are running more complex shaders and are not as bound by texture processing as they were in years prior. If this wasn't true then we'd see a closer to 25% increase in performance of GT200 over G80 at the same clock rather than something much greater.

It also means that GT200's performance advantage over G80 or G92 based architectures (e.g. GeForce 9800 GTX) will be determined much by how computationally bound the games we're testing are.

The ratio of increase compute/texture power in the GT200 has been evident in NVIDIA architectures for years now, dating back to the ill-fated GeForce FX. NVIDIA sacrificed memory bandwidth on the GeForce FX, equipping it with a narrow 128-bit memory bus (compared to ATI's 256-bit interface on the Radeon 9700 Pro) and instead focused on building a much more powerful compute engine. Unfortunately, the bet was the wrong one to make at the time and the GeForce FX was hardly competitive (for more reasons than just a lack of memory bandwidth), but today we're dealing in a very different world. Complex shader programs run on each pixel on the screen and there's a definite need for more compute power in today's GPUs.

An Increase in Rasterization Throughput

In addition to the 25% increase in texture processing capabilities of the GT200, NVIDIA added two more ROP partitions to the GPU. While the GeForce 8800 GTX had six ROP partitions, each capable of outputting a maximum of 4 pixels per clock, the GT200 adds two more partitions.

With eight ROP partitions the GT200 can now output a maximum of 32 pixels per clock, up from 24 pixels per clock in the GeForce 8800 GTX and 9800 GTX.

The pixel blend rate on G80/G92 was half-speed, meaning that while you could output 24 pixels per clock, you could only blend 12 pixels per clock. Thanks to the 65nm shrink and redesign, GT200 can now output and blend pixels at full speed - that's 32 pixels per clock for each.

The end result is a non-linear performance improvement in everything from anti-aliasing and fire effects to shadows on GT200. It's an evolutionary change, but that really does sum up many of the enhancements of GT200 over G80/G92.

Building NVIDIA's GT200 Derek Gets Technical: 15th Century Loom Technology Makes a Comeback
Comments Locked

108 Comments

View All Comments

  • elchanan - Monday, June 30, 2008 - link

    VERY eye-opening discussion on TMT. Thank you for it.
    I've been trying to understand how GPUs can be competitive for scientific applications which require lots of inter-process communication, and "local" memory, and this appears to be an elegant solution for both.

    I can identify the weak points of it being hard to program for, as well as requiring many parallel threads to make it practical.

    But are there other weak points?
    Is there some memory-usage profile, or inter-process data bandwidth, where the trick doesn't work?
    Perhaps some other algorithm characteristic which GPUs can't address well?



  • Think - Friday, June 20, 2008 - link

    This card is a junk bond when taking into consideration cost/perfomance/power consumption.

    Reminds me of a 1976 Cadillac with a 7.7litre v8 with only 210 horsepower/3600 rpm.

    It's a PIG.
  • Margalus - Tuesday, June 24, 2008 - link

    this shows how many people don't run a dual monitor setup. I would snatch up one of these 260/280's over the gx2's anyday, gladly!!

    The performance may not be quite as good as an sli setup, but it will be much better than a single card which is what a lot of us are stuck with since you CANNOT run a dual monitor setup with sli!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  • iamgud - Wednesday, June 18, 2008 - link

    "I can has vertex data"


    LOL

    These look fine, but need to be moved to 55nm. By the time I save up for one they will .
  • calyth - Tuesday, June 17, 2008 - link

    Well what the heck are they doing with 1.4B transistors, which is becoming the largest die that TMSC has been producing so far?
    The larger the core, the more likely that an blemish would take out the core. As far as I know, didn't Phenom (4 cores on die) suffered low-yield problems?
  • gochichi - Tuesday, June 17, 2008 - link

    You know, when you consider the price and you look at the benchmarks, you start looking for features and NVIDIA just doesn't have the features going on at all.

    COD4 -- Ran perfect at 1920x1200 with last gen stuff (the HD3870 and 8800GT(S))so now the benchmarks have to be for outrageous resolutions that a handful of monitors can handle (and those customers already bought SLI or XFIRE, or GTX2 etc.)

    Crysis is a pig of a game, but it's not that great (it is a good technical preview though, I admit), and I don't think even these new cards really satisfy this system hog... so maybe this is a win, but I doubt too many people care... if you had an 8800GT or whatever, you're already played this game "well enough" on medium settings and are plenty tired of it. Though we'll surely fire it up in the future once our video cards "happen to be able to run it on high" very few people are going to go out of their way $500+ for this silly title.

    In any case, then you look at ATI, and they have the HDMI audio, the DX 10.1 support and all they have to do at this point is A) Get a good price out the door, B) Make a good profit (make them cheap, which these NVIDIA are expensive to make, no doubt) and C) handily beat the 8800GTS and many of us are going to be sold.

    These cards are what I would call a next gen preview. Some overheated prototypes of things to come. I doubt AMD will be as fast, and in fact I hope they aren't just as long as they keep the power consumption in check, the price, and the value (HDMI, DX10.1, etc).

    Today's release reminded me that NVIDIA is the underdog, they are the company that released the FX series (desperate technology, like these are). ATI has been around well before 3DFX made 3d-accelerators. They were down for a bit, and we all said it was over for ATI but this desperate release from NVIDIA makes me think that ATI is going to be quite tought to beat.

  • Brazofuerte - Tuesday, June 17, 2008 - link

    Can I go somewhere to find the exact settings used for these benchmarks? I appreciate the tech side of the write up but when it comes to determining whether I want one of these for my gaming machine (I ordered mine at midnight), I find HardOCP's numbers much more useful.
  • woofermazing - Tuesday, June 17, 2008 - link

    AMD/ATI isn't going to abandon the high end like your article implies. Their plan is to make a really good mid range chip, and ductape to cores together ala the X2's. Nvidia goes from the high-end down, ATI from the mid-end up. From the look of it, ATI might have the right idea, atleast this time around. I seriously doubt we'll see a two core version of this monster anytime soon.
  • DerekWilson - Tuesday, June 17, 2008 - link

    they are abandoning the high end single GPU ...

    we did state that they are planning on competing in the high end space with multiGPU cards, but that there are drawbacks to that.

    we'll certainly have another article coming out sometime soon that looks a little more closely at AMD's strategy.
  • KeypoX - Tuesday, June 17, 2008 - link

    i dont like it, not impressed either :(. Hopefully my 8800gt last for a while, far past this crap atleast

Log in

Don't have an account? Sign up now