The Pipeline Overview

First, let us take a second to run through NVIDIA's architecture in general. DirectX or OpenGL commands and HLSL and GLSL shaders are translated and compiled for the architectures. Commands and data are sent to the hardware where we go from numbers, instructions and artwork to a rendered frame.

The first major stop along the way is the vertex engine where geometry is processed. Vertices can be manipulated using math and texture data, and the output of the vertex pipelines is passed on down the line to the fragment (or pixel) engine. Here, every pixel on the screen is processed based on input from the vertex engine. After the pixels have been processed for all the geometry, the final scene must be assembled based on color and z data generated for each pixel. Anti-aliasing and blending are done into the framebuffer for final render output in what NVIDIA calls the render output pipeline (ROP). Now that we have a general overview, let's take a look at the G70 itself.



The G70 GPU is quite a large IC. Weighing in at 302 million transistors, we would certainly hope that NVIDIA packed enough power in the chip to match its size. The 110nm TSMC process will certainly help with die size, but that is quite a few transistors. The actual die area is only slightly greater than NV4x. In fact, NVIDIA is able to fit the same number of ICs on a single wafer.



A glance at a block diagram of the hardware gives us a first look at the methods by which NVIDIA increased performance this time around.



The first thing to notice is that we now have 8 (up from 6) vertex pipelines. We still aren't vertex processing limited (except in the workstation market), but this 33% upgrade in vertex power will help to keep the extra pixel pipelines fed as well as handle any added vertex load developers try to throw at games in the near future. There are plenty of beautiful things that can be done with vertex shaders that we aren't seeing come about in games yet like parallax and relief mapping as well as extended use of geometry instancing and vertex texturing.

Moving on to pixel pipelines, we see a 50% increase in the number of pipelines packed under the hood. Each of the 24 pixel pipes is also more powerful than those of NV4x. We will cover just why that is a little later on. For now though, it is interesting to note that we do not see an increase in the 16 ROPs. These pipelines take the output of the fragment crossbar (which aggregates all of the pixel shader output) and finalizes the rendering process. It is here where MSAA is performed, as well as the color and z/stencil operations. Not matching the number of ROPs to the number of pixel pipelines indicates that NVIDIA feels its fill rate and ability to handle current and near future resolutions is not an issue that needs to be addressed in this incarnation of the GeForce. As NVIDIA's UltraShadow II technology is driven by the hardware's ability to handle twice as many z operations per clock when a z only pass is performed, this also means that we won't see improved performance in this area.

If NVIDIA is correct in their guess (and we see no reason they should be wrong), we will see increasing amounts of processing being done per pixel in future titles. This means that each pixel will spend more time in the pixel pipeline. In order to keep the ROPs busy in light of a decreased output flow from a single pixel pipe, the ratio of pixel pipes to ROPs can be increased. This is in accord with the situation we've already described.

ROPs will need to be driven higher as common resolutions increase. This can also be mitigated by increases in frequency. We will also need more ROPs as the number pixel pipelines are able to saturate the fragment crossbar in spite of the increased time a pixel spends being shaded.

Index No More Memory Bandwidth
POST A COMMENT

127 Comments

View All Comments

  • BenSkywalker - Wednesday, June 22, 2005 - link

    Derek-

    I wanted to offer my utmost thanks for the inclusion of 2048x1536 numbers. As one of the fairly sizeable group of owners of a 2070/2141 these numbers are enormously appreciated. As everyone can see 1600x1200x4x16 really doesn't give you an idea of what high resolution performance will be like. As far as the benches getting a bit messed up- it happens. You moved quickly to rectify the situation and all is well now. Thanks again for taking the time to show us how these parts perform at real high end settings.
    Reply
  • blckgrffn - Wednesday, June 22, 2005 - link

    You're forgiven, by me anyway :) It is also the great editorial staff that makes Anandtech my homepage on every browser on all of my boxes!

    Nat
    Reply
  • yacoub - Wednesday, June 22, 2005 - link

    #72 - Totally agree. Some Rome: Total War benchs are much needed - but primarily to see how the game's battle performance with large numbers of troops varies between AMD and Intel more so than NVidia and ATi, considering the game is highly CPU-limited currently in my understanding. Reply
  • DerekWilson - Wednesday, June 22, 2005 - link

    Hi everyone,

    Thank you for your comments and feedback.

    I would like to personally apologize for the issues that we had with our benchmarks today. It wasn't just one link in the chain that caused the problems we had, but there were many factors that lead to the results we had here today.

    For those who would like an explanation of what happened to cause certain benchmark numbers not to reflect reality, we offer you the following. Some of our SLI testing was done forcing multi-GPU rendering on for tests where there was no profile. In these cases, the default mutli-GPU mode caused a performance hit rather than the increase we are used to seeing. The issue was especially bad in Guild Wars and the SLI numbers have been removed from offending graphs. Also, on one or two titles our ATI display settings were improperly configured. Our windows monitor properties, ATI "Display" tab properties, and refresh rate override settings were mismatched. This caused the card to render. Rather than push the display at a the pixel clock we expected, ATI defaulted to a "safe" mode where the game is run at the resolution requested, but only part of the display is output to the screen. This resulted in abnormally high numbers in some cases at resolutions above 1600x1200.

    For those of you who don't care about why the numbers ran the way they did, please understand we are NOT trying to hide behind our explanation as an excuse.

    We agree completely that the more important issue is not why bad numbers popped up, but that bad numbers made it into a live article. For this I can only offer my sincerest of apologies. We consider it our utmost responsibility to produce quality work on which people may rely with confidence.

    I am proud that our readership demands a quality above and beyond the norm, and I hope that that never changes. Everything in our power will be done to assure that events like this will not happen again.

    Again, I do apologize for the erroneous benchmark results that went live this morning. And thank you for requiring that we maintain the utmost integrity.

    Thanks,
    Derek Wilson
    Senior CPU & Graphics Editor
    AnandTech.com
    Reply
  • Dmitheon - Wednesday, June 22, 2005 - link

    I have to say, while I'm am extremely pleased with nVidia doing a real launch, the product leaves me scratching my head. They priced themselves into an extremely small market, and effectively made their 6800 series the second tier performance cards without really dropping the price on them. I'm not going to get one, but I do wonder how this will affect the company's bottom line. Reply
  • OrSin - Wednesday, June 22, 2005 - link

    I not tring to be a buthole but can we get a benchmark thats a RTS game. I see 10+ games benchmarks and most are FPS, the few that are not might as well be. Those RPG seems to use a silimar type engine. Reply
  • stmok - Wednesday, June 22, 2005 - link

    To CtK's question : Nope, SLI doesn't work with dual-display. (Last I checked, Nvidia got 2D working, but NO 3D)...Rumours say its a driver issue, and Nvidia is working on it.

    I don't know any more than that. I think I'd rather wait until Nvidia are actually demonstrating SLI with dual or more displays, before I lay down any money.
    Reply
  • yacoub - Wednesday, June 22, 2005 - link

    #60 - it's already to the point where it's turning people off to PC gaming, thus damaging the company's own market of buyers. It's just going to move more people to consoles, because even though PC games are often better games and much more customizable and editable, that only means so much and the trade-off versus price to play starts to become too imbalanced to ignore. Reply
  • jojo4u - Wednesday, June 22, 2005 - link

    What was regarding the AF setting? I understand that it was set to 8x when AA was set to 4x? Reply
  • Rand - Wednesday, June 22, 2005 - link

    I have to say I'm rather disappointed in the quality of the article. A number of apparently nonsensical benchmark results, with little to no analysis of most of the results.

    A complete lack of any low level theoretical performance results, no attempts to measure any improvements in efficiency of what may have caused such improvements.

    Temporal AA is only tested on one game with image quality examined in only one scene. Given how dramatically different games and genres utilize alpha textures your providing us with an awfully limited perspective of it's impact.

    Reply

Log in

Don't have an account? Sign up now