The Pipeline Overview

First, let us take a second to run through NVIDIA's architecture in general. DirectX or OpenGL commands and HLSL and GLSL shaders are translated and compiled for the architectures. Commands and data are sent to the hardware where we go from numbers, instructions and artwork to a rendered frame.

The first major stop along the way is the vertex engine where geometry is processed. Vertices can be manipulated using math and texture data, and the output of the vertex pipelines is passed on down the line to the fragment (or pixel) engine. Here, every pixel on the screen is processed based on input from the vertex engine. After the pixels have been processed for all the geometry, the final scene must be assembled based on color and z data generated for each pixel. Anti-aliasing and blending are done into the framebuffer for final render output in what NVIDIA calls the render output pipeline (ROP). Now that we have a general overview, let's take a look at the G70 itself.



The G70 GPU is quite a large IC. Weighing in at 302 million transistors, we would certainly hope that NVIDIA packed enough power in the chip to match its size. The 110nm TSMC process will certainly help with die size, but that is quite a few transistors. The actual die area is only slightly greater than NV4x. In fact, NVIDIA is able to fit the same number of ICs on a single wafer.



A glance at a block diagram of the hardware gives us a first look at the methods by which NVIDIA increased performance this time around.



The first thing to notice is that we now have 8 (up from 6) vertex pipelines. We still aren't vertex processing limited (except in the workstation market), but this 33% upgrade in vertex power will help to keep the extra pixel pipelines fed as well as handle any added vertex load developers try to throw at games in the near future. There are plenty of beautiful things that can be done with vertex shaders that we aren't seeing come about in games yet like parallax and relief mapping as well as extended use of geometry instancing and vertex texturing.

Moving on to pixel pipelines, we see a 50% increase in the number of pipelines packed under the hood. Each of the 24 pixel pipes is also more powerful than those of NV4x. We will cover just why that is a little later on. For now though, it is interesting to note that we do not see an increase in the 16 ROPs. These pipelines take the output of the fragment crossbar (which aggregates all of the pixel shader output) and finalizes the rendering process. It is here where MSAA is performed, as well as the color and z/stencil operations. Not matching the number of ROPs to the number of pixel pipelines indicates that NVIDIA feels its fill rate and ability to handle current and near future resolutions is not an issue that needs to be addressed in this incarnation of the GeForce. As NVIDIA's UltraShadow II technology is driven by the hardware's ability to handle twice as many z operations per clock when a z only pass is performed, this also means that we won't see improved performance in this area.

If NVIDIA is correct in their guess (and we see no reason they should be wrong), we will see increasing amounts of processing being done per pixel in future titles. This means that each pixel will spend more time in the pixel pipeline. In order to keep the ROPs busy in light of a decreased output flow from a single pixel pipe, the ratio of pixel pipes to ROPs can be increased. This is in accord with the situation we've already described.

ROPs will need to be driven higher as common resolutions increase. This can also be mitigated by increases in frequency. We will also need more ROPs as the number pixel pipelines are able to saturate the fragment crossbar in spite of the increased time a pixel spends being shaded.

Index No More Memory Bandwidth
POST A COMMENT

127 Comments

View All Comments

  • WaltC - Thursday, June 23, 2005 - link

    I found this remark really strange and amusing:

    "It's taken three generations of revisions, augmentation, and massaging to get where we are, but the G70 is a testament to the potential the original NV30 design possessed. Using the knowledge gained from their experiences with NV3x and NV4x, the G70 is a very refined implementation of a well designed part."

    Oh, please...nV30 was so poor that it couldn't even run at its factory speeds without problems of all kinds--which is why nVidia officially cancelled nV30 production after shipping a mere few thousand units. JHH, nVidia's CEO went on record saying, "nV30 was a failure" [quote, unquote] at the time. nV30 was [i]not[/i] the foundation for nV40, let alone the G70.

    Indeed, if anything could be said to be foundational for both nV40 and G70, it would be ATi's R3x0 design of 2002. G70, imo, has far more in common with R300 than it does nV30. nV30, if you recall, was primarily a DX8 part with some hastily bolted on DX9-ish add-ons done in response to R300 (fully a DX9 part) which had been shipping for nine months prior to nV30 getting out of the door.

    In fact, ATi owes its meteoric rise to #1 in the 3d markets over the last three years precisely to the R3x0 products which served as the basis for its later R4x0 architectures. Good riddance to nV3x, I say.

    I'm always surprised at the short and selective memories displayed so often by tech writers--really makes me wonder, sometimes, whether they are writing tech copy for their readers or PR copy at the behest of specific companies, if you know what I mean.
    Reply
  • JarredWalton - Thursday, June 23, 2005 - link

    98 - As far as I know, the power was measured at the wall. We use a device called "Kill A Watt", and despite the rather lame name, it gives accurate results. It's almost impossible to measure the power draw of any single component without some very expensive equipment - you know, the stuff that AMD and Intel use for CPUs. So under load, the CPU and GPU (and RAM and chipset, probably) are using far more power than at idle. Reply
  • PrinceGaz - Thursday, June 23, 2005 - link

    I agree, starting at 1600x1200 for a card like this was a good idea. If your monitor can only do 1280x1024, you should consider getting a better one before buying a card like the 7800gtx. As a 2070/2141 owner myself, I know that a good monitor capable of high resolutions is a great investment that lasts a helluva lot longer than graphics cards, which are usually worthless after four or five years (along with most other components).

    I'm surprised that no one has moaned about the current lack of an AGP version, to go with their Athlon XP 1700+ or whatever ;)
    Reply
  • Johnmcl7 - Thursday, June 23, 2005 - link

    I think it was spot on to have 1600x1200 as the minimum resolution, given the power of these cards I think 1024x768, no AA/AF results for 3Dmark2003/2005 which have been thrown around are a complete waste of time.

    John
    Reply
  • Frallan - Thursday, June 23, 2005 - link

    Good review... And re: the NDA deadlines and the sleapless nights - don't sweat it if a few mistakes are published. The readers here have their heads screwed on the right way and will find the issues for soon enough. And for everyone that does not do 12*16 or 15*20 the answer is simple - U Don't Need The Power!! Save your hard earnt money and get a 6800gt instead. Reply
  • Calin - Thursday, June 23, 2005 - link

    Maybe if you could save the game, change the settings and reload it you could obtain images from exactly the same positions. In one of the fence images, the distance to the fence is quite a bit different in different screenshots Reply
  • Calin - Thursday, June 23, 2005 - link

    You had an 7800 SLI? I hate you all
    :p
    Reply
  • xtknight - Thursday, June 23, 2005 - link

    Edit: last post correction: actually 21-page report! Reply
  • xtknight - Thursday, June 23, 2005 - link

    Jeez...a couple spelling errors here and there...who cares? I'd like to see you type up a 12-page report and get it out the door in a couple days with no grammatical or spelling errors, especially when your main editor is gone. Remember that English study that showed the human brain interpreted words based on patterns and not spelling?

    I did read the whole review, word-for-word, with little to no trouble. There was not a SINGLE thing I had trouble comprehending. It's a better review than most sites have done which test lower resolutions. I love the non-CPU-limited benchmarks here.

    One thing that made me chuckle was "There is clearly a problem with the SLI support in Wolfenstein 3D". That MS-DOS game is in dire need of SLI. (It's abbreviated Wolfenstein: ET. Wolf3D is an oooold Nazi game.)
    Reply
  • SDA - Thursday, June 23, 2005 - link

    Derek or Jarred or Wesley or someone:

    Did you measure system power consumption as how much power the computer drew from the wall, or how much power the innards drew from the PSU?


    #95, it's a good thing you know enough about running a major hardware site to help them out with your advice! :-)
    Reply

Log in

Don't have an account? Sign up now