The Pixel Pipe Performance Picture

The ultimate goal of graphics hardware is to determine the color of every visible pixel. From this unassuming end extend a vast array of operations that need to be performed to get the job done. As the demand for ever increasing graphics quality asserts itself on the industry, more and more work needs to be done in graphics hardware rather than in software on the CPU. All the work that ends up needing to be done on a per pixel basis translates to what is known as the pixel pipeline. To draw on an oft used analogy in computer engineering, this is basically the assembly line of a pixel.

One of the more fortunate aspects of computer graphics is that determining the color of one pixel can be done completely independently of any other pixels (though NVIDIA chooses to work on four pixel units internally called "quads"), so computer graphics is infinitely parallelizable. If we had enough processing power, we could actually process every single pixel on the screen at the same time. Even though going to such extremes is currently not an option (I wonder where we'll be in another decade or two), currently graphics cards are able to process multiple pixels at a time. Just how many pixels can be rendered in parallel is described by the "width" of the architecture.

Behind NV3x is a 4x2 pixel pipe (though there was some confusion over this we will get to later). This means that NV3x based cards could draw 4 pixels with 2 textures per pixel at a time (texturing a pixel involves mapping a position on a surface to (usually) a color in a texture map -- in a two texture per pixel architecture this lookup operation can be performed with two different textures at the same time). In contrast, ATI's R300 architecture is 8x1 meaning that 8 pixels with 1 texture per pixel can be drawn at a time. Unfortunately, in single texture environments, NVIDIA could still only draw four pixels per clock at a maximum.




The layout of the architecture (in as much as it appears to software) is 4 pixel shader units each with two texture units. Maximum texture fill rate is twice the maximum number of pixels per second the card can draw, which means that a lot of power is going to waste when only one texture is being used per surface.

The decision NVIDIA made for NV3x makes sense considering that many effects in fixed function and early programmable hardware (the DirectX 7 and 8 timeframe) were better suited to creating and applying multiple textures to a surface (this is called multitextureing for obvious reasons). Implementing Light maps, environment maps and cube maps (reflections), and bump maps, in addition to the traditional color map, are all examples of ways developers can exploit multitextureing to add realism to their environment.

Most multitextureing effects can be done using vertex and pixel shader programs. Shader programs are able to offer a higher degree of control to developers and artists and can eliminate the need for multitextureing at the same time. While this is fortunate for developers, artists, end users, and ATI, the NV3x architecture is not suited to the current climate and thus its real world performance falls much shorter than its theoretical max than NVIDIA would like.

When moving to NV40 from the NV3x architecture, more of a focus was placed on single texturing while enhancing the internal performance of the vertex and pixel shaders. This was done by essentially quadrupling the number of pixel shader pipelines while only doubling the capacity of the GPU to handle textures (making it a 16x1 architecture). On NV40, maximum pixel and texture fill rates are the same leading to a more balanced use of hardware in real world conditions. When handling multitextureing, NV40 can also run in an 8x2 mode where half of the pipeline is dedicated to each texture. In this multitexture mode, NV40's texture fill rate is the same as its single texture mode while its pixel fill rate is halved.



Aside from color and texturing, 3D graphics cards also need to deal with the third dimension: depth "into" the screen. This depth, or z, value keeps track of how near or far a pixel on a surface is from the viewer. If at any point in the pipeline something is determined to be "behind" another thing, it can be thrown out or turned off (this is known as occlusion culling). One of the best ways to enhance performance in 3D graphics is to do less work, and the key is knowing what not to do. Calculating and tracking z values is a key part of eliminating work. NVIDIA's architectures can handle stenciling in the same bit of hardware that handles z operations. Stenciling is difficult to explain, but it may be easier to grasp by looking at a simplified explanation of a common application: shadowing. Shadows can be implemented by "rendering" z values as viewed from a light source. Anything that gets turned off (is behind something) from the perspective of the light source is shadowed, and can remain off when rendering the scene from the perspective of the viewer (who will see a shadow due to the light where pixels were turned off). Doing "good" shadowing is much more complicated than this, but that's the general idea.

In both NV3x and NV40 architectures, z and color can be calculated per pixel at the same time. In addition, rather than coloring a pixel, a z or stencil operation can be performed in the color unit. This allows NV3x to perform 8 z or stencil ops per clock and NV40 to perform 32 z or stencil ops per clock. NVIDIA has started to call this "8x0" and "32x0", respectively, as no new pixels are drawn. This mode is very useful if a z only pass is performed first, or if stencil shadows are used (as is the case with Doom 3).

Of course, there is more to graphics performance than how many pixel pipes are under the hood. There were other reasons NV3x performance wasn't what it could have been, not the least of which was the internal layout of the vertex and pixel shaders.
Index Shedding Light On Shader Performance
Comments Locked

18 Comments

View All Comments

  • PrinceGaz - Tuesday, April 20, 2004 - link

    Maybe they've been saving this article up until after the NV40 was launched? I must admit I subconciously substituted postmortem myself when reading it.

    Well put together article Derek, it clearly explains some of the problems that led to the entire GeForce FX (NV3x) range of cards always being in second-place behind equivalent R3xx cards (the NV34 core FX5200 had no R3xx based competitor).

    I'm surprised no mention was made of the FP16/FP32 performance issue with the NV3x core and the consequences that is having on image quality, and also how its inferior ordered-grid anti-aliasing couldn't compare with the rotated-grid used in the R3xx.
  • Phiro - Tuesday, April 20, 2004 - link

    wahahahaa a "moratorium" hahaha

    You guys need to quit looking up big words that sound like the word you really mean to use, and actually grow some vocabulary.

    How about "postmortem"? Do you even know what the word "moratorium" means?
  • Cybercat - Tuesday, April 20, 2004 - link

    Nice work, you made it user friendly, so even I can understand it! lol, when it comes to explaining graphics architectures to me, that's no small feat.
  • newuser12 - Monday, April 19, 2004 - link

    this article is depressing, being as I have an mx-440....... :(
  • ZobarStyl - Monday, April 19, 2004 - link

    By this hardware I meant the x800 sorry.
  • ZobarStyl - Monday, April 19, 2004 - link

    Thank you, Derek, for politely avoiding going into the current speculation war about the new hardware...frankly no one but ATI knows how this hardware is going to hold up against the 6800, and it's sad that people on both sides have already pronounced winners and losers. A good article; it sheds light on these dark times for nVidia. I'm no fanboy...frankly I'm most swayed by their quality driver support rather than the sheer speed factor.
  • Modal - Monday, April 19, 2004 - link

    Thanks for this elucidating article; I find articles of this type (that is "this is what your hardware is doing and why") very interesting.
  • Regs - Monday, April 19, 2004 - link

    lol. I just find it funny you wait until now to write this article. But then again, you would likely have a better understanding when we could compare it to the NV40.

Log in

Don't have an account? Sign up now