The R420 Vertex Pipeline

The point of the vertex pipeline in any GPU is to take geometry data, manipulate it if needed (with either fixed function processes, or a vertex shader program), and project all of the 3D data in a scene to 2 dimensions for display. It is also possible to eliminate unnecessary data from the rendering pipeline to cut out useless work (via view volume clipping and backface culling). After the vertex engine is done processing the geometry, all the 2D projected data is sent to the pixel engine for further processing (like texturing and fragment shading).

The vertex engine of R420 includes 6 total vertex pipelines (R3xx has four). This gives R420 a 50% per clock increase in peak vertex shader power per clock cycle.

Looking inside an individual vertex pipeline, not much has changed from R3xx. The vertex pipeline is laid out exactly the same, including a 128bit vector math unit, and a 32bit scalar math unit. The major upgrade R420 has had from R3xx is that it is now able to compute a SINCOS instruction in one clock cycle. Before now, if a developer requested the sine or cosine of a number in a vertex shader program, R3xx would actually compute a taylor series approximation of the answer (which takes longer to complete). The adoption of a single cycle SINCOS instruction by ATI is a very smart move, as trigonometric computations are useful in implementing functionality and effects attractive to developers. As an example, developers could manipulate the vertices of a surface with SINCOS in order to add ripples and waves (such as those seen in bodies of water). Sine and cosine computations are also useful in more basic geometric manipulation. Overall, R420 has a welcome addition in single cycle SINCOS computation.

So how does ATI's new vertex pipeline layout compare to NV40? On a major hardware "black box" level, ATI lacks the vertex texture unit featured in NV40 that's required for shader model 3.0's vertex texturing support. Vertex texturing allows developers to easily implement any effect which would benefit from allowing texture data to manipulate geometry (such as displacement mapping). The other major difference between R420 and NV40 is feature set support. As has been widely talked about, NV40 supports Shader Model 3.0 and all the bells and whistles that come along with it. R420's feature set support can be described as an extended version of Shader Model 2.0, offering a few more features above and beyond the R3xx line (including more support of longer shader programs, and more registers).

What all this boils down to is that we are only seeing something that looks like a slight massaging of the hardware from R300 to R420. We would probably see many more changes if we were able too peer deeper under the hood. From a functionality standpoint, it is sometimes hard to see where performance comes from, but (as we will see even more from the pixel pipeline) as graphics hardware evolves into multiple tiny CPUs all laid out in parallel, performance will be effected by factors traditionally only spoken of in CPU analysis and reviews. The total number of internal pipeline stages (rather than our high level functionality driven pipeline), cache latencies, the size of the internal register file, number of instructions in flight, number of cycles an instructions takes to complete, and branch prediction will all come heavily into play in the future. In fact, this review marks the true beginning of where we will be seeing these factors (rather than general functionality and "computing power") determine the performance of a generation of graphics products. But, more on this later.

After leaving the vertex engine portion of R420, data moves into the setup engine. This section of the hardware takes the 2D projected data from the vertex engine, generates triangles and point sprites (particles), and partitions the output for use in the pixel engine. The triangle output is divided up into tiles, each of which are sent to a block of four pixel pipelines (called a quad pipeline by ATI). These tiles are simply square blocks of projected pixel data, and have nothing to do with "tile based rendering" (front to back rendering of small portions of the screen at a time) as was seen in PowerVR's Kyro series of GPUs.

Now we're ready to see what happens on the per-pixel level.

The Chip The Pixel Shader Engine
Comments Locked

95 Comments

View All Comments

  • adntaylor - Tuesday, May 4, 2004 - link

    I wish they'd also tested with an nForce3 motherboard. nVidia have managed some very interesting performance enhancements on the AGP to HT tunnel that only works with the nVidia graphics cards. That might have pushed the 6800 in front - who knows!
  • UlricT - Tuesday, May 4, 2004 - link

    Hey... Though the review rocks, you guys desperately need an editor for spelling and grammar!
  • Jeff7181 - Tuesday, May 4, 2004 - link

    This pretty much settles it. With the excellent comparision between architectures, and the benchmark scores to prove the advantages and disadvantages of the architecture... my next card will be made by ATI.
    NV40 sure has a lot of potential, one might say it's ahead of it's time, supporting SM 3.0 and being so programmable. However, with a product cycle of 6 months to a year, being ahead of it's time is more of a disadvantage in this case. People don't care what it COULD do... people care what it DOES do... and the R420 seems to do it better. I just hope my venture into the world of ATI doesn't turn into driver hell.
  • NullSubroutine - Tuesday, May 4, 2004 - link

    Im fan boy for neither company and objectively I can say the cards are equal. Some games the ATI cards are faster other games the Nvidia cards are faster. So it all depends on the game you play to which one is better and the price of the card you are looking for. (Hmm, maybe motherboard companies could make 2 AGP slots...)

    About the arguement of the PS 2.0/3.0...

    2.0 Cards will be able to play games with 3.0, they may not have full functionality or they may run it slower. This will remain to be seen till games begin to use 3.0. However...

    The one thing bad for Nvidia in my eyes is the pixel shader quality that can be seen in Farcry, whether this is a game or driver glitch it is still unknown.

    I forgot to add I like that the ATI cards use less power, I dont want to have to pay for another PSU ontop of already high prices of video cards. I would also like to see a review again a month from now when newer drivers come out to see how much things have changed.
  • l3ored - Tuesday, May 4, 2004 - link

    pschhh, did you see the unreal 3 demo? in the video i saw, it looked like it ran at about 5fps imagine running halo on a gfx 5200. however you could run it if you were to turn of halo's PS 2 effects. i think thats how it's going to be with unreal 3
  • Slaanesh - Tuesday, May 4, 2004 - link

    Since PS 3.0 is not supported by the X800 hardware, does this mean that those extremely impressive graphical features showed in the Unreal 3 tech demo (NV40 launch) and the near-to-be-released goodlooking PS 3.0 Far Cry update are both NOT playable on the X800?? This would be a huge disadvantage for ATi since alot of the upcoming topgames will support PS3.0!
  • l3ored - Tuesday, May 4, 2004 - link

    i agree phiro, personally i think im gonna get the one that hits $200 first (may be a while)
  • Phiro - Tuesday, May 4, 2004 - link

    Hearing about the 6850 and the other Emergency-Extreme-Whatever 6800 variants that are floating about irritates me greatly. Nvidia, you are losing your way!

    Instead of spending all that time, effort and $$ just to try to take the "speed champ" title, make your shit that much cheaper instead! If your 6800 Ultra was $425 instead of $500, that would give you a hell of alot more market share and $$ than a stupid Emergency Edition of your top end cards... We laugh at Intel for doing it, and now you're doing it too, come fricking on...
  • gordon151 - Tuesday, May 4, 2004 - link

    #14, I think it has more to do with the fact those OpenGL benchmarks are based on a single engine that was never fast on ATI hardware to begin with.
  • araczynski - Tuesday, May 4, 2004 - link

    12: personally i think the TNT line was better then the Voodoo line. I think they bought them out only to get rid of the competition, which was rather stupid because i think they would have died out sooner or later anyway because nvidia was just better. I would guess that perhaps they bought them out cuz that gave them patent rights and they woudln't have to worry about being sued for probably copying some of the technology :)

Log in

Don't have an account? Sign up now