The R420 Vertex Pipeline

The point of the vertex pipeline in any GPU is to take geometry data, manipulate it if needed (with either fixed function processes, or a vertex shader program), and project all of the 3D data in a scene to 2 dimensions for display. It is also possible to eliminate unnecessary data from the rendering pipeline to cut out useless work (via view volume clipping and backface culling). After the vertex engine is done processing the geometry, all the 2D projected data is sent to the pixel engine for further processing (like texturing and fragment shading).

The vertex engine of R420 includes 6 total vertex pipelines (R3xx has four). This gives R420 a 50% per clock increase in peak vertex shader power per clock cycle.

Looking inside an individual vertex pipeline, not much has changed from R3xx. The vertex pipeline is laid out exactly the same, including a 128bit vector math unit, and a 32bit scalar math unit. The major upgrade R420 has had from R3xx is that it is now able to compute a SINCOS instruction in one clock cycle. Before now, if a developer requested the sine or cosine of a number in a vertex shader program, R3xx would actually compute a taylor series approximation of the answer (which takes longer to complete). The adoption of a single cycle SINCOS instruction by ATI is a very smart move, as trigonometric computations are useful in implementing functionality and effects attractive to developers. As an example, developers could manipulate the vertices of a surface with SINCOS in order to add ripples and waves (such as those seen in bodies of water). Sine and cosine computations are also useful in more basic geometric manipulation. Overall, R420 has a welcome addition in single cycle SINCOS computation.

So how does ATI's new vertex pipeline layout compare to NV40? On a major hardware "black box" level, ATI lacks the vertex texture unit featured in NV40 that's required for shader model 3.0's vertex texturing support. Vertex texturing allows developers to easily implement any effect which would benefit from allowing texture data to manipulate geometry (such as displacement mapping). The other major difference between R420 and NV40 is feature set support. As has been widely talked about, NV40 supports Shader Model 3.0 and all the bells and whistles that come along with it. R420's feature set support can be described as an extended version of Shader Model 2.0, offering a few more features above and beyond the R3xx line (including more support of longer shader programs, and more registers).

What all this boils down to is that we are only seeing something that looks like a slight massaging of the hardware from R300 to R420. We would probably see many more changes if we were able too peer deeper under the hood. From a functionality standpoint, it is sometimes hard to see where performance comes from, but (as we will see even more from the pixel pipeline) as graphics hardware evolves into multiple tiny CPUs all laid out in parallel, performance will be effected by factors traditionally only spoken of in CPU analysis and reviews. The total number of internal pipeline stages (rather than our high level functionality driven pipeline), cache latencies, the size of the internal register file, number of instructions in flight, number of cycles an instructions takes to complete, and branch prediction will all come heavily into play in the future. In fact, this review marks the true beginning of where we will be seeing these factors (rather than general functionality and "computing power") determine the performance of a generation of graphics products. But, more on this later.

After leaving the vertex engine portion of R420, data moves into the setup engine. This section of the hardware takes the 2D projected data from the vertex engine, generates triangles and point sprites (particles), and partitions the output for use in the pixel engine. The triangle output is divided up into tiles, each of which are sent to a block of four pixel pipelines (called a quad pipeline by ATI). These tiles are simply square blocks of projected pixel data, and have nothing to do with "tile based rendering" (front to back rendering of small portions of the screen at a time) as was seen in PowerVR's Kyro series of GPUs.

Now we're ready to see what happens on the per-pixel level.

The Chip The Pixel Shader Engine
Comments Locked

95 Comments

View All Comments

  • rms - Tuesday, May 4, 2004 - link

    "the near-to-be-released goodlooking PS 3.0 Far Cry update "

    When is that patch scheduled for? I recall seeing some rumour it was due in September...

    rms
  • Fr0zeN - Tuesday, May 4, 2004 - link

    Yeah I agree, the GT looks like it's gonna give the x800P a run for its money. On a side note, the differences between P and XT versions seem to be greater than r9800's, hmm.

    In the end it's the most overclockable $200 card that'll end up in my comp. There's no way I'm paying $500 for something that I can compensate for by turning the rez down to 10x7... Raw benchmarks mean nothing if it doesn't oc well!
  • Doop - Tuesday, May 4, 2004 - link

    The cards seem very close, I tend to favor nVidia now since they have superior multi monitor and professional 3D drivers and I regret buying my Fire GL X1.

    It's strange ATi didn't announce a 16 pipeline card orginally, it will be interesting to see in a month or two who actually ends up delivering cards.

    I mean if they're being made in significant quantities they'll be at your local store with a reduced 'street' price but if it's just a paper launch they'll just be at Alienware, Dell (with a new PC only) or $500 if you can find one.
  • jensend - Tuesday, May 4, 2004 - link

    #17, the Serious Engine has nothing to do with the Q3 engine; Nvidia's superior OpenGL performance is not dependent on any handful of engines' particular quirks.

    Zobar is right; contra Jibbo, the increased flexibility of PS3 means that for many 2.0 shader programs a PS3 version can achieve equivalent results with a lesser performance hit.

    As far as power goes, I'm surprised NV made such a big deal out of PSU requirements, as its new cards (except the 6800U Extremely Short Production Run Edition/6850U/Whatever they end up calling that part) compare favorably wattage-wise to the 5950U and don't pull all that much more power than the 9800XT. Both companies have made a big performance per watt leap, and it'll be interesting to see how the mid-range and value cards compare in this respect.
  • blitz - Tuesday, May 4, 2004 - link

    "Of course, we will have to wait and see what happens in that area, but depending on what the test results for our 6850 Ultra end up looking like, we may end up recommending that NVIDIA push their prices down slightly (or shift around a few specs) in order to keep the market balanced."

    It sounds as if you would be giving nvidia advice on their pricing strategy, somehow I don't think they would listen nor be influenced by your opinion. It could be better phrased that you would advise consumers to wait for prices to drop or look elsewhere for better price\performance ratio.
  • Cygni - Tuesday, May 4, 2004 - link

    Hmmmm, interesting. I really dont see where anyone can draw the conclusion that the x800 Pro is CLEARLY the winner. The 6800 GT and x800 Pro traded game wins back and forth. There doesnt seem to be any clear cut winner to me. Wolf, JediA, X2, F1C, and AQ3 all went clearly to the GT... this isnt open and shut. Alot of the other tests were split depending on resolution/AA. On the other hand, I dont think you can say that the GT is clearly better than the x800 Pro either.

    Personally, I will buy whichever one hits a reasonable price point first. $150-200. Both seem to be pretty equal, and to me, price matters far more.
  • kherman - Tuesday, May 4, 2004 - link

    BRING ON DOOM 3!!!!!!

    We all know inside that this is what ID was waiting for!
  • Diesel - Tuesday, May 4, 2004 - link

    ------------------
    I think it is strange that the tested X800XT is clocked at 520 Mhz, while the 6800U, that is manufactured by the same taiwanese company and also has 16 pipelines, is set at 400 Mhz.
    ------------------

    This could be because NV40 has 222M transistors vs. R420 at 160M transistors. I think the amount of power required and heat generated is proportional to transistor count and clock speed.
  • edub82 - Tuesday, May 4, 2004 - link

    I know this is an ATI article but that 6800 GT is looking very attractive. It beats the x800Pro on a fairly regular basis is a single slot and molex connector card and is starting at 400 and hopefully will go down a few dollars ;) in 6 months when i want to upgrade.
  • Slaanesh - Tuesday, May 4, 2004 - link

    "Clearly a developer can have much nicer quality and exotic effects if he/she exploits these, but how many gamers will have a PS3.0 card that will run these extremely complex shaders at high resolutions and AA/AF without crawling to single-digit fps? It's my guess that it will be *at least* a year until games show serious quality differentiation between PS2.0 and PS3.0. But I have been wrong in the past..."
    --------

    I dunnow.. When Morrowind got released, only he few GF3 cards on the market were able to show the cool pixel shader water effects and they did it well; at that time I was really pissed I went for the cheaper Geforce2 Ultra although it had some better benchmarks at a much lower price. I don't think I want make that mistake again and pay the same amount of money for a card that doesnt support the latest technology..

Log in

Don't have an account? Sign up now