The R420 Vertex Pipeline

The point of the vertex pipeline in any GPU is to take geometry data, manipulate it if needed (with either fixed function processes, or a vertex shader program), and project all of the 3D data in a scene to 2 dimensions for display. It is also possible to eliminate unnecessary data from the rendering pipeline to cut out useless work (via view volume clipping and backface culling). After the vertex engine is done processing the geometry, all the 2D projected data is sent to the pixel engine for further processing (like texturing and fragment shading).

The vertex engine of R420 includes 6 total vertex pipelines (R3xx has four). This gives R420 a 50% per clock increase in peak vertex shader power per clock cycle.

Looking inside an individual vertex pipeline, not much has changed from R3xx. The vertex pipeline is laid out exactly the same, including a 128bit vector math unit, and a 32bit scalar math unit. The major upgrade R420 has had from R3xx is that it is now able to compute a SINCOS instruction in one clock cycle. Before now, if a developer requested the sine or cosine of a number in a vertex shader program, R3xx would actually compute a taylor series approximation of the answer (which takes longer to complete). The adoption of a single cycle SINCOS instruction by ATI is a very smart move, as trigonometric computations are useful in implementing functionality and effects attractive to developers. As an example, developers could manipulate the vertices of a surface with SINCOS in order to add ripples and waves (such as those seen in bodies of water). Sine and cosine computations are also useful in more basic geometric manipulation. Overall, R420 has a welcome addition in single cycle SINCOS computation.

So how does ATI's new vertex pipeline layout compare to NV40? On a major hardware "black box" level, ATI lacks the vertex texture unit featured in NV40 that's required for shader model 3.0's vertex texturing support. Vertex texturing allows developers to easily implement any effect which would benefit from allowing texture data to manipulate geometry (such as displacement mapping). The other major difference between R420 and NV40 is feature set support. As has been widely talked about, NV40 supports Shader Model 3.0 and all the bells and whistles that come along with it. R420's feature set support can be described as an extended version of Shader Model 2.0, offering a few more features above and beyond the R3xx line (including more support of longer shader programs, and more registers).

What all this boils down to is that we are only seeing something that looks like a slight massaging of the hardware from R300 to R420. We would probably see many more changes if we were able too peer deeper under the hood. From a functionality standpoint, it is sometimes hard to see where performance comes from, but (as we will see even more from the pixel pipeline) as graphics hardware evolves into multiple tiny CPUs all laid out in parallel, performance will be effected by factors traditionally only spoken of in CPU analysis and reviews. The total number of internal pipeline stages (rather than our high level functionality driven pipeline), cache latencies, the size of the internal register file, number of instructions in flight, number of cycles an instructions takes to complete, and branch prediction will all come heavily into play in the future. In fact, this review marks the true beginning of where we will be seeing these factors (rather than general functionality and "computing power") determine the performance of a generation of graphics products. But, more on this later.

After leaving the vertex engine portion of R420, data moves into the setup engine. This section of the hardware takes the 2D projected data from the vertex engine, generates triangles and point sprites (particles), and partitions the output for use in the pixel engine. The triangle output is divided up into tiles, each of which are sent to a block of four pixel pipelines (called a quad pipeline by ATI). These tiles are simply square blocks of projected pixel data, and have nothing to do with "tile based rendering" (front to back rendering of small portions of the screen at a time) as was seen in PowerVR's Kyro series of GPUs.

Now we're ready to see what happens on the per-pixel level.

The Chip The Pixel Shader Engine
Comments Locked

95 Comments

View All Comments

  • ZobarStyl - Tuesday, May 4, 2004 - link

    Jibbo I thought that the dynamic branching capability as part of PS3.0 could make rendering a scene faster because it skips rendering unneccessary pixels and thus could offer an increase in performance, albeit a small one. In an interview one of the developers of Far Cry said that there weren't many more things that PS3.0 could do that 2.0 can't, but that 3.0 can do things in a single pass that a 2.0 shader would have to do in multiple passes. The way he described it, the real pretty effects can come in later but a streamlined (read: slightly faster) shader could very well improve NV40 scores as is. This seems kind of analogous to the whole 64-bit processor ordeal going on; Intel says you don't need it, but then most articles show higher scores from A64 chips when they are in a 64 bit OS, so basically if you streamline it you can run a little bit faster than in less efficient 32-bit.

    In the end, it'll still be bitter fanboys fighting it out and buying whatever product their respective corporation feeds them, despite features or speeds or price or whatever. Personally, like I said before, I'll wait and see who really ends up earning my dollar.

    Anyway, thanks for keeping me on my toes though, jib...I can't get lazy now... =)
  • Barkuti - Tuesday, May 4, 2004 - link

    From my point of view, the 6800U is superior high end hardware. Folks, you don't need to be that intelligent to understand that if ATI needs 520 Mhz to "beat" nVidia's 400 MHz chip, as it will need to overclock proportionally to keep the same level of performance that means it will need a good bunch of extra MHz to stay at least on par on the overclocking front.

    I think the final revision of the 6800U will manage 500 MHz overclocks or around (probably more if they deliberately set the initial clock low waiting for ATI), so ATI's hardware may need around 650 Mhz, which I doubt it'll make. As for the power requirements, sure ATI is the winner, but the nVidia's card can be fed with more standard PSU's than they claim; I just think they played on the safe side.
    Oh, sure, power may be a limiting factor when oc'ing the 6800U, but the reality is that people who buy these kind of harware already has top end computer components (including the PSU), so no worries here also.

    And finally speaking, I think PS 3.0 will make some additional difference. With the possibility to somewhat enhance shader performance and the superior displacement mapping effect, it may give it the edge in at least a handful of games. We'll see.

    "Just my 2 cents"
    Cheers
  • Staples - Tuesday, May 4, 2004 - link

    Everyone be sure to check out Tom's review. Looks like the X800 did better here than it did against the 6800. I have seen other reviews and the X800 doesn't really seem as fast in comparison as it does here.

    Anyway, it is a lot faster than I though. The 6800 was impressive but it seems that the reason it does really well in some games and not so great in others is because some games have NVIDIA specific code that the 6800 takes advantage of very well.
  • UlricT - Tuesday, May 4, 2004 - link

    wtf? the GT is outperforming the Ultra in F1 Challenge?
  • jibbo - Tuesday, May 4, 2004 - link

    Agree with you all the way on the fanboys, ZobarStyl.

    Just wanted to point out that PS3.0 is not "faster" - it's simply an API. It allows longer and more complex shaders so, if anything, it's likely to be "slower." I'm guessing that designers who use PS3.0 heavily will see serious fill-rate problems on the 6800. These shaders will have potentially 65k+ instructions with dynamic branching, a minimum of 4 render targets, 32-bit FP minimum color format, etc - I seriosuly doubt any hardcore 3.0 shader programs will run faster than existing 2.0 shaders.

    Clearly a developer can have much nicer quality and exotic effects if he/she exploits these, but how many gamers will have a PS3.0 card that will run these extremely complex shaders at high resolutions and AA/AF without crawling to single-digit fps? It's my guess that it will be *at least* a year until games show serious quality differentiation between PS2.0 and PS3.0. But I have been wrong in the past...
  • T8000 - Tuesday, May 4, 2004 - link

    I think it is strange that the tested X800XT is clocked at 520 Mhz, while the 6800U, that is manufactured by the same taiwanese company and also has 16 pipelines, is set at 400 Mhz.

    This suggests a lot of headroom on the 6800U or a large overclock on the X800XT.

    Also note that the 6800U scored much better on tomshardware.com (HALO 65FPS@1600x1200), but that can also be caused by their use of the 3.2 Ghz P4 instead of a 2.2 Ghz A64.
  • ZobarStyl - Tuesday, May 4, 2004 - link

    I love seeing these fanboys announce each product as the best thing ever (same thing happened with the Prescott, Intel fanboys called it the end of AMD and the AMD guys laughed and called it a flamethrower) without actually reading the benches. NV won some, ATi won some. Most of the time it was tiny margins either way. Fanboys aside, this is gonna be a driver war nothing more. The biggest margin was on Far Cry, and I'm personally waiting on the faster PS3.0 to see what that bench really is. This is a great card but price drops and drivers updates will eventually show us the real victor.
  • jibbo - Tuesday, May 4, 2004 - link

    If I had to guess, DX10 and Longhorn will coincide with the release of new hardware from everyone.
  • Akaz1976 - Tuesday, May 4, 2004 - link

    Just thought of something. If i am reading AT review right, ATi now has milked the original Radeon9700 architecture for nearly 2 years (sure says a lot of good things about the ArtX design team).

    Anyone know when the true next gen chip can be expected?

    Akaz
  • Ilmater - Tuesday, May 4, 2004 - link

    ---------------------------------------
    Hearing about the 6850 and the other Emergency-Extreme-Whatever 6800 variants that are floating about irritates me greatly. Nvidia, you are losing your way!

    Instead of spending all that time, effort and $$ just to try to take the "speed champ" title, make your shit that much cheaper instead! If your 6800 Ultra was $425 instead of $500, that would give you a hell of alot more market share and $$ than a stupid Emergency Edition of your top end cards... We laugh at Intel for doing it, and now you're doing it too, come fricking on...
    --------------------------------------------
    This is ridiculous!! What do you think the XT Platinum Edition from ATI is? The only difference is that nVidia released first, so it's more obvious when they do it than when ATI does. I'm not really a fanboy of either, but you shouldn't dog nVidia for something that everyone does.

    Plus, if nVidia dropped their prices, ATI would do the same thing. Then nVidia would be right back where it was before, but they wouldn't be making any money on the cards.

Log in

Don't have an account? Sign up now