Depth and Stencil with Hyper Z HD

In accordance with their "High Definition Gaming" theme, ATI is calling the R420's method of handling depth and stencil processing Hyper Z HD. Depth and stencil processing is handled at multiple points throughout the pipeline, but grouping all this hardware into one block can make sense as each step along the way will touch the z-buffer (an on die cache of z and stencil data). We have previously covered other incarnations of Hyper Z which have done basically the same job. Here we can see where the Hyper Z HD functionality interfaces with the rendering pipeline:

The R420 architecture implements a hierarchical and early z type of occlusion culling in the rendering pipeline.

With early z, as data emerges from the geometry processing portion of the GPU, it is possible to skip further rendering large portions of the scene that are occluded (or covered) by other geometry. In this way, pixels that won't be seen don't need to run through the pixel shader pipelines and waste precious resources.

Hierarchical z indicates that large blocks of pixels are checked and thrown out if the entire tile is occluded. In R420, these tiles are the very same ones output by the geometry and setup engine. If only part of a tile is occluded, smaller subsections are checked and thrown out if possible. This processing doesn't eliminate all the occluded pixels, so pixels coming out of the pixel pipelines also need to be tested for visibility before they are drawn to the framebuffer. The real difference between R3xx and R420 is in the number of pixels that can be gracefully handled.

As rasterization draws nearer, the ATI and NVIDIA architectures begin to differentiate themselves more. Both claim that they are able to calculate up to 32 z or stencil operations per clock, but the conditions under which this is true are different. NV40 is able to push two z/stencil operations per pixel pipeline during a z or stencil only pass or in other cases when no color data is being dealt with (the color unit in NV40 can work with z/stencil data when no color computation is needed). By contrast, R420 pushes 32 z/stencil operations per clock cycle when antialiasing is enabled (one z/stencil operation can be completed per clock at the end of each pixel pipeline, and one z/stencil operation can be completed inside the multisample AA unit).

The different approaches these architectures take mean that each will excel in different ways when dealing with z or stencil data. Under R420, z/stencil speed will be maximized when antialiasing is enabled and will only see 16 z/stencil operations per clock under non-antialiased rendering. NV40 will achieve maximum z/stencil performance when a z/stencil only pass is performed regardless of the state of antialiasing.

The average case for NV40 will be closer to 16 z/stencil operations per clock, and if users don't run antialiasing on R420 they won't see more than 16 z/stencil operations per clock. Really, if everyone begins to enable antialiasing, R420 will begin to shine in real world situations, and if developers embrace z or stencil only passes (such as in Doom III), NV40 will do very well. The bottom line on which approach is better will be defined by the direction the users and developers take in the future. Will enabling antialiasing win out over running at ultra-high resolutions? Will developers mimic John Carmack and the intensive shadowing capabilities of Doom III? Both scenarios could play out simultaneously, but, really, only time will tell.

The Pixel Shader Engine A New Compression Scheme: 3Dc
Comments Locked

95 Comments

View All Comments

  • adntaylor - Tuesday, May 4, 2004 - link

    I wish they'd also tested with an nForce3 motherboard. nVidia have managed some very interesting performance enhancements on the AGP to HT tunnel that only works with the nVidia graphics cards. That might have pushed the 6800 in front - who knows!
  • UlricT - Tuesday, May 4, 2004 - link

    Hey... Though the review rocks, you guys desperately need an editor for spelling and grammar!
  • Jeff7181 - Tuesday, May 4, 2004 - link

    This pretty much settles it. With the excellent comparision between architectures, and the benchmark scores to prove the advantages and disadvantages of the architecture... my next card will be made by ATI.
    NV40 sure has a lot of potential, one might say it's ahead of it's time, supporting SM 3.0 and being so programmable. However, with a product cycle of 6 months to a year, being ahead of it's time is more of a disadvantage in this case. People don't care what it COULD do... people care what it DOES do... and the R420 seems to do it better. I just hope my venture into the world of ATI doesn't turn into driver hell.
  • NullSubroutine - Tuesday, May 4, 2004 - link

    Im fan boy for neither company and objectively I can say the cards are equal. Some games the ATI cards are faster other games the Nvidia cards are faster. So it all depends on the game you play to which one is better and the price of the card you are looking for. (Hmm, maybe motherboard companies could make 2 AGP slots...)

    About the arguement of the PS 2.0/3.0...

    2.0 Cards will be able to play games with 3.0, they may not have full functionality or they may run it slower. This will remain to be seen till games begin to use 3.0. However...

    The one thing bad for Nvidia in my eyes is the pixel shader quality that can be seen in Farcry, whether this is a game or driver glitch it is still unknown.

    I forgot to add I like that the ATI cards use less power, I dont want to have to pay for another PSU ontop of already high prices of video cards. I would also like to see a review again a month from now when newer drivers come out to see how much things have changed.
  • l3ored - Tuesday, May 4, 2004 - link

    pschhh, did you see the unreal 3 demo? in the video i saw, it looked like it ran at about 5fps imagine running halo on a gfx 5200. however you could run it if you were to turn of halo's PS 2 effects. i think thats how it's going to be with unreal 3
  • Slaanesh - Tuesday, May 4, 2004 - link

    Since PS 3.0 is not supported by the X800 hardware, does this mean that those extremely impressive graphical features showed in the Unreal 3 tech demo (NV40 launch) and the near-to-be-released goodlooking PS 3.0 Far Cry update are both NOT playable on the X800?? This would be a huge disadvantage for ATi since alot of the upcoming topgames will support PS3.0!
  • l3ored - Tuesday, May 4, 2004 - link

    i agree phiro, personally i think im gonna get the one that hits $200 first (may be a while)
  • Phiro - Tuesday, May 4, 2004 - link

    Hearing about the 6850 and the other Emergency-Extreme-Whatever 6800 variants that are floating about irritates me greatly. Nvidia, you are losing your way!

    Instead of spending all that time, effort and $$ just to try to take the "speed champ" title, make your shit that much cheaper instead! If your 6800 Ultra was $425 instead of $500, that would give you a hell of alot more market share and $$ than a stupid Emergency Edition of your top end cards... We laugh at Intel for doing it, and now you're doing it too, come fricking on...
  • gordon151 - Tuesday, May 4, 2004 - link

    #14, I think it has more to do with the fact those OpenGL benchmarks are based on a single engine that was never fast on ATI hardware to begin with.
  • araczynski - Tuesday, May 4, 2004 - link

    12: personally i think the TNT line was better then the Voodoo line. I think they bought them out only to get rid of the competition, which was rather stupid because i think they would have died out sooner or later anyway because nvidia was just better. I would guess that perhaps they bought them out cuz that gave them patent rights and they woudln't have to worry about being sued for probably copying some of the technology :)

Log in

Don't have an account? Sign up now