Depth and Stencil with Hyper Z HD

In accordance with their "High Definition Gaming" theme, ATI is calling the R420's method of handling depth and stencil processing Hyper Z HD. Depth and stencil processing is handled at multiple points throughout the pipeline, but grouping all this hardware into one block can make sense as each step along the way will touch the z-buffer (an on die cache of z and stencil data). We have previously covered other incarnations of Hyper Z which have done basically the same job. Here we can see where the Hyper Z HD functionality interfaces with the rendering pipeline:

The R420 architecture implements a hierarchical and early z type of occlusion culling in the rendering pipeline.

With early z, as data emerges from the geometry processing portion of the GPU, it is possible to skip further rendering large portions of the scene that are occluded (or covered) by other geometry. In this way, pixels that won't be seen don't need to run through the pixel shader pipelines and waste precious resources.

Hierarchical z indicates that large blocks of pixels are checked and thrown out if the entire tile is occluded. In R420, these tiles are the very same ones output by the geometry and setup engine. If only part of a tile is occluded, smaller subsections are checked and thrown out if possible. This processing doesn't eliminate all the occluded pixels, so pixels coming out of the pixel pipelines also need to be tested for visibility before they are drawn to the framebuffer. The real difference between R3xx and R420 is in the number of pixels that can be gracefully handled.

As rasterization draws nearer, the ATI and NVIDIA architectures begin to differentiate themselves more. Both claim that they are able to calculate up to 32 z or stencil operations per clock, but the conditions under which this is true are different. NV40 is able to push two z/stencil operations per pixel pipeline during a z or stencil only pass or in other cases when no color data is being dealt with (the color unit in NV40 can work with z/stencil data when no color computation is needed). By contrast, R420 pushes 32 z/stencil operations per clock cycle when antialiasing is enabled (one z/stencil operation can be completed per clock at the end of each pixel pipeline, and one z/stencil operation can be completed inside the multisample AA unit).

The different approaches these architectures take mean that each will excel in different ways when dealing with z or stencil data. Under R420, z/stencil speed will be maximized when antialiasing is enabled and will only see 16 z/stencil operations per clock under non-antialiased rendering. NV40 will achieve maximum z/stencil performance when a z/stencil only pass is performed regardless of the state of antialiasing.

The average case for NV40 will be closer to 16 z/stencil operations per clock, and if users don't run antialiasing on R420 they won't see more than 16 z/stencil operations per clock. Really, if everyone begins to enable antialiasing, R420 will begin to shine in real world situations, and if developers embrace z or stencil only passes (such as in Doom III), NV40 will do very well. The bottom line on which approach is better will be defined by the direction the users and developers take in the future. Will enabling antialiasing win out over running at ultra-high resolutions? Will developers mimic John Carmack and the intensive shadowing capabilities of Doom III? Both scenarios could play out simultaneously, but, really, only time will tell.

The Pixel Shader Engine A New Compression Scheme: 3Dc
Comments Locked

95 Comments

View All Comments

  • rms - Tuesday, May 4, 2004 - link

    "the near-to-be-released goodlooking PS 3.0 Far Cry update "

    When is that patch scheduled for? I recall seeing some rumour it was due in September...

    rms
  • Fr0zeN - Tuesday, May 4, 2004 - link

    Yeah I agree, the GT looks like it's gonna give the x800P a run for its money. On a side note, the differences between P and XT versions seem to be greater than r9800's, hmm.

    In the end it's the most overclockable $200 card that'll end up in my comp. There's no way I'm paying $500 for something that I can compensate for by turning the rez down to 10x7... Raw benchmarks mean nothing if it doesn't oc well!
  • Doop - Tuesday, May 4, 2004 - link

    The cards seem very close, I tend to favor nVidia now since they have superior multi monitor and professional 3D drivers and I regret buying my Fire GL X1.

    It's strange ATi didn't announce a 16 pipeline card orginally, it will be interesting to see in a month or two who actually ends up delivering cards.

    I mean if they're being made in significant quantities they'll be at your local store with a reduced 'street' price but if it's just a paper launch they'll just be at Alienware, Dell (with a new PC only) or $500 if you can find one.
  • jensend - Tuesday, May 4, 2004 - link

    #17, the Serious Engine has nothing to do with the Q3 engine; Nvidia's superior OpenGL performance is not dependent on any handful of engines' particular quirks.

    Zobar is right; contra Jibbo, the increased flexibility of PS3 means that for many 2.0 shader programs a PS3 version can achieve equivalent results with a lesser performance hit.

    As far as power goes, I'm surprised NV made such a big deal out of PSU requirements, as its new cards (except the 6800U Extremely Short Production Run Edition/6850U/Whatever they end up calling that part) compare favorably wattage-wise to the 5950U and don't pull all that much more power than the 9800XT. Both companies have made a big performance per watt leap, and it'll be interesting to see how the mid-range and value cards compare in this respect.
  • blitz - Tuesday, May 4, 2004 - link

    "Of course, we will have to wait and see what happens in that area, but depending on what the test results for our 6850 Ultra end up looking like, we may end up recommending that NVIDIA push their prices down slightly (or shift around a few specs) in order to keep the market balanced."

    It sounds as if you would be giving nvidia advice on their pricing strategy, somehow I don't think they would listen nor be influenced by your opinion. It could be better phrased that you would advise consumers to wait for prices to drop or look elsewhere for better price\performance ratio.
  • Cygni - Tuesday, May 4, 2004 - link

    Hmmmm, interesting. I really dont see where anyone can draw the conclusion that the x800 Pro is CLEARLY the winner. The 6800 GT and x800 Pro traded game wins back and forth. There doesnt seem to be any clear cut winner to me. Wolf, JediA, X2, F1C, and AQ3 all went clearly to the GT... this isnt open and shut. Alot of the other tests were split depending on resolution/AA. On the other hand, I dont think you can say that the GT is clearly better than the x800 Pro either.

    Personally, I will buy whichever one hits a reasonable price point first. $150-200. Both seem to be pretty equal, and to me, price matters far more.
  • kherman - Tuesday, May 4, 2004 - link

    BRING ON DOOM 3!!!!!!

    We all know inside that this is what ID was waiting for!
  • Diesel - Tuesday, May 4, 2004 - link

    ------------------
    I think it is strange that the tested X800XT is clocked at 520 Mhz, while the 6800U, that is manufactured by the same taiwanese company and also has 16 pipelines, is set at 400 Mhz.
    ------------------

    This could be because NV40 has 222M transistors vs. R420 at 160M transistors. I think the amount of power required and heat generated is proportional to transistor count and clock speed.
  • edub82 - Tuesday, May 4, 2004 - link

    I know this is an ATI article but that 6800 GT is looking very attractive. It beats the x800Pro on a fairly regular basis is a single slot and molex connector card and is starting at 400 and hopefully will go down a few dollars ;) in 6 months when i want to upgrade.
  • Slaanesh - Tuesday, May 4, 2004 - link

    "Clearly a developer can have much nicer quality and exotic effects if he/she exploits these, but how many gamers will have a PS3.0 card that will run these extremely complex shaders at high resolutions and AA/AF without crawling to single-digit fps? It's my guess that it will be *at least* a year until games show serious quality differentiation between PS2.0 and PS3.0. But I have been wrong in the past..."
    --------

    I dunnow.. When Morrowind got released, only he few GF3 cards on the market were able to show the cool pixel shader water effects and they did it well; at that time I was really pissed I went for the cheaper Geforce2 Ultra although it had some better benchmarks at a much lower price. I don't think I want make that mistake again and pay the same amount of money for a card that doesnt support the latest technology..

Log in

Don't have an account? Sign up now