Beyond the Shader: Coloring Pixels

We can't ignore the last few steps in the rendering pipeline, as AMD has also updated their render back ends (analogous to NVIDIA's ROPs) which are responsible for determining the visibility of each fragment and the final color of each pixel on the screen. Beyond this, the render back ends handle compression and decompression, render to texture functionality, MRTs, framebuffer formats, and usually AA.

Once again, one of the important things to note is that R600 only has four render back ends. This means we will only see 16 pixels complete per clock at maximum, just like the R580. However, AMD has included double the Z/stencil hardware so that we can get up to 32 total Z/stencil ops out of the render back ends to improve stencil shadow operations among other things. Pure fill rate hasn't really mattered in a while, while Z/stencil capability remains important. But will only four render back ends be enough?

Efficiency has been improved on the render back ends, but with the potential of completing 64 threads per clock from the shader hardware, they will need to really work to keep up. R600 has the ability to display floating point formats from 11:11:10 up to 128-bit fp. DX10 requires eight MRTs now, and we've got them. We also get more efficient render to texture features which should help enable more complex effects to process faster.

Z/Stencil Hardware

As far as Z/stencil hardware is concerned, compression has gotten a boost up to 16:1 rather than 8:1 on the X1k series. Depth tests can be limited to a specific range programmatically which can speed up stencil shadows. Our Z-buffer is now 32-bit floating point rather than 24-bit. Hierarchical Z has been enhanced to handle some situations where it was unable to assist in rendering, and AMD has added a hierarchical stencil buffer as well.

AMD is introducing something called Re-Z which is designed to also help with the problem Early-Z has in not being able to handle shaders that update Z data. R600 is able to check Z values before a shader runs as well as after the Z value has been changed in the shader. This allows AMD to throw out pixels that are updated to be out of view without sending them to the render back ends for evaluation.

If we compare this setup with G80, we're not as worried as we are about texture capability. G80 can complete 24 pixels per clock (4 pixels per ROP with six ROPs). Like R600, G80 is capable of 2x Z-only performance with 48 Z/stencil operations per clock with AA enabled. When AA is disabled, the hardware is capable of 192 Z-only samples per clock. The ratio of running threads to ROPs is actually worse on G80 than on R600. At the same time, G80 does offer a higher overall fill rate based on potential pixels per clock and clock speed.

Memory and Data Movement CFAA and No Fixed Resolve Hardware
Comments Locked

86 Comments

View All Comments

  • mostlyprudent - Monday, May 14, 2007 - link

    Frankly, neither the NVIDIA nor the AMD part at this price point is all that impressive an upgrade from the prior generations. We keep hearing that we will have to wait for DX10 titles to know the real performance of these cards, but I suspect that by the time DX10 titles are on the shelves we will have at least product line refreshes by both companies. Does anyone else feel like the graphics card industry is jerking our chains?
  • johnsonx - Monday, May 14, 2007 - link

    It seems pretty obvious that AMD needs a Radeon HD2900Pro to fill in the gap between the 2900XT and 2600XT. Use R600 silicon, give it 256Mb RAM with a 256-bit memory bus. Lower the clocks 15% so that power consumption will be lower, and so that chips that don't bin at full XT speeds can be used. Price at $250-$300. It would own the upper-midrange segment over the 8600GTS, and eat into the 8800GTS 320's lunch as well.
  • GlassHouse69 - Monday, May 14, 2007 - link

    If I know this, and YOU know this.... wouldnt anandtech? I see money under the table or utter stupidity at work at anand. I mean, I know that the .01+ version does a lot better in benches as well as the higher res with aa/af on sometimes get BETTER framerates than lower res, no aa/af settings. This is a driver thing. If I know this, you know this, anand must. I would rather admit to being corrupt rather than that stupid.

  • GlassHouse69 - Monday, May 14, 2007 - link

    wrong section. dt is doing that today it seems to a few people
  • xfiver - Monday, May 14, 2007 - link

    Hi, thank you for a really in depth review. While reading other 'earlier' reviews I remember a site using Catalyst 8.38 and reported performance improvements upto 14% from 8.37. Look forward to Anandtech's view on this.
  • xfiver - Monday, May 14, 2007 - link

    My apologies it was VR zone and 8.36 to 8.37 (not 8.38)
  • GlassHouse69 - Monday, May 14, 2007 - link

    If I know this, and YOU know this.... wouldnt anandtech? I see money under the table or utter stupidity at work at anand. I mean, I know that the .01+ version does a lot better in benches as well as the higher res with aa/af on sometimes get BETTER framerates than lower res, no aa/af settings. This is a driver thing. If I know this, you know this, anand must. I would rather admit to being corrupt rather than that stupid.
  • Gary Key - Tuesday, May 15, 2007 - link

    quote:

    If I know this, and YOU know this.... wouldnt anandtech? I see money under the table or utter stupidity at work at anand. I mean, I know that the .01+ version does a lot better in benches as well as the higher res with aa/af on sometimes get BETTER framerates than lower res, no aa/af settings. This is a driver thing. If I know this, you know this, anand must. I would rather admit to being corrupt rather than that stupid.


    I have worked extensively with four 8.37 releases and now the 8.38 release for the upcoming P35 release article. The 8.37.4.2 alpha driver had the top performance in SM3.0 heavy apps but was not very stable with numerous games, especially under Vista. The released 8.37.4.3 driver on AMD's website is the most stable driver to date and has decent performance but nothing near the alpha 8.37 or beta 8.38. The 8.38s offer great benchmark performance in the 3DMarks, several games, and a couple of DX10 benchmarks from AMD.

    However, the 8.38s more or less broke CrossFire, OpenGL, and video acceleration in Vista depending upon the app and IQ is not always perfect. While there is a great deal of promise in their performance and we see the potential, they are still Beta drivers that have a long ways to go in certain areas before their final release date of 5/23 (internal target).

    That said, would you rather see impressive results in 3DMarks or have someone tell you the truth about the development progress or lack of it with the drivers. As much as I would like to see this card's performance improve immediately, it is what it is at this time with the released drivers. AMD/ATI will improve the performance of the card with better drivers but until they are released our only choice is to go with what they sent. We said the same thing about NVIDIA's early driver issues with the G80 so there are not any fanboys or people taking money under the table around here. You can put all the lipstick on a pig you want, but in the end, you still have a pig. ;-)
  • Anand Lal Shimpi - Monday, May 14, 2007 - link

    There's nothing sinister going on, ATI gave us 8.37 to test with and told us to use it. We got 8.38 today and are currently testing it for a follow-up.

    Take care,
    Anand
  • GlassHouse69 - Monday, May 14, 2007 - link

    wow dood. you replied!

    Yes, I have been wondering about the ethics of your group here for about a year now. I felt this sorta slick leaning towards and masking thing goign on. Nice to see there is not.

    Thanks for the 1000's of articles and tests!
    -Mr. Glass

Log in

Don't have an account? Sign up now