Beyond the Shader: Coloring Pixels

We can't ignore the last few steps in the rendering pipeline, as AMD has also updated their render back ends (analogous to NVIDIA's ROPs) which are responsible for determining the visibility of each fragment and the final color of each pixel on the screen. Beyond this, the render back ends handle compression and decompression, render to texture functionality, MRTs, framebuffer formats, and usually AA.

Once again, one of the important things to note is that R600 only has four render back ends. This means we will only see 16 pixels complete per clock at maximum, just like the R580. However, AMD has included double the Z/stencil hardware so that we can get up to 32 total Z/stencil ops out of the render back ends to improve stencil shadow operations among other things. Pure fill rate hasn't really mattered in a while, while Z/stencil capability remains important. But will only four render back ends be enough?

Efficiency has been improved on the render back ends, but with the potential of completing 64 threads per clock from the shader hardware, they will need to really work to keep up. R600 has the ability to display floating point formats from 11:11:10 up to 128-bit fp. DX10 requires eight MRTs now, and we've got them. We also get more efficient render to texture features which should help enable more complex effects to process faster.

Z/Stencil Hardware

As far as Z/stencil hardware is concerned, compression has gotten a boost up to 16:1 rather than 8:1 on the X1k series. Depth tests can be limited to a specific range programmatically which can speed up stencil shadows. Our Z-buffer is now 32-bit floating point rather than 24-bit. Hierarchical Z has been enhanced to handle some situations where it was unable to assist in rendering, and AMD has added a hierarchical stencil buffer as well.

AMD is introducing something called Re-Z which is designed to also help with the problem Early-Z has in not being able to handle shaders that update Z data. R600 is able to check Z values before a shader runs as well as after the Z value has been changed in the shader. This allows AMD to throw out pixels that are updated to be out of view without sending them to the render back ends for evaluation.

If we compare this setup with G80, we're not as worried as we are about texture capability. G80 can complete 24 pixels per clock (4 pixels per ROP with six ROPs). Like R600, G80 is capable of 2x Z-only performance with 48 Z/stencil operations per clock with AA enabled. When AA is disabled, the hardware is capable of 192 Z-only samples per clock. The ratio of running threads to ROPs is actually worse on G80 than on R600. At the same time, G80 does offer a higher overall fill rate based on potential pixels per clock and clock speed.

Memory and Data Movement CFAA and No Fixed Resolve Hardware
Comments Locked

86 Comments

View All Comments

  • yyrkoon - Tuesday, May 15, 2007 - link

    See, the problem here is: guys like you are so bent on saving that little bit of money, by buying a lesser brand name, that you do not even take the time to research your hardware. USe newegg , and read the user reviews, and if that is not enough for you, go to the countless other resources all over the internet.
  • yyrkoon - Tuesday, May 15, 2007 - link

    Blame the crappy OEM you bought the card from, not nVIdia. Get an EVGA card, and embrace a completely different aspect on video card life.

    MSI may make some decent motherboards, but their other components have serious issues.
  • LoneWolf15 - Thursday, May 17, 2007 - link

    Um, since 95% of nvidia-GPU cards on the market are the reference design, I'd say your argument here is shaky at best. EVGA and MSI both use the reference design, and it's even possible that cards with the same GPU came off the same production line at the same plant.
  • DerekWilson - Thursday, May 17, 2007 - link

    it is true that the majority of parts are based on reference designs, but that doesn't mean they all come from the same place. I'm sure some of them do, but to say that all of these guys just buy completed boards and put their name on them all the time is selling them a little short.

    at the same time, the whole argument of which manufacturer builds the better board on a board component level isn't something we can really answer.

    what we would suggest is that its better to buy from OEMs who have good customer service and long extensive warranties. this way, even if things do go wrong, there is some recourse for customers who get bad boards or have bad experiences with drivers and software.
  • cmdrdredd - Monday, May 14, 2007 - link

    you're wrong. 99% of people buying these high end cards are gaming. Those gamers demand and deserve the best possible performance. If a card that uses MORE power and costs MORE (x2900xt vs 8800gts) and performs generally the same or slower what is the point? Fact is...ATI's high end is in fact slower than mid range offerings from Nvidia and consumes alot more power. Regardless of what you think, people are buying these based on performance benchmarks in 99% of all cases.
  • AnnonymousCoward - Tuesday, May 15, 2007 - link

    No, you're wrong. Did you overlook the emphasis he put on "NOT ALWAYS"?

    You said 99% use for gaming--so there's 1%. Out of the gamers, many really want LCD scaling to work, so that games aren't stretched horribly on widescreen monitors. Some gamers would also like TVout to work.

    So he was right: faster is NOT ALWAYS better.
  • erwos - Monday, May 14, 2007 - link

    It'd be nice to get the scoop on the video decode acceleration present on these boards, and how it stocks up to the (excellent) PureVideo HD found in the 8600 series.
  • imaheadcase - Tuesday, May 15, 2007 - link

    I agree! They need to do a whole article on video acceleration on a range of cards and show the pluses and cons of each card in respective areas. A lot of people like myself like to watch videos and game on cards, but like the option open to use the advanced video features.

  • Turnip - Monday, May 14, 2007 - link

    "We certainly hope we won't see a repeat of the R600 launch when Barcelona and Agena take on Core 2 Duo/Quad in a few months...."


    Why, that's exactly what I had been thinking :)

    Phew! I made it through the whole thing though, I even read all of those awfully big words and everything! :)

    Thanks guys, another top review :)
  • Kougar - Monday, May 14, 2007 - link

    First, great article! I will be going back to reread the very indepth analysis of the hardware and features, something that keeps me a avid Anandtech reader. :)

    Since it was mentioned that overclocking will be included in a future article, I would like to suggest that if possible watercooling be factored into it. So far one review site has already done a watercooled test with a low-end watercooling setup, and without mods acheived 930MHz on the Core, which indirectly means 930MHz shaders if I understand the hardware.

    I'm sure I am not the only reader extremely interested to see if all R600 needs is a ~900-950MHz overclock to offer some solid GTX level performance... or if it would even help at all. Again thanks for the consideration, and the great article! Now off to find some Folding@Home numbers...

Log in

Don't have an account? Sign up now