Texturing, Caches, and Memory

Texturing

R600 features less texture hardware than we would expect to see, though AMD stands by the argument that compute power will come out on top when it matters. At the same time, we can't compute anything if we don't have any data to work with. So let's take a look at what AMD has done with their texture units.

There are four texture units in R600, one for each SIMD unit. These units don't share resources with the hardware in the SIMD units and are independently scheduled by AMD's dispatch processor. The dispatch processor is able to determine what data will be needed for threads about to execute and can handle setting up the texture units without waiting for the SIMD unit to request data and come up empty.

Texture units on the R600 are able to make both filtered and unfiltered texture requests no matter what shader is running. Unfiltered textures are useful with non-image-based texture data like vertex textures, normal maps, and generic blocks of data. Filtered requests will generally be for image data to be used in determining the color of a pixel. R600 can address one unfiltered texture per clock per texture unit and one filtered textures per clock per texture unit. Filtered units can be used to request unfiltered textures if necessary, providing an extra four unfiltered textures in place of one filtered texture.

The unfiltered texture requests will come back through four fp32 texture samplers (one per component), while the filtered requests will return 16 data points which will be run through the texture filtering hardware resulting in four filtered texture samples. The hardware can at best produce 32 single component fp16 unfiltered results per texture unit per clock. More practically, each texture unit can produce four bilinear filtered four component fp16 samples per clock alongside four unfiltered results. For textures with fp32 components, two clocks would be required to complete a bilinear filter process, as only half the data is loaded at a time to conserve bandwidth.

This is definitely a step up for R600, as R5xx hardware doesn't have texture filtering hardware for floating point textures. All told, with each of its four texture units working, R600 can consume up to 32 unfiltered textures or 16 unfiltered textures plus 16 filtered textures (as long as they're fp16 or fewer bits and we're only using bilinear filtering).

G80 is built with four texture address units and eight texture filters per block of 16 SPs. In total, this means NVIDIA's hardware can produce 32 filtered texture samples per clock (again these are fp16 and bilinear filtered). Of course, NVIDIA is operating on twice as many threads per clock, so it is conceivable that they would benefit more from having the extra filtered data.

We will have to wait and see if AMD's approach of providing unfiltered and filtered texture access in parallel pays off. For the general case on pixel shaders, we would want to see more filtered textures per clock, but with vertex and geometry shaders coming into the mix this could be a good way to save hardware space while offering more texturing power. On a final texturing note, AMD implemented "percentage closer" filter hardware for depth stencil textures. This will allow developers to implement fast soft shadows. The details of the implementation weren't indicated though.

Next Up: NVIDIA's G80 Finally: A Design House Talks Cache Size
Comments Locked

86 Comments

View All Comments

  • wjmbsd - Monday, July 2, 2007 - link

    What is the latest on the so-called Dragonhead 2 project (aka, HD 2900 XTX)? I heard it was just for OEMs at first...anyone know if the project is still going and how the part is benchmarking with newest drivers?
  • teainthesahara - Monday, May 21, 2007 - link

    After this failure of the R600 and likely overrated(and probably late) Barcelona/Agena processors I think that Intel will finally bury AMD. Paul Ottelini is rubbing his hands with glee at the moment and rightfully so. AMD now stands for mediocrity.Oh dear what a fall from grace.... To be honest Nvidia don't have any real competition on the DX10 front at any price points.I cannot see AMD processors besting Intel's Core 2 Quad lineup in the future especially when 45nm and 32 nm become the norm and they don't have a chance in hell of beating Nvidia. Intel and Nvidia are turning the screws on Hector Ruiz.Shame AMD brought down such a great company like ATI.
  • DerekWilson - Thursday, May 24, 2007 - link

    To be fair, we really don't have any clue how these cards compete on the DX10 front as there are no final, real DX10 games on the market to test.

    We will try really hard to get a good idea of what DX10 will look like on the HD 2000 series and the GeForce 8 Series using game demos, pre-release code, and SDK samples. It won't be a real reflection of what users will experience, but we will certainly hope to get a glimpse at performance.

    It is fair to say that NVIDIA bests AMD in current game performance. But really there are so many possibilities with DX10 that we can't call it yet.
  • spinportal - Friday, May 18, 2007 - link

    From the last posting of results for the GTS 320MB round-up
    http://www.anandtech.com/video/showdoc.aspx?i=2953...">Prey @ AnandTech - 8800GTS320
    we see that the 2900XT review chart pushes the nVidia cards down about 15% across the board.
    http://www.anandtech.com/video/showdoc.aspx?i=2988...">Prey @ AnandTech - ATI2900XT
    The only difference in systems is software drivers as the cpu / mobo / mem are the same.

    Does this mean ATI should be getting a BIGGER THRASHING BEAT-DOWN than the reviewer is stating?
    400$ ATI 2900XT performing as good as a 300$ nVidia 8800 GTS 320MB?

    Its 100$ short and 6 months late along with 100W of extra fuel.

    This is not your uncle's 9700 Pro...
  • DerekWilson - Sunday, May 20, 2007 - link

    We switched Prey demos -- I updated our benchmark.

    Both numbers are accurate for the tests I ran at the time.

    Our current timedemo is more stressful and thus we see lower scores with this test.
  • Yawgm0th - Wednesday, May 16, 2007 - link

    The prices listed in this article are way off.

    Currently, 8800GTS 640MB retails for $350-380, $400+ for OC or special versions. 2900XT retails for $430+. In the article, both are listed as $400, and as such the card is given a decent review in the conclusion.

    Realistically, this card provides slightly inferior performance to the 8800GTS 640MB at a considerably higher price point -- $80-$100 more than the 8800GTS. I mean, it's not like the 8800Ultra, but for the most part this card has little use outside of AMD and/or ATI fanboys. I'd love for this card to do better as AMD needs to be competing with Nvidia and Intel right now, but I just can't see how this is even worth looking at, given current prices.
  • DerekWilson - Thursday, May 17, 2007 - link

    really, this article focuses on architechture more than product, and we went with MSRP prices...

    we will absolutly look closer at price and price/performance when we review retail products.
  • quanta - Tuesday, May 15, 2007 - link

    As I recalled, the Radeon HD 2900 only has DVI ports, but nowhere in DVI documentation specifies it can carry audio signals. Unless the card comes with adapter that accepts audio input, it seems the audio portion of R600 is rendered useless.
  • DerekWilson - Wednesday, May 16, 2007 - link

    the card does come with an adapter of sorts, but the audio input is from the dvi port.

    you can't use a standard DVI to HDMI converter for this task.

    when using AMD's HDMI converter the data sent out over the DVI port does not follow the DVI specification.

    the bottom line is that the DVI port is just a physical connector carrying data. i could take a DVI port and solder it to a stereo and use it to carry 5.1 audio if I wanted to ... wouldn't be very useful, but I could do it :-)

    While connected to a DVI device, the card operates the port according to the DVI specification. When connected to an HDMI device through the special converter (which is not technically "dvi to hdmi" -- it's amd proprietry to hdmi), the card sends out data that follows the HDMI spec.

    you can look at it another way -- when the HDMI converter is connected, just think of the dvi port as an internal connector between an I/O port and the TMDS + audio device.
  • ShaunO - Tuesday, May 15, 2007 - link

    I was at an AMD movie night last night where they discussed the technical details of the HD 2900 XT and also showed the Ruby Whiteout DX10 Demo rendered using the card. It looked amazing and I had high hopes until I checked out the benchmark scores. They're going to need more than free food and popcorn to convince me to buy an obsolete card.

    However there is room for improvement of course. Driver updates, DX10 and whatnot. The main thing for me personally will be driver updates, I will be interested to see how well the card improves over time while I save my pennies for my next new machine.

    Everyone keeps saying "DX10 performance will be better, yadda yadda" but I also want to be able to play the games I have now and older games without having to rely on DX10 games to give me better performance. Nothing like totally underperforming in DX9 games and then only being equal or slightly better in DX10 games compared to the competition. I would rather have a decent performer all-round. Even saying that we don't even know for sure if DX10 games are even going to bring any performance increases of the competition, it's all speculation right now and that's all we can do, speculate.

    Shaun.

Log in

Don't have an account? Sign up now