Compute & Tessellation

Moving on from our look at gaming performance, we have our customary look at compute performance, bundled with a look at theoretical tessellation performance.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.

Civilization V’s compute benchmark cares little for memory bandwidth or the architectural differences between Barts and Juniper; SPs and clockspeed are what matter here. As a result the 6790 narrowly averts a tie with the 5770 of all things, and the performance relative to NVIDIA’s cards isn’t any better.

Our second GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.

SmallLuxGPU ends up being one of the best showings for the 6790, as while it’s obviously compute bound, it definitely benefits from the architectural differences between Barts and Juniper. The 6790’s performance relative to the 6850 almost identically matches the theoretical performance difference, and in spite of the 5770 having a slight theoretical advantage of its own, the 6790 easily beats the 5770 by 16%. This opens up a small window for the 6790 as a lower-priced GPGPU product, but it’s a very small window – the program would need to excel on AMD cards and on Barts over Juniper. Otherwise we see SLG where the 6790 does well versus the 5770, but very poorly compared to NVIDIA’s cards.

At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. Barts’ tessellation improvements should give it an edge over the 5770, but it still has to contend with the 6800 series.

At this point in time none of our games closely match our tessellation results, which shouldn’t be a surprise given the low usage of tessellation. Although Barts isn’t a tessellation monster it could do quite well in the future if tessellation takes off in a manner similar to how these benchmarks use it, but that’s a very big if.

Wolfenstein Power, Temperature, & Noise


View All Comments

  • geniekid - Tuesday, April 05, 2011 - link

    Tom's used a higher end card paired with the 6790 to test the Crossfire performance of this thing. The results suggest there might be some value to this card if used in Xfire configuration compared to single cards around the same price. It would be nice to explore that possibility! Reply
  • BPB - Tuesday, April 05, 2011 - link

    I have been happily running with two HD4850's for a few years now, and want to upgrade. It seems to me that if I stick to a 24" 1920x1200 monitor practically any card will do. Still, I don't like the idea of getting a 6800 series since it's practically 2+ year old technology. Wondering if I should go 6900 series, or wait till 7000 series. Come AMD, man up, put some real upgrades out there that make it easy for me to decide. Reply
  • richardginn - Tuesday, April 05, 2011 - link

    it is nice to see a review the AMD 6790 video card, but how about a review of the AMD 6450, AMD 6550, and AMD 6670 OEM video cards??? Reply
  • jabber - Tuesday, April 05, 2011 - link

    Arent they just the 5XXX cards re-branded with a 6? If so a waste of time.

    Plus as they are OEM we wont be buying them.
  • richardginn - Wednesday, April 06, 2011 - link

    Actually no. These OEM cards are very different.

    The 6450 video card which I have only seen sold as an option at is supposed to be like twice as fast as the 5450 based on the specs listed from the AMD website.

    If you are talking just a rebranded video card you have to be talking about the OEM 6770 and 6750 video cards which have no performance boost in the FPS area.

    Will these cards go off OEM status when Bulldozer CPU'S are released or just move on to something like a 7450 video card????
  • Ryan Smith - Tuesday, April 05, 2011 - link

    Funny you should mention that... Reply
  • BoFox - Tuesday, April 05, 2011 - link

    Is the memory bus really 128-bit instead of 256 bits wide? I'm wondering why Anandtech put a lot of effort into checking up on GTX 550 Ti's 192-bit bandwidth with an odd number of chips, but not on either the 5830 or 6790 that claims 256-bit bus while the ROPs are cut in half.

    We all know that the number of ROPs is tied with the memory bus for a given architecture design. This is why NV's GTX 550 Ti seems much more valid, as it is linear with 24 ROPs.

    If we look at the 5830 here:
    3D Mark Vantage color fill test is strongly correlated with the memory bandwidth. If the 5830 were 256-bit, it would have had identical bandwidth with 5850. However, the performance shows that it is not the case. It is barely half of 5850's performance, and also much slower than HD 4870 which has only 16 ROPs at a lower clock than that of 5830.

    Next, if we look at BeHardware's ultimate scrutiny (just as respectable as Anandtech's examination of GTX 550 Ti):
    We see that the 5830 has far lower FP16 and FP32 GPixel/s writes than not only the 5770 that has a slightly higher fill rate, but also the 4890 to a far greater degree. The test is directly linear to the available bandwidth as the 4890 is so much faster than the 5770, let alone 5830 in that respect.

    One more thing is that as with Barts architecture, we should all know that it is based on VLIW4 architecture, not the traditional VLIW5 one. It seems that AMD wanted to save the "thunder" for Cayman's launch by reserving the announcement for what desperately needed as much thunder as possible. Just look at how close 6870's performance is to 5870 while comparing 6870's 1120sp and 4.2Gbps bandwidth to 5870's 1600sp and 4.8Gbps bandwidth.

    Hope you guys enjoyed a little bit of exposure! Is AMD deliberately giving us wrong information? That's not my problem, but if I were the one reviewing the product, I would definitely point these things out in my article.
  • Ryan Smith - Tuesday, April 05, 2011 - link

    From a graphics point of view it's not possible to separate the performance of the ROPs from memory bandwidth. Color fill, etc are equally impacted by both. To analyze bandwidth you'd have to work from a compute point of view. However with that said I don't have any reason to believe AMD doesn't have a 256-bit; achieving identical performance with half the L2 cache will be harder though.

    And Barts is VLIW5, not VLIW4. Only Cayman is VLIW4.
  • BoFox - Tuesday, April 05, 2011 - link

    Barts is also Northern Islands--the keynote of the architecture being VLIW4.

    See, the 6790 wouldn't come within 2-3% of 5830 according to all of the benchmarks at the review here.

    If it were VLIW5 like the 5830, the 6790 would've been MUCH slower (or the 5830 much faster).

    This is because HD 5830 has 1120sp, which is 40% more than 6790's 800sp. It also has 56 TMU's, which is 40% more than 6790's 40 TMU's.

    All of the other specs are shockingly similar, with only 5% difference in core and memory clock speeds. Both the 5830 and 6790 have the same "alleged 256-bit bus".

    In spite of the whopping 40% shader and TMU difference, the 6790 comes SO close to the 5830--close enough that it's only possible if the 800sp were VLIW4 (multiply it by 5/4 and you get performance like 1000sp). It would only make sense there, as 1000sp is about 10% less than 1120sp, but with 5% higher clock, it comes to within 2-3% of 5830's performance.

    If not for VLIW4, what would it be? Tell me.
  • BoFox - Tuesday, April 05, 2011 - link

    Typo: I forgot to add "if it weren't for VLIW4" before the second sentence above. Sorry if it's confusing.

    Shouldn't we all already know that Barts XT was of the Northern Islands VLIW4 architecture from how close it was to the 5870 (1120sp vs 1600sp which is 43% higher)? Even after adjusting for the clock differences, the shader/TMU operations per second is still 35% higher for the 5870, yet the 5870 turned out to be only 9% faster overall. It would've made perfect sense if Barts XT had 1400 VLIW5 shaders (using the 5:4 ratio over 1120sp).

    Using the math: 1600sp x 850Mhz is ... 8% faster than 1400sp x 900MHz. The memory bandwidth does not affect the performance too much since the proportion of bandwidth to GPU muscle is not changed by much. So, it's really close to the overall 9% actual performance difference between 5870 and 6870. Is that coincidental? The reason the difference is actually greater than the calculation is because 1120sp VLIW4 does not exactly translate to 1400sp VLIW5. VLIW4 is not perfectly efficient as to scale 100% at a 5/4 ratio, but it's pretty close.

    What else would it be, sincerely?

Log in

Don't have an account? Sign up now