Compute and Tessellation

Moving on from our look at gaming performance, we have our customary look at compute performance, bundled with a look at theoretical tessellation performance. Unlike our gaming benchmarks where NVIDIA’s architectural enhancements could have an impact, everything here should be dictated by the core clock and SMs, with shader and polymorph engine counts defining most of these tests.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.

We previously discovered that NVIDIA did rather well in this test, so it shouldn’t come as a surprise that the GTX 580 does even better. Even without the benefits of architectural improvements, the GTX 580 still ends up pulling ahead of the GTX 480 by 15%. The GTX 580 also does well against the 5970 here, which does see a boost from CrossFire but ultimately falls short, showcasing why multi-GPU cards can be inconsistent at times.

Our second compute benchmark is Cyberlink’s MediaEspresso 6, the latest version of their GPU-accelerated video encoding suite. MediaEspresso 6 doesn’t currently utilize a common API, and instead has codepaths for both AMD’s APP (née Stream) and NVIDIA’s CUDA APIs, which gives us a chance to test each API with a common program bridging them. As we’ll see this doesn’t necessarily mean that MediaEspresso behaves similarly on both AMD and NVIDIA GPUs, but for MediaEspresso users it is what it is.

We throw MediaEspresso 6 in largely to showcase that not everything that’s GPU accelerated is GPU-bound, as ME6 showcases this nicely. Once we move away from sub-$150 GPUs, APIs and architecture become much more important than raw speed. The 580 is unable to differentiate itself from the 480 as a result.

Our third GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.

SmallLuxGPU is rather straightforward in its requirements: compute and lots of it. The GTX 580 attains most of its theoretical performance improvement here, coming in at a bit over 15% over the GTX 480. It does get bested by a couple of AMD’s GPUs however, a showcase of where AMD’s theoretical performance advantage in compute isn’t so theoretical.

Our final compute benchmark is a Folding @ Home benchmark. Given NVIDIA’s focus on compute for Fermi and in particular GF110 and GF100, cards such as the GTX 580 can be particularly interesting for distributed computing enthusiasts, who are usually looking for the fastest card in the coolest package. This benchmark is from the original GTX 480 launch, so this is likely the last time we’ll use it.

If I said the GTX 580 was 15% faster, would anyone be shocked? So long as we’re not CPU bound it seems, the GTX 580 is 15% faster through all of our compute benchmarks. This coupled with the GTX 580’s cooler/quieter design should make the card a very big deal for distributed computing enthusiasts.

At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. Here we’re interesting in things from a theoretical architectural perspective, using the Unigine Heaven benchmark and Microsoft’s DirectX 11 Detail Tessellation sample program to measure the tessellation performance of a few of our cards.

NVIDIA likes to heavily promote their tessellation performance advantage over AMD’s Cypress and Barts architectures, as it’s by far the single biggest difference between them and AMD. Not surprisingly the GTX 400/500 series does well here, and between those cards the GTX 580 enjoys a 15% advantage in the DX11 tessellation sample, while Heaven is a bit higher at 18% since Heaven is a full engine that can take advantage of the architectural improvements in GF110.

Seeing as how NVIDIA and AMD are still fighting about the importance of tessellation in both the company of developers and the public, these numbers shouldn’t be used as long range guidance. NVIDIA clearly has an advantage – getting developers to use additional tessellation in a meaningful manner is another matter entirely.

Wolfenstein Power, Temperature, and Noise
POST A COMMENT

159 Comments

View All Comments

  • cjb110 - Tuesday, November 09, 2010 - link

    "While the difference is’ earthshattering, it’s big enough..." nt got dropped, though not yet at my workplace:) Reply
  • Invader Mig - Tuesday, November 09, 2010 - link

    I don't know the stance on posting links to other reviews since I'm a new poster, so I wont. I would like to make note that in another review they claim to have found a work around the power throttling that allowed them to use furmark to get accurate temps and power readings. This review has the 580 at 28w above the 480 at max load. I don't mean to step on anyone's toe's, but I have seen so many different numbers because of this garbage nvidia has pulled, and the only person who claims to have furmark working gets higher numbers. I would really like to see something definitive. Reply
  • 7Enigma - Tuesday, November 09, 2010 - link

    Here's my conundrum. What is the point of something like Furmark that has no purpose except to overstress a product? In this case the 580 (with modified X program) doesn't explode and remains within some set thermal envelope that is safe to the card. I like using Crysis as it's a real-world application that stresses the GPU heavily.

    Until we have another game/program that is used routinely (be it game or coding) that surpasses the heat generation and power draw of Crysis I just don't see the need to try to max out the cards with a benchmark. OC your card to the ends of the earth and run something real, that is understandable. But just using a program that has no real use to artificially create a power draw just doesn't have any benefit IMO.
    Reply
  • Gonemad - Tuesday, November 09, 2010 - link

    I beg to differ. (be careful, high doses of flaming.)

    Let me put it like this. The Abrams M1 Tank is tested on a 60º ramp (yes, that is sixty degrees), where it must park. Just park there, hold the brakes, and then let go. It proves the brakes on a 120-ton 1200hp vehicle will work. It is also tested on emergency brakes, where this sucker can pull a full stop from 50mph on 3 rubber-burning meters. (The treads have rubber pads, for the ill informed).
    Will ever a tank need to hold on a 60º ramp? Probably not. Would it ever need to come to a screeching halt in 3 meters? In Iraqi, they probably did, in order to avoid IEDs. But you know, if there were no prior testing, nobody would know.

    I think there should be programs specifically designed to stress the GPU in unintended ways, and it must protect itself from destruction, regardless of what code is being thrown at it. NVIDIA should be grateful somebody pointed that out to them. AMD was thankful when they found out the 5800 series GPUs (and others, but this was worse) had lousy performance on 2D acceleration, or none at all, and rushed to fix its drivers. Instead, NVIDIA tries to cheat Furmark by recognizing its code and throttling. Pathetic.

    Perhaps someday, a scientific application may come up with repeatable math operations that just behave exactly like Furmark. So, out of the blue, you got a $500 worth of equipment that gets burned out, and nobody can tell why??? Would you like that happening to you? Wouldn't you like to be informed that this or that code, at least, could destroy your equipment?

    What if Furmark wasn't designed to stress GPUs, but it was an actual game, (with furry creatures, lol)?

    Ever heard of Final Fantasy XIII killing off PS3s for good, due to overload, thermal runaway, followed by meltdown? Rumors are there, if you believe them is entirely to you.

    Ever heard of Nissan GTR (skyline) being released with a top-speed limiter with GPS that unlocks itself when the car enters the premises of Nissan-approved racetracks? Inherent safety, or meddling? Can't you drive on a Autoban at 300km/h?

    Remember back in the day of early benchmark tools, (3DMark 2001 if I am not mistaken), where the Geforce drivers detected the 3DMark executable and cheated the hell out of the results, and some reviewers got NVIDIA red-handed when they renamed and changed the checksum of the benchmark??? Rumors, rumors...

    The point is, if there is a flaw, a risk of an unintended instruction kill the hardware, the buyer should be rightfully informed of such conditions, specially if the company has no intention at all to fix it. Since Anand warned us, they will probably release the GTX 585 with full hardware thermal safeties. Or new drivers. Or not.

    Just like the instruction #PROCHOT was inserted in the Pentium (which version?) and some reviewers tested it against an AMD chip. I never forgot that AMD processor billowing blue smoke the moment the heatsink was torn off. Good PR, bad PR. The video didn´t look fake to me back then, just unfair.

    In the end, it becomes matter of PR. If suddenly all the people that played Crysis on this card caused it to be torched, we would have something really interesting.
    Reply
  • Sihastru - Tuesday, November 09, 2010 - link

    AMD has a similar system in place since the HD4xx0 generation. Remember when Furmark used to blow up 48x0 cards? Of course not. But look it up...

    What nVidia did here is what AMD has in all their mid/high end cards since HD4xx0. At least nVidia will only throttle when it detects Furmark/OCCT. AMD cards will throttle in any situation if the power limiter requires it.
    Reply
  • JimmiG - Tuesday, November 09, 2010 - link

    It's a very unfortunate situation that both companies are to blame for. That's what happens when you push the limits of power consumption and heat output too far while at the same time trying to keep manufacturing costs down.

    The point of a stress test is to push the system to the very limit (but *not* beyond it, like AMD and Nvidia would have you believe). You can then be 100% assured that it will run all current and future games and HPC applications, not matter what unusual workloads they dump on your GPU or CPU, without crashes or reduced performance.
    Reply
  • cactusdog - Tuesday, November 09, 2010 - link

    So if you want to use multiple monitors do you still need 2 cards to run it or have they enabled a third monitor on the 580? Reply
  • Sihastru - Tuesday, November 09, 2010 - link

    Yes. Reply
  • Haydyn323 - Tuesday, November 09, 2010 - link

    The 580 as with the previous generation still only supports 2 monitors max per card. Reply
  • Pantsu - Tuesday, November 09, 2010 - link

    A good article, and a good conclusion overall. Much better that the fiasko that was the 6800-article.

    I do lament the benchmarking method AT uses though. Benchmarks like the Crysis Warhead one are not really representative of real world performance, but tend to be a bit too "optimized". They do not reflect real world performance very well, and even skew the results between cards.
    Reply

Log in

Don't have an account? Sign up now