Compute and Tessellation

Moving on from our look at gaming performance, we have our customary look at compute performance, bundled with a look at theoretical tessellation performance. Unlike our gaming benchmarks where NVIDIA’s architectural enhancements could have an impact, everything here should be dictated by the core clock and SMs, with shader and polymorph engine counts defining most of these tests.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.

We previously discovered that NVIDIA did rather well in this test, so it shouldn’t come as a surprise that the GTX 580 does even better. Even without the benefits of architectural improvements, the GTX 580 still ends up pulling ahead of the GTX 480 by 15%. The GTX 580 also does well against the 5970 here, which does see a boost from CrossFire but ultimately falls short, showcasing why multi-GPU cards can be inconsistent at times.

Our second compute benchmark is Cyberlink’s MediaEspresso 6, the latest version of their GPU-accelerated video encoding suite. MediaEspresso 6 doesn’t currently utilize a common API, and instead has codepaths for both AMD’s APP (née Stream) and NVIDIA’s CUDA APIs, which gives us a chance to test each API with a common program bridging them. As we’ll see this doesn’t necessarily mean that MediaEspresso behaves similarly on both AMD and NVIDIA GPUs, but for MediaEspresso users it is what it is.

We throw MediaEspresso 6 in largely to showcase that not everything that’s GPU accelerated is GPU-bound, as ME6 showcases this nicely. Once we move away from sub-$150 GPUs, APIs and architecture become much more important than raw speed. The 580 is unable to differentiate itself from the 480 as a result.

Our third GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.

SmallLuxGPU is rather straightforward in its requirements: compute and lots of it. The GTX 580 attains most of its theoretical performance improvement here, coming in at a bit over 15% over the GTX 480. It does get bested by a couple of AMD’s GPUs however, a showcase of where AMD’s theoretical performance advantage in compute isn’t so theoretical.

Our final compute benchmark is a Folding @ Home benchmark. Given NVIDIA’s focus on compute for Fermi and in particular GF110 and GF100, cards such as the GTX 580 can be particularly interesting for distributed computing enthusiasts, who are usually looking for the fastest card in the coolest package. This benchmark is from the original GTX 480 launch, so this is likely the last time we’ll use it.

If I said the GTX 580 was 15% faster, would anyone be shocked? So long as we’re not CPU bound it seems, the GTX 580 is 15% faster through all of our compute benchmarks. This coupled with the GTX 580’s cooler/quieter design should make the card a very big deal for distributed computing enthusiasts.

At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. Here we’re interesting in things from a theoretical architectural perspective, using the Unigine Heaven benchmark and Microsoft’s DirectX 11 Detail Tessellation sample program to measure the tessellation performance of a few of our cards.

NVIDIA likes to heavily promote their tessellation performance advantage over AMD’s Cypress and Barts architectures, as it’s by far the single biggest difference between them and AMD. Not surprisingly the GTX 400/500 series does well here, and between those cards the GTX 580 enjoys a 15% advantage in the DX11 tessellation sample, while Heaven is a bit higher at 18% since Heaven is a full engine that can take advantage of the architectural improvements in GF110.

Seeing as how NVIDIA and AMD are still fighting about the importance of tessellation in both the company of developers and the public, these numbers shouldn’t be used as long range guidance. NVIDIA clearly has an advantage – getting developers to use additional tessellation in a meaningful manner is another matter entirely.

Wolfenstein Power, Temperature, and Noise
Comments Locked

160 Comments

View All Comments

  • wtfbbqlol - Thursday, November 11, 2010 - link

    Most likely an anomaly. Just compare the GTX480 to the GTX470 minimum framerate. There's no way the GTX480 is twice as fast as the GTX470.
  • Oxford Guy - Friday, November 12, 2010 - link

    It does not look like an anomaly since at least one of the few minimum frame rate tests posted by Anandtech also showed the 480 beating the 580.

    We need to see Unigine Heaven minimum frame rates, at the bare minimum, from Anandtech, too.
  • Oxford Guy - Saturday, November 13, 2010 - link

    To put it more clearly... Anandtech only posted minimum frame rates for one test: Crysis.

    In those, we see the 480 SLI beating the 580 SLI at 1920x1200. Why is that?

    It seems to fit with the pattern of the 480 being stronger in minimum frame rates in some situations -- especially Unigine -- provided that the resolution is below 2K.

    I do hope someone will clear up this issue.
  • wtfbbqlol - Wednesday, November 10, 2010 - link

    It's really disturbing how the throttling happens without any real indication. I was really excited reading about all the improvements nVidia made to the GTX580 then I read this annoying "feature".

    When any piece of hardware in my PC throttles, I want to know about it. Otherwise it just adds another variable when troubleshooting performance problem.

    Is it a valid test to rename, say, crysis.exe to furmark.exe and see if throttling kicks in mid-game?
  • wtfbbqlol - Wednesday, November 10, 2010 - link

    Well it looks like there is *some* official information about the current implementation of the throttling.

    http://nvidia.custhelp.com/cgi-bin/nvidia.cfg/php/...

    Copy and paste of the message:
    "NVIDIA has implemented a new power monitoring feature on GeForce GTX 580 graphics cards. Similar to our thermal protection mechanisms that protect the GPU and system from overheating, the new power monitoring feature helps protect the graphics card and system from issues caused by excessive power draw.

    The feature works as follows:
    • Dedicated hardware circuitry on the GTX 580 graphics card performs real-time monitoring of current and voltage on each 12V rail (6-pin, 8-pin, and PCI-Express).
    • The graphics driver monitors the power levels and will dynamically adjust performance in certain stress applications such as Furmark 1.8 and OCCT if power levels exceed the card’s spec.
    • Power monitoring adjusts performance only if power specs are exceeded AND if the application is one of the stress apps we have defined in our driver to monitor such as Furmark 1.8 and OCCT.
    - Real world games will not throttle due to power monitoring.
    - When power monitoring adjusts performance, clocks inside the chip are reduced by 50%.

    Note that future drivers may update the power monitoring implementation, including the list of applications affected."
  • Sihastru - Wednesday, November 10, 2010 - link

    I never heard anyone from the AMD camp complaining about that "feature" with their cards and all current AMD cards have it. And what would be the purpose of renaming your Crysis exe? Do you have problems with the "Crysis" name? You think the game should be called "Furmark"?

    So this is a non issue.
  • flyck - Wednesday, November 10, 2010 - link

    the use of renaming is that nvidia uses name tags to identify wether it should throttle or not.... suppose person x creates a program and you use an older driver that does not include this name tag, you can break things.....
  • Gonemad - Wednesday, November 10, 2010 - link

    Big fat YES. Please do rename the executable from crysis.exe to furmark.exe, and tell us.

    Get furmark and go all the way around, rename it to Crysis.exe, but be sure to have a fire extinguisher in the premises. Caveat Emptor.

    Perhaps just renaming in not enough, some checksumming is involved. It is pretty easy to change checksum without altering the running code, though. When compiling source code, you can insert comments in the code. When compiling, the comments are not dropped, they are compiled together with the running code. Change the comment, change the checksum. But furmark alone can do that.

    Open the furmark on a hex editor and change some bytes, but try to do that in a long sequence of zeros at the end of the file. Usually compilers finish executables in round kilobytes, filling with zeros. It shouldn't harm the running code, but it changes the checksum, without changing byte size.

    If it works, rename it Program X.

    Ooops.
  • iwodo - Wednesday, November 10, 2010 - link

    The good thing about GPU is that it scales VERY well ( if not linearly ) with transistors. 1 Node Die Shrink, Double the transistor account, double the performance.

    Combined there are not bottleneck with Memory, which GDDR5 still have lots of headroom, we are very limited by process and not the design.
  • techcurious - Wednesday, November 10, 2010 - link

    I didnt read through ALL the comments, so maybe this was already suggested. But, can't the idle sound level be reduced simply by lowering the fan speed and compromising idle temperatures a bit? I bet you could sink below 40db if you are willing to put up with an acceptable 45 C temp instead of 37 C temp. 45 C is still an acceptable idle temp.

Log in

Don't have an account? Sign up now