Compute

Jumping into compute, we should see a mix of results here, with some tests favoring the GK110 based GTX 780’s more compute capable design, while other tests will punish it for not being a fast FP64 card like GTX Titan.

As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.  While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Civilization V’s DirectCompute performance is looking increasingly maxed out at the high end. At 402fps the GTX 780 may as well be tied with GTX Titan. On the other hand it’s a reminder that while we don’t always see NVIDIA do well in our more pure compute tests, it can deliver where it matters for games with DirectCompute.

Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

NVIDIA has never done well at LuxMark, and GTX 780 won’t change that. It’s greatly faster than GTX 680 and that’s about it. Kepler parts, including GK110, continue to have trouble with our OpenCL benchmarks, as evidenced by the fact that GTX 780 doesn’t beat GTX 580 by nearly as much as the generational improvements should lead to. GK110 is a strong compute GPU, but not in ways that LuxMark is going to benefit.

Our 3rd benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

GTX 780 still struggles some at compute with CLBenchmark, but less so than with LuxMark. 7970GE is the clear winner here in both tests, while GTX 780 stays remarkably close to GTX Titan in performance. The fluid simulation in particular makes GTX 780 look good on a generational basis, more than doubling GTX 580’s performance.

Moving on, our 4th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home has moved exclusively to OpenCL this year with FAHCore 17.

The Folding@Home group recently pushed out a major core update(FAHBench 1.2.0), which we’ve rerun on a number of cards and is reflected in our results. Unfortunately this version also broke single precision implicit on AMD GPUs and AMD’s latest drivers, so we only have NVIDIA GPUs for that section.

In any case, despite the fact that this is an OpenCL benchmark this is one of the cases where NVIDIA GPUs do well enough for themselves in single precision mode, with GTX 780 surpassing 7970GE, and falling behind only GTX Titan and the 7990. GTX 780 doesn’t necessarily benefit from GK110’s extra compute functionality, but it does see a performance improvement over GTX 680 that’s close to the theoretical difference in shader performance. Meanwhile in double precision mode, the lack of an uncapped double precision mode for GTX 780 means that it brings up the bottom of the charts compared to Titan and its 1/3 FP64 rate. Compute customers looking for a bargain NVIDIA card (relatively speaking) will need to stick with Titan.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

SystemCompute shows very clear gains over both the GTX 680 and GTX 580, while trailing the GTX Titan as expected. However like Titan, both trail the 7970GE.

Synthetics Power, Temperature, & Noise
POST A COMMENT

155 Comments

View All Comments

  • milkman001 - Friday, May 24, 2013 - link

    FYI,

    On the "Total War: Shogun 2" page, you have the 2650x1440 graph posted twice.
    Reply
  • JDG1980 - Saturday, May 25, 2013 - link

    I don't think that the release of this card itself is problematic for Titan owners - everyone knows that GPU vendors start at the top and work their way down with new silicon, so this shouldn't have come as much of a surprise.

    What I do find problematic is their refusal to push out BIOS-based fan controller improvements to Titan owners. *That* comes off as a slap in the face. Someone spends $1000 on a new video card, they deserve top-notch service and updates.
    Reply
  • inighthawki - Saturday, May 25, 2013 - link

    The typically swapchain format is something like R8G8B8A8 and the alpha channel is typically ignored (value of 0xFF typically written), since it is of no use to the OS, since it will not alpha blend with the rest of the GUI. You can create a 24-bit format, but it's very likely that for performance reasons, the driver will allocate it as if it were a 32-bit format, and not write to the upper 8 bits. The hardware is often only capable of writing to 32 bit aligned places, so its more beneficial for the hardware to just waste 8 bits of data and not have to do any fancy shifting to read or write from each pixel. I've actually seen cases where some drivers will allocate 8-bit formats as 32-bit formats, wasting 4x the space the user thought they were allocating. Reply
  • jeremyshaw - Saturday, May 25, 2013 - link

    As a current GTX580 owner running at 2560x1440, I don't have any want of upgrade, especially in compute performance. I think I'll hold out for at least one more generation, before deciding. Reply
  • ahamling27 - Saturday, May 25, 2013 - link

    As a GTX 560 Ti owner, I am chomping at the bit to pick up an upgrade. The Titan was out of the question, but the 780 looks a lot better at 65% of the cost for 90% of the performance. The only thing holding me back is that I'm still on z67 with a 2600k overclocked to 4.5 ghz. I don't see a need to rebuild my entire system as it's almost on par with the z77/3770. The problem is that I'm still on PCIe 2.0 and I'm worried that it would bottleneck a 780.

    Considering a 780 is aimed at us with 5xx or lower cards, it doesn't make sense if we have to abandon our platform just to upgrade our graphics card. So could you maybe test a 780 on PCIe 2.0 vs 3.0 and let us know if it's going to bottleneck on 2.0?
    Reply
  • Ogdin - Sunday, May 26, 2013 - link

    There will be no bottleneck. Reply
  • mapesdhs - Sunday, May 26, 2013 - link


    Ogdin is right, it shouldn't be a bottleneck. And with a decent air cooler, you ought to be
    able to get your 2600K to 5.0, so you have some headroom there aswell.

    Lastly, you say you have a GTX 560 Ti. Are you aware that adding a 2nd card will give
    performance akin to a GTX 670? And two 560 Tis oc'd is better than a stock 680 (VRAM
    capacity not withstanding, ie. I'm assuming you have a 1GB card). Here's my 560Ti SLI
    at stock:

    http://www.3dmark.com/3dm11/6035982

    and oc'd:

    http://www.3dmark.com/3dm11/6037434

    So, if you don't want the expense of an all new card for a while at the cost level of a 780,
    but do want some extra performance in the meantime, then just get a 2nd 560Ti (good
    prices on eBay these days), it will run very nicely indeed. My two Tis were only 177 UKP
    total - less than half the cost of a 680, though atm I just run them at stock speed, don't
    need the extra from an oc. The only caveat is VRAM, but that shouldn't be too much of
    an issue unless you're running at 2560x1440, etc.

    Ian.
    Reply
  • ahamling27 - Wednesday, May 29, 2013 - link

    Thanks for the reply! I thought about SLI but ultimately the 1 GB of vram is really going to hurt going forward. I'm not going to grab a 780 right away, because I want to see what custom models come out in the next few weeks. Although, EVGA's ACX cooler looks nice, I just want to see some performance numbers before I bite the bullet.

    Thanks again!
    Reply
  • inighthawki - Tuesday, May 28, 2013 - link

    Your comment is inaccurate. Just because a game requires "only 512MB" of video ram doesn't mean that's all it'll use. Video memory can be streamed in on the fly as needed off the hard drive, and as a result you can easily use a lot if you wanted as a performance optimization. I would not be the least bit surprised to see games on next gen consoles using WAY more video memory than regular memory. Running a game that "requires" 512MB of VRAM on a GPU with 4GB of VRAM gives it 3.5GB more storage to cache higher resolution textures. Reply
  • AmericanZ28 - Tuesday, May 28, 2013 - link

    NVIDIA=FAIL....AGAIN! 780 Performs on par with a 7970GE, yet the GE costs $100 LESS than the 680, and $250 LESS than the 780. Reply

Log in

Don't have an account? Sign up now