Compute

Jumping into compute, we aren’t expecting too much here. Outside of DirectCompute GK104 is generally a poor compute GPU, and the loss of an SMX relative to the GTX 660 Ti isn’t doing the GTX 760 any favors here. By all appearances the GTX 760 is even more of a pure gaming card than the GTX 660 Ti was.

As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.  While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Civilization V once more validates that NVIDIA’s DirectCompute performance is generally up to snuff in this case. The fact that the GTX 760 is ahead of the GTX 660 Ti by any degree took us by surprise at first, but we’re likely looking at a scenario where the wider memory bus and/or larger L2 cache of GTX 760 offset some of the general compute gap.

Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

Luxmark is entirely about compute performance, and as a result this is an exceptionally poor showing for the GTX 760, with the GTX 660 Ti having no trouble besting it.

Our 3rd benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

Breaking down our CLBenchmark results, the computer vision test has frequently favored raw clockspeed over total shader throughput, which gives the GTX 760 an interesting advantage here. It’s capable of easily leaving the GTX 660 Ti in the dust and even edge out the GTX 670. Of course this is still less than 2/3rds the performance of even the slowest AMD GCN card, reflecting AMD’s superior computer performance.

The fluid simulation is especially brutal in that regard. Once again shifting back to an almost complete reliance on shader throughput, GTX 760 slightly trails GTX 660 Ti, never mind the nearly three-fold difference between it and the 7950B.

Moving on, our 4th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home has moved exclusively to OpenCL this year with FAHCore 17.

Unlike some of our other compute benchmarks, the GTX 760 doesn’t fare too poorly here when it comes to single precision. However it’s still notably behind the 7950B in this case. And with double precision it’s no contest.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

As another compute throughput bound benchmark, the GTX 760 is essentially tied with the GTX 660 Ti. This benchmark is somewhat memory bandwidth sensitive, which is why the GTX 760 doesn’t outright lose to the GTX 660 Ti here.

Synthetics Power, Temperature, & Noise
Comments Locked

110 Comments

View All Comments

  • kishorshack - Tuesday, June 25, 2013 - link

    Looks like the GPU gains over a two year cycle is more than CPU gains
    Spending on GPU's is more worth while than Spending on CPU's
    Specially if you start from Sandy Bridge in CPU's
  • DanNeely - Tuesday, June 25, 2013 - link

    3D Rendering is a trivially parallelizable workload. As a result it can roughly double in performance with each full node process shrink just by keeping the core design the same but putting twice as many of them on the die. Real world behavior differs mostly in that some of the additional die space is used to enable things that weren't practical before instead of just making all the existing features twice as fast.
  • wumpus - Tuesday, June 25, 2013 - link

    That is only strictly true if you are willing to use twice as much electricity and generate/remove twice as much heat (it could approach costing twice as much as well, but not nearly as often). A good chunk of each update needs to go to making the GPU have a higher TFLOP/W or the thing will melt.
  • ewood - Tuesday, June 25, 2013 - link

    luckily many of those issues are mitigated by transition to a smaller process node, as DanNeeley said. your statement is more applicable to dual die cards, not new processors having twice the functional units.
  • maltanar - Tuesday, June 25, 2013 - link

    That is unfortunately no longer true, smaller processes do not benefit from the so-called 'Dennard scaling' anymore, without a lot of trickery from semiconductor engineers.
  • DanNeely - Wednesday, June 26, 2013 - link

    They may have to work harder at it; but as long as they're able to continue doing what you refer to as trickery, the result for us end users is the same.
  • tential - Wednesday, June 26, 2013 - link

    CPU gains have been made, just not in performance. We don't need performance on the CPU side for a LOT of applications. Like I always say, if you had double the CPU performance, you still wouldn't gain much FPS in most games.

    Intel would be cannibalizing it's higher end processors if it kept making CPU gains. Instead, it focuses on power consumption, to fit better CPUs into smaller things such as notebooks, tablets, etc. Look at the Macbook Air Review and then tell me we haven't made CPU gains.
  • UltraTech79 - Tuesday, July 2, 2013 - link

    More worthwhile than what? What are you even talking about? Today's i5 chips arnt the bottleneck to any of the GPUs here in any game. So what you're saying is irrelevant.
  • ericore - Tuesday, July 2, 2013 - link

    Aint that the truth, the biggest change was from the 500 series to the 600 series.
    The 600 series make most radeons look like dinosaurs or AMD processors.
    Intel is dicking around giving us less than 10% speed inprovement in each generation.
    Can't wait for AMD to release their steamroller 8 core, except where latency is crucial it will match haswell and cost a fraction. Haswell will still technically be faster, but only in benchmarks, in practice they will be identical. The change from piledriver to steamroller is like from a a pentium 4 to a core 2 duo. It's not a new architecture, but has so many improvements that it ought to be called one.
  • MarcVenice - Tuesday, June 25, 2013 - link

    I checked all the games, and the first 4-5 games the 7950 Boost wins, the other the GTX 760 wins. I didn't add up the numbers, but are you guys sure the HD 7950 Boost is 8% slower overall?

    And what's anandtech's stance on frametimes/fcat? Are those only used when problems arise, new games? I realize they take a lot of time, but I think they can be quite valuable in determing which card is the fastest.

Log in

Don't have an account? Sign up now