Compute

With GTX 980 NVIDIA surprised us with their stunning turnaround in in GPU compute performance, which saw them capable of reaching the top in many compute benchmarks they couldn’t before. GTX 970 meanwhile should benefit from these architectural and driver improvements, though since compute is nearly analogous to shader performance this is also a case where the performance difference between the GTX 970 and GTX 980 stands to be among its widest.

As always we’ll start with LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone

Compute: LuxMark 2.0

Thanks to GTX 980 taking the top spot here, GTX 970 still maintains a small lead over R9 290XU. So even with the GTX 970's weaker performance, it can still manage to outperform AMD's flagship in this case.

For our second set of compute benchmarks we have CompuBench 1.5, the successor to CLBenchmark. We’re not due for a benchmark suite refresh until the end of the year, however as CLBenchmark does not know what to make of GTX 980 and is rather old overall, we’ve upgraded to CompBench 1.5 for this review.

Compute: CompuBench 1.5 - Face Detection

Compute: CompuBench 1.5 - Optical Flow

Compute: CompuBench 1.5 - Particle Simulation 64K

In the cases where the GTX 980 does well, so does the GTX 970. In the cases where the GTX 980 wasn’t fast enough to top the charts, the GTX 970 will be similarly close behind. Overall compared to AMD’s lineup we see the whole gamut, from a tie between the GTX 970 and R9 290XU to victories for either card.

Our 3rd compute benchmark is Sony Vegas Pro 12, an OpenGL and OpenCL video editing and authoring package. Vegas can use GPUs in a few different ways, the primary uses being to accelerate the video effects and compositing process itself, and in the video encoding step. With video encoding being increasingly offloaded to dedicated DSPs these days we’re focusing on the editing and compositing process, rendering to a low CPU overhead format (XDCAM EX). This specific test comes from Sony, and measures how long it takes to render a video.

Compute: Sony Vegas Pro 12 Video Render

As expected, GTX 970 sheds a bit of performance here. AMD still holds a lead here overall, and against GTX 970 that lead is a little bit larger.

Moving on, our 4th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, utilizing the OpenCL path for FAHCore 17.

Compute: Folding @ Home: Explicit, Single Precision

Compute: Folding @ Home: Implicit, Single Precision

Compute: Folding @ Home: Explicit, Double Precision

With the GTX 980 holding such a commanding lead here, even with the GTX 970’s lower performance it still is more than enough to easily beat any other card in single precision Folding @ Home workloads. Only in double precision with NVIDIA’s anemic 1:32 ratio does GTX 970 falter.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

Compute: SystemCompute v0.5.7.2 C++ AMP Benchmark

Recently this has been a stronger benchmark for AMD cards than NVIDIA cards, and consequently GTX 970 doesn’t enjoy quite the lead it sees elsewhere. Though not too far behind R9 280X and even R9 290, like GTX 980 it can’t crunch numbers quite fast enough to keep up with R9 290XU.

Synthetics Power, Temperature, & Noise
Comments Locked

155 Comments

View All Comments

  • hammer256 - Saturday, September 27, 2014 - link

    It would not surprise me if GM204 is crippled in FP64 in a similar way to GK104, with physically limited number of FP64 cores.
    Regarding to GK110, how the die are selected between FP64 crippled and professional cards is not known. You can imagine a case where the dies with defects in the FP64 cores can still be used in gamer cards, and thus have a bit more yield. But that's pure speculation, of course.
    Either way, Nvidia does this because this makes them more money, and they can get away with it. If you remember from your class in micro-economics, when the industry is in a state of monopoly or oligopoly, segmentation is the way to go for profit maximization. Unless AMD is willing to not segment their products, there is no pressure for Nvidia to change what they are doing.
    So we can argue that consumers are the losers in this state of things, and generally in monopoly and oligopoly that is indeed the case. But in this specific case with FP64, I have to ask: are there many/any consumer relevant applications that could really benefit from FP64? I'm curious to know. I would say that in order for these companies to care, the application need to have sufficient general relevance in the same order of magnitude as that for graphics.
    Those of us who uses the GPU in scientific computation such as simulations are the real losers in this trend. But then again, we were fortunate to have had this kind of cheap, off the shelf hardware that were so powerful for what we do. Looks that ride is coming to an end, at least for the foreseeable future. Personally, my simulation doesn't really benefit from double precision, so I'm pretty lucky. Even then I found that stepping from the GTX580 to a GTX680 core didn't improve performance at all. The silver lining there was that GTX690 had much better performance that the GTX590 for me, and I was able to get 4 GTX690's for some excellent performance. A GTX990 would be tempting, or maybe just wait for the 20nm iteration...
  • anubis44 - Wednesday, October 22, 2014 - link

    Of course GM204 is crippled in FP64. That's where nVidia is finding the improved power budget and reduction in wattage requirement. Frankly, I think it's pretty cheesy, and I've stopped listening to people creaming their jeans about how fabulous nVidia's low power is compared with AMD's. Of course it's going to loose it's power requirements if you cripple the hell out of it. Duh. The question is whether you will line up to get shafted with all the other drones, or if you'll protest this stupidity by buying AMD instead, and give nVidia the finger for this, as they rightly deserve. If we don't, AMD will have to take its FP64 circuitry out of their cards to compete.
  • D. Lister - Sunday, September 28, 2014 - link

    What I said earlier had nothing to do with efficiency. If you were a prosumer and were in the market for double precision hardware... why would you want a $3000 pro GPU when you can get nearly the same performance from a <$1000 consumer variant? Not everyone cares for ECC VRAM. HPC guys et al would be all over it, resulting in an unfairly inflated retail value for the rest of us. When that happens, Nvidia is the one that gets the bad rep, just like AMD did during the bit mining fad. Why do you believe it is so important anyway?
  • Subyman - Friday, September 26, 2014 - link

    Looking at the PCB, the FTW version does not have more VRMs than the SC or normal EVGA model. I only see four chokes, which is what the other cards have. MSI has 6 VRMs. I'm wondering if EVGA is also using the same low-end analog VRMs that the SC and regular EVGA cards use as well. All other 970's use higher end VRMs.
  • wetwareinterface - Saturday, September 27, 2014 - link

    the ftw is not the top end designation it isn't even better than the sc cards in most cases it's lower clocked than the sc and just has extra ram.

    for evga the cards are custom clocked cards are in order

    sc
    ftw
    ssc
    sc signature
    classified

    again the ftw can have lower clocks than the sc or the same clocks but usually has more ram
  • Subyman - Saturday, September 27, 2014 - link

    I never said it was. The article mentioned it had 1 more power phase than the others, but from the pictures it obviously doesn't.
  • Subyman - Friday, September 26, 2014 - link

    Also, we really need a round up of all the brands on here. Seeing the FTW version vs reference doesn't paint a usable picture for those looking to make a purchase.
  • Mr Perfect - Friday, September 26, 2014 - link

    Is anyone going to pair this with the 980's blower? That would be quite impressive.

    Oh, and get the 970's IO up to par. Again, the 980's configuration would be better. Dual DVI indeed...
  • Margalus - Friday, September 26, 2014 - link

    pny has a 970 with the full complement of output's. 3 dp, 1 hdmi 2 and 1 dvi. It really pisses me off that most of the top tier makers like EVGA and ASUS decided to switch that to 1 dp, 1 hdmi and 2 dvi...
  • pixelstuff - Friday, September 26, 2014 - link

    Same here. Annoyed.

Log in

Don't have an account? Sign up now