Compute

Jumping into compute, we aren’t expecting too much here. Outside of DirectCompute GK104 is generally a poor compute GPU, and other than the clockspeed boost GTX 770 doesn’t have much going for it.

As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.  While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Civilization V at least shows that NVIDIA’s DirectCompute performance is up to snuff in this case. Though as is the case with GTX 780, we’re reaching the limits of what this benchmark can do, due to just how fast modern cards have become.

Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

Moving on to a more general compute task, we get a reminder of how poor GK104 is here. GTX 770 can beat the slower GK104 products, and that’s it. Even GTX 570 is faster, never mind the massive lead that 7970GE holds.

Our 3rd benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

CLBenchmark paints GTX 770 in a better light than LuxMark, but not by a great deal. The gains over the GTX 680 are miniscule since these benchmarks aren’t memory bandwidth limited, and the gap between it and the 7970GE is nothing short of enormous.

Moving on, our 4th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home has moved exclusively to OpenCL this year with FAHCore 17.

Recent core improvements in Folding @ Home continue to pay off for NVIDIA. In single precision the GTX 770 is just fast enough to hang with the 7970 vanilla, though the 7970GE is still over 10% faster. Double precision on the other hand is entirely in AMD’s favor thanks to GK104’s very poor FP64 performance.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

Unlike our other compute benchmarks, System Compute is at least a little bit memory bandwidth sensitive, so GTX 770 pulls ahead of GTX 680 by 11%. Otherwise like every other compute benchmark, AMD’s cards fare far better here.

Synthetics Power, Temperature, & Noise
Comments Locked

117 Comments

View All Comments

  • chizow - Thursday, May 30, 2013 - link

    They are both overpriced relative to their historical cost/pricing, as a result you see Nvidia has posted record margins last quarter, and will probably do similarly well again.
  • Razorbak86 - Thursday, May 30, 2013 - link

    Cool! I'm both a customer and a shareholder, but my shares are worth a hell of a lot more than my SLi cards. :)
  • antef - Thursday, May 30, 2013 - link

    I'm not happy that NVIDIA threw power efficiency to the wind this generation. What is with these GPU manufacturers that they can't seem to CONSISTENTLY focus on power efficiency? It's always...."Oh don't worry, next gen will be better we promise," then it finally does get better, then next gen sucks, then again it's "don't worry, next gen we'll get power consumption down, we mean it this time." How about CONTINUING to focus on it? Imagine any other product segment where a 35%! power increase would be considered acceptable, there is none. That makes a 10 or whatever FPS jump not impressive in the slightest. I have a 660 Ti which I feel has an amazing speed to power efficiency ratio, looks like this generation definitely needs to be sat out.
  • jwcalla - Thursday, May 30, 2013 - link

    It's going to be hard to get a performance increase without sacrificing some power while using the same architecture. You pretty much need a new architecture to get both.
  • jasonelmore - Thursday, May 30, 2013 - link

    or a die shrink
  • Blibbax - Thursday, May 30, 2013 - link

    As these cards have configurable TDP, you get to choose your own priorities.
  • coldpower27 - Thursday, May 30, 2013 - link

    There isn't much you can really do when your working with the same process node and same architecture, the best you can hope for is a slight bump in efficiency at the same performance level but if you increase performance past the sweet spot, you sacrifice efficiency.

    In past generation you had half node shrinks. GTX 280 -> GTX 285 65nm to 55nm and hence reduced power consumption.

    Now we don't, we have jumped straight from 55nm -> 40nm -> 28nm, with the next 20nm node still aways out. There just isn't very much you can do right now for performance.
  • JDG1980 - Thursday, May 30, 2013 - link

    Yes, this is really TSMC's fault. They've been sitting on their ass for too long.
  • tynopik - Thursday, May 30, 2013 - link

    maybe a shade of NVIDIA green for the 770 in the charts instead of AMD red?
  • joel4565 - Thursday, May 30, 2013 - link

    Looks like an interesting part. If for no other reason that to put pressure on AMD's 7950 Ghz card. I imagine that card will be dropping to 400ish very soon.

    I am not sure what card to pick up this summer. I want to buy my first 2560x1440 monitor (leaning towards Dell 2713hm) this summer, but that means I need a new video card too as my AMD 6950 is not going to have the muscle for 1440p. It looks like both the Nvida 770 and AMD 7950 Ghz are borderline for 1440p depending on the game, but there is a big price jump to go to the Nvidia 780.

    I am also not a huge fan of crossfire/sli although I do have a compatible motherboard. Also to preempt the 2560/1440 vs 2560/1600 debate, yes i would of course prefer more pixels, but most of the 2560x1600 monitors I have seen are wide gamut which I don't need and cost 300-400 more. 160 vertical pixels are not worth 300-400 bucks and dealing with the Wide gamut issues for programs that aren't compatible.

Log in

Don't have an account? Sign up now