Compute Performance

As always our final set of real-world benchmarks is composed of a look at compute performance. As we have seen with GTX 680 and GTX 670, Kepler appears to be significantly less balanced between rendering and compute performance than GF110 or GF114 were, and as a result compute performance suffers.  Further compounding this is the fact that GK106 only has 5 SMXes versus the 8 SMXes of GK104, which will likely further depress compute performance.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

It’s interesting then that despite the obvious difference between the GTX 660 and GTX 660 Ti in theoretical compute performance, the GTX 660 actually beats the GTX 660 Ti here. Despite being a compute benchmark, Civlization V’s texture decompression benchmark is more sensitive to memory bandwidth and cache performance than it is shader performance, giving us the results we see above. Given the GTX 660 Ti’s poor showing in this benchmark this is a good thing for NVIDIA since this means they don’t fall any farther behind. Still, the GTX 660 is effectively tied with the 7850 and well behind the 7870.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

SmallLuxGPU sees us shift towards an emphasis on pure compute performance, which of course is going to be GTX 660’s weak point here. Over 2 years after the launch of the GTX 460 and SLG performance has gone exactly nowhere, with the GTX 460 and GTX 660 turning in the same exact scores. Thank goodness the 8800GT is terrible at this benchmark, otherwise the GTX 660 would be in particularly bad shape.

It goes without saying that with the GTX 660’s poor compute performance here, the 7800 series is well in the lead. The 7870 more than trebles the GTX 660’s performance, an indisputable victory if there ever was one.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

Our AES benchmark was one of the few compute benchmarks where the GTX 660 Ti had any kind of lead, but the significant loss of compute resources has erased that for the GTX 660. At 395ms it’s a hair slower than the 7850, never mind the 7870.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

The fluid simulation is another benchmark that includes a stronger mix of memory bandwidth and cache rather than being purely dependent on compute resources. As a result the GTX 660 still trails the GTX 660 Ti, but by a great amount. Even so, the GTX 660 is no match for the 7800 series.

Finally, we’re adding one last benchmark to our compute run. NVIDIA and the Folding@Home group have sent over a benchmarkable version of the client with preliminary optimizations for Kepler. Folding@Home and similar initiatives are still one of the most popular consumer compute workloads, so it’s something NVIDIA wants their GPUs to do well at.

As we’ve seen previously with GK104, this is one of the few compute benchmarks that shows any kind of significant performance advantage for Little Kepler compared to Little Fermi. GTX 660 drops by 12% compared to GTX 660 Ti, but this is still good enough for a 60% performance advantage over GTX 460.

Civilization V Synthetics
Comments Locked

147 Comments

View All Comments

  • Amgal - Friday, September 14, 2012 - link

    A little off topic, but does anandtech have an article explaining TU's, SMXes, ROPs, shader clock, etc- basically explaining the new age graphics card architectures? I really enjoy their informative articles, and am having some trouble finding one on that area that isn't littered with incomprehensible computer science macroes. Thanks.
  • pattycake0147 - Friday, September 14, 2012 - link

    If the majority of cards available for sale have custom coolers, why are noise measurements taken for only the reference card? Especially when you've stated that you have custom cards in the lab.
  • Jad77 - Friday, September 14, 2012 - link

    but shouldn't AMD be releasing their next generation sometime soon?
  • Patflute - Friday, September 14, 2012 - link

    Months from now.
  • rarson - Friday, September 14, 2012 - link

    Can we please stop pretending that Nvidia's supply issues are anybody's fault but their own? Is it just a coincidence that Fermi and Kepler both were huge, horrible misfires or is it possible that Nvidia has struggled to design things that actually yield decently? Can we stop ignoring the fact that AMD has had an entire lineup of 28nm parts since March (you know, like 2 months before Kepler ever appeared in reasonable quantities)? Yeah, 28nm IS constrained, but other companies are still putting out parts. Nvidia can't put out parts because they have to throw them away. They're eating the wafers (they must be eating a lot of them if it took them this long to bring out a $300 part).

    I hope Nvidia can pull it together because at this rate, AMD's going to start launching a generation ahead of them (they already have all of the console business).
  • CeriseCogburn - Thursday, November 29, 2012 - link

    nVidia dropped it production purchased spots, so you amd fanboys could blow giant dollars on nearly unavailable amd crap overpriced crashing non pci-e3 gen compliant video card trash
    you did so
    Well not you, but you know what I mean
    Then nVidia released and 2 days before amd "magically" had supply in the channels.
    If you're too stupid to know that - well - sorry since it's obvious
    Then amd crashed it's prices 4 times, and amd fanboys were left raped
    Then amd fired 10% more and now 15% more
    I hope the amd golden parachutes for the criminal executives pleased you
    What's your guess on the amd buyout rumors ?
    My guess is that 3G of ram you fools tried to lie about having an advantage with the totaled and incapable gpu choking on dirt below it at frame rates no Skyrim player could possibly stand, won't be recieving "driver updates" for that "glorious future" when "new games" that "can make use of it" "become available" !
    right fan boy ?
    RIGHT
    LOL
    Have a nice cry, err I meant day.
  • Lepton87 - Friday, September 14, 2012 - link

    This card is obviously slower than 7870.

    http://tpucdn.com/reviews/MSI/GTX_660_Twin_Frozr_I...

    Just look at performance summaries from other sites. But the most glaring flaw of this review is NOT comparing it to OC'ed AMD cards. After OC even 7850 is going to obliterate this overpriced card with almost no clock headroom.
  • Lepton87 - Friday, September 14, 2012 - link

    Unfortunately Anandtech is playing favourites. It's the only site that I know that has somewhat decent reputation that just couldn't admit that 7970GE is simply a faster card than GTX680 and now this....
  • CeriseCogburn - Thursday, November 29, 2012 - link

    Oh come on quarky, Crysis Warhead and Metro first on every review doesn't do it for you ?
    The alphabet here goes A for amd first, then C, the jumps to M, for amd , again and again.
    Why so sour, because amd is almost toast ?
  • CeriseCogburn - Thursday, November 29, 2012 - link

    100%, vs 103%, at a single resolution, the 1920x1200, when 1920x1080 shows another story, and the 7850 is down low at 85%.

    LOL - yeah amd fanboy, you sure are telling this amd fanboy site..

    Can we count how CRAPPY amd drivers are ? Can we count no adaptive v-sync on amd crap cards, can we count no 4 monitors out of the box on amd cards, can we count no auto overclocking, can we count amd slashing it's staff and driver writers aka catalusy maker issues ?
    Can we count any of that, or should we just count 3% ? LOL
    Oh wait fair and above it all amd fanboy, I know the answer...
    We will just count 3 more frames per 100 frame rate, at a single resolution, at your single link, and ignore everything else.
    LOL
    Thank you for your support.

Log in

Don't have an account? Sign up now