Compute Performance

As always our final set of real-world benchmarks is composed of a look at compute performance. As we have seen with GTX 680 and GTX 670, Kepler appears to be significantly less balanced between rendering and compute performance than GF110 or GF114 were, and as a result compute performance suffers.  Further compounding this is the fact that GK106 only has 5 SMXes versus the 8 SMXes of GK104, which will likely further depress compute performance.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

It’s interesting then that despite the obvious difference between the GTX 660 and GTX 660 Ti in theoretical compute performance, the GTX 660 actually beats the GTX 660 Ti here. Despite being a compute benchmark, Civlization V’s texture decompression benchmark is more sensitive to memory bandwidth and cache performance than it is shader performance, giving us the results we see above. Given the GTX 660 Ti’s poor showing in this benchmark this is a good thing for NVIDIA since this means they don’t fall any farther behind. Still, the GTX 660 is effectively tied with the 7850 and well behind the 7870.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

SmallLuxGPU sees us shift towards an emphasis on pure compute performance, which of course is going to be GTX 660’s weak point here. Over 2 years after the launch of the GTX 460 and SLG performance has gone exactly nowhere, with the GTX 460 and GTX 660 turning in the same exact scores. Thank goodness the 8800GT is terrible at this benchmark, otherwise the GTX 660 would be in particularly bad shape.

It goes without saying that with the GTX 660’s poor compute performance here, the 7800 series is well in the lead. The 7870 more than trebles the GTX 660’s performance, an indisputable victory if there ever was one.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

Our AES benchmark was one of the few compute benchmarks where the GTX 660 Ti had any kind of lead, but the significant loss of compute resources has erased that for the GTX 660. At 395ms it’s a hair slower than the 7850, never mind the 7870.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

The fluid simulation is another benchmark that includes a stronger mix of memory bandwidth and cache rather than being purely dependent on compute resources. As a result the GTX 660 still trails the GTX 660 Ti, but by a great amount. Even so, the GTX 660 is no match for the 7800 series.

Finally, we’re adding one last benchmark to our compute run. NVIDIA and the Folding@Home group have sent over a benchmarkable version of the client with preliminary optimizations for Kepler. Folding@Home and similar initiatives are still one of the most popular consumer compute workloads, so it’s something NVIDIA wants their GPUs to do well at.

As we’ve seen previously with GK104, this is one of the few compute benchmarks that shows any kind of significant performance advantage for Little Kepler compared to Little Fermi. GTX 660 drops by 12% compared to GTX 660 Ti, but this is still good enough for a 60% performance advantage over GTX 460.

Civilization V Synthetics
Comments Locked

147 Comments

View All Comments

  • TemjinGold - Thursday, September 13, 2012 - link

    "For today’s launch we were able to get a reference clocked card, but in order to do so we had to agree not to show the card or name the partner who supplied the card."

    "Breaking open a GTX 660 (specifically, our EVGA 660 SC using the NV reference PCB),"

    So... didn't you just break your promise as soon as you made it AND show a pic of the card right underneath?
  • Sufo - Thursday, September 13, 2012 - link

    Haha, shhhh!
  • Homeles - Thursday, September 13, 2012 - link

    Reading comprehension is such an endangered resource...

    If it's the super clocked edition, it's obviously not a reference clocked card.
  • jonup - Thursday, September 13, 2012 - link

    Exactly my thoughts.
  • Ryan Smith - Thursday, September 13, 2012 - link

    Homeles is correct. That's one of the cards from the launch roundup we're publishing later today.. The reference-clocked GTX 660 we tested is not in any way pictured (I'm not quite that daft).
  • knutjb - Saturday, September 15, 2012 - link

    No matter what you try to say it still reads poorly. It should be blatantly obvious about which card was which up front, which the article wasn't. I should have to dig when scanning through.

    Also, your picking it as the better choice over a card that has been out how long, over slight differences... If nvivda really wanted to me to say wow I'll buy it now, the card would have been no more than 199 at launch. 10 bucks under is the best they can do for being late to the party? And you bought the strategy. I have been equally disappointed with AMD when they have done the same thing.
  • MrSpadge - Sunday, September 16, 2012 - link

    When reading Anadtech articles it's almost always safe to assume "he actually means what he's saying". Helps a lot with understanding.
  • thomp237 - Sunday, September 23, 2012 - link

    So where is this roundup? We are now 10 days on from your comment and still no signs of a roundup.
  • CeriseCogburn - Friday, October 12, 2012 - link

    I have been wondering where all the eyefinity amd fragglers have gone to, and now I know what has occurred.

    Eyefinity is Dead.

    These Kepler GPU's from nVidia all can do 4 monitors out of the box. Sure you might find a cheap version with 3 ports, whatever - that's the minority.

    So all the amd fanboys have shut their fat traps about eyefinity, since nVidia surpassed them with A+ 4 easy monitors out of the box on all the Kelpers.

    Thank you nVidia dearly for shutting the idiot pieholes of the amd fanboys.

    It took me this long to comment on this matter because nVidia fanboys don't all go yelling in unison sheep fashion about stuff like the little angry losing amd fans do.

    I have also noticed all the reviewers who are so used to being amd fan rave boys themselves almost never bring up multimonitor and abhor pointing out nVidia does 4 while amd only does 3 except in very expensive special cases.

    Yeah that's notable too. As soon as amd got utterly and totally crushed, it was no longer a central topic and central theme for all the review sites like this place.

    That 2 week Island vacation every year amd puts hundreds of these reporters on must be absolutely wonderful.
    I do hope they are treated very well and have a great time.
  • EchoOne - Wednesday, November 21, 2012 - link

    LOL dude,the 660ti vs the 7950 in eyefinity would get destroyed.I know this because my friend has a comp build with a phenom 965be 4.2ghz and 660ti with 16gb of ram (i built this for him) and i have a fx 6100 4.7ghz,16gb ram and a 7950 i run a triple monitor setup

    https://www.youtube.com/watch?v=ZRXGveviruw&fe...

    And his 660ti DIED trying to play the games at that res and at the same settings as i do.He had to take down his graphics settings from say gta4 from max settings down to about medium and high (i run very high)

    So yeah sure it can run a couple monitors out of the box but same with eyefinity.And trust me their nvidia surround is not as polished as eyefinity..But they get props for trying.

Log in

Don't have an account? Sign up now