Compute Performance

Shifting gears, as always our final set of real-world benchmarks is a look at compute performance. As we have seen with GTX 680 and GTX 670, GK104 appears to be significantly less balanced between rendering and compute performance than GF110 or GF114 were, and as a result compute performance suffers.  Cache and register file pressure in particular seem to give GK104 grief, which means that GK104 can still do well in certain scenarios, but falls well short in others. For GTX 660 Ti in particular, this is going to be a battle between the importance of shader performance – something it has just as much of as the GTX 670 – and cache/memory pressure from losing that ROP cluster and cache.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

For Civilization V memory bandwidth and cache are clearly more important than raw compute performance in this test. Although this isn’t a worst case scenario outcome for the GTX 660 Ti, it drops substantially from the GTX 670. As a result its compute performance is barely better than the GTX 560 Ti, which wasn’t a strong performer at compute in the first place.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

Ray tracing likes memory bandwidth and cache, which means another tough run for the GTX 660 Ti. In fact it’s now slower than the GTX 560 Ti. Compared to the 7950 this isn’t even a contest. GK104 is generally bad at compute, and GTX 660 Ti is turning out to be especially bad.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

The GTX 660 Ti does finally turn things around on our AES benchmark, thanks to the fact that it generally favors NVIDIA. At the same time the gap between the GTX 670 and GTX 660 Ti is virtually non-existent.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

The compute shader fluid simulation provides the GTX 660 Ti another bit of reprieve, although like other GK104 cards it’s still relatively weak. Here it’s virtually tied with the GTX 670 so it’s clear that it isn’t being impacted by cache or memory bandwidth losses, but it needs about 10% more to catch the 7950.

Finally, we’re adding one last benchmark to our compute run. NVIDIA and the Folding@Home group have sent over a benchmarkable version of the client with preliminary optimizations for GK104. Folding@Home and similar initiatives are still one of the most popular consumer compute workloads, so it’s something NVIDIA wants their GPUs to do well at.

Interestingly Folding @ Home proves to be rather insensitive to the differences between the GTX 670 and GTX 660 Ti, which is not what we would have expected. The GTX 660 Ti isn’t doing all that much better than the GTX 570, once more reflecting that GK104 is generally struggling with compute performance, but it’s not a bad result.

Civilization V Synthetics
Comments Locked

313 Comments

View All Comments

  • CeriseCogburn - Thursday, August 23, 2012 - link

    http://www.bit-tech.net/hardware/2012/08/16/nvidia...

    ROFLMAO - the ONLY REASON you say you wanted the 7950 and it LOSES.

    There's the level of "your cred", you freaking loser.
  • Galidou - Saturday, August 18, 2012 - link

    This is only the point of the iceberg when we speak about credibility. Anandtech was nice enough to have a stock clocked part, we can't say that for most of the reviews on the internet.

    I even got on a website ''not gonna say it, could be too much shame for them'' that was comparing a non reference 660ti overclocked with... suspense... a 7850. And then some times in the review offered an ''alternative analysis'' against a 6870, who's dirty now?

    I won't name any but of all the review sites I usually read, they were all testing overclocked cards (plus the included Nvidia boost) against stock clocked AMD cards, ALL of them... Only one included minimum frame rates to all of the games tested which was interesting to see the limiting bandwidth acting at certain points. One can only wonder if the games released won't have any problem with that.

    I first came here on anand and almost pulled the trigger buying one RIGHT after finishing reading. Then I visited my other sites and it got all messed up. Anand didn't have minimum frames everywhere, others had different results, the games I play switch from one brand to another for the ''best bang for my bucks''.

    With all that mixed up mess, one can only wonder where the ''real'' truth is. I'll probably just end up buying a 7950 overclocking it 40-50% higher and not wonder about future games. At least I waited long enough to see the 660ti. Anyway the other reviewers had quite good result with the 7950 and it was STOCK omg 40-50% overclock can't give a bad performance...
  • CeriseCogburn - Sunday, August 19, 2012 - link

    *OC 660ti's on newegg and only 3 Stock.
    The author pointed out there is no default version, and Partners have a somewhat free reign on released clocks.
    Now be a good person and go look for yourself, you'll have a hard time finding a stock card vs an OC oob card.
    I'd also like to see that 40-50% 7950 OC....(methinks you really spewed overboard there)
    Reviews are noting a 17%-22% max performance gain on maximum 7950 OC, and that does not mean it's stable, except on a sole rider, non internet server, spanking clean, just defragged, built for benching, top of the line components, reviewer super massive rig.
    So, can we get that 50% OC bench set from you ?
    NO, of course we can't.
  • Galidou - Sunday, August 19, 2012 - link

    My friend bought the Twin frozr 3 while it was on special on newegg(300$ a week ago). overclocked 1150/1700 stable that's a 44% overclock and he could go higher, with the stock cooler. We reported gains of around 30 to 36% performance gain in games.

    On newegg, there's plenty of people reporting 1150 to 1200 core overclock, because it is in fact a 7970 board at a very cheap price. If you really can't accept one good thing about AMD that's where I differ from you.

    The thing is, Nvidia won this round for the average user, most of us don't overclock and are not fiddling with voltages and such. Including a nice boost is good for those average users, the fact is and whatever you might say, overclockers know it. AMD is very overclocker friendly this gen, end of the line, cry about it some more, it doesn't change the fact that they already know it, sorry. If you tried to misinform the people, you're too late, it's already circulating on the internet my friend.

    Now you shall say and I've heard it: ''People have been able to get their gtx680 overclocked to 1300 core in some cases so they are..........''. I know the drill, 680 has for the most part, a boost clock of around 1100 - 1150 boost clock. Lemme translate that, 200mhz overclock on a 1100 boost clock, 18% overclock on the cherry picked 680, because I'm comparing it with a 7950 which didn't pass the 7970 requirement.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Oh look at that, I didn't use a single fact again.
    you're pathetic.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    the 660Ti's are hitting 1300+ cores.

    you're losing at stock out of the box in your highest triple monitor rez dummy

    http://www.bit-tech.net/hardware/2012/08/16/nvidia...

    Keep attacking like the fool you are.

    Now you may apologize profusely and thankme for saving you from your brainwashed amd embolism you claim to have acquired at overclock net
  • thebluephoenix - Monday, August 20, 2012 - link

    Cerise, as a punishment i would make you read few nvidia related articles at site called Semi Accurate to see why is so wrong to be biased idiotic crazy fanboy.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Charlie is a liar, I am not. Deal with it.
  • Galidou - Thursday, August 23, 2012 - link

    Everyone is a liar but you Cerise, all hail to you ohh great hardware god, I'm still waiting for news of you on Overclock.net you almighty owner of all the knowledge.

    Come and teach the nitrogen overclockers of the world about your so great knowledge about video card.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Yes, time for you to bow down, then thank me for having to correct you three times already, on the FACTS.

Log in

Don't have an account? Sign up now