Compute Performance

Shifting gears, as always our final set of real-world benchmarks is a look at compute performance. As we have seen with GTX 680 and GTX 670, GK104 appears to be significantly less balanced between rendering and compute performance than GF110 or GF114 were, and as a result compute performance suffers.  Cache and register file pressure in particular seem to give GK104 grief, which means that GK104 can still do well in certain scenarios, but falls well short in others. For GTX 660 Ti in particular, this is going to be a battle between the importance of shader performance – something it has just as much of as the GTX 670 – and cache/memory pressure from losing that ROP cluster and cache.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

For Civilization V memory bandwidth and cache are clearly more important than raw compute performance in this test. Although this isn’t a worst case scenario outcome for the GTX 660 Ti, it drops substantially from the GTX 670. As a result its compute performance is barely better than the GTX 560 Ti, which wasn’t a strong performer at compute in the first place.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

Ray tracing likes memory bandwidth and cache, which means another tough run for the GTX 660 Ti. In fact it’s now slower than the GTX 560 Ti. Compared to the 7950 this isn’t even a contest. GK104 is generally bad at compute, and GTX 660 Ti is turning out to be especially bad.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

The GTX 660 Ti does finally turn things around on our AES benchmark, thanks to the fact that it generally favors NVIDIA. At the same time the gap between the GTX 670 and GTX 660 Ti is virtually non-existent.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

The compute shader fluid simulation provides the GTX 660 Ti another bit of reprieve, although like other GK104 cards it’s still relatively weak. Here it’s virtually tied with the GTX 670 so it’s clear that it isn’t being impacted by cache or memory bandwidth losses, but it needs about 10% more to catch the 7950.

Finally, we’re adding one last benchmark to our compute run. NVIDIA and the Folding@Home group have sent over a benchmarkable version of the client with preliminary optimizations for GK104. Folding@Home and similar initiatives are still one of the most popular consumer compute workloads, so it’s something NVIDIA wants their GPUs to do well at.

Interestingly Folding @ Home proves to be rather insensitive to the differences between the GTX 670 and GTX 660 Ti, which is not what we would have expected. The GTX 660 Ti isn’t doing all that much better than the GTX 570, once more reflecting that GK104 is generally struggling with compute performance, but it’s not a bad result.

Civilization V Synthetics
Comments Locked

313 Comments

View All Comments

  • TheJian - Monday, August 20, 2012 - link

    660 can go to 1100/1200 as easily as the 7950 gets to 1150 (so another 10% faster)..Check the asus card I linked to before. You'll have a hard time catching the 660 no matter what, it costs you also as noted by anandtech, my comments on watts/cost/heat etc.

    Memory bandwidth isn't the issue. here and all of it overclocks fairly close. We don't run in 2560x1600. It's not the weakness. That is a misnomer perpetuated by Ryan beating it like a dead horse when only 2% of users use any res above 1920x1200. I just debunked that idea further by showing even monitors at newegg including 27 inchers don't use that res. IE, no, bandwidth isn't the problem. Bad review on ryan's part, and no conclusion is the problem. The CORE clock/boost is the thing when it's not an bandwidth issue, and it's already been shown to not be true.. LOL, yep, nvidia conspiracy, the minimums were used here to...ROFL. Good luck digging for things wrong with 660TI. Minimums are shown at hardocp, guru3d, anandtech and more. Strange thing you even brought this up with no proof.

    The NV cards have only been upped 100mhz, which is about ~10%, not 20 like you say. 915/1114 isn't 20%. You CAN get there, but not in out of box exp. I'd guess nearly all of the memory will hit 6.6ghz. Common for 7970OC / gtx680 to hit 7+ghz.
  • Galidou - Monday, August 20, 2012 - link

    I said 20% because most of their cards are way above reference clocks, I was just representing the reality, not the reference thingys. When you can buy factory overclocked cards at the same price, let's say 10$ premium, mentioning the reference clocks is almost... useless. Plus over the internet, 80% of the reviews had factory overclocked cards so the performance we see everywhere and is in everyone's head, is close to 20% overclock has been done.

    So in fact there's maybe 10-15% of the juice left for fellow overclockers. I'm estimating, it could be more in the case of better chips. While the 7950 as we know it, has been reviewed everywhere on it's reference clocks/fan and if you take an aftermarket cooler and get, let's be honest and say 40%, it's far ahead in terms of comparison from the reference reviews we have.

    And again and for the last time, it all depends on the games.
  • Galidou - Monday, August 20, 2012 - link

    When I look at things again and again. the memory bandwidth doesn't seem to be much of a problem. The only games where I can guess it could harm it is any new games that will come out with directx11 heavy graphics. Something that taxes the cards on every aspects, else than that, for now, the card doesn't seem to have any weaknesses at all.

    I never thought that for the moment it was a real weakness for it, the future will tell us but even there, 90% of the gamers plays at 1080p or less and 80% of that 90% pays less than 150$ for their video cards. For those paying more, it all depends on choosen side, games they play, overclocking or not and money they want to spend.

    Remove overclocking of the way Nvidia wins almost everything by a good margin. Anyone playing 1080p won't be deceived by any 200$+ card if they are not so inclined playing everything on ultra with 8x MSAA.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    You're going to have a CRAP experience and stuttering junk on your eyefinity in between crashes.
    Come back and apologize to me, and then thejian can hang his head and tell you he tried to warn you.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Here's the WARNING for you again, with the 660Ti STOMPING your dreamy 7950 into the turf in Skyrim at 2560 x 1080

    http://www.bit-tech.net/hardware/2012/08/16/nvidia...
  • RussianSensation - Thursday, August 16, 2012 - link

    Hey Ryan,

    In Shogun 2 and Batman AC at 1080P almost none of the new cards are being stressed. I think you should increase the quality to Ultra for the new 2-3GB generation of cards even if the < 1.5GB VRAM cards suffer and bump AA to 8X in Batman. Otherwise all the cards have no problem passing these benchmarks. Same with SKYRIM, maybe think about adding heavy mods OR testing that game with SSAA or 8xAA at least. Even the 6970 is getting > 83 fps. Maybe you can start thinking of replacing some of these games. They aren't getting very demanding anymore for the new generation of cards.
  • Ryan Smith - Saturday, August 18, 2012 - link

    Russian, it's unlikely that we'll ever bump AA up to 8x. I hate jaggies, but the only thing 8x AA does is to superficially slow things down; the quality improvement isn't even negligable. If 4x MSAA doesn't get rid of jaggies in a game, then the problem isn't MSAA.

    Consequently this is why we use SSAA on Portal 2. High-end cards are fast enough to use SSAA at a reasonable speed. Ultimately many of these games will get replaced in the next benchmark refresh, but if we need to throw up extra roadblocks in the future it will be in the form of TrSSAA/AAA or SSAA, just like we did with Portal 2.
  • Biorganic - Saturday, August 18, 2012 - link

    I was speaking a bit on both. The article insinuates that the 660ti is on the same performance level as the 7950. The obvious caveat to your results is that it is ridiculously easy to overclock the 7950 by 35-45%, and GCN performance scales pretty well with clock increases. It should be noted in the article that the perf of 7950 OC'd is beyond what the 660ti can attain. Unless you guys can OC a 660ti sample by 30% or more.
  • CeriseCogburn - Sunday, August 19, 2012 - link

    Is this the exact same way we recommended the GTX460 reviews ? With some supermassive OC in the reviews, so we could really see what the great GTX 460 could do ?
    NO>>>>>>>
    The EXACT OPPOSITE occurred here, by all of your type people.
    Did we demand the 560Ti be OC'ed to show how it surpasses the amd series ? NOPE.
    Did we go on and on about how massive the GTX580 gains were with OC even though it was already far, far ahead of all the amd cards with it's very low core clocks ? NOPE - here we heard power whines.
    Did we just complain that the GTX680 is not even in the review while the 7970 is ?
    Nope.
    How about the GTX 470 or 480 ? Very low cores, where were all of you then demanding they be OC'ed because they gained massively.. ?
    Huh, where were you ?
  • Galidou - Sunday, August 19, 2012 - link

    Performance scales pretty well on both design but AMD just is a little better at overclocking because it seems like the base clock is terribly underclocked. It just feels like that but that must be for power constraints and noise on reference designs.

Log in

Don't have an account? Sign up now