Compute Performance

Shifting gears, as always our final set of real-world benchmarks is a look at compute performance. As we have seen with GTX 680 and GTX 670, GK104 appears to be significantly less balanced between rendering and compute performance than GF110 or GF114 were, and as a result compute performance suffers.  Cache and register file pressure in particular seem to give GK104 grief, which means that GK104 can still do well in certain scenarios, but falls well short in others. For GTX 660 Ti in particular, this is going to be a battle between the importance of shader performance – something it has just as much of as the GTX 670 – and cache/memory pressure from losing that ROP cluster and cache.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

For Civilization V memory bandwidth and cache are clearly more important than raw compute performance in this test. Although this isn’t a worst case scenario outcome for the GTX 660 Ti, it drops substantially from the GTX 670. As a result its compute performance is barely better than the GTX 560 Ti, which wasn’t a strong performer at compute in the first place.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

Ray tracing likes memory bandwidth and cache, which means another tough run for the GTX 660 Ti. In fact it’s now slower than the GTX 560 Ti. Compared to the 7950 this isn’t even a contest. GK104 is generally bad at compute, and GTX 660 Ti is turning out to be especially bad.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

The GTX 660 Ti does finally turn things around on our AES benchmark, thanks to the fact that it generally favors NVIDIA. At the same time the gap between the GTX 670 and GTX 660 Ti is virtually non-existent.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

The compute shader fluid simulation provides the GTX 660 Ti another bit of reprieve, although like other GK104 cards it’s still relatively weak. Here it’s virtually tied with the GTX 670 so it’s clear that it isn’t being impacted by cache or memory bandwidth losses, but it needs about 10% more to catch the 7950.

Finally, we’re adding one last benchmark to our compute run. NVIDIA and the Folding@Home group have sent over a benchmarkable version of the client with preliminary optimizations for GK104. Folding@Home and similar initiatives are still one of the most popular consumer compute workloads, so it’s something NVIDIA wants their GPUs to do well at.

Interestingly Folding @ Home proves to be rather insensitive to the differences between the GTX 670 and GTX 660 Ti, which is not what we would have expected. The GTX 660 Ti isn’t doing all that much better than the GTX 570, once more reflecting that GK104 is generally struggling with compute performance, but it’s not a bad result.

Civilization V Synthetics
Comments Locked

313 Comments

View All Comments

  • CeriseCogburn - Sunday, August 19, 2012 - link

    If they can't supply it - it cannot lower competitor prices, and can't be bought, so they make little or no money, and everyone else buys the available competitors product.
    Why doesn't AMD release a card that drives down the 680's price $170 per card and makes nVidia give away 3 free games with it too ?
    That would make too much sense for amd and we consumers and some competition that crushes evil corporate profiteering nVidia, so AMD should do it.
    (roll eyes)
    To answer your question> nVidia is being nice not draining all the red blood from amd's jugular since amd is bleeding out so badly already that if nVidia took them out a million raging in 3d fanboys would scream for billions in payola in a giant lawsuit they protest for in front of the UN and the IMF and the International Court and the 25k persons traveling EURO unelected power bureaucrats.
    So instead of all that terribleness and making amd fans cry, nVidia is nice about it.

  • Galidou - Tuesday, August 21, 2012 - link

    This card at 249$ would be very bad for AMD but not very good for Nvidia either. Considering how close it already is to it's bigger brother, it would probably cut a good percentage of gtx 670 sales.

    So yeah, 249$ might seem a good price for US but they don't want to harm themselves either.
  • Belard - Thursday, August 16, 2012 - link

    What does TI mean?

    Where is the GTX 660? So its really a 670 with a hand chopped off?
  • ericloewe - Thursday, August 16, 2012 - link

    TI means something along the lines of "We'll release a crap version later on that only OEMs will buy, called the GTX 660."
  • Patflute - Thursday, August 16, 2012 - link

    lolwut
  • Omega215D - Thursday, August 16, 2012 - link

    I think it still means "Titanium" version of a chip which was supposedly better than the non-Ti.
  • MrSpadge - Sunday, August 19, 2012 - link

    It means "We can't figure out how to distinguish our products using 3 decimal numbers and up to 3 letters in front of it (or the lack thereof), so we'll add some more letters".
  • R3MF - Thursday, August 16, 2012 - link

    in the anand review of the 450 where Nvidia first showed the lopsided memory bus arrangement it was noted that CUDA apps would not recognise the full memory complement.

    has this now been fixed?
  • Ryan Smith - Thursday, August 16, 2012 - link

    Yes. That was fixed almost immediately.
  • R3MF - Thursday, August 16, 2012 - link

    thanks Ryan

Log in

Don't have an account? Sign up now