Compute

Shifting gears, as always our final set of benchmarks is a look at compute performance. As we have seen with GTX 680, GK104 appears to be significantly less balanced between rendering and compute performance than GF110 or GF114 were, and as a result compute performance suffers.  Cache and register file pressure in particular seem to give GK104 grief, which means that GK104 can still do well in certain scenarios, but falls well short in others.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

It’s quite shocking to see the GTX 670 do so well here. For sure it’s struggling relative to the Radeon HD 7900 series and the GTX 500 series, but compared to the GTX 680 it’s only trailing by 4%. This is a test that should cause the gap between the two cards to open up due to the lack of shader performance, but clearly that this not the case. Perhaps we’ve been underestimating the memory bandwidth needs of this test? If that’s the case, given AMD’s significant memory bandwidth advantage it certainly helps to cement the 7970’s lead.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

SmallLuxGPU on the other hand finally shows us that larger gap we’ve been expecting between the GTX 670 and GTX 680. The GTX 680’s larger number of SMXes and higher clockspeed cause the GTX 670 to fall behind by 10%, performing worse than the GTX 570 or even the GTX 470. More so than any other test, this is the test that drives home the point that GK104 isn’t a strong compute GPU while AMD offers nothing short of incredible compute performance.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

Once again the GTX 670 has a weak showing here, although not as bad as with SmallLuxGPU. Still, it’s enough to fall behind the GTX 570; but at least it’s enough to beat the 7950. Clockspeeds help as showcased by the EVGA GTX 670SC but nothing really makes up for the missing SMX.

Our foruth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

For reasons we’ve yet to determine, this benchmark strongly dislikes GTX 670 in particular. There doesn’t seem to be a performance regression in NVIDIA’s drivers, and there’s not an incredible gap due to TDP, it just struggles on the GTX 670. As a result performance of the GTC 670 only hits 42% of the GTX 680, which is well below what the GTX 670 should theoretically be getting. Barring some kind of esoteric reaction between this program and the unbalanced GPC a driver issue is still the most likely culprit, but it looks to only affect the GTX 670.

Finally, we’re adding one last benchmark to our compute run. NVIDIA  and the Folding@Home group have sent over a benchmarkable version of the client with preliminary optimizations for GK104. Folding@Home and similar initiatives are still one of the most popular consumer compute workloads, so it’s something NVIDIA wants their GPUs to do well at.

Whenever NVIDIA sends over a benchmark you can expect they have good reason to, and this is certainly the case for Folding@Home. GK104 is still a slouch given its resources compared to GF110, but at least it can surpass the GTX 580. At 970 nanoseconds per day the GTX 670 can tie the GTX 580, while the GTX 680 can pull ahead by 6%. Interestingly this benchmark appears to be far more constrained by clockspeed than the number of shaders, as the EVGA GTX 670SC outperforms the GTX 680 thanks to its 1188MHz boost clock, which it manages to stick to the entire time.

Civilization V Synthetics
Comments Locked

414 Comments

View All Comments

  • Morg. - Thursday, May 10, 2012 - link

    No.
    I am saying that tahiti XT paired with 384 bits RAM AND clocked at the same speed as a gtx 680 paired with 256 bits RAM, has clearly more raw power.

    The thing is, two years from now, nVidia will be boosting other new games for the NEW nVidia hardware and you will not benefit from it on the old H/W.

    However, raw power will remain, 3GB of RAM will still be 3GB of RAM and you will thank god for the added graphics you get out of that last 1 GB that cost you nothing more.

    The two games that have for years been GPU benchmarks and haven't been sponsored by either nVidia or AMD are Crysis warhead and metro 2033.

    If you wanna trash those results because BF3 is everything to you, you should totally do it though.
  • scook9 - Thursday, May 10, 2012 - link

    Crysis: Warhead is a "The way it is meant to be played" title.....

    You see that every time you start it up as well as on the box.
    http://image.com.com/gamespot/images/bigboxshots/3...
  • eddman - Thursday, May 10, 2012 - link

    Two years from now 7970 won't be powerful enough anyway.

    As scook9 mentioned, warhead is an TWIMTBP and yet runs better on 7970.
    It'd be better if you removed that tin foil hat. TWIMTBP and Gaming Evolved are programs to help developers code their games better.
    There are countless TWIMTBP games that run better on radeons.

    Crysis and warhead use an old engine that isn't going to be used anymore. Nowadays they are just obsolete benchmarks.

    Metro 2033 is a very nice game and I really liked it, but it's not that popular and has a proprietary engine. Most gamers don't care about such engine.

    Frostbite, OTOH, matters because it belongs to a major publisher/developer which means we'll see many games based on it in the future.
  • SlyNine - Thursday, May 10, 2012 - link

    I'm pretty sure a 4870 (basically a 6770) is powerful enough today, why wouldn't a 7970 be powerful enough by than.

    Just because an engine is going to be used anymore doesn't mean it isn't useful to gauge certain aspects of a videocard. Many engines that will be used are not even developed yet, some may push a card more like the Crytech engine did.

    Crytech 2 is going to be used for MechWarrior online baby. (Im glad it used a good engine, and it looks like they are using it to good effect).
  • eddman - Thursday, May 10, 2012 - link

    Because 3GB memory is for high-resolutions and high AA settings, and 2 years from now 7970 won't have enough power to run those games at those settings at good frame rates.

    That doesn't make sense. Card A might run max payne 1 twice as fast as card B, but what'd be the point.

    No, mechwarrior online uses cryengine 3, not 2. Cryengine 2, that was used in crysis and warhead, is dead.
  • SlyNine - Saturday, May 12, 2012 - link

    I meant CryEngine 3. not sure why I said 2.

    There is no proof that 3gigs wont be enough for high res by then. Yea maybe not (or maybe) with AA.

    Besides you didn't say anything about running maxed out everything, you made a blanket statement that the 7970 wont powerful enough period.

    That means that card A does something that card B cannot, depending on what that is it have an effect on engines that focus on certain things.
  • eddman - Saturday, May 12, 2012 - link

    I meant 7970 won't have enough shader power 2 years from now, so 3GB won't help then either.

    Yes, everything maxed out with high AA. After all that's what large memories are for.

    Obsolete engine is obsolete. Deal with it. Cryengine 2 won't be used in any other AAA game. It's gone.
  • SlyNine - Saturday, May 12, 2012 - link

    A realtime engine will always tell you something about the card. Obsolete or not.

    If 3GB gives it some sort of advantage then it was worth it. In many games it's already showing an advantage at ultra high res.

    Only you are saying the only use of large video cache is AA at ultra settings. But this is simply a questionable premise.

    I really don't care if Cryengine 2 is used for a AAA game, or ever again. I still play Crysis. Furthermore I don't give a dam about AAA games, most of them are dumbed down for mass appeal.
  • CeriseCogburn - Monday, June 11, 2012 - link

    At 7000X what rez is 3GB showing an advantage ?

    ROFL - desperation
  • theprodigalrebel - Thursday, May 10, 2012 - link

    BF3 has sold 1.9 million copies worldwide.
    Metro 2033 has sold 0.16m copies worldwide
    Crysis is an old game that I don't see (m)any people playing.

    BF3 is also scheduled for three DLC releases (two this year, third next year).

    I see a perfectly good reason why BF3 performance matters. You are speculating that the 7900-series will have great Unreal 4 performance. That's just silly since nobody knows anything about Unreal 4 performance yet.

    The only thing I could find was Hexus.net reporting that nVidia chose the Kepler to demonstrate the Unreal 4 engine at the GDC.

Log in

Don't have an account? Sign up now