Compute Performance

Shifting gears, as always our final set of real-world benchmarks is a look at compute performance. As we have seen with GTX 680 and GTX 670, GK104 appears to be significantly less balanced between rendering and compute performance than GF110 or GF114 were, and as a result compute performance suffers.  Cache and register file pressure in particular seem to give GK104 grief, which means that GK104 can still do well in certain scenarios, but falls well short in others. For GTX 660 Ti in particular, this is going to be a battle between the importance of shader performance – something it has just as much of as the GTX 670 – and cache/memory pressure from losing that ROP cluster and cache.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

For Civilization V memory bandwidth and cache are clearly more important than raw compute performance in this test. Although this isn’t a worst case scenario outcome for the GTX 660 Ti, it drops substantially from the GTX 670. As a result its compute performance is barely better than the GTX 560 Ti, which wasn’t a strong performer at compute in the first place.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

Ray tracing likes memory bandwidth and cache, which means another tough run for the GTX 660 Ti. In fact it’s now slower than the GTX 560 Ti. Compared to the 7950 this isn’t even a contest. GK104 is generally bad at compute, and GTX 660 Ti is turning out to be especially bad.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

The GTX 660 Ti does finally turn things around on our AES benchmark, thanks to the fact that it generally favors NVIDIA. At the same time the gap between the GTX 670 and GTX 660 Ti is virtually non-existent.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

The compute shader fluid simulation provides the GTX 660 Ti another bit of reprieve, although like other GK104 cards it’s still relatively weak. Here it’s virtually tied with the GTX 670 so it’s clear that it isn’t being impacted by cache or memory bandwidth losses, but it needs about 10% more to catch the 7950.

Finally, we’re adding one last benchmark to our compute run. NVIDIA and the Folding@Home group have sent over a benchmarkable version of the client with preliminary optimizations for GK104. Folding@Home and similar initiatives are still one of the most popular consumer compute workloads, so it’s something NVIDIA wants their GPUs to do well at.

Interestingly Folding @ Home proves to be rather insensitive to the differences between the GTX 670 and GTX 660 Ti, which is not what we would have expected. The GTX 660 Ti isn’t doing all that much better than the GTX 570, once more reflecting that GK104 is generally struggling with compute performance, but it’s not a bad result.

Civilization V Synthetics
POST A COMMENT

313 Comments

View All Comments

  • blanarahul - Thursday, August 16, 2012 - link

    First! Oh yeah! Reply
  • blanarahul - Thursday, August 16, 2012 - link

    GTX 660 Ti: Designed for overclockers. Overclock memory and thats it. Reply
  • CeriseCogburn - Thursday, August 23, 2012 - link

    The cores are hitting over 1300 consistently. Oh well, buh bye amd. Reply
  • Galidou - Monday, August 27, 2012 - link

    Well it depends on the samples, the 660 ti I bought for my wife, I tested it in my pc and over 1290 core clock(with boost) after 10-15 minutes gaming in a game that doesn't even taxes the gpu past 70%, the video card crashes and windows tells me ''the adapter has stopped responding''.

    Crysis 2 stutters on some levels but it's mainly stable 95% of the time wheras my 7950 overclocked is not doing this.

    It would artifact in MSI kombustor with a slight increase in voltage and core clock above 1260. Good thing it's for my wife and not me, she won't overclock as it's way enough for her mere 1080p resolution. The memory overclocks at 6,6ghz easily.
    Reply
  • GmTrix - Friday, August 17, 2012 - link

    Dear God, Have AnandTech readers really sunk to this level of childishness? Reply
  • Chaitanya - Friday, August 17, 2012 - link

    shocking. Reply
  • CeriseCogburn - Sunday, August 19, 2012 - link

    TXAA - AWESOME - THE JAGGIES ARE GONE.
    Thank you nVidia for having real technology developement, unlike amd loser
    Thank you nVidia for being able to mix ram chip sizes or to distribute ram chips across your memory controllers with proprietary technology that you keep secret depsite amd fanboys desiring to know how you do it so they can help amd implement for free.
    Thanks also for doing it so well, even with reviewers putting it down and claiming it can result in 48 bandwidth instead of 144 bandwidth, all the games and tests they have ever thrown at it in a desperate amd fanboy desire to find a chink in it's armor has yielded ABSOLUTELY NOTHING, as in, YOU'VE DONE IT PERFECTLY AGAIN nVidia.
    I just love the massive bias at this site.
    It must be their darn memory failing.
    Every time they make a crazy speculative attack here on nVidia where all their rabid research to find some fault provides a big fat goose egg, they try to do it again anyway, and they talk like they'll eventually find something even though they never do. By the time they give up, they're off on some other notional and failed to prove it put down against nVidia.
    192 bit bus / 2GB ram / unequal distribution / PERFECT PERFORMANCE IMPLEMENTATION
    Get used to it.
    Reply
  • TheJian - Sunday, August 19, 2012 - link

    ROFL... I should have just read more posts...Might have saved me a crapload of typing Cerise...LOL. Nah, it needs to be said more by more than ONE person :) Call a spade a spade people.

    I tried to leave out the word BIAS and RYAN/Anandtech in the same sentence :)

    But hold on a minute, while I fire up my compute crap (or 2008 game rendered moot by it's own 2011+2012 hires patch equivalent) so I can run up my electric bill so I can prove the AMD card wins in something I never intend to use a gaming card for or run at a res that these things aren't being used for by 98% of the people. Folding? You must be kidding. Bitcoin hunting?...LOL that party was over ages ago - you won't pay for your card getting bitcoins today - it was over before anandtech did their article on bitcoins - but I bet they helped sell some AMD cards. Quadro+fireGL cards are for this crap (computational NON game stuff I mean). Recommending cards based on computational crap is pointless when they're for gaming.

    I'm an amd fanboy but ONLY at heart. My wallet wins all arguments regardless of my love for AMD (or my NV stock...LOL). I'm trying to hold out for AMD's next cpu's but I'm heavily leaning Ivy K for Black Friday, fanboy AMD love or not. They ruined their company by paying 3x the price for ATI, which in turn crapped on their stock and degraded their company to near junk bond status in said stock (damn them, I used to be able to ride the rollercoaster and make money on AMD!). I'm still hoping for a trick up their sleeve nobody knows about. But I think they're just holding back cpu's to clear shelves, nothing special in the new ones coming. Basically a sandy to ivy upgrade but on AMD's side for bullsnozer. The problem is it's still going to be behind ivy by 25-50% (in some cases far worse). Unless it's an EXCEPTIONAL price I can't help but pick IVY as I do a lot of rar/par stuff and of course gaming. I'd get hurt way too much by following my heart this round (I had to take xeon e3110 s775 last time for the same reason).

    My planned Black Friday upgrade looks like, X motherboard (too early for a pick or homework not knowing AMD yet), Ivy 3770K (likely) and a 660TI with the highest default clock I can get at a black friday price :) (meaning $299 or under for zotac AMP speeds or better). I already have 16GB ddr3 waiting here...LOL. I ordered it ages ago, figuring it's going to go through the roof at some point (win8? crappy as it is IMHO). I'm only down $10 so far after purchasing mem I think in Jan or so...LOL. In the end I think I'll be up $30-80 at some point (I only paid $75 for 16GB). Got my dad taken care of too, we're both just waiting on black friday and all this 28nm vid card crap to sort out. End of Nov should have some better tsmc cards available (or another fabs chips?). I'm guessing a ton at high clocks by then for under $299.

    Anyway, THANKS for the good laugh :) I needed that after reading my 4th asinine review. Guru3d looking up for the 5th though...LOL. He doesn't seem to care who wins, & caters more to the wallet it seems (great OC stuff there too). He usually doesn't have a ton of cards or chips in each review though, so you have to read more than one product review there to get the picture, but they're good reviews. Hilbert Hagedoorn (sp?) does pretty dang good. By the end of it, I'll have hit everyone I think (worth mentioning, techreport, hardocp, ixbtlabs, hexus etc - sorry if I left a good one out guys). I seem to read 10+ these days before parting with cash. :( I like hardocp for a difference in ideas of benchmarking. He benches and states the HIGHEST PLAYABLE SETTINGS per card. It's a good change IMHO, though I still require all the other reviews for more games etc. I'm just sure to hit him for vidcard reviews just for the settings I can expect to get away with in a few games. I wish guru3d had thrown in an OC'd 660TI into the 7950 boost review since they're so easily had clocked high at $299/309. But one more read gets that picture, or can be drawn by all the asinine reviews and his 7950 boost review...LOL. I have to get through the rest of guru3d, then off to hardocp for the different angle :) Ahh, weekend geek reading galore with two new gpu cards out this week ;)
    Reply
  • Jorgan22 - Sunday, October 07, 2012 - link

    Review was a good read, glad to see the 660 TI is doing well.

    I have no idea what's up with the comments though, especially you TheJian, you wrote a novel, ending half the paragraphs with "... LOL".

    If you're going to waste so much time doing that, post it in the forums, not in a comment thread where its not going to get read buddy, just hurts you.
    Reply
  • RussianSensation - Sunday, August 19, 2012 - link

    1) TXAA is a blurry mess. See videos or screenshots. It's an option but let's not try claiming it's some new revolutionary anti-aliasing features.

    Instead HD7950 can actually handle MSAA and mods in Skyrim and Batman AC and not choke.
    http://www.computerbase.de/artikel/grafikkarten/20...

    2) That review left 2 critical aspects out:

    (I) Factory preoverclocked, binned after-market 7950s run cooler, quieter and at way lower voltage than that reference artificially overvolted 7950B card tested in the review (see MSI TwinFrozr 3, Gigabyte Windforce 3x for $320-330 on Newegg).

    (II) Those same after-market 7950s hit 1100-1200mhz on 1.175V or less in our forum. At those speeds, the HD7950 > GTX680/HD7970 Ghz Edition. How is that for value at $320-330?

    The review didn't take into account that you can get way better 7950 cards and they overclock 30-50%, and yet the same review took after-market 660Tis and used their coolers for noise testing and overclocking sections against a reference based 7950.

    Let's see how the 660Ti does against the $320 MSI TwinFrozr 7950 @ 1150mhz with MSAA on in Metro 2033, Crysis 1/Warhead, Anno 2070, Skyrim with ENB Mods w/8xMSAA, Batman AC w/8xMSAA, Dirt Showdown, Sleeping Dogs, Sniper Elite V2, Serious Sam 3, Bulletstorm, Alan Wake, Crysis 2 with MSAA. It's going to get crushed, that's what will happen.
    Reply

Log in

Don't have an account? Sign up now