Titan’s Compute Performance, Cont

With Rahul having covered the basis of Titan’s strong compute performance, let’s shift gears a bit and take a look at real world usage.

On top of Rahul’s work with Titan, as part of our 2013 GPU benchmark suite we put together a larger number of compute benchmarks to try to cover real world usage, including the old standards of gaming usage (Civilization V) and ray tracing (LuxMark), along with several new tests. Unfortunately that got cut short when we discovered that OpenCL support is currently broken in the press drivers, which prevents us from using several of our tests. We still have our CUDA and DirectCompute benchmarks to look at, but a full look at Titan’s compute performance on our 2013 GPU benchmark suite will have to wait for another day.

For their part, NVIDIA of course already has OpenCL working on GK110 with Tesla. The issue is that somewhere between that and bringing up GK110 for Titan by integrating it into NVIDIA’s mainline GeForce drivers – specifically the new R314 branch – OpenCL support was broken. As a result we expect this will be fixed in short order, but it’s not something NVIDIA checked for ahead of the press launch of Titan, and it’s not something they could fix in time for today’s article.

Unfortunately this means that comparisons with Tahiti will be few and far between for now. Most significant cross-platform compute programs are OpenCL based rather than DirectCompute, so short of games and a couple other cases such as Ian’s C++ AMP benchmark, we don’t have too many cross-platform benchmarks to look at. With that out of the way, let’s dive into our condensed collection of compute benchmarks.

We’ll once more start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.  While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Note that for 2013 we have changed the benchmark a bit, moving from using a single leader to using all of the leaders. As a result the reported numbers are higher, but they’re also not going to be comparable with this benchmark’s use from our 2012 datasets.

With Civilization V having launched in 2010, graphics cards have become significantly more powerful since then, far outpacing growth in the CPUs that feed them. As a result we’ve rather quickly drifted from being GPU bottlenecked to being CPU bottlenecked, as we see both in our Civ V game benchmarks and our DirectCompute benchmarks. For high-end GPUs the performance difference is rather minor; the gap between GTX 680 and Titan for example is 45fps, or just less than 10%. Still, it’s at least enough to get Titan past the 7970GE in this case.

Our second test is one of our new tests, utilizing Elcomsoft’s Advanced Office Password Recovery utility to take a look at GPU password generation. AOPR has separate CUDA and OpenCL kernels for NVIDIA and AMD cards respectively, which means it doesn’t follow the same code path on all GPUs but it is using an optimal path for each GPU it can handle. Unfortunately we’re having trouble getting it to recognize AMD 7900 series cards in this build, so we only have CUDA cards for the time being.

Password generation and other forms of brute force crypto is an area  where the GTX 680 is particularly weak, thanks to the various compute aspects that have been stripped out in the name of efficiency. As a result it ends up below even the GTX 580 in these benchmarks, never mind AMD’s GCN cards. But with Titan/GK110 offering NVIDIA’s full compute performance, it rips through this task. In fact it more than doubles performance from both the GTX 680 and the GTX 580, indicating that the huge performance gains we’re seeing are coming from not just the additional function units, but from architectural optimizations and new instructions that improve overall efficiency and reduce the number of cycles needed to complete work on a password.

Altogether at 33K passwords/second Titan is not just faster than GTX 680, but it’s faster than GTX 690 and GTX 680 SLI, making this a test where one big GPU (and its full compute performance) is better than two smaller GPUs. It will be interesting to see where the 7970 GHz Edition and other Tahiti cards place in this test once we can get them up and running.

Our final test in our abbreviated compute benchmark suite is our very own Dr. Ian Cutress’s SystemCompute benchmark, which is a collection of several different fundamental compute algorithms. Rahul went into greater detail on this back in his look at Titan’s compute performance, but I wanted to go over it again quickly with the full lineup of cards we’ve tested.

Surprisingly, for all of its performance gains relative to GTX 680, Titan still falls notably behind the 7970GE here. Given Titan’s theoretical performance and the fundamental nature of this test we would have expected it to do better. But without additional cross-platform tests it’s hard to say whether this is something where AMD’s GCN architecture continues to shine over Kepler, or if perhaps it’s a weakness in NVIDIA’s current DirectCompute implementation for GK110. Time will tell on this one, but in the meantime this is the first solid sign that Tahiti may be more of a match for GK110 than it’s typically given credit for.

Titan’s Compute Performance (aka Ph.D Lust) Meet The 2013 GPU Benchmark Suite & The Test
Comments Locked

337 Comments

View All Comments

  • etriky - Sunday, February 24, 2013 - link

    OK, after a little digging I guess I shouldn't be to upset about not having Blender benches in this review. Tesla K20 and GeForce GTX TITAN support was only added to Blender on the 2/21 and requires a custom build (it's not in the main release). See http://www.miikahweb.com/en/blender/svn-logs/commi... for more info
  • Ryan Smith - Monday, February 25, 2013 - link

    As noted elsewhere, OpenCL was broken in the Titan launch drivers, greatly limiting what we could run. We have more planned including SLG's LuxMark, which we will publish an update for once the driver situation is resolved.
  • kukreknecmi - Friday, February 22, 2013 - link

    If you look at Azui's PDF, with using different type of kernel , results for 7970 are :

    SGEMM : 2646 GFLOP
    DGEMM : 848 GFLOP

    Why did u take the lowest numbers for 7970 ??
  • codedivine - Friday, February 22, 2013 - link

    This was answered above. See one of my earlier comments.
  • gwolfman - Friday, February 22, 2013 - link

    ASUS: http://www.newegg.com/Product/Product.aspx?Item=N8...
    OR
    Titan gfx card category (only one shows up for now): http://www.newegg.com/Product/ProductList.aspx?Sub...

    Anand and staff, post this in your news feed please! ;)
  • extide - Friday, February 22, 2013 - link

    PLEASE start including Folding@home benchmarks!!!
  • TheJian - Sunday, February 24, 2013 - link

    Why? It can't make me any money and isn't a professional app. It tells us nothing. I'd rather see photoshop, premier, some finite analysis app, 3d Studiomax, some audio or content creation app or anything that can be used to actually MAKE money. They should be testing some apps that are actually used by those this is aimed at (gamers who also make money on their PC but don't want to spend $2500-3500 on a full fledged pro card).

    What does any card prove by winning folding@home (same with bitcoin crap, botnets get all that now anyway)? If I cure cancer is someone going to pay me for running up my electric bill? NOPE. Only a fool would spend a grand to donate electricity (cpu/gpu cycles) to someone else's next Billion dollar profit machine (insert pill name here). I don't care if I get cancer, I won't be donating any of my cpu time to crap like this. Benchmarking this proves nothing on a home card. It's like testing to see how fast I can spin my car tires while the wheels are off the ground. There is no point in winning that contest vs some other car.

    "If we better understand protein misfolding we can design drugs and therapies to combat these illnesses."
    Straight from their site...Great I'll make them a billionaire drug and get nothing for my trouble or my bill. FAH has to be the biggest sucker pitch I've ever seen. Drug companies already rip me off every time I buy a bottle of their pills. They get huge tax breaks on my dime too, no need to help them, or for me to find out how fast I can help them...LOL. No point in telling me sythentics either. They prove nothing other than your stuff is operating correctly and drivers set up right. Their perf has no effect on REAL use of products as they are NOT a product, thus not REAL world. Every time I see the word synthetic and benchmark in the same sentence it makes me want to vomit. If they are limited on time (usually reviewers are) I want to see something benchmarked that I can actually USE for real.

    I feel the same way about max fps. Who cares? You can include them, but leaving out MIN is just dumb. I need to know when a game hits 30fps or less, as that means I don't have a good enough card to get the job done and either need to spend more or turn things down if using X or Y card.
  • Ryan Smith - Monday, February 25, 2013 - link

    At noted elsewhere, FAHBench is in our plans. However we cannot do anything further until NVIDIA fixes OpenCL support.
  • vanwazltoff - Friday, February 22, 2013 - link

    the 690, 680 and 7970 have had almost a year to brew and improve with driver updates, i suspect that after a few drivers and an overclock titan will creep up on a 690 and will probably see a price deduction after a few months. dont clock out yet, just think what this could mean for 700 and 800 series cards, its obvious nvidia can deliver
  • TheJian - Sunday, February 24, 2013 - link

    It already runs 1150+ everywhere. Most people hit around 1175 max OC stable on titan. Of course this may improve with aftermarket solutions for cooling but it looks like they hit 1175 or so around the world. And that does hit 690 perf and some cases it wins. In compute it's already a winner.

    If there is no die shrink on the next gens from either company I don't expect much. You can only do so much with 250-300w before needing a shrink to really see improvements. I really wish they'd just wait until 20nm or something to give us a real gain. Otherwise will end up with a ivy,haswell deal. Where you don't get much (5-15%). Intel won't wow again until 14nm. Graphics won't wow again until the next shrink either (full shrink, not the halves they're talking now).

Log in

Don't have an account? Sign up now