Titan’s Compute Performance, Cont

With Rahul having covered the basis of Titan’s strong compute performance, let’s shift gears a bit and take a look at real world usage.

On top of Rahul’s work with Titan, as part of our 2013 GPU benchmark suite we put together a larger number of compute benchmarks to try to cover real world usage, including the old standards of gaming usage (Civilization V) and ray tracing (LuxMark), along with several new tests. Unfortunately that got cut short when we discovered that OpenCL support is currently broken in the press drivers, which prevents us from using several of our tests. We still have our CUDA and DirectCompute benchmarks to look at, but a full look at Titan’s compute performance on our 2013 GPU benchmark suite will have to wait for another day.

For their part, NVIDIA of course already has OpenCL working on GK110 with Tesla. The issue is that somewhere between that and bringing up GK110 for Titan by integrating it into NVIDIA’s mainline GeForce drivers – specifically the new R314 branch – OpenCL support was broken. As a result we expect this will be fixed in short order, but it’s not something NVIDIA checked for ahead of the press launch of Titan, and it’s not something they could fix in time for today’s article.

Unfortunately this means that comparisons with Tahiti will be few and far between for now. Most significant cross-platform compute programs are OpenCL based rather than DirectCompute, so short of games and a couple other cases such as Ian’s C++ AMP benchmark, we don’t have too many cross-platform benchmarks to look at. With that out of the way, let’s dive into our condensed collection of compute benchmarks.

We’ll once more start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.  While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Note that for 2013 we have changed the benchmark a bit, moving from using a single leader to using all of the leaders. As a result the reported numbers are higher, but they’re also not going to be comparable with this benchmark’s use from our 2012 datasets.

With Civilization V having launched in 2010, graphics cards have become significantly more powerful since then, far outpacing growth in the CPUs that feed them. As a result we’ve rather quickly drifted from being GPU bottlenecked to being CPU bottlenecked, as we see both in our Civ V game benchmarks and our DirectCompute benchmarks. For high-end GPUs the performance difference is rather minor; the gap between GTX 680 and Titan for example is 45fps, or just less than 10%. Still, it’s at least enough to get Titan past the 7970GE in this case.

Our second test is one of our new tests, utilizing Elcomsoft’s Advanced Office Password Recovery utility to take a look at GPU password generation. AOPR has separate CUDA and OpenCL kernels for NVIDIA and AMD cards respectively, which means it doesn’t follow the same code path on all GPUs but it is using an optimal path for each GPU it can handle. Unfortunately we’re having trouble getting it to recognize AMD 7900 series cards in this build, so we only have CUDA cards for the time being.

Password generation and other forms of brute force crypto is an area  where the GTX 680 is particularly weak, thanks to the various compute aspects that have been stripped out in the name of efficiency. As a result it ends up below even the GTX 580 in these benchmarks, never mind AMD’s GCN cards. But with Titan/GK110 offering NVIDIA’s full compute performance, it rips through this task. In fact it more than doubles performance from both the GTX 680 and the GTX 580, indicating that the huge performance gains we’re seeing are coming from not just the additional function units, but from architectural optimizations and new instructions that improve overall efficiency and reduce the number of cycles needed to complete work on a password.

Altogether at 33K passwords/second Titan is not just faster than GTX 680, but it’s faster than GTX 690 and GTX 680 SLI, making this a test where one big GPU (and its full compute performance) is better than two smaller GPUs. It will be interesting to see where the 7970 GHz Edition and other Tahiti cards place in this test once we can get them up and running.

Our final test in our abbreviated compute benchmark suite is our very own Dr. Ian Cutress’s SystemCompute benchmark, which is a collection of several different fundamental compute algorithms. Rahul went into greater detail on this back in his look at Titan’s compute performance, but I wanted to go over it again quickly with the full lineup of cards we’ve tested.

Surprisingly, for all of its performance gains relative to GTX 680, Titan still falls notably behind the 7970GE here. Given Titan’s theoretical performance and the fundamental nature of this test we would have expected it to do better. But without additional cross-platform tests it’s hard to say whether this is something where AMD’s GCN architecture continues to shine over Kepler, or if perhaps it’s a weakness in NVIDIA’s current DirectCompute implementation for GK110. Time will tell on this one, but in the meantime this is the first solid sign that Tahiti may be more of a match for GK110 than it’s typically given credit for.

Titan’s Compute Performance (aka Ph.D Lust) Meet The 2013 GPU Benchmark Suite & The Test
Comments Locked

337 Comments

View All Comments

  • CeriseCogburn - Tuesday, February 26, 2013 - link

    I really don't understand that mentality you have. I'm surrounded by thousands of dollars of computer parts and I certainly don't consider myself some sort of hardware enthusiast or addicted overclocker, or insane gamer.

    Yet this card is easily a consideration, since several other systems have far more than a thousand dollars in them on just the basics. It's very easy to spend a couple thousand even being careful.

    I don't get what the big deal is. The current crop of top end cards before this are starkly inadequate at common monitor resolutions.
    One must nearly ALWAYS turn down features in the popular benched games to be able to play.

    People just don't seem to understand that I guess. I have untold thousands of dollars in many computers and the only thing that will make them really gaming capable at cheap monitor resolutions is a card like this.

    Cripes my smartphone cost a lot more than the former top two cards just below Titan.

    This is the one area that comes to mind ( the only one that exists as far as I can tell) where the user is left with "my modern computer can't do it" - and that means, take any current taxing game (lots of those - let's say 50% of those reviewed as a rough thumb) and you're stuck unable to crank it up.

    Now 120hz monitors are becoming common, so this issue is increased.
    As you may have noticed, another poster exclaimed:
    " Finally ! 1920x1080 a card that can do it ! "

    There's the flat out closest to the truth, and I agree with that entirely, at least for this moment, as I stated here before the 7970 didn't do it when it was released and doesn't now and won't ever. (neither does the 680)

    I'm trying to deny it, but really it is already clear that the Titan doesn't cut it for everything at the above rez either, not really, and not at higher refresh rates.

    More is still needed, and this is the spot that is lacking for gamers, the video card.

    This card is the card to have, and it's not about bragging, it's about firing up your games and not being confronted with the depressing "turn off the eyecandy" and check the performance again... see if that is playable...

    I mean ****, that apparently does not bother any of you, and I do not know why.
    Everything else in your system is capable...
    This is an IMPORTANT PART that actually completes the package, where the end user isn't compromising.
  • HighTech4US - Thursday, February 21, 2013 - link

    If it does could we see a new story on performance using NVENC across the entire Kepler line along with any FREEware/PAYware software that utilizes it. I have an older Intel Q8300 that is used as my HTPC/Living Room Gaming System and encoding videos take a long time just using the CPU cores.

    If getting a Kepler GPU and using NVENC can speed up encoding significantly I would like to know. As that would be the lowest cost upgrade along with getting a Gaming card upgrade.

    Thanks
  • Ryan Smith - Thursday, February 21, 2013 - link

    Yes, NVEnc is present.
  • lkuzmanov - Thursday, February 21, 2013 - link

    excellent! now make it 30-40% cheaper and I'm on board.
  • Zink - Thursday, February 21, 2013 - link

    Rahul Garg picked the lowest HD 7970 scores in both cases from the Matsumoto et al. paper. The other higher GFLOPS scores represent performance using alternate kernels performing the same calculation on the same hardware as far as I can tell. Rahul needs to justify choosing only the lowest HD 7970 numbers in his report or I can only assume he is tilting the numbers in favor of Titan.
  • JarredWalton - Thursday, February 21, 2013 - link

    Picking the highest scoring results that are using optimized cores and running on different hardware in the first place (e.g. not the standard test bed) would be tilting the results very far in AMD's favor. A default run is basically what Titan gets to do, so the same for 7970 would make sense.
  • codedivine - Thursday, February 21, 2013 - link

    The different algorithms are actually not performing the exact same calculation. There are differences in matrix layouts and memory allocations. We chose the ones that are closest to the layouts and allocations we were testing on the Titan.

    In the future, we intend to test with AMD's official OpenCL BLAS. While Matsumoto's numbers are good for illustrative purposes. We would prefer running our own benchmarks on our own testbeds, and on real-world code which will typically use AMD's BLAS for AMD cards. AMD's OpenCL BLAS performance is actually a little bit lower than Matsumoto's numbers so I don't think we tilted the numbers in AMD's favour. If anything, we gave AMD a bit of benefit-of-the-doubt here.

    In the same vein, faster results than Nvidia's CUBLAS have been demonstrated on Nvidia hardware. However, we chose to test only using CUBLAS as all production code will typically use CUBLAS due to its reliability and support from Nvidia.

    AMD's OpenCL BLAS is a bit complicated to setup correctly and in my research, I have had problems with stability with it on Windows. Thus, we avoided it in this particular review but we will likely look at it in the future.
  • Zink - Thursday, February 21, 2013 - link

    Thanks, shouldn't have doubted you :)
  • Nfarce - Thursday, February 21, 2013 - link

    ...about my 680 purchase last April (nearly a year ago already, wow). Was so worried I made the wrong decision replacing two 570s knowing the Kepler was less than a year away. The news on this card has firmed up my decision to lock in with a second 680 now for moving up to a 2560x1440 monitor.

    Very *very* disappointing, Nvidia.
  • CeriseCogburn - Thursday, February 21, 2013 - link

    The new top card has been near the same as two of the former cards FOREVER.

    You people are nothing short of stupid nut jobs.

    There are not enough tampons at Johnson and Johnson warehouses for this thread.

    THE VERY SAME RATIO has occurred every time for all the prior launches.

Log in

Don't have an account? Sign up now