Titan’s Compute Performance (aka Ph.D Lust)

Because GK110 is such a unique GPU from NVIDIA when it comes to compute, we’re going to shake things up a bit and take a look at compute performance first before jumping into our look at gaming performance.

On a personal note, one of the great things about working at AnandTech is all the people you get to work with. Anand himself is nothing short of fantastic, but what other review site also has a Brian Klug or a Jarred Walton? We have experts in a number of fields, and as a computer technology site that includes of course includes experts in computer science.

What I’m trying to say is that for the last week I’ve been having to fend off our CS guys, who upon hearing I had a GK110 card wanted one of their own. If you’ve ever wanted proof of just how big a deal GK110 is – and by extension Titan – you really don’t have to look too much farther than that.

Titan, its compute performance, and the possibilities it unlocks is a very big deal for researchers and other professionals that need every last drop of compute performance that they can get, for as cheap as they can get it. This is why on the compute front Titan stands alone; in NVIDIA’s consumer product lineup there’s nothing like it, and even AMD’s Tahiti based cards (7970, etc), while potent, are very different from GK110/Kepler in a number of ways. Titan essentially writes its own ticket here.

In any case, as this is the first GK110 product that we have had access to, we couldn’t help but run it through a battery of tests. The Tesla K20 series may have been out for a couple of months now, but at $3500 for the base K20 card, Titan is the first GK110 card many compute junkies are going to have real access to.

To that end I'd like to introduce our newest writer, Rahul Garg, who will be leading our look at Titan/GK110’s compute performance. Rahul is a Ph.D student specializing in the field of parallel computing and GPGPU technology, making him a prime candidate for taking a critical but nuanced look at what GK110 can do. You will be seeing more of Rahul in the future, but first and foremost he has a 7.1B transistor GPU to analyze. So let’s dive right in.

By: Rahul Garg

For compute performance, we first looked at two common benchmarks: GEMM (measures performance of dense matrix multiplication) and FFT (Fast Fourier Transform). These numerical operations are important in a variety of scientific fields. GEMM is highly parallel and typically compute heavy, and one of the first tests of performance and efficiency on any parallel architecture geared towards HPC workloads. FFT is typically memory bandwidth bound but, depending upon the architecture, can be influenced by inter-core communication bandwidth. Vendors and third-parties typically supply optimized libraries for these operations. For example, Intel supplies MKL for Intel processors (including Xeon Phi) and AMD supplies ACML and OpenCL-based libraries for their CPUs and GPUs respectively.  Thus, these benchmarks measure the performance of the combination of both the hardware and software stack.

For GEMM, we tested the performance of NVIDIA's CUBLAS library supplied with CUDA SDK 5.0, on SGEMM (single-precision/fp32 GEMM) and DGEMM (double precision/fp64 GEMM) on square matrices of size 5k by 5k. For SGEMM on Titan, the data reported here was collected with boost disabled. We also conducted the experiments with boost enabled on Titan, but found that the performance was effectively equal to the non-boost case. We assume that it is because our test ran for a very short period of time and perhaps did not trigger boost. Therefore, for the sake of simpler analysis, we report the data with boost disabled on the Titan. If time permits, we may return to the boost issue in a future article for this benchmark.

Apart from the results collected by us for GTX Titan, GTX 680 and GTX 580, we refer to experiments conducted by Matsumoto, Nakasato and Sedukin reported in a technical report filed at the University of Aizu about GEMM on Radeon 7970.  Their exact parameters and testbed are different than ours, and we include their results for illustrative purposes, as a ballpark estimate only. The results are below.

DGEMM

Titan rules the roost amongst the three listed cards in both SGEMM and DGEMM by a wide margin. We have not included Intel's Xeon Phi in this test, but the TItan's achieved performance is higher than the theoretical peak FLOPS of the current crop of Xeon Phi. Sharp-eyed readers will have observed that the Titan achieves about 1.3 teraflops on DGEMM, while the listed fp64 theoretical peak is also 1.3 TFlops; we were not expecting 100% of peak on the Titan in DGEMM. NVIDIA clarified that the fp64 rating for the Titan is a conservative estimate. At 837MHz, the calculated fp64 peak of Titan is 1.5 TFlops. However, under heavy load in fp64 mode, the card may underclock below the listed 837MHz to remain within the power and thermal specifications. Thus, fp64 ALU peak can vary between 1.3 TFlops and 1.5 TFlops and our DGEMM results are within expectations.

Next, we consider the percentage of fp32 peak achieved by the respective SGEMM implementations. These are plotted below.

Percentage of peak achieved on SGEMM

Titan achieves about 71% of its peak while GTX 680 only achieves about 40% of the peak. It is clear that while both GTX 680 and Titan are said to be Kepler architecture chips, Titan is not just a bigger GTX 680. Architectural tweaks have been made that enable it to reach much higher efficiency than the GTX 680 on at least some compute workloads. GCN based Radeon 7970 obtains about 63% of peak on SGEMM using Matsumoto et al. algorithm, and Fermi based GTX 580 also obtains about 63% of peak using CUBLAS.

For FFT, we tested the performance of 1D complex-to-complex inplace transforms of size 225 using the CUFFT library. Results are given below.

FFT single precision

FFT double precision

Titan outperforms the GTX 680 in FFT by about 50% in single-precision. We suspect this is primarily due to increased memory bandwidth on Titan compared to GTX 680 but we have not verified this hypothesis.  GTX 580 has a slight lead over the GTX 680. Again, if time permits, we may return to the benchmark for a deeper analysis. Titan achieves about 3.4x the performance of GTX 680 but this is not surprising given the poor fp64 execution resources on the GTX 680.

We then looked at an in-house benchmark called SystemCompute, developed by our own Ian Cutress. The benchmark tests the performance on a variety of sample kernels that are representative of some scientific computing applications. Ian described the CPU version of these benchmarks in a previous article. Ian wrote the GPU version of the benchmarks in C++ AMP, which is a relatively new GPGPU API introduced by Microsoft in VS2012.

Microsoft's implementation of AMP compiles down to DirectCompute shaders. These are all single-precision benchmarks and should run on any DX11 capable GPU. The benchmarks include 2D and 3D finite difference solvers, 3d particle movement, n-body benchmark and a simple matrix multiplication algorithm. Boost is enabled on both the Titan and GTX 680 for this benchmark. We give the score reported by the benchmark for both cards, and report the speedup of the Titan over 680. Speedup greater than 1 implies Titan is faster, while less than 1 implies a slowdown.

SystemCompute scores (higher is better)
Benchmark GTX 580 GTX 680 GTX Titan Speedup of Titan
over GTX 680
2D FD 9053 8445 12461 1.47
3D FD 3133 3827 5263 1.37
3DPmo 41722 26955 40397 1.49
MatMul 172 197 229 1.16
nbody 918 1517 2418 1.59

The benchmarks show between 16% and 60% improvement, with the most improvement coming from the relatively FLOP-heavy n-body benchmark. Interestingly, GTX 580 wins over the Titan in 3DPMo and wins over the 680 in 3DPmo and 2D.

Overall, GTX Titan is an impressive accelerator from compute perspective and posts large gains over its predecessors.

The Final Word On Overclocking Titan’s Compute Performance, Cont
Comments Locked

337 Comments

View All Comments

  • cliffnotes - Thursday, February 21, 2013 - link

    Price is a disgrace. Can we really be surprised though ? We saw the 680 release and knew then they were selling their mid ranged card as a flagship with a flagship price.

    We knew then the real flagship was going to come at some point. I admit I assumed they would replace the 680 with it and charge maybe 600 or 700. Can't believe they're trying to pawn it off for 1000. Looks like nvidia has decided to try and reshape what the past flagship performance level is worth. 8800gtx,280,285,480,580 all 500-600, we all know gtx680 is not a proper flagship and was their mid-range. Here is the real one and..... 1000

    Outrageous.
  • ogreslayer - Thursday, February 21, 2013 - link

    Problem here is this gen none of the reviewers chewed out AMD for the 7970. This led Nvidia to think it was totally fine to release GK104 for $500 which was still cheaper then a 7970 but not where that die was originally slotted and to do this utter insanity with a $1000 solution that is more expensive then solutions that are faster then it.

    7950 3-way Crossfire, GTX690, GTX660Ti 3 Way SLI, GTX670SLI and GTX680SLI are all better options for anyone who isn't spending $3000 on cards as even dual card you are better off with the GTX690s in SLI. Poor form Nvidia, poor form. But poor form to every reviewer who gives this an award of any kind. It's time to start taking pricing and availability into the equation.

    I think I'd have much less of an issue if partners had access to GK110 dies binned for slightly lower clocks and limited to 3GB at 750-800. I'd wager you'd hit close to the same performance window at a more reasonable price that people wouldn't have scoffed at. GTX670SLI is about $720...
  • HisDivineOrder - Thursday, February 21, 2013 - link

    Pretty much agree. GPU reviewers of late have been so forgiving toward nVidia and AMD for all kinds of crap. They don't seem to have the cahoneys to put their foot down and say, "This far, no farther!"

    They just keep bowing their head and saying, "Can I have s'more, please?" Pricing is way out of hand, but the reviewers here and elsewhere just seem to be living in a fantasy world where these prices make even an iota of sense.

    That said, the Titan is a halo card and I don't think 99% of people out there are even supposed to be considering it.

    This is for that guy you read about on the forum thread who says he's having problems with quad-sli working properly. This is for him to help him spend $1k more on GPU's than he already would have.

    So then we can have a thread with him complaining about how he's not getting optimal performance from his $3k in GPU's. And how, "C'mon, nVidia! I spent $3k in your GPU's! Make me a custom driver!"

    Which, if I'd spent 3k in GPU's, I'd probably want my very own custom driver, too.
  • ronin22 - Thursday, February 21, 2013 - link

    For 3k, you can pay a good developer (all cost included) for about 5 days, to build your custom driver.

    Good luck with that :D
  • CeriseCogburn - Tuesday, February 26, 2013 - link

    I can verify that programmer pricing personally.

    Here is why we have crap amd crashing and driver problems only worsening still.
    33% CF failure, right frikkin now.
    Driver teams decimated by losing financial reality.

    "Investing" as our many local amd fanboy retard destroyers like to proclaim, in an amd card, is one sorry bet on the future.
    It's not an investment.

    If it weren't for the constant crybaby whining about price in a laser focused insane fps only dream world of dollar pinching beyond the greatest female coupon clipper in the world's OBSESSION level stupidity, I could stomach an amd fanboy buying Radeons at full price and not WHINING in an actual show of support for the failing company they CLAIM must be present for "competition" to continue.

    Instead our little hoi polloi amd ragers rape away at amd's failed bottom line, and just shortly before screamed nVidia would be crushed out of existence by amd's easy to do reduction in prices.... it went on and on and on for YEARS as they were presented the REAL FACTS and ignored them entirely.
    Yes, they are INSANE.
    Perhaps now they have learned to keep their stupid pieholes shut in this area, as their meme has been SILENCED for it's utter incorrectness.
    Thank God for small favors YEARS LATE.

    Keep crying crybabies, it's all you do now, as you completely ignore amd's utter FAILURE in the driver department and are STUPID ENOUGH to unconsciously accept "the policy" about dual card usage here, WHEN THE REALITY IS NVIDIA'S CARDS ALWAYS WORK AND AMD'S FAIL 33% OF THE TIME.

    So recommending CROSSFIRE cannot occur, so here is thrown the near perfect SLI out with the biased waters.

    ANOTHER gigantic, insane, lie filled BIAS.

    Congratulations amd fanboys, no one could possibly be more ignorant nor dirtier. That's what lying and failure is all about, it's all about amd and their little CLONES.
  • CeriseCogburn - Saturday, February 23, 2013 - link

    Prices have been going up around the world for a few years now.

    Of course mommies basement has apparently not been affected by the news.
  • trajan2448 - Thursday, February 21, 2013 - link

    Awesome card! best single GPU on the planet at the moment. Almost 50% better in frame latencies than 7970. Crossfire,don't make me laugh. here's an analysis. Many of the frames "rendered" by the 7970 and especially Crossfire aren't visible.
    http://www.pcper.com/reviews/G...
  • CeriseCogburn - Thursday, February 21, 2013 - link

    So amd has been lying, and the fraps boys have been jiving for years now....
    It's coming out - the BIG LIE of the AMD top end cards... LOL
    Fraudster amd and their idiot fanboys are just about finished.

    http://www.pcper.com/reviews/Graphics-Cards/NVIDIA...

    LOL- shame on all the dummy reviewers
  • Alucard291 - Sunday, February 24, 2013 - link

    What you typed here sounds like sarcasm.

    And you're actually serious aren't you?

    That's really cute. But can you please take your comments to 4chan/engadget where they belong.
  • CeriseCogburn - Sunday, February 24, 2013 - link

    Ok troll, you go to wherever the clueless reign. You will fit right in.

    Those aren't suppositions I made, they are facts.

Log in

Don't have an account? Sign up now