Titan’s Compute Performance (aka Ph.D Lust)

Because GK110 is such a unique GPU from NVIDIA when it comes to compute, we’re going to shake things up a bit and take a look at compute performance first before jumping into our look at gaming performance.

On a personal note, one of the great things about working at AnandTech is all the people you get to work with. Anand himself is nothing short of fantastic, but what other review site also has a Brian Klug or a Jarred Walton? We have experts in a number of fields, and as a computer technology site that includes of course includes experts in computer science.

What I’m trying to say is that for the last week I’ve been having to fend off our CS guys, who upon hearing I had a GK110 card wanted one of their own. If you’ve ever wanted proof of just how big a deal GK110 is – and by extension Titan – you really don’t have to look too much farther than that.

Titan, its compute performance, and the possibilities it unlocks is a very big deal for researchers and other professionals that need every last drop of compute performance that they can get, for as cheap as they can get it. This is why on the compute front Titan stands alone; in NVIDIA’s consumer product lineup there’s nothing like it, and even AMD’s Tahiti based cards (7970, etc), while potent, are very different from GK110/Kepler in a number of ways. Titan essentially writes its own ticket here.

In any case, as this is the first GK110 product that we have had access to, we couldn’t help but run it through a battery of tests. The Tesla K20 series may have been out for a couple of months now, but at $3500 for the base K20 card, Titan is the first GK110 card many compute junkies are going to have real access to.

To that end I'd like to introduce our newest writer, Rahul Garg, who will be leading our look at Titan/GK110’s compute performance. Rahul is a Ph.D student specializing in the field of parallel computing and GPGPU technology, making him a prime candidate for taking a critical but nuanced look at what GK110 can do. You will be seeing more of Rahul in the future, but first and foremost he has a 7.1B transistor GPU to analyze. So let’s dive right in.

By: Rahul Garg

For compute performance, we first looked at two common benchmarks: GEMM (measures performance of dense matrix multiplication) and FFT (Fast Fourier Transform). These numerical operations are important in a variety of scientific fields. GEMM is highly parallel and typically compute heavy, and one of the first tests of performance and efficiency on any parallel architecture geared towards HPC workloads. FFT is typically memory bandwidth bound but, depending upon the architecture, can be influenced by inter-core communication bandwidth. Vendors and third-parties typically supply optimized libraries for these operations. For example, Intel supplies MKL for Intel processors (including Xeon Phi) and AMD supplies ACML and OpenCL-based libraries for their CPUs and GPUs respectively.  Thus, these benchmarks measure the performance of the combination of both the hardware and software stack.

For GEMM, we tested the performance of NVIDIA's CUBLAS library supplied with CUDA SDK 5.0, on SGEMM (single-precision/fp32 GEMM) and DGEMM (double precision/fp64 GEMM) on square matrices of size 5k by 5k. For SGEMM on Titan, the data reported here was collected with boost disabled. We also conducted the experiments with boost enabled on Titan, but found that the performance was effectively equal to the non-boost case. We assume that it is because our test ran for a very short period of time and perhaps did not trigger boost. Therefore, for the sake of simpler analysis, we report the data with boost disabled on the Titan. If time permits, we may return to the boost issue in a future article for this benchmark.

Apart from the results collected by us for GTX Titan, GTX 680 and GTX 580, we refer to experiments conducted by Matsumoto, Nakasato and Sedukin reported in a technical report filed at the University of Aizu about GEMM on Radeon 7970.  Their exact parameters and testbed are different than ours, and we include their results for illustrative purposes, as a ballpark estimate only. The results are below.

DGEMM

Titan rules the roost amongst the three listed cards in both SGEMM and DGEMM by a wide margin. We have not included Intel's Xeon Phi in this test, but the TItan's achieved performance is higher than the theoretical peak FLOPS of the current crop of Xeon Phi. Sharp-eyed readers will have observed that the Titan achieves about 1.3 teraflops on DGEMM, while the listed fp64 theoretical peak is also 1.3 TFlops; we were not expecting 100% of peak on the Titan in DGEMM. NVIDIA clarified that the fp64 rating for the Titan is a conservative estimate. At 837MHz, the calculated fp64 peak of Titan is 1.5 TFlops. However, under heavy load in fp64 mode, the card may underclock below the listed 837MHz to remain within the power and thermal specifications. Thus, fp64 ALU peak can vary between 1.3 TFlops and 1.5 TFlops and our DGEMM results are within expectations.

Next, we consider the percentage of fp32 peak achieved by the respective SGEMM implementations. These are plotted below.

Percentage of peak achieved on SGEMM

Titan achieves about 71% of its peak while GTX 680 only achieves about 40% of the peak. It is clear that while both GTX 680 and Titan are said to be Kepler architecture chips, Titan is not just a bigger GTX 680. Architectural tweaks have been made that enable it to reach much higher efficiency than the GTX 680 on at least some compute workloads. GCN based Radeon 7970 obtains about 63% of peak on SGEMM using Matsumoto et al. algorithm, and Fermi based GTX 580 also obtains about 63% of peak using CUBLAS.

For FFT, we tested the performance of 1D complex-to-complex inplace transforms of size 225 using the CUFFT library. Results are given below.

FFT single precision

FFT double precision

Titan outperforms the GTX 680 in FFT by about 50% in single-precision. We suspect this is primarily due to increased memory bandwidth on Titan compared to GTX 680 but we have not verified this hypothesis.  GTX 580 has a slight lead over the GTX 680. Again, if time permits, we may return to the benchmark for a deeper analysis. Titan achieves about 3.4x the performance of GTX 680 but this is not surprising given the poor fp64 execution resources on the GTX 680.

We then looked at an in-house benchmark called SystemCompute, developed by our own Ian Cutress. The benchmark tests the performance on a variety of sample kernels that are representative of some scientific computing applications. Ian described the CPU version of these benchmarks in a previous article. Ian wrote the GPU version of the benchmarks in C++ AMP, which is a relatively new GPGPU API introduced by Microsoft in VS2012.

Microsoft's implementation of AMP compiles down to DirectCompute shaders. These are all single-precision benchmarks and should run on any DX11 capable GPU. The benchmarks include 2D and 3D finite difference solvers, 3d particle movement, n-body benchmark and a simple matrix multiplication algorithm. Boost is enabled on both the Titan and GTX 680 for this benchmark. We give the score reported by the benchmark for both cards, and report the speedup of the Titan over 680. Speedup greater than 1 implies Titan is faster, while less than 1 implies a slowdown.

SystemCompute scores (higher is better)
Benchmark GTX 580 GTX 680 GTX Titan Speedup of Titan
over GTX 680
2D FD 9053 8445 12461 1.47
3D FD 3133 3827 5263 1.37
3DPmo 41722 26955 40397 1.49
MatMul 172 197 229 1.16
nbody 918 1517 2418 1.59

The benchmarks show between 16% and 60% improvement, with the most improvement coming from the relatively FLOP-heavy n-body benchmark. Interestingly, GTX 580 wins over the Titan in 3DPMo and wins over the 680 in 3DPmo and 2D.

Overall, GTX Titan is an impressive accelerator from compute perspective and posts large gains over its predecessors.

The Final Word On Overclocking Titan’s Compute Performance, Cont
Comments Locked

337 Comments

View All Comments

  • JeBarr - Thursday, February 21, 2013 - link

    I would guess because as time goes by the reviewers here (and elsewhere) think they need to bench at settings used by the "majority". Even when that majority doesn't frequent, or even know the existance of, Anandtech.com. Go figure.

    I don't like it any more than you do...but for different reasons.

    I for one was happy to have a review site still benching at 16:10...which is what the long-time hardware enthusiasts/gamers prefer, that is, when they can't find a good CRT monitor ;)

    Just think of this review as the new bench standard going forward. A new starting point, if you will.
  • Ryan Smith - Monday, February 25, 2013 - link

    Bench 2013 will be going live soon. The backend is done (it's what I used to store and generate the charts here), but the frontend is part of a larger project...

    As for why the settings change, when we refresh our suite we sometimes change our settings to match what the latest generation of cards can do. When Titan sets the high bar for example, running 2560 at Ultra with 4xMSAA is actually practical.
  • TheJian - Thursday, February 21, 2013 - link

    NO Borderlands 2 (~6 million copies sold rated 89! not counting the addons rated high also)
    No Diablo3 (I hate the DRM but 10million+ sold of course rated high, but not by users)
    No Guild 2 (MMO with 3million copies sold rated 90!) even WOW Mists of pandaria has 3million or so now and 11 million playing the game's total content. I don't play WOW but it's still got a TON of users.
    No Assassin's Creed 3 (brings 680/7970 to low 30's 2560x1600)
    Crysis 3, warhead needs to die, and this needs to replace it (at the very LEAST). As shown below NOBODY is playing warhead. Wasted page space, and time spend benching it.

    Instead we get Crysis warhead...ROFL Well what can we expect Ryan still loves AMD.
    http://www.gametracker.com/search/warhead/
    Notice all the empty servers? Go ahead list them by players only 3 had over 10!..Most are ZERO players...LOL...Why even waste your time benchmarking this ignored game? Just to show NV weakness?
    Dirt Showdown - Raise your hand if you play this...Nope, you're all playing Dirt3 (wisely, or F1 etc anything that rates better than showdown)
    User ratings on metacritic of 70/4.7 (out of TEN not 5) and best summarized by gamespy (rated it a 40/100 on the frontpage of the metacritic site: http://www.metacritic.com/game/pc/dirt-showdown
    "DiRT: Showdown delivers bargain-basement entertainment value for the high, high price of $50. With its neutered physics, limited driving venues, clunky multiplayer, and diminished off-road racing options, discerning arcade racing fans should just write this one off as an unanticipated pothole in Codemaster's trailblazing DiRT series. "
    If you're going to use a racing game, at least make it a good one, not just the one AMD wins in. Why not F1 2012 (scored 80 at metacritic/6.8 from users). AMD wins in warhead which is also why crysis warhead is chosen even though nobody plays it (it's from 2008!). Again check the server list, who are you testing this for? What does it represent today? What other game based on it's engine? It's representing nothing correct? Nobody plays showdown either.

    How about adding some games people actually PLAY. I thought the whole point of benchmarking is to show us how games WE PLAY will run, is that not true at anandtech?

    Also no discussion of the frame delay ala Techreport:
    http://techreport.com/review/24381/nvidia-geforce-...
    No discussion of the frame latency issues that AMD is working on game by game. Their current beta I think just fixed the skyrim/borderland/guild wars2 issues which were awful.
    http://techreport.com/review/24218/a-driver-update...
    This has been an ongoing problem Anantech (ryan?) seems to just ignore. AMD is just getting to fixing this stuff in Jan...LOL. You can read more about it in the rematch of the 660TI/7950 here:
    http://techreport.com/review/23981/radeon-hd-7950-...
    Of course you can start at the beginning but this is where they recommend the 660TI and why (dec 2012 article).
    "The FPS average suggests near-parity performance between the 7950 and the GTX 660 Ti, with a tiny edge to the GeForce. The 99th percentile frame time, though, captures the impact of the Radeon's frame latency issues and suggests the GTX 660 Ti is easily the superior performer."
    More:
    "Instead, we have a crystal clear recommendation of the GeForce GTX 660 Ti over the Radeon HD 7950 for this winter's crop of blockbuster games. Perhaps AMD will smooth out some of the rough patches in later driver releases, but the games we've tested are already on the market—and Nvidia undeniably delivers the better experience in them, overall. "
    Even Tomshardware reports on delays now (albeit the wrong metric...LOL). Read the comments at techreport for why they're using the wrong one.

    No wonder they left out the xmas blockbusters and diablo3 (which will still sell probably 15million over it's life even though I would never buy it). I can name other games that are hot and new also:
    Dishonored, Deadspace 3, max payne 3, all highly rated. Max 3 barely hits 50's on top cards at 2560x1600 (7970ghz, 680 even lower), excellent test game and those are NOT the minimums (which can bring you to 20's/teens on lower cards). Witcher 2 (witcher 3 is coming), with uber sampling ENABLED is a taxer also.

    Dragon Age 2 at 2560x1600 will bring 7970/680 to teens/20's at minimums also, barely hits 40's avg (why use ONLY AVG at techspot I don't know, but better than maxes).
    http://www.techspot.com/review/603-best-graphics-c...

    START reporting MIN FPS for every game benched! There should be more discussion of the fact that in a lot of these games you hit teens for even $500 cards at 2560x1600 maxed out. Max fps means NOTHING. IF you hit 10-20fps a lot in a game your max means nothing. You won't want to play at that res, so what have you shown me? NOTHING. You should ALWAYS report MIN FPS as that dictates our gameplay experience and if it isn't always above 30 life sucks usually. Farcry 3 hits below 30 on both 680/7970 at 2560x1600.
    http://www.hardocp.com/article/2013/02/21/nvidia_g...
    And they don't have them on ULTRA, only titan is and none on 4xmsaa. At least they're giving max details/res you can expect to play and what it's min will be (better, you at least have USEFUL info after reading their benchmarks).

    From your article:
    "This is enough to get Titan to 74fps at 2560 with 4xMSAA, which is just fast enough to make BF3 playable at those settings with a single GPU."
    Why didn't you just report the minimums so we can see when ALL cards hit 30fps or less in all resolutions tested? If the game doesn't give a way to do this use fraps while running it (again, for ALL games). So it takes 74fps to get playable in BF3? It's easier to just give the minimums so people can see, otherwise are we supposed to attempt to extrapolate every one of your games without MINS listed? You did it for us in this sentence, but for ONE card and even then it's just a comment, not a number we can work with. It's YOU extrapolating your own guess that it would be playable given 74fps. What kind of benchmarking is this? I won't even get into your other comments throughout the articles on titan, It's more important to me to key on what you totally ignore that is VERY important to anyone picking ANY gpu. SMOOTHNESS of gameplay (latency testing) and MIN FPS so we know where we have no prayer of playing or what to expect playable on a given gpu. This is why Hardocp actually points to you guys as why your benchmarks suck. It's linked in most of their articles...LOL. FIX IT.
    http://www.hardocp.com/article/2008/02/11/benchmar...
    They have that in nearly every gpu article including the titan article. It's a valid point. But if you're not going to use IN GAME play, at least give min fps for canned etc. That link is in the test setup page of nearly every article on hardocp, you'd think you'd fix this so they'd stop. Your benchmarks represent something that doesn't reflect gameplay in most cases. The maxfps doesn't dictate fun factor. MIN does.

    One comment on Titan, I'd think about it at $800-850. Compute isn't important today at home for me, and won't be until more games use it like civ5 (they're just scratching surface here). At that point this card could become a monster compared to 690 without heat, noise etc. One day it may be worth $1000 to me, but for now it's not worth more than $800 (to me, no SFF needed, no compute needed). I don't like any dual chips or running multiple cards (see microstutter, latency delays etc), so once cheaper this would be tops on my list, but I don't usually spend over $360 on a card anyway...LOL. Most of the first run will go to boutique shops (20K first run I think). Maybe they'll drop it after that.

    LOL at anyone thinking the price sucks. Clearly you are NOT the target market. If you're product sells out at a given price, you priced it right. That's good business, and actually you probably should have asked more if it's gone in hours. You can still an SLI of titan in SFF, what other card can do that? You always pay a premium for the TOP card. Intel's extreme chips are $1000 too...No surprise. Same thing on the pro side is $2500 and not much different. IT's 20% slower than 690, but 690 can't go into SFF for the most part and certainly not as quiet or controllable. Also blows away 690 in compute if someone is after that. Though they need APPS that test this, not some home made anandtech benchmark. How about testing something I can actually USE and is relevant (no I don't count folding@home or bitcoin mining either, they don't make me money-a few coins?...LOL).
  • JeBarr - Thursday, February 21, 2013 - link

    I'm pretty sure Ryan has mentioned the benches you want are forthcoming. Maybe they haven't figured it all out yet...i dunno....but like you, I've been waiting what seems like a year or more for Anandtech to catch up with reality in GPU benching.
  • CeriseCogburn - Tuesday, February 26, 2013 - link

    Yes, well I've found Frame Rate Target to be an absolute GEM in this area:

    " START reporting MIN FPS for every game benched! There should be more discussion of the fact that in a lot of these games you hit teens for even $500 cards at 2560x1600 maxed out. Max fps means NOTHING. IF you hit 10-20fps a lot in a game your max means nothing. "

    If you crank to max settings then have frame drop issues, FRAME RATE TARGET by nVidia of course, is excellent for minimizing and eliminating that issue.
    It really is a great and usable feature, and of course is for the most now already completely ignored.

    It was ported back to at least the top 500 series cards I don't remember exactly which ones right now, but that feature should have an entire article dedicated to it at every review site. It is AWESOME, and directly impacts minimum frame rates lofting nVidia to absolutely playable vs amd.

    I really think the bias won't ever be overcome. We used to hear nothing but eyefinity, yet now with nvidia cards capable of 4 monitors out of the box, it has suddenly become very unpopular for reviewers to mention eyefinity, surround, and surround plus ONE MORE in the nVidia case, without the need for any special adapters in many of nViida's partners card releases.

    So, it's really a sick situation.
  • Urbanos - Friday, February 22, 2013 - link

    he went through all the trouble of benchmarking in order to show that entry points for budget conscious users can get through Titan, but it doesn't actually prove that Titan is even worth the money without comparing it to at least 1 of its bigger competitors in the GPGPU market. Can you please consider adding that or having a new review based on the compute only.
  • codedivine - Friday, February 22, 2013 - link

    I am certainly interested in looking at the Xeon Phi if I can find the time and if we can arrange the resources to do so.

    My performance expectation (based on Intel white papers) is about 1700-1800 GFLops for SGEMM and 800-900 GFlops for DGEMM on the Xeon Phi 5110P. However, there are also a few benchmarks where I am expecting them to win as well thanks to the large cache on the Phi. Stay tuned.
  • Ryan Smith - Monday, February 25, 2013 - link

    This is really a consumer/prosumer level review, so the cards we're going to judge it against need to be comparable in price and intended audience. Not only can we not get some of those parts, but all of them cost many times more than Titan.

    If we were ever able to review K20, then they would be exactly the kinds of parts we'd try to include though.
  • kivig - Friday, February 22, 2013 - link

    There is a whole community of 3D people interested.
    Or when it will get added to bench table?
  • etriky - Saturday, February 23, 2013 - link

    +1
    Since this card at this price point is pointless for gaming I figured the article would be heavy on compute applications in order to give us a reason for it's existence.

    But then, nothing. No SmallLuxGpu or Cycles. Not even any commercial packages like Octane, or any of the Adobe products. I know LuxGPU and Blender used to be in the test suite. What happened?

Log in

Don't have an account? Sign up now