Original Link: http://www.anandtech.com/show/6774/nvidias-geforce-gtx-titan-part-2-titans-performance-unveiled
NVIDIA’s GeForce GTX Titan Review, Part 2: Titan's Performance Unveiledby Ryan Smith & Rahul Garg on February 21, 2013 9:00 AM EST
Earlier this week NVIDIA announced their new top-end single-GPU consumer card, the GeForce GTX Titan. Built on NVIDIA’s GK110 and named after the same supercomputer that GK110 first powered, the GTX Titan is in many ways the apex of the Kepler family of GPUs first introduced nearly one year ago. With anywhere between 25% and 50% more resources than NVIDIA’s GeForce GTX 680, Titan is intended to be the ultimate single-GPU card for this generation.
Meanwhile with the launch of Titan NVIDIA has repositioned their traditional video card lineup to change who the ultimate video card will be chasing. With a price of $999 Titan is decidedly out of the price/performance race; Titan will be a luxury product, geared towards a mix of low-end compute customers and ultra-enthusiasts who can justify buying a luxury product to get their hands on a GK110 video card. So in many ways this is a different kind of launch than any other high performance consumer card that has come before it.
So where does that leave us? On Tuesday we could talk about Titan’s specifications, construction, architecture, and features. But the all-important performance data would be withheld another two days until today. So with Thursday finally upon us, let’s finish our look at Titan with our collected performance data and our analysis.
Titan: A Performance Summary
|GTX Titan||GTX 690||GTX 680||GTX 580|
|Stream Processors||2688||2 x 1536||1536||512|
|Texture Units||224||2 x 128||128||64|
|ROPs||48||2 x 32||32||48|
|Memory Clock||6.008GHz GDDR5||6.008GHz GDDR5||6.008GHz GDDR5||4.008GHz GDDR5|
|Memory Bus Width||384-bit||2 x 256-bit||256-bit||384-bit|
|VRAM||6GB||2 x 2GB||2GB||1.5GB|
|FP64||1/3 FP32||1/24 FP32||1/24 FP32||1/8 FP32|
|Transistor Count||7.1B||2 x 3.5B||3.5B||3B|
|Manufacturing Process||TSMC 28nm||TSMC 28nm||TSMC 28nm||TSMC 40nm|
On paper, compared to GTX 680, Titan offers anywhere between a 25% and 50% increase in resource. At the starting end, Titan comes with 25% more ROP throughput, a combination of Titan’s 50% increase in ROP count and simultaneous decrease in clockspeeds relative to GTX 680. Shading and texturing performance meanwhile benefits even more from the expansion of the number of SMXes, from 8 to 14. And finally, Titan has a full 50% more memory bandwidth than GTX 680.
Setting aside the unique scenario of compute for a moment, this means that Titan will be between 25% and 50% faster than GTX 680 in GPU limited situations, depending on the game/application and its mix of resource usage. For an industry and userbase still trying to come to terms with the loss of nearly annual half-node jumps, this kind of performance jump on the same node is quite remarkable. At the same time it also sets expectations for how future products may unfold; one way to compensate for the loss of the rapid cadence in manufacturing nodes is to spread out the gains from a new node over multiple years, and this is essentially what we’ve seen with the Kepler family by launching GK104, and a year later GK110.
In any case, while Titan can improve gaming performance by up to 50%, NVIDIA has decided to release Titan as a luxury product with a price roughly 120% higher than the GTX 680. This means that Titan will not be positioned to push the price of NVIDIA’s current cards down, and in fact it’s priced right off the currently hyper-competitive price-performance curve that the GTX 680/670 and Radeon HD 7970GE/7970 currently occupy.
|February 2013 GPU Pricing Comparison|
|$1000||GeForce Titan/GTX 690|
|(Unofficial) Radeon HD 7990||$900|
|Radeon HD 7970 GHz Edition||$450||GeForce GTX 680|
|Radeon HD 7970||$390|
|$350||GeForce GTX 670|
|Radeon HD 7950||$300|
This setup isn’t unprecedented – the GTX 690 more or less created this precedent last May – but it means Titan is a very straightforward case of paying 120% more for 50% more performance; the last 10% always costs more. What this means is that the vast majority of gamers will simply be shut out from Titan at this price, but for those who can afford Titan’s $999 price tag NVIDIA believes they have put together a powerful card and a convincing case to pay for luxury.
So what can potential Titan buyers look forward to on the performance front? As always we’ll do a complete breakdown of performance in the following pages, but we wanted to open up this article with a quick summary of performance. So with that said, let’s take a look at some numbers.
|GeForce GTX Titan Performance Summary (2560x1440)|
|vs. GTX 680||vs. GTX 690||vs. R7970GE||vs. R7990|
|Total War: Shogun 2||50%||-15%||62%||1%|
|Far Cry 3||35%||-23%||37%||-15%|
Looking first at NVIDIA’s product line, Titan is anywhere between 33% and 54% faster than the GTX 680. In fact with the exception of Hitman: Absolution, a somewhat CPU-bound benchmark, Titan’s performance relative to the GTX 680 is actually very consistent at a narrow 45%-55% range. Titan and GTX 680 are of course based on the same fundamental Kepler architecture, so there haven’t been any fundamental architecture changes between the two; Titan is exactly what you’d expect out of a bigger Kepler GPU. At the same time this is made all the more interesting due to the fact that Titan’s real-world performance advantage of 45%-55% is so close to its peak theoretical performance advantage of 50%, indicating that Titan doesn’t lose much (if anything) in efficiency when scaled up, and that the games we’re testing today favor memory bandwidth and shader/texturing performance over ROP throughput.
Moving on, while Titan offers a very consistent performance advantage over the architecturally similar GTX 680, it’s quite a different story when compared to AMD’s fastest single-GPU product, the Radeon HD 7970 GHz Edition. As we’ve seen time and time again this generation, the difference in performance between AMD and NVIDIA GPUs not only varies with the test and settings, but dramatically so. As a result Titan is anywhere between being merely equal to the 7970GE to being nearly a generation ahead of it.
At the low-end of the scale we have DiRT: Showdown, where Titan’s lead is less than 3%. At the other end is Total War: Shogun 2, where Titan is a good 62% faster than the 7970GE. The average gain over the 7970GE is almost right in the middle at 34%, reflecting a mix of games where the two are close, the two are far, and the two are anywhere in between. With recent driver advancements having helped the 7970GE pull ahead of the GTX 680, NVIDIA had to work harder to take back their lead and to do so in an concrete manner.
Titan’s final competition are the dual-GPU cards of this generation, the GK104 based GTX 690, and the officially unofficial Tahiti based HD 7990 cards, which vary in specs but generally have just shy of the performance of a pair of 7970s. As we’ve seen in past generations, when it comes to raw performance one big GPU is no match for two smaller GPUs, and the same is true with Titan. For frames per second and nothing else, Titan cannot compete with those cards. But as we’ll see there are still some very good reasons for Titan’s existence, and areas Titan excels at that even two lesser GPUs cannot match.
None of this of course accounts for compute. Simply put, Titan stands alone in the compute world. As the first consumer GK110 GPU based video card there’s nothing quite like it. We’ll see why that is in our look at compute performance, but as far as the competitive landscape is concerned there’s not a lot to discuss here.
The Final Word On Overclocking
Before we jump into our performance breakdown, I wanted to take a few minutes to write a bit of a feature follow-up to our overclocking coverage from Tuesday. Since we couldn’t reveal performance numbers at the time – and quite honestly we hadn’t even finished evaluating Titan – we couldn’t give you the complete story on Titan. So some clarification is in order.
On Tuesday we discussed how Titan reintroduces overvolting for NVIDIA products, but now with additional details from NVIDIA along with our own performance data we have the complete picture, and overclockers will want to pay close attention. NVIDIA may be reintroducing overvolting, but it may not be quite what many of us were first thinking.
First and foremost, Titan still has a hard TDP limit, just like GTX 680 cards. Titan cannot and will not cross this limit, as it’s built into the firmware of the card and essentially enforced by NVIDIA through their agreements with their partners. This TDP limit is 106% of Titan’s base TDP of 250W, or 265W. No matter what you throw at Titan or how you cool it, it will not let itself pull more than 265W sustained.
Compared to the GTX 680 this is both good news and bad news. The good news is that with NVIDIA having done away with the pesky concept of target power versus TDP, the entire process is much simpler; the power target will tell you exactly what the card will pull up to on a percentage basis, with no need to know about their separate power targets or their importance. Furthermore with the ability to focus just on just TDP, NVIDIA didn’t set their power limits on Titan nearly as conservatively as they did on GTX 680.
The bad news is that while GTX 680 shipped with a max power target of 132%, Titan is again only 106%. Once you do hit that TDP limit you only have 6% (15W) more to go, and that’s it. Titan essentially has more headroom out of the box, but it will have less headroom for making adjustments. So hardcore overclockers dreaming of slamming 400W through Titan will come away disappointed, though it goes without saying that Titan’s power delivery system was never designed for that in the first place. All indications are that NVIDIA built Titan’s power delivery system for around 265W, and that’s exactly what buyers will get.
Second, let’s talk about overvolting. What we didn’t realize on Tuesday but realize now is that overvolting as implemented in Titan is not overvolting in the traditional sense, and practically speaking I doubt too many hardcore overclockers will even recognize it as overvolting. What we mean by this is that overvolting was not implemented as a direct control system as it was on past generation cards, or even the NVIDIA-nixed cards like the MSI Lightning or EVGA Classified.
Overvolting is instead a set of two additional turbo clock bins, above and beyond Titan’s default top bin. On our sample the top bin is 1.1625v, which corresponds to a 992MHz core clock. Overvolting Titan to 1.2 means unlocking two more bins: 1006MHz @ 1.175v, and 1019MHz @ 1.2v. Or put another way, overvolting on Titan involves unlocking only another 27MHz in performance.
These two bins are in the strictest sense overvolting – NVIDIA doesn’t believe voltages over 1.1625v on Titan will meet their longevity standards, so using them is still very much going to reduce the lifespan of a Titan card – but it’s probably not the kind of direct control overvolting hardcore overclockers were expecting. The end result is that with Titan there’s simply no option to slap on another 0.05v – 0.1v in order to squeak out another 100MHz or so. You can trade longevity for the potential to get another 27MHz, but that’s it.
Ultimately, this means that overvolting as implemented on Titan cannot be used to improve the clockspeeds attainable through the use of the offset clock functionality NVIDIA provides. In the case of our sample it peters out after +115MHz offset without overvolting, and it peters out after +115MHz offset with overvolting. The only difference is that we gain access to a further 27MHz when we have the thermal and power headroom available to hit the necessary bins.
|GeForce GTX Titan Clockspeed Bins|
Finally, as with the GTX 680 and GTX 690, NVIDIA will be keeping tight control over what Asus, EVGA, and their other partners release. Those partners will have the option to release Titan cards with factory overclocks and Titan cards with different coolers (i.e. water blocks), but they won’t be able to expose direct voltage control or ship parts with higher voltages. Nor for that matter will they be able to create Titan cards with significantly different designs (i.e. more VRM phases); every Titan card will be a variant on the reference design.
This is essentially no different than how the GTX 690 was handled, but I think it’s something that’s important to note before anyone with dreams of big overclocks throws down $999 on a Titan card. To be clear, GPU Boost 2.0 is a significant improvement in the entire power/thermal management process compared to GPU Boost 1.0, and this kind of control means that no one needs to be concerned with blowing up their video card (accidentally or otherwise), but it’s a system that comes with gains and losses. So overclockers will want to pay close attention to what they’re getting into with GPU Boost 2.0 and Titan, and what they can and cannot do with the card.
Titan’s Compute Performance (aka Ph.D Lust)
Because GK110 is such a unique GPU from NVIDIA when it comes to compute, we’re going to shake things up a bit and take a look at compute performance first before jumping into our look at gaming performance.
On a personal note, one of the great things about working at AnandTech is all the people you get to work with. Anand himself is nothing short of fantastic, but what other review site also has a Brian Klug or a Jarred Walton? We have experts in a number of fields, and as a computer technology site that includes of course includes experts in computer science.
What I’m trying to say is that for the last week I’ve been having to fend off our CS guys, who upon hearing I had a GK110 card wanted one of their own. If you’ve ever wanted proof of just how big a deal GK110 is – and by extension Titan – you really don’t have to look too much farther than that.
Titan, its compute performance, and the possibilities it unlocks is a very big deal for researchers and other professionals that need every last drop of compute performance that they can get, for as cheap as they can get it. This is why on the compute front Titan stands alone; in NVIDIA’s consumer product lineup there’s nothing like it, and even AMD’s Tahiti based cards (7970, etc), while potent, are very different from GK110/Kepler in a number of ways. Titan essentially writes its own ticket here.
In any case, as this is the first GK110 product that we have had access to, we couldn’t help but run it through a battery of tests. The Tesla K20 series may have been out for a couple of months now, but at $3500 for the base K20 card, Titan is the first GK110 card many compute junkies are going to have real access to.
To that end I'd like to introduce our newest writer, Rahul Garg, who will be leading our look at Titan/GK110’s compute performance. Rahul is a Ph.D student specializing in the field of parallel computing and GPGPU technology, making him a prime candidate for taking a critical but nuanced look at what GK110 can do. You will be seeing more of Rahul in the future, but first and foremost he has a 7.1B transistor GPU to analyze. So let’s dive right in.
By: Rahul Garg
For compute performance, we first looked at two common benchmarks: GEMM (measures performance of dense matrix multiplication) and FFT (Fast Fourier Transform). These numerical operations are important in a variety of scientific fields. GEMM is highly parallel and typically compute heavy, and one of the first tests of performance and efficiency on any parallel architecture geared towards HPC workloads. FFT is typically memory bandwidth bound but, depending upon the architecture, can be influenced by inter-core communication bandwidth. Vendors and third-parties typically supply optimized libraries for these operations. For example, Intel supplies MKL for Intel processors (including Xeon Phi) and AMD supplies ACML and OpenCL-based libraries for their CPUs and GPUs respectively. Thus, these benchmarks measure the performance of the combination of both the hardware and software stack.
For GEMM, we tested the performance of NVIDIA's CUBLAS library supplied with CUDA SDK 5.0, on SGEMM (single-precision/fp32 GEMM) and DGEMM (double precision/fp64 GEMM) on square matrices of size 5k by 5k. For SGEMM on Titan, the data reported here was collected with boost disabled. We also conducted the experiments with boost enabled on Titan, but found that the performance was effectively equal to the non-boost case. We assume that it is because our test ran for a very short period of time and perhaps did not trigger boost. Therefore, for the sake of simpler analysis, we report the data with boost disabled on the Titan. If time permits, we may return to the boost issue in a future article for this benchmark.
Apart from the results collected by us for GTX Titan, GTX 680 and GTX 580, we refer to experiments conducted by Matsumoto, Nakasato and Sedukin reported in a technical report filed at the University of Aizu about GEMM on Radeon 7970. Their exact parameters and testbed are different than ours, and we include their results for illustrative purposes, as a ballpark estimate only. The results are below.
Titan rules the roost amongst the three listed cards in both SGEMM and DGEMM by a wide margin. We have not included Intel's Xeon Phi in this test, but the TItan's achieved performance is higher than the theoretical peak FLOPS of the current crop of Xeon Phi. Sharp-eyed readers will have observed that the Titan achieves about 1.3 teraflops on DGEMM, while the listed fp64 theoretical peak is also 1.3 TFlops; we were not expecting 100% of peak on the Titan in DGEMM. NVIDIA clarified that the fp64 rating for the Titan is a conservative estimate. At 837MHz, the calculated fp64 peak of Titan is 1.5 TFlops. However, under heavy load in fp64 mode, the card may underclock below the listed 837MHz to remain within the power and thermal specifications. Thus, fp64 ALU peak can vary between 1.3 TFlops and 1.5 TFlops and our DGEMM results are within expectations.
Next, we consider the percentage of fp32 peak achieved by the respective SGEMM implementations. These are plotted below.
Titan achieves about 71% of its peak while GTX 680 only achieves about 40% of the peak. It is clear that while both GTX 680 and Titan are said to be Kepler architecture chips, Titan is not just a bigger GTX 680. Architectural tweaks have been made that enable it to reach much higher efficiency than the GTX 680 on at least some compute workloads. GCN based Radeon 7970 obtains about 63% of peak on SGEMM using Matsumoto et al. algorithm, and Fermi based GTX 580 also obtains about 63% of peak using CUBLAS.
For FFT, we tested the performance of 1D complex-to-complex inplace transforms of size 225 using the CUFFT library. Results are given below.
Titan outperforms the GTX 680 in FFT by about 50% in single-precision. We suspect this is primarily due to increased memory bandwidth on Titan compared to GTX 680 but we have not verified this hypothesis. GTX 580 has a slight lead over the GTX 680. Again, if time permits, we may return to the benchmark for a deeper analysis. Titan achieves about 3.4x the performance of GTX 680 but this is not surprising given the poor fp64 execution resources on the GTX 680.
We then looked at an in-house benchmark called SystemCompute, developed by our own Ian Cutress. The benchmark tests the performance on a variety of sample kernels that are representative of some scientific computing applications. Ian described the CPU version of these benchmarks in a previous article. Ian wrote the GPU version of the benchmarks in C++ AMP, which is a relatively new GPGPU API introduced by Microsoft in VS2012.
Microsoft's implementation of AMP compiles down to DirectCompute shaders. These are all single-precision benchmarks and should run on any DX11 capable GPU. The benchmarks include 2D and 3D finite difference solvers, 3d particle movement, n-body benchmark and a simple matrix multiplication algorithm. Boost is enabled on both the Titan and GTX 680 for this benchmark. We give the score reported by the benchmark for both cards, and report the speedup of the Titan over 680. Speedup greater than 1 implies Titan is faster, while less than 1 implies a slowdown.
|Benchmark||GTX 580||GTX 680||GTX Titan||
Speedup of Titan
over GTX 680
The benchmarks show between 16% and 60% improvement, with the most improvement coming from the relatively FLOP-heavy n-body benchmark. Interestingly, GTX 580 wins over the Titan in 3DPMo and wins over the 680 in 3DPmo and 2D.
Overall, GTX Titan is an impressive accelerator from compute perspective and posts large gains over its predecessors.
Titan’s Compute Performance, Cont
With Rahul having covered the basis of Titan’s strong compute performance, let’s shift gears a bit and take a look at real world usage.
On top of Rahul’s work with Titan, as part of our 2013 GPU benchmark suite we put together a larger number of compute benchmarks to try to cover real world usage, including the old standards of gaming usage (Civilization V) and ray tracing (LuxMark), along with several new tests. Unfortunately that got cut short when we discovered that OpenCL support is currently broken in the press drivers, which prevents us from using several of our tests. We still have our CUDA and DirectCompute benchmarks to look at, but a full look at Titan’s compute performance on our 2013 GPU benchmark suite will have to wait for another day.
For their part, NVIDIA of course already has OpenCL working on GK110 with Tesla. The issue is that somewhere between that and bringing up GK110 for Titan by integrating it into NVIDIA’s mainline GeForce drivers – specifically the new R314 branch – OpenCL support was broken. As a result we expect this will be fixed in short order, but it’s not something NVIDIA checked for ahead of the press launch of Titan, and it’s not something they could fix in time for today’s article.
Unfortunately this means that comparisons with Tahiti will be few and far between for now. Most significant cross-platform compute programs are OpenCL based rather than DirectCompute, so short of games and a couple other cases such as Ian’s C++ AMP benchmark, we don’t have too many cross-platform benchmarks to look at. With that out of the way, let’s dive into our condensed collection of compute benchmarks.
We’ll once more start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.
Note that for 2013 we have changed the benchmark a bit, moving from using a single leader to using all of the leaders. As a result the reported numbers are higher, but they’re also not going to be comparable with this benchmark’s use from our 2012 datasets.
With Civilization V having launched in 2010, graphics cards have become significantly more powerful since then, far outpacing growth in the CPUs that feed them. As a result we’ve rather quickly drifted from being GPU bottlenecked to being CPU bottlenecked, as we see both in our Civ V game benchmarks and our DirectCompute benchmarks. For high-end GPUs the performance difference is rather minor; the gap between GTX 680 and Titan for example is 45fps, or just less than 10%. Still, it’s at least enough to get Titan past the 7970GE in this case.
Our second test is one of our new tests, utilizing Elcomsoft’s Advanced Office Password Recovery utility to take a look at GPU password generation. AOPR has separate CUDA and OpenCL kernels for NVIDIA and AMD cards respectively, which means it doesn’t follow the same code path on all GPUs but it is using an optimal path for each GPU it can handle. Unfortunately we’re having trouble getting it to recognize AMD 7900 series cards in this build, so we only have CUDA cards for the time being.
Password generation and other forms of brute force crypto is an area where the GTX 680 is particularly weak, thanks to the various compute aspects that have been stripped out in the name of efficiency. As a result it ends up below even the GTX 580 in these benchmarks, never mind AMD’s GCN cards. But with Titan/GK110 offering NVIDIA’s full compute performance, it rips through this task. In fact it more than doubles performance from both the GTX 680 and the GTX 580, indicating that the huge performance gains we’re seeing are coming from not just the additional function units, but from architectural optimizations and new instructions that improve overall efficiency and reduce the number of cycles needed to complete work on a password.
Altogether at 33K passwords/second Titan is not just faster than GTX 680, but it’s faster than GTX 690 and GTX 680 SLI, making this a test where one big GPU (and its full compute performance) is better than two smaller GPUs. It will be interesting to see where the 7970 GHz Edition and other Tahiti cards place in this test once we can get them up and running.
Our final test in our abbreviated compute benchmark suite is our very own Dr. Ian Cutress’s SystemCompute benchmark, which is a collection of several different fundamental compute algorithms. Rahul went into greater detail on this back in his look at Titan’s compute performance, but I wanted to go over it again quickly with the full lineup of cards we’ve tested.
Surprisingly, for all of its performance gains relative to GTX 680, Titan still falls notably behind the 7970GE here. Given Titan’s theoretical performance and the fundamental nature of this test we would have expected it to do better. But without additional cross-platform tests it’s hard to say whether this is something where AMD’s GCN architecture continues to shine over Kepler, or if perhaps it’s a weakness in NVIDIA’s current DirectCompute implementation for GK110. Time will tell on this one, but in the meantime this is the first solid sign that Tahiti may be more of a match for GK110 than it’s typically given credit for.
Meet The 2013 GPU Benchmark Suite & The Test
Having taken a look at the compute side of Titan, let’s finally dive into what most of you have probably been waiting for: our gaming benchmarks.
As this is the first major launch of 2013 it’s also the first time we’ll be using our new 2013 GPU benchmark suite. This benchmark suite should be considered a work in progress at the moment, as it’s essentially incomplete. With several high-profile games due in the next 4 weeks (and no other product launches expected), we expect we’ll be expanding our suite to integrate those latest games. In the meantime we have composed a slightly smaller suite of 8 games that will serve as our base.
|AnandTech GPU Bench 2013 Game List|
|Total War: Shogun 2||Strategy|
|Sleeping Dogs||Action/Open World|
|Far Cry 3||FPS|
Returning to the suite will be Total War: Shogun 2, Civilization V, Battlefield 3, and of course Crysis: Warhead. With no performance-demanding AAA strategy games released in the last year, we’re effectively in a holding pattern for new strategy benchmarks, hence we’re bringing Shogun and Civilization forward. Even 2 years after its release, Shogun 2 can still put an incredible load on a system on its highest settings, and Civilization V is still one of the more advanced games in our suite due to its use of driver command lists for rendering. With Company of Heroes 2 due here in the near future we may finally get a new strategy game worth benchmarking, while Total War will be returning with Rome 2 towards the end of this year.
Meanwhile Battlefield 3 is still among the most popular multiplayer FPSes, and though newer video cards have lightened its system-killer status, it still takes a lot of horsepower to play. Furthermore the engine behind it, Frostbite 2, is used in a few other action games, and will be used for Battlefield 4 at the end of this year. Finally we have the venerable Crysis: Warhead, our legacy entry. As the only DX10 title in the current lineup it’s good for tracking performance against our oldest video cards, plus it’s still such a demanding game that only the latest video cards can play it at high framerates and resolutions with MSAA.
As for the new games in our suite, we have added DiRT: Showdown, Hitman: Absolution, Sleeping Dogs, and Far Cry 3. DiRT: Showdown is the annual refresh of the DiRT racing franchise from Codemasters, based upon their continually evolving racer engine. Meanwhile Hitman: Absolution is last year’s highly regarded third person action game, and notably in this day and age features a built-in benchmark, albeit a bit of a CPU-intensive one. As for Sleeping Dogs, it’s a rare treat in that it’s a benchmarkable open world game (open world games having benchmarks is practically unheard of) giving us a rare chance to benchmark something from this genre. And finally we have Far Cry 3, the latest rendition of the Far Cry franchise. A popular game in its own right, its jungle environment can be particularly punishing.
These games will be joined throughout the year by additional games as we find games that meet our needs and standards, and for which we can create meaningful benchmarks and validate their performance. As with 2012 we’re looking at having roughly 10 game benchmarks at any given time.
Meanwhile from a settings and resolution standpoint we have finally (and I might add, begrudgingly) moved from 16:10 resolutions to 16:9 resolutions in most cases to better match the popularity of 1080p monitors and the recent wave of 1440p IPS monitors. Our primary resolutions are now 2560x1440, 1920x1080, and 1600x900, with an emphasis on 1920x1080 at lower setting ahead of dropping to lower resolutions, given the increasing marginalization of monitors with sub-1080p resolutions. The one exception to these resolutions is our triple-monitor resolution, which stays at 5760x1200. This is purely for technical reasons, as NVIDIA’s drivers do not consistently offer us 5760x1080 on the 1920x1200 panels we use for testing.
As for the testbed itself, we’ve changed very little. Our testbed remains our trusty 4.3GHz SNB-E, backed with 16GB of RAM and running off of a 256GB Samsung 470 SSD. The one change we have made here is that having validated our platform as being able to handle PCIe 3.0 just fine, we are forcibly enabling PCIe 3.0 on NVIDIA cards where it’s typically disabled. NVIDIA disables PCIe 3.0 by default on SNB-E systems due to inconsistencies in the platform, but as our goal is to remove every non-GPU bottleneck, we have little reason to leave PCIe 3.0 disabled. Especially since most buyers will be on Ivy Bridge platforms where PCIe 3.0 is fully supported.
Finally, we’ve also used this opportunity to refresh a couple of our cards in our test suite. AMD’s original press sample for the 7970 GHz Edition was a reference 7970 with the 7970GE BIOS, a configuration that was more-or-less suitable for the 7970GE, but not one AMD’s partners followed. Since all of AMD’s partners are using open air cooling, we’ve replaced our AMD sample with HIS’s 7970 IceQ X2 GHz Edition, a fairly typical representation of the type of dual-fan coolers that are common on 7970GE cards. Our 7970GE temp/noise results should now be much closer to what retail cards will do, though performance is unchanged.
Unfortunately we’ve had to deviate from that almost immediately for CrossFire testing. Our second HIS card was defective, so due to time constraints we’re using our original AMD 7970GE as our second card for CF testing. This has no impact on performance, but it means that we cannot fairly measure temp or noise. We will update Bench with those results once we get a replacement card and run the necessary tests.
Finally, we also have a Powercolor Devil13 7990 as our 7990 sample. The Devil13 was a limited run part and has been replaced by the plain 7990, the difference between them being a 25MHz advantage for the Devil13. As such we’ve downclocked our Devil13 to match the basic 7990’s specs. The performance and power results should perfectly match a proper retail 7990.
|CPU:||Intel Core i7-3960X @ 4.3GHz|
|Motherboard:||EVGA X79 SLI|
|Power Supply:||Antec True Power Quattro 1200|
|Hard Disk:||Samsung 470 (256GB)|
|Memory:||G.Skill Ripjaws DDR3-1867 4 x 4GB (8-10-9-26)|
|Case:||Thermaltake Spedo Advance|
AMD Radeon HD 7970
NVIDIA ForceWare 314.07
NVIDIA ForceWare 314.09 (Titan)
AMD Catalyst 13.2 Beta 6
|OS:||Windows 8 Pro|
Racing to the front of our 2013 list will be our racing benchmark, DiRT: Showdown. DiRT: Showdown is based on the latest iteration of Codemasters’ EGO engine, which has continually evolved over the years to add more advanced rendering features. It was one of the first games to implement tessellation, and also one of the first games to implement a DirectCompute based forward-rendering compatible lighting system. At the same time as Codemasters is by far the most prevalent PC racing developers, it’s also a good proxy for some of the other racing games on the market like F1 and GRID.
DiRT: Showdown is something of a divisive game for benchmarking. The game’s advanced lighting system, while not developed by AMD, does implement a lot of the key concepts they popularized with their Leo forward lighting tech demo. As a result performance with that lighting system turned on has been known to greatly favor AMD cards. With that said, since we’re looking at high-end cards there’s really little reason not to be testing with it turned on since even a slow card can keep up. That said, this is why we also test DiRT with advanced lighting both on and off starting at 1920x1080 Ultra.
The end result is perhaps unsurprising in that NVIDIA already starts with a large deficit with the GTX 680 versus AMD’s Radeon cards. Titan closes the gap and is enough to surpass the 7970GE at every resolution except 5760, but just barely. This is the one game like this and as a result I don’t put a ton of stock into these results on a global level, but I thought it would make for an interesting look none the less.
This also settles some speculation of whether DiRT and its compute-heavy lighting system would benefit from the compute performance improvements Titan brings to the table. The answer to that is yes, but only by roughly as much as the increase in theoretical compute performance over GTX 680. We’re not seeing any kind of performance increase that could be attributed to improved compute efficiency here, which is why Titan can only just beat the 7970GE at 2560 here. However the jury is still out on whether this means that DiRT’s lighting algorithm doesn’t map well to Kepler period, or if it’s an implementation issue. We also saw some unexpected weak DirectCompute performance out of Titan with our SystemCompute benchmark, so this may be further evidence that DirectCompute isn’t currently taking full advantage of everything Titan offers.
In any case, at 2560 Titan is roughly 47% faster than the GTX 680 and all of 3% faster than the 7970GE. It’s enough to get Titan above the 60fps mark here, but at 5760 no single GPU, not even GK110, can get you 60fps. On the other hand, the equivalent AMD dual-GPU products, the 7970GECF and the 7990, have no such trouble. Dual-GPU cards will consistently win, but generally not like this.
Total War: Shogun 2
Our next benchmark is Shogun 2, which is a returning favorite to our benchmark suite. Total War: Shogun 2 is the latest installment of the long-running Total War series of turn based strategy games, and alongside Civilization V is notable for just how many units it can put on a screen at once. Even 2 years after its release it’s still a very punishing game at its highest settings due to the amount of shading and memory those units require.
Shogun has us seeing one of Titan’s best games almost right away. The 50% performance gain over the GTX 680 at 2560 almost exactly mirrors Titan’s 50% increase in memory bandwidth, which given the game’s nature as a memory eater shouldn’t come as too great a surprise. In this case it’s enough to push Titan to nearly 60fps at 2560 even with everything cranked up, or a bit past 30fps at 5760.
This happened to be a game the GTX 680 was already doing modestly well at relative to AMD’s cards, so the Titan/7970GE matchup is even more lopsided, with Titan surpassing the 7970GE by over 60% at 2560. It’s not entirely clear what Shogun is doing that favors Kepler over GCN so much, but the result is the single biggest performance gap we’ll see all day.
Moving on to the dual GPU cards however, we are reminded that for raw framerates a dual-GPU card can almost never be beat. Titan is effectively tied with the 7990, but the GTX 690 and the multi-card configurations surpass it by at least 17%. As we’ve stated before, Titan’s forte will not be raw framerates, but rather slightly lower framerates without the pitfalls of SLI/CF. The only thing that’s really being determined here is how big this gap will be.
The third game in our revised lineup is Hitman: Absolution. The latest game in Square Enix’s stealth-action series, Hitman: Absolution is a DirectX 11 based title that though a bit heavy on the CPU, can give most GPUs a run for their money. Furthermore it has a built-in benchmark, which gives it a level of standardization that fewer and fewer benchmarks possess.
Based on our results I suspect Hitman is CPU limited beyond 85fps or so, which is depressing our results on these extremely powerful cards. Titan is by far the fastest of the single-GPU cards, but at 2560 it only beats the GTX 680 by 34%, and the 7970GE by 18%. If we jump up to 5760 we can see that Titan pulls ahead by more, now 48% and 33% respectively, and this is probably the most pure GPU result we’re going to get out of Hitman.
Note that the dual-GPU cards still do better than Titan here, but they are running right into the wall presented by the CPU bottleneck. Their 17% leads are nothing to scoff at, but it may not be all they’re capable of.
Meanwhile thanks to its built-in benchmark, Hitman is one of the most consistent games in our lineup, making it a good candidate for including the minimum framerate, which we have below.
The minimum framerates on Hitman show Titan in an even better light. Though it still loses to the dual-GPU configurations, it’s now 40% ahead of the GTX 680 and 25% ahead of the 7970GE respectively. And amusingly enough, at 2560 Titan is just fast enough to hit 60fps minimum.
Another Square Enix game, Sleeping Dogs is one of the few open world games to be released with any kind of benchmark, giving us a unique opportunity to benchmark an open world game. Like most console ports, Sleeping Dogs’ base assets are not extremely demanding, but it makes up for it with its interesting anti-aliasing implementation, a mix of FXAA and SSAA that at its highest settings does an impeccable job of removing jaggies. However by effectively rendering the game world multiple times over, it can also require a very powerful video card to drive these high AA modes.
Sleeping Dogs is another game that AMD cards have done rather well at, leaving the GTX 680 quite a way behind. The sheer increase in functional units for Titan means it has no problem vaulting back to the top of the list of single GPU cards, but it also means it’s crossing a sizable gap.
In the end, at 2560 at the High (second-highest) AA settings, Titan is just shy of 50% faster than the GTX 680, but a weaker 17% ahead of the 7970GE. As we drop in resolution/AA, so does Titan’s lead, as the game shifts to being CPU limited.
Notably, no single card is really good enough here for 2560 with Extreme AA, with even Titan only hitting 35fps. This is one of the only games where even with a single monitor there’s real potential for a second Titan card in SLI.
Meanwhile the gap between Titan and our dual-GPU cards is roughly as expected. The GTX 690 takes a smaller lead at 18%, while the 7990 is some 42% ahead.
Due to its built-in benchmark, Sleeping Dogs is also another title that is a good candidate for repeatable and consistent minimum framerate testing.
While on average Titan is faster than the 7970GE, the minimum framerates put Titan in a rough spot. At 2560 with high AA Titan is effectively tied with the 7970GE, and with extreme AA it actually falls behind. It’s not readily apparent why this is, whether it’s some kind of general SSAA bottleneck or if there’s something else going on. But it’s a reminder that at its very worst, Titan can only match the 7970GE.
Up next is our legacy title for 2013, Crysis: Warhead. The stand-alone expansion to 2007’s Crysis, at over 4 years old Crysis: Warhead can still beat most systems down. Crysis was intended to be future-looking as far as performance and visual quality goes, and it has clearly achieved that. We’ve only finally reached the point where single-GPU cards have come out that can hit 60fps at 1920 with 4xAA.
At 2560 we still have a bit of a distance to go before any single-GPU card can crack 60fps. In lieu of that Titan is the winner as expected. Leading the GTX 680 by 54%, this is Titan’s single biggest win over its predecessor, actually exceeding the theoretical performance advantage based on the increase in functional units alone. For some reason GTX 680 never did gain much in the way of performance here versus the GTX 580, and while it’s hard to argue that Titan has reversed that, it has at least corrected some of the problem in order to push more than 50% out.
In the meantime, with GTX 680’s languid performance, this has been a game the latest Radeon cards have regularly cleared. For whatever reason they’re a good match for Crysis, meaning even with all its brawn, Titan can only clear the 7970GE by 21%.
On the other hand, our multi-GPU cards are a mixed bag. Once more Titan loses to both, but the GTX 690 only leads by 15% thanks to GK104’s aforementioned weak Crysis performance. Meanwhile the 7990 takes a larger lead at 33%.
I’d also note that we’ve thrown in a “bonus round” here just to see when Crysis will be playable at 1080p with its highest settings and with 4x SSAA for that picture-perfect experience. As it stands AMD multi-GPU cards can already cross 60fps, but for everything else we’re probably a generation off yet before Crysis is completely and utterly conquered.
Moving on, we once again have minimum framerates for Crysis.
When it comes to Titan, the relative improvement in minimum framerates over GTX 680 is nothing short of obscene. Whatever it was that was holding back GTX 680 is clearly having a hard time slowing down Titan, leading to Titan offering 71% better minimum framerates. There’s clearly much more going on here than just an increase in function units.
Meanwhile, though Titan’s gains here over the 7970GE aren’t quite as high as they were with the GTX 680, the lead over the 7970GE still grows a bit to 26%. As for our mutli-GPU cards, this appears to be a case where SLI is struggling; the GTX 690 is barely faster than Titan here. Though at 31% faster than Titan, the 7990 doesn’t seem to be faltering much.
Far Cry 3
The final new game added to the latest rendition of our benchmark suite is Far Cry 3, Ubisoft’s recently released island-jungle action game. A lot like our other jungle game Crysis, Far Cry 3 can be quite tough on GPUs, especially with MSAA and improved alpha-to-coverage checking thrown into the mix. On the other hand it’s still a bit of a pig on the CPU side, and seemingly inexplicably we’ve found that it doesn’t play well with HyperThreading on our testbed, making this the only game we’ve ever had to disable HT for to maximize our framerates.
For the 7970GE and GTX 680, FC3 at 2560 was already a very close match. Or put another way, with the 7970GE and GTX 680 tied up with each other, Titan is free to clear the both of them by approximately 35% each at 2560. This is enough to launch Titan past the 60fps mark, the first for any single-GPU card.
As for our other resolutions, it’s interesting to note that the gains at both 5760 and 1920 with MSAA are actually greater than at 2560. As we mentioned before Far Cry is somewhat demanding on the CPU side of things, so Titan may not be fully stretching out at 2560. In which case the performance gains due to Titan would be closer to 45-50%.
Moving on to our multi-GPU cards, this is something of a mixed bag. Titan isn’t close to winning, but GTX 690 wins by under 30%, and 7990 by just 17%. This is despite the fact that SLI/CF scaling is as strong as it is. At the same time Far Cry 3 is a good contemporary reminder of just what Titan can excel at: had Titan been out in 2012, it would have been doing roughly this well while NVIDIA would have still been hammering out their SLI profiles for this game. Multi-GPU cards are powerful, but they are forever reliant on waiting for profiles to unlock their capabilities.
Our final action game of our benchmark suite is Battlefield 3, DICE’s 2011 multiplayer military shooter. Its ability to pose a significant challenge to GPUs has been dulled some by time and drivers, but it’s still a challenge if you want to hit the highest settings at the highest resolutions at the highest anti-aliasing levels. Furthermore while we can crack 60fps in single player mode, our rule of thumb here is that multiplayer framerates will dip to half our single player framerates, so hitting high framerates here may not be high enough.
AMD and NVIDIA have gone back and forth in this game over the past year, and as of late NVIDIA has held a very slight edge with the GTX 680. That means Titan has ample opportunity to push well past the 7970GE, besting AMD’s single-GPU contender by 52% at 2560. Even the GTX 680 is left well behind, with Titan clearing it by 48%.
This is enough to get Titan to 74fps at 2560 with 4xMSAA, which is just fast enough to make BF3 playable at those settings with a single GPU. Otherwise by the time we drop to 1920, even the 120Hz gamers should be relatively satisfied.
Moving on, as always multi-GPU cards end up being faster, but not necessarily immensely so. 22% for the GTX 690 and just 12% for the 7990 are smaller leads than we’ve seen elsewhere.
Our final game, Civilization V, gives us an interesting look at things that other RTSes cannot match, with a much weaker focus on shading in the game world and a much greater focus on creating the geometry needed to bring such a world to life. In doing so it uses a slew of DirectX 11 technologies, including tessellation for said geometry, driver command lists for reducing CPU overhead, and compute shaders for on-the-fly texture decompression.
Maxing out at 2560, even with everything turned up none of our high-end cards have a problem here. Somewhat surprisingly we’re not completely CPU limited here, but even at 2560 everything north of the GTX 580 gets 60fps.
Nevertheless Titan completely clobbers the competition on our final game, delivering 60% better performance than both the GTX 680 and 7970GE. Even the 7990 can at best tie Titan here, giving us a case for when one GK110 is as good as two Tahiti GPUs. It’s not clear what exactly Civ V favors about Titan, but it’s clearly something that makes Titan different from GTX 680 and other GK104 cards.
As always we’ll also take a quick look at synthetic performance to get a better look at Titan’s underpinnings. These tests are mostly for comparing cards from within a manufacturer, as opposed to directly comparing AMD and NVIDIA cards. We’ll start with 3DMark Vantage’s Pixel Fill test.
Pixel fill is a mix of a ROP test and a test to see if you have enough bandwidth to feed those ROPs. At the same time the smallest increase in theoretical performance for Titan over GTX 680 was in ROP performance, where a 50% increase in ROPs was met with a minor clockspeed reduction for a final increase in ROP performance of 25%.
The end result is that with gains of 28%, Titan’s lead over GTX 680 is just a hair more than its increase in theoretical ROP performance. Consequently at first glance it looks like Titan has enough memory and cache bandwidth to feed its 48 ROPs, which given where we’re at today with GDDR5 is good news as GDDR5 has very nearly run out of room.
Moving on, we have our 3DMark Vantage texture fillrate test, which does for texels and texture mapping units what the previous test does for ROPs.
Oddly enough, despite the fact that Titan’s texture performance improvements over GTX 680 were only on the range of 46%, here Titan is measured as having 62% more texturing performance. This may be how Titan is interplaying with its improved bandwidth, or it may be a case where some of the ancillary changes NVIDIA made to the texture paths for compute are somehow also beneficial to proper texturing performance.
Finally we’ll take a quick look at tessellation performance with TessMark.
Unsurprisingly, Titan is well ahead of anything else NVIDIA produces. At 49% faster it’s just a bit over the 46% theoretical performance improvement we would expect from the increased number of Polymorph Engines the extra 6 SMXes bring. Interestingly, as fast as GTX 580’s tessellation performance was, these results would indicate that Titan offers more than a generational jump in tessellation performance, nearly tripling GTX 580’s tessellation performance. Though at this time it’s not at all clear just what such tessellation performance is good for, as we seem to be reaching increasingly ridiculous levels.
Power, Temperature, & Noise
Last but certainly not least, we have our obligatory look at power, temperature, and noise. Next to price and performance of course, these are some of the most important aspects of a GPU, due in large part to the impact of noise. All things considered, a loud card is undesirable unless there’s a sufficiently good reason to ignore the noise.
It’s for that reason that GPU manufacturers also seek to keep power usage down, and under normal circumstances there’s a pretty clear relationship between power consumption, heat generated, and the amount of noise the fans will generate to remove that heat. At the same time however this is an area that NVIDIA is focusing on for Titan, as a premium product means they can use premium materials, going above and beyond what more traditional plastic cards can do for noise dampening.
|GeForce GTX Titan Voltages|
|Titan Max Boost||Titan Base||Titan Idle|
Stopping quickly to take a look at voltages, Titan’s peak stock voltage is at 1.162v, which correlates to its highest speed bin of 992MHz. As the clockspeeds go farther down these voltages drop, to a load low of 0.95v at 744MHz. This ends up being a bit less than the GTX 680 and most other desktop Kepler cards, which go up just a bit higher to 1.175v. Since NVIDIA is classifying 1.175v as an “overvoltage” on Titan, it looks like GK110 isn’t going to be quite as tolerant of voltages as GK104 was.
|GeForce GTX Titan Average Clockspeeds|
|Max Boost Clock||992MHz|
|Far Cry 3||979MHz|
One thing we quickly notice about Titan is that thanks to GPU Boost 2 and the shift from what was primarily a power based boost system to a temperature based boost system is that Titan hits its maximum speed bin far more often and sustains it more often too, especially since there’s no longer a concept of a power target with Titan, and any power limits are based entirely by TDP. Half of our games have an average clockspeed of 992MHz, or in other words never triggered a power or thermal condition that would require Titan to scale back its clockspeed. For the rest of our tests the worst clockspeed was all of 2 bins (26MHz) lower at 966MHz, with this being a mix of hitting both thermal and power limits.
On a side note, it’s worth pointing out that these are well in excess of NVIDIA’s official boost clock for Titan. With Titan boost bins being based almost entirely on temperature, the average boost speed for Titan is going to be more dependent on environment (intake) temperatures than GTX 680 was, so our numbers are almost certainly a bit higher than what one would see in a hotter environment.
Starting as always with a look at power, there’s nothing particularly out of the ordinary here. AMD and NVIDIA have become very good at managing idle power through power gating and other techniques, and as a result idle power has come down by leaps and bounds over the years. At this point we still typically see some correlation between die size and idle power, but that’s a few watts at best. So at 111W at the wall, Titan is up there with the best cards.
Moving on to our first load power measurement, as we’ve dropped Metro 2033 from our benchmark suite we’ve replaced it with Battlefield 3 as our game of choice for measuring peak gaming power consumption. BF3 is a difficult game to run, but overall it presents a rather typical power profile which of all the games in our benchmark suite makes it one of the best representatives.
In any case, as we can see Titan’s power consumption comes in below all of our multi-GPU configurations, but higher than any other single-GPU card. Titan’s 250W TDP is 55W higher than GTX 680’s 195W TDP, and with a 73W difference at the wall this isn’t too far off. A bit more surprising is that it’s drawing nearly 50W more than our 7970GE at the wall, given the fact that we know the 7970GE usually gets close to its TDP of 250W. At the same time since this is a live game benchmark, there are more factors than just the GPU in play. Generally speaking, the higher a card’s performance here, the harder the rest of the system will have to work to keep said card fed, which further increases power consumption at the wall.
Moving to Furmark our results keep the same order, but the gap between the GTX 680 and Titan widens, while the gap between Titan and the 7970GE narrows. Titan and the 7970GE shouldn’t be too far apart from each other in most situations due to their similar TDPs (even if NVIDIA and AMD TDPs aren’t calculated in quite the same way), so in a pure GPU power consumption scenario this is what we would expect to see.
Titan for its part is the traditional big NVIDIA GPU, and while NVIDIA does what they can to keep it in check, at the end of the day it’s still going to be among the more power hungry cards in our collection. Power consumption itself isn’t generally a problem with these high end cards so long as a system has the means to cool it and doesn’t generate much noise in doing so.
Moving on to temperatures, for a single card idle temperatures should be under 40C for anything with at least a decent cooler. Titan for its part is among the coolest at 30C; its large heatsink combined with its relatively low idle power consumption makes it easy to cool here.
Because Titan’s boost mechanisms are now temperature based, Titan’s temperatures are going to naturally gravitate towards its default temperature target of 80C as the card raises and lowers clockspeeds to maximize performance while keeping temperatures at or under that level. As a result just about any heavy load is going to see Titan within a couple of degrees of 80C, which makes for some very predictable results.
Looking at our other cards, while the various NVIDIA cards are still close in performance the 7970GE ends up being quite a bit cooler due to its open air cooler. This is typical of what we see with good open air coolers, though with NVIDIA’s temperature based boost system I’m left wondering if perhaps those days are numbered. So long as 80C is a safe temperature, there’s little reason not to gravitate towards it with a system like NVIDIA’s, regardless of the cooler used.
With Furmark we see everything pull closer together as Titan holds fast at 80C while most of the other cards, especially the Radeons, rise in temperature. At this point Titan is clearly cooler than a GTX 680 SLI, 2C warmer than a single GTX 680, and still a good 10C warmer than our 7970GE.
Just as with the GTX 690, one of the things NVIDIA focused on was construction choices and materials to reduce noise generated. So long as you can keep noise down, then for the most part power consumption and temperatures don’t matter.
Simply looking at idle shows that NVIDIA is capable of delivering on their claims. 37.8dB is the quietest actively cooled high-end card we’ve measured yet, besting even the luxury GTX 690, and the also well-constructed GTX 680. Though really with the loudest setup being all of 40.5dB, none of these setups is anywhere near loud at idle.
It’s with load noise that we finally see the full payoff of Titan’s build quality. At 51dB it’s only marginally quieter than the GTX 680, but as we recall from our earlier power data, Titan is drawing nearly 70W more than GTX 680 at the wall. In other words, despite the fact that Titan is drawing significantly more power than GTX 680, it’s still as quiet as or quieter than the aforementioned card. This coupled with Titan’s already high performance is Titan’s true power in NVIDIA’s eyes; it’s not just fast, but despite its speed and despite its TDP it’s as quiet as any other blower based card out there, allowing them to get away with things such as Tiki and tri-SLI systems with reasonable noise levels.
Much like what we saw with temperatures under Furmark, noise under Furmark has our single-GPU cards bunching up. Titan goes up just enough to tie GTX 680 in our pathological scenario, meanwhile our multi-GPU cards start shooting up well past Titan, while the 7970GE jumps up to just shy of Titan. This is a worst case scenario, but it’s a good example of how GPU Boost 2.0’s temperature functionality means that Titan quite literally keeps its cool and thereby keeps its noise in check.
Of course we would be remiss to point out that in all these scenarios the open air cooled 7970GE is still quieter, and in our gaming scenario by actually by quite a bit. Not that Titan is loud, but it doesn’t compare to the 7970GE. Ultimately we get to the age old debate between blowers and open air coolers; open air coolers are generally quieter, but blowers allow for more flexibility with products, and are more lenient with cases with poor airflow.
Ultimately Titan is a blower so that NVIDIA can do concept PCs like Tiki, which is something an open air cooler would never be suitable for. For DIY builders the benefits may not be as pronounced, but this is also why NVIDIA is focusing so heavily on boutique systems where the space difference really matters. Whereas realistically speaking, AMD’s best blower-capable card is the vanilla 7970, a less power hungry but also much less powerful card.
Bringing things to a close, most of what we’ve seen with Titan has been a long time coming. Since the introduction of GK110 back at GTC 2012, we’ve had a solid idea of how NVIDIA’s grandest GPU would be configured, and it was mostly a question of when it would make its way to consumer hands, and at what clockspeeds and prices.
The end result is that with the largest Kepler GPU now in our hands, the performance situation closely resembles the Fermi and GT200 generations. Which is to say that so long as you have a solid foundation to work from, he who builds the biggest GPU builds the most powerful GPU. And at 551mm2, once more NVIDIA is alone in building massive GPUs.
No one should be surprised then when we proclaim that GeForce GTX Titan has unquestionably reclaimed the single-GPU performance crown for NVIDIA. It’s simply in a league of its own right now, reaching levels of performance no other single-GPU card can touch. At best, at its very best, AMD’s Radeon HD 7970GE can just match Titan, which is quite an accomplishment for AMD, but then at Titan’s best it’s nearly a generation ahead of the 7970GE. Like its predecessors, Titan delivers the kind of awe-inspiring performance we have come to expect from NVIDIA’s most powerful video cards.
With that in mind, as our benchmark data has shown, Titan’s performance isn’t quite enough to unseat this generation’s multi-GPU cards like the GTX 690 or Radeon HD 7990. But with that said this isn’t a new situation for us, and we find our editorial stance has not changed: we still suggest single-GPU cards over multi-GPU cards when performance allows for it. Multi-GPU technology itself is a great way to improve performance beyond what a single GPU can do, but as it’s always beholden to the need for profiles and the inherent drawbacks of AFR rendering, we don’t believe it’s desirable in situations such as Titan versus the GTX 690. The GTX 690 may be faster, but Titan is going to deliver a more consistent experience, just not quite at the same framerates as the GTX 690.
Meanwhile in the world of GPGPU computing Titan stands alone. Unfortunately we’re not able to run a complete cross-platform comparison due to Titan’s outstanding OpenCL issue, but from what we have been able to run Titan is not only flat-out powerful, but NVIDIA has seemingly delivered on their compute efficiency goals, giving us a Kepler family part capable of getting far closer to its theoretical efficiency than GTX 680, and closer than any other GPU before it. We’ll of course be taking a further look at Titan in comparison to other GPUs once the OpenCL situation is resolved in order to come to a better understanding of its relative strengths and weaknesses, but for the first wave of Titan buyers I’m not sure that’s going to matter. If you’re doing GPU computing, are invested in CUDA, and need a fast compute card, then Titan is the compute card CUDA developers and researchers have been dreaming of.
Back in the land of consumer gaming though, we have to contend with the fact that unlike any big-GPU card before it, Titan is purposely removed from the price/performance curve. NVIDIA has long wanted to ape Intel’s ability to have an extreme/luxury product at the very top end of the consumer product stack, and with Titan they’re going ahead with that.
The end result is that Titan is targeted at a different demographic than GTX 580 or other such cards, a demographic that has the means and the desire to purchase such a product. Being used to seeing the best video cards go for less we won’t call this a great development for the competitive landscape, but ultimately this is far from the first luxury level computer part, so there’s not much else to say other than that this is a product for a limited audience. But what that limited audience is getting is nothing short of an amazing card.
Like the GTX 690, NVIDIA has once again set the gold standard for GPU construction, this time for a single-GPU card. GTX 680 was a well-built card, but next to Titan it suddenly looks outdated. For example, despite Titan’s significantly higher TDP it’s no louder than the GTX 680, and the GTX 680 was already a quiet card. Next to price/performance the most important metric is noise, and by focusing on build quality NVIDIA has unquestionably set the new standard for high-end, high-TDP video cards.
On a final note, normally I’m not one for video card gimmicks, but after having seen both of NVIDIA’s Titan concept systems I have to say NVIDIA has taken an interesting route in justifying the luxury status of Titan. With the Radeon HD 7970 GHz Edition only available with open air or exotic cooling, Titan has been put into a position where it’s the ultimate blower card by a wide margin. The end result is that in scenarios where blowers are preferred and/or required, such as SFF PCs or tri-SLI, Titan is even more of an improvement over the competition than it is for traditional desktop computers. Or as Anand has so eloquently put it with his look at Falcon Northwest’s Tiki, when it comes to Titan “The days of a high end gaming rig being obnoxiously loud are thankfully over.”
Wrapping things up, on Monday we’ll be taking a look at the final piece of the puzzle: Origin’s tri-SLI full tower Genesis PC. The Genesis has been an interesting beast for its use of water cooling with Titan, and with the Titan launch behind us we can now focus on what it takes to feed 3 Titan video cards and why it’s an impeccable machine for multi-monitor/surround gaming. So until then, stay tuned.