Titan’s Compute Performance, Cont

With Rahul having covered the basis of Titan’s strong compute performance, let’s shift gears a bit and take a look at real world usage.

On top of Rahul’s work with Titan, as part of our 2013 GPU benchmark suite we put together a larger number of compute benchmarks to try to cover real world usage, including the old standards of gaming usage (Civilization V) and ray tracing (LuxMark), along with several new tests. Unfortunately that got cut short when we discovered that OpenCL support is currently broken in the press drivers, which prevents us from using several of our tests. We still have our CUDA and DirectCompute benchmarks to look at, but a full look at Titan’s compute performance on our 2013 GPU benchmark suite will have to wait for another day.

For their part, NVIDIA of course already has OpenCL working on GK110 with Tesla. The issue is that somewhere between that and bringing up GK110 for Titan by integrating it into NVIDIA’s mainline GeForce drivers – specifically the new R314 branch – OpenCL support was broken. As a result we expect this will be fixed in short order, but it’s not something NVIDIA checked for ahead of the press launch of Titan, and it’s not something they could fix in time for today’s article.

Unfortunately this means that comparisons with Tahiti will be few and far between for now. Most significant cross-platform compute programs are OpenCL based rather than DirectCompute, so short of games and a couple other cases such as Ian’s C++ AMP benchmark, we don’t have too many cross-platform benchmarks to look at. With that out of the way, let’s dive into our condensed collection of compute benchmarks.

We’ll once more start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.  While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Note that for 2013 we have changed the benchmark a bit, moving from using a single leader to using all of the leaders. As a result the reported numbers are higher, but they’re also not going to be comparable with this benchmark’s use from our 2012 datasets.

With Civilization V having launched in 2010, graphics cards have become significantly more powerful since then, far outpacing growth in the CPUs that feed them. As a result we’ve rather quickly drifted from being GPU bottlenecked to being CPU bottlenecked, as we see both in our Civ V game benchmarks and our DirectCompute benchmarks. For high-end GPUs the performance difference is rather minor; the gap between GTX 680 and Titan for example is 45fps, or just less than 10%. Still, it’s at least enough to get Titan past the 7970GE in this case.

Our second test is one of our new tests, utilizing Elcomsoft’s Advanced Office Password Recovery utility to take a look at GPU password generation. AOPR has separate CUDA and OpenCL kernels for NVIDIA and AMD cards respectively, which means it doesn’t follow the same code path on all GPUs but it is using an optimal path for each GPU it can handle. Unfortunately we’re having trouble getting it to recognize AMD 7900 series cards in this build, so we only have CUDA cards for the time being.

Password generation and other forms of brute force crypto is an area  where the GTX 680 is particularly weak, thanks to the various compute aspects that have been stripped out in the name of efficiency. As a result it ends up below even the GTX 580 in these benchmarks, never mind AMD’s GCN cards. But with Titan/GK110 offering NVIDIA’s full compute performance, it rips through this task. In fact it more than doubles performance from both the GTX 680 and the GTX 580, indicating that the huge performance gains we’re seeing are coming from not just the additional function units, but from architectural optimizations and new instructions that improve overall efficiency and reduce the number of cycles needed to complete work on a password.

Altogether at 33K passwords/second Titan is not just faster than GTX 680, but it’s faster than GTX 690 and GTX 680 SLI, making this a test where one big GPU (and its full compute performance) is better than two smaller GPUs. It will be interesting to see where the 7970 GHz Edition and other Tahiti cards place in this test once we can get them up and running.

Our final test in our abbreviated compute benchmark suite is our very own Dr. Ian Cutress’s SystemCompute benchmark, which is a collection of several different fundamental compute algorithms. Rahul went into greater detail on this back in his look at Titan’s compute performance, but I wanted to go over it again quickly with the full lineup of cards we’ve tested.

Surprisingly, for all of its performance gains relative to GTX 680, Titan still falls notably behind the 7970GE here. Given Titan’s theoretical performance and the fundamental nature of this test we would have expected it to do better. But without additional cross-platform tests it’s hard to say whether this is something where AMD’s GCN architecture continues to shine over Kepler, or if perhaps it’s a weakness in NVIDIA’s current DirectCompute implementation for GK110. Time will tell on this one, but in the meantime this is the first solid sign that Tahiti may be more of a match for GK110 than it’s typically given credit for.

Titan’s Compute Performance (aka Ph.D Lust) Meet The 2013 GPU Benchmark Suite & The Test
POST A COMMENT

336 Comments

View All Comments

  • PEJUman - Thursday, February 21, 2013 - link

    Made me wonder:
    7970 - 4.3B trans. - $500 - OK compute, 100% gaming perf.
    680 - 3.5B trans. _ $500 - sucky compute, 100% gaming perf.
    Titan - 7.1B trans - $1000 - OK compute, ~140% gaming perf.

    1. Does compute capability really takes that much more transistors to build? as in 2x trans. only yield ~140% improvement on gaming.
    I think this was a conscious decision by nVidia to focus on compute and the required profit margin to sustain R&D.

    2. despite the die size shrink, I'm guessing it would be harder to have functional silicon as the process shrinks. i.e. finding 100mm^2 of functional silicon @ 40nm is easier than @28nm, from the standpoint that more transistors are packed to the same area. Which I think why they have 15SMXs designed.
    Thus it'd be more expensive for nVidia to build same area at 28 vs. 40 nm... at least until the process matures, but at 7B I doubt it will ever be attainable.

    3. The AMD statement on no updates to 7970 essentially sealed the $1000 price for titan. I would bet if AMD announced 8970, Titan would be priced at $700 today, with 3GB memory.
    Reply
  • JarredWalton - Thursday, February 21, 2013 - link

    Luxury GPU is no more silly than Extreme CPUs that cost $1000 each. And yet, Intel continues to sell those, and what's more the performance offered by Titan is a far better deal than the performance offered by a $1000 CPU vs. a $500 CPU. Then there's the Tesla argument: it's a $3500 card for the K20 and this is less than a third that price, with the only drawbacks being no ECC and no scalability beyond three cards. For the Quadro crowd, this might be a bargain at $1000 (though I suspect Titan won't get the enhanced Quadro drivers, so it's mostly a compute Tesla alternative). Reply
  • chizow - Friday, February 22, 2013 - link

    The problem with this analogy, which I'm sure was floated around Nvidia's Marketing board room in formulating the plan for Titan, is that Intel offers viable alternative SKUs based on the same ASIC. Sure there are the few who will buy the Intel EE CPU (3970K) for $1K, but the overwhelming majority in that high-end market would rather opt for the $500 option (3930K) or $300 option (3820).

    Extend this to the GPU market and you see Nvidia clearly withheld GK100/GK110 as the flagship part for over a year, and instead of offering a viable SKU for traditional high-end market segments based on this ASIC, they created a NEW ultra-premium market. That's the ONLY reason Titan looks better compared to GK104 than Intel's $1K and $500 options, because Nvidia's offerings are truly different classes while Intel's differences are minor binning and multiplier locked parts with a bigger black box.
    Reply
  • mlambert890 - Saturday, February 23, 2013 - link

    The analogy is fine, you're just choosing to not see it.

    Everything you said about Intel EE vs standard directly applies here.

    You are assuming that the Intel EE parts are nothing more than a marketing ploy, which is wrong, while at the same time assuming that the Titan is orders of magnitude beyond the 680 which is also wrong.

    You're seeing it from the point of view of someone who buys the cheapest Intel CPU, overclocks it to the point of melting, and then feels they have a solution "just as good if not better" than the Intel EE.

    Because the Titan has unlocked stream procs that the 680 lacks, and there is no way to "overclock" your way around missing SPs, you feel that NVidia has committed some great sin.

    The reality is that the EE procs give out of box performance that is superior to out of box performance of the lesser SKUs by a small, but appreciable, margin. In addition, they are unlocked, and come from a better bin, which means they will overclock *even better* than the lesser SKUs. Budget buyers never want to admit this, but it is reality in most cases. Yes you can get a "lucky part" from the lesser SKU that achieves a 100% overclock, but this is an anomaly. Most who criticize the EE SKUs have never even come close to owning one.

    Similarly, the Titan offers a small, but appreciable, margin of performance over the 680. It allows you to wait longer before going SLI. The only difference is you don't get the "roll of the dice" shot at a 680 that *might* be able to appear to match a Titan since the SP's arent there.

    The analogy is fine, it's just that biased perspective prevents some from seeing it.
    Reply
  • chizow - Saturday, February 23, 2013 - link

    Well you obviously have trouble comprehending analogies if you think 3.6B difference in transistors and ~40% difference in performance is analogous to 3MB L3 cache, an unlocked multiplier and 5% difference in performance.

    But I guess that's the only way you could draw such an asinine parallel as this:

    "Similarly, the Titan offers a small, but appreciable, margin of performance over the 680."

    It's the only way your ridiculous analogy to Intel's EE could possibly hold true, when in reality, it couldn't be further from the truth. Titan holds a huge advantage over GTX 680, but that's expected, its a completely different class of GPU whereas the 3930K and 3960X are cut from the exact same wafer.
    Reply
  • CeriseCogburn - Sunday, February 24, 2013 - link

    There was no manufacturing capacity you IDIOT LIAR.
    The 680 came out 6 months late, and amd BARELY had 79xx's on the shelves till a day before that.

    Articles were everywhere pointing out nVidia did not have reserve die space as the crunch was extreme, and the ONLY factory was in the process of doing a multi-billion dollar build out to try to keep up with bare minimum demand.

    Now we've got a giant GPU core with perhaps 100 attempted dies per wafer, with a not high yield, YET YOU'RE A LIAR NONETHELESS.
    Reply
  • chizow - Sunday, February 24, 2013 - link

    It has nothing to do with manufacturing capacity, it had everything to do with 7970's lackluster performance and high price tag.

    GTX 680 was only late (by 3, not 6 months) because Nvidia was too busy re-formulating their high-end strategy after seeing 7970 outperform GTX 580 by only 15-20% but asking 10% higher price. Horrible price:performance metric for a new generation GPU on a new process node.

    This gave Nvidia the opportunity to:

    1) Position mid-range ASIC GK104 as flagship GTX 680 and still beat the 7970.
    2) Push back and most importantly, re-spin GK100 and refine it to be GK110.
    3) Screw their long-time customers and AMD/AMD fans in the process.
    4) Profit.

    So instead of launching and mass-producing their flagship ASIC first (GK100) as they've done in every single previous generation and product launch, they shifted their production allocation at TSMC to their mid-range ASIC, GK104 instead.

    Once GK110 was ready, they've had no problem churning them out, even the mfg date of these TITAN prove this point as week 31 chips are somewhere in the July-August time frame. They were able to deliver some 19,000 K20X units to ORNL for the real TITAN in October 2012. Coupled with the fact they're using ASICs with the same number of functional units for GTX Titanic, it goes to show yields are pretty good.

    But the real conclusion to be drawn for this is that other SKUs based on GK110 are coming. There's no way GK110 wafer yields are anywhere close to 100% for 15 SMX ASICs. I fully expect a reduced SMX unit, maybe 13 with 2304SP as originally rumored show it's face as the GTX 780 with a bunch of GK114 refreshes behind it to fill out the line-up.

    The sooner people stop overpaying for TITAN, the sooner we'll see the GTX 700 series, imo, but with no new AMD GPUs on the horizon we may be waiting awhile.
    Reply
  • CeriseCogburn - Sunday, February 24, 2013 - link

    Chizow I didn't read your stupid long post except for your stupid 1st line.

    you're a brainwashed lying sack of idiocy, so maybe i'll waste my time reading your idiotic lies, and maybe not, since your first line is the big fat frikkin LIE you HAVE TO BELIEVE that you made up in your frikkin head, in order to take your absolutely FALSE STANCE for the past frikkin nearly year now.
    Reply
  • chizow - Monday, February 25, 2013 - link

    You should read it, you might learn something.

    Until then stfd, stfu, and gfy.
    Reply
  • CeriseCogburn - Sunday, February 24, 2013 - link

    Dear Jeff, a GPU that costs $400 dollars is a luxury GPU.

    I'm not certain you disagree with that, I'd just like to point out the brainless idiots pretending $1000 for a GPU is luxury and $250 is not are clueless.
    Reply

Log in

Don't have an account? Sign up now