Compute

Jumping into pure compute performance, we’re going to have several new factors influencing the 290X as compared to the 280X. On the front end 290X/Hawaii has those 8 ACEs versus 280X/Tahiti’s 2 ACEs, potentially allowing 290X to queue up a lot more work and to keep itself better fed as a result; though in practice we don’t expect most workloads to be able to put the additional ACEs to good use at the moment. Meanwhile on the back end 290X has that 11% memory bandwidth boost and the 33% increase in L2 cache, which in compute workloads can be largely dedicated to said computational work. On the other hand 290X takes a hit to its double precision floating point (FP64) rate versus 280X, so in double precision scenarios it’s certainly going to enter with a larger handicap.

As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Unfortunately Civ V can’t tell us much of value, due to the fact that we’re running into CPU bottlenecks, not to mention increasingly absurd frame rates. In the 3 years since this game was released high-end CPUs are around 20% faster per core, whereas GPUs are easily 150% faster (if not more). As such the GPU portion of texture decoding has apparently started outpacing the CPU portion, though this is still an enlightening benchmark for anything less than a high-end video card.

For what it is worth, the 290X can edge out the GTX 780 here, only to fall to GTX Titan. But in these CPU limited scenarios the behavior at the very top can be increasingly inconsistent.

Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

LuxMark by comparison is very simple and very scalable. 290X packs with it a significant increase in computational resources, so 290X picks up from where 280X left off and tops the chart for AMD once more. Titan is barely half as fast here, and GTX 780 falls back even further. Though the fact that scaling from the 280X to 290X is only 16% – a bit less than half of the increase in CUs – is surprising at first glance. Even with the relatively simplistic nature of the benchmark, it has shown signs in the past of craving memory bandwidth and certainly this seems to be one of those times. Feeding those CUs with new rays takes everything the 320GB/sec memory bus of the 290X can deliver, putting a cap on performance gains versus the 280X.

Our 3rd compute benchmark is Sony Vegas Pro 12, an OpenGL and OpenCL video editing and authoring package. Vegas can use GPUs in a few different ways, the primary uses being to accelerate the video effects and compositing process itself, and in the video encoding step. With video encoding being increasingly offloaded to dedicated DSPs these days we’re focusing on the editing and compositing process, rendering to a low CPU overhead format (XDCAM EX). This specific test comes from Sony, and measures how long it takes to render a video.

Vegas is another title where GPU performance gains are outpacing CPU performance gains, and as such earlier GPU offloading work has reached its limits and led to the program once again being CPU limited. It’s a shame GPUs have historically underdelivered on video encoding (as opposed to video rendering), as wringing significantly more out of Vegas will require getting rid of the next great CPU bottleneck.

Our 4th benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

Curiously, the 290X’s performance advantage over 280X is unusual dependent on the specific sub-test. The fluid simulation scales decently enough with the additional CUs, but the computer vision benchmark is stuck in the mud as compared to the 280X. The fluid simulation is certainly closer than the vision benchmark towards being the type of stupidly parallel workload GPUs excel at, though that doesn’t fully explain the lack of scaling in computer vision. If nothing else it’s a good reminder of why professional compute workloads are typically profiled and optimized against specific target hardware, as it reduces these kinds of outcomes in complex, interconnected workloads.

Moving on, our 5th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home has moved exclusively to OpenCL this year with FAHCore 17.

With FAHBench we’re not fully convinced that it knows how to best handle 290X/Hawaii as opposed to 280X/Tahiti. The scaling in single precision explicit is fairly good, but the performance regression in the water-free (and generally more GPU-limited) implicit simulation is unexpected. Consequently while the results are accurate for FAHCore 17, it’s hopefully something AMD and/or the FAH project can work out now that 290X has been released.

Meanwhile double precision performance also regresses, though here we have a good idea why. With DP performance on 290X being 1/8 FP32 as opposed to ¼ on 280X, this is a benchmark 290X can’t win. Though given the theoretical performance differences we should be expecting between the two video cards – 290X should have about 70% of the FP 64 performance of 280X – the fact that 290X is at 82% bodes well for AMD’s newest GPU. However there’s no getting around the fact that the 290X loses to GTX 780 here even though the GTX 780 is even more harshly capped, which given AMD’s traditional strength in OpenCL compute performance is going to be a let-down.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

SystemCompute and the underlying C++ AMP environment scales relatively well with the additional CUs offered by 290X. Not only does the 290X easily surpass the GTX Titan and GTX 780 here, but it does so while also beating the 280X by 18%. Or to use AMD’s older GPUs as a point of comparison, we’re up to a 3.4x improvement over 5870, well above the improvement in CU density alone and another reminder of how AMD has really turned things around on the GPU compute side with GCN.

Synthetics Power, Temperature, & Noise
Comments Locked

396 Comments

View All Comments

  • Sandcat - Thursday, October 24, 2013 - link

    Perhaps they knew it was unsustainable from the beginning, but short term gains are generally what motivate managers when the develop pricing strategies, because bonus. Make hay whilst the sun shines, or when AMD is 8 months late.
  • chizow - Saturday, October 26, 2013 - link

    Possibly, but now they have to deal with the damaged goodwill of some of their most enthusiastic, spendy customers. I can't count how many times I've seen it, someone saying they swore off company X or company Y because they felt they got burned/screwed/fleeced by a single transaction. That is what Nvidia will be dealing with going forward with Titan early adopters.
  • Sancus - Thursday, October 24, 2013 - link

    AMD really needs to do better than a response 8 months later to crash anyone's parade. And honestly, I would love to see them put up a fight with Maxwell at a reasonable time period so they have incentive to keep prices lower. Otherwise, expect Nvidia to "overprice" things next generation as well.

    When they have no competition for 8 months it's not unsustainable to price as high as the market will bear, and there's no real evidence that Titan was economically overpriced because it's not like there was a supply glut of Titans sitting around anywhere, in fact they were often out of stock. So really, Nvidia is just pricing according to the market -- no competition from AMD for 8 months, fastest card with limited supply, why WOULD they price it at anything below $1000?
  • chizow - Saturday, October 26, 2013 - link

    My reply would be that they've never had to price it at $1000 before, and we have certainly seen this level of advancement from one generation to the next in the past (7900GTX to 8800GTX, 8800GTX to GTX 280, 280 GTX to 480 GTX, etc), so it's not completely ground-breaking performance increases even though Kepler overall outperformed historical improvements by ~20%, imo.

    Also, the concern with Titan isn't just the fact it was priced at ungodly premiums this time around, it's the fact it held it's crown for such a relatively short period of time. Sure Nvidia had no competition at the $500+ range for 8 months, but that was also the brevity of Titan's reign at the top. In the past, a flagship in that $500 or $600+ range would generally reign for the entire generation, especially one that was launched half way through that generation's life cycle. Now Nvidia has already announced a reply with the 780 Ti which will mean not one, but TWO cards will surpass Titan at a fraction of it's price before the generation goes EOL.

    Nvidia was clearly blind-sided by Hawaii and ultimately it will cost them customer loyalty, imo.
  • ZeDestructor - Thursday, October 24, 2013 - link

    $1000 cards are fine, since the Titan is a cheap compute unit compared to the Quadro K6000 and the 690 is a dual-GPU card (Dual-GPU has always been in the $800+ range).

    What we should see is the 780 (Ti?) go down in price and match the R9-290x, much to the rejoicing of all!

    Nvidia got away with $650-750 on the 780 because they could, and THAT is why competition is important, and why I pay attention to AMD even if I have no reason to buy from them over Nvidia (driver support on Linux is a joke). Now they have to match. Much of the same happens in the CPU segement.
  • chizow - Saturday, October 26, 2013 - link

    For those that actually bought the Titan as a cheap compute card, sure Titan may have been a good buy, but I doubt most Titan buyers were buying it for compute. It was marketed as a gaming card with supercomputer guts and at the time, there was still much uncertainty whether or not Nvidia would release a GTX gaming card based on GK110.

    I think Nvidia preyed on these fears and took the opportunity to launch a $1K part, but I knew it was an unsustainable business model for them because it was predicated on the fact Nvidia would be an entire ASIC ahead of AMD and able to match AMD's fastest ASIC (Tahiti) with their 2nd fastest (GK104). Clearly Hawaii has turned that idea on it's head and Nvidia's premium product stack is crashing down in flames.

    Now, we will see at least 4 cards (290/290X, 780/780Ti) that all come close to or exceed Titan performance at a fraction of the price, only 8 months after it's launch. Short reign indeed.
  • TheJian - Friday, October 25, 2013 - link

    The market dictates pricing. As they said, they sell every Titan immediately, so they could probably charge more. But that's because it has more value than you seem to understand. It is a PRO CARD at it's core. Are you unaware of what a TESLA is for $2500? It's the same freaking card with 1 more SMX and driver support. $1000 is GENEROUS whether you like it or not. Gamers with PRO intentions laughed when they saw the $1000 price and have been buying them like mad ever since. No parade has been crashed. They will continue to do this pricing model for the foreseeable future as they have proven there is a market for high-end gamers with a PRO APP desire on top. The first run was 100,000 and sold in days. By contrast Asus Rog Ares 2 had 1000 unit first run and didn't sell out like that. At $1500 it really was a ripoff with no PRO side.

    I think they'll merely need another SMX turned on and 50-100mhz for the next $1000 version which likely comes before xmas :) The PRO perf is what is valued here over a regular card. Your short-lived statement makes no sense. It's been 8 months, a rather long life in gpus when you haven't beaten the 8 month old card in much (I debunked 4k crap already, and pointed to a dozen other games where titan wins at every res). You won't fire up Blender, Premiere, PS CS etc and smoke a titan with 290x either...LOL. You'll find out what the other $450 is for at that point.
  • chizow - Saturday, October 26, 2013 - link

    Yes and as soon as they released the 780, the market corrected itself and Titans were no longer sold out anywhere, clearly a shift indicating the price of the 780 was really what the market was willing to bear.

    Also, there are more differences with their Tesla counterparts than just 1 SMX, Titan lacks ECC support which makes it an unlikely candidate for serious compute projects. Titan is good for hobby compute, anything serious business or research related is going to spend the extra for Tesla and ECC.

    And no, 8-months is not a long time at the top, look at the reigns of previous high-end parts and you will see it is generally longer than this. Even the 580 that preceded it held sway for 14-months before Tahiti took over it's spot. Time at the top is just one part though, the amount which Titan devalued is the bigger concern. When 780 launched 3 months after Titan, you could maybe sell Titan for $800. Now that Hawaii has launched, you could maybe sell it for $700? It's only going to keep going down, what do you think it will sell for once 780Ti beats it outright for $650 or less?
  • Sandcat - Thursday, October 24, 2013 - link

    I noticed your comments on the Tahiti pricing fiasco 2 years ago and generally skip through the comment section to find yours because they're top notch. Exactly what I was thinking with the $550 price point, finally a top-tier card at the right price for 28nm. Long live sanity.
  • chizow - Saturday, October 26, 2013 - link

    Thanks! Glad you appreciated the comments, I figured this business model and pricing for Nvidia would be unsustainable, but I thought it wouldn't fall apart until we saw 20nm Maxwell/Pirate Islands parts in 2014. Hawaii definitely accelerated the downfall of Titan and Nvidia's $1K eagle's nest.

Log in

Don't have an account? Sign up now