Compute

Jumping into pure compute performance, we’re going to have several new factors influencing the 290X as compared to the 280X. On the front end 290X/Hawaii has those 8 ACEs versus 280X/Tahiti’s 2 ACEs, potentially allowing 290X to queue up a lot more work and to keep itself better fed as a result; though in practice we don’t expect most workloads to be able to put the additional ACEs to good use at the moment. Meanwhile on the back end 290X has that 11% memory bandwidth boost and the 33% increase in L2 cache, which in compute workloads can be largely dedicated to said computational work. On the other hand 290X takes a hit to its double precision floating point (FP64) rate versus 280X, so in double precision scenarios it’s certainly going to enter with a larger handicap.

As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Unfortunately Civ V can’t tell us much of value, due to the fact that we’re running into CPU bottlenecks, not to mention increasingly absurd frame rates. In the 3 years since this game was released high-end CPUs are around 20% faster per core, whereas GPUs are easily 150% faster (if not more). As such the GPU portion of texture decoding has apparently started outpacing the CPU portion, though this is still an enlightening benchmark for anything less than a high-end video card.

For what it is worth, the 290X can edge out the GTX 780 here, only to fall to GTX Titan. But in these CPU limited scenarios the behavior at the very top can be increasingly inconsistent.

Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

LuxMark by comparison is very simple and very scalable. 290X packs with it a significant increase in computational resources, so 290X picks up from where 280X left off and tops the chart for AMD once more. Titan is barely half as fast here, and GTX 780 falls back even further. Though the fact that scaling from the 280X to 290X is only 16% – a bit less than half of the increase in CUs – is surprising at first glance. Even with the relatively simplistic nature of the benchmark, it has shown signs in the past of craving memory bandwidth and certainly this seems to be one of those times. Feeding those CUs with new rays takes everything the 320GB/sec memory bus of the 290X can deliver, putting a cap on performance gains versus the 280X.

Our 3rd compute benchmark is Sony Vegas Pro 12, an OpenGL and OpenCL video editing and authoring package. Vegas can use GPUs in a few different ways, the primary uses being to accelerate the video effects and compositing process itself, and in the video encoding step. With video encoding being increasingly offloaded to dedicated DSPs these days we’re focusing on the editing and compositing process, rendering to a low CPU overhead format (XDCAM EX). This specific test comes from Sony, and measures how long it takes to render a video.

Vegas is another title where GPU performance gains are outpacing CPU performance gains, and as such earlier GPU offloading work has reached its limits and led to the program once again being CPU limited. It’s a shame GPUs have historically underdelivered on video encoding (as opposed to video rendering), as wringing significantly more out of Vegas will require getting rid of the next great CPU bottleneck.

Our 4th benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

Curiously, the 290X’s performance advantage over 280X is unusual dependent on the specific sub-test. The fluid simulation scales decently enough with the additional CUs, but the computer vision benchmark is stuck in the mud as compared to the 280X. The fluid simulation is certainly closer than the vision benchmark towards being the type of stupidly parallel workload GPUs excel at, though that doesn’t fully explain the lack of scaling in computer vision. If nothing else it’s a good reminder of why professional compute workloads are typically profiled and optimized against specific target hardware, as it reduces these kinds of outcomes in complex, interconnected workloads.

Moving on, our 5th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home has moved exclusively to OpenCL this year with FAHCore 17.

With FAHBench we’re not fully convinced that it knows how to best handle 290X/Hawaii as opposed to 280X/Tahiti. The scaling in single precision explicit is fairly good, but the performance regression in the water-free (and generally more GPU-limited) implicit simulation is unexpected. Consequently while the results are accurate for FAHCore 17, it’s hopefully something AMD and/or the FAH project can work out now that 290X has been released.

Meanwhile double precision performance also regresses, though here we have a good idea why. With DP performance on 290X being 1/8 FP32 as opposed to ¼ on 280X, this is a benchmark 290X can’t win. Though given the theoretical performance differences we should be expecting between the two video cards – 290X should have about 70% of the FP 64 performance of 280X – the fact that 290X is at 82% bodes well for AMD’s newest GPU. However there’s no getting around the fact that the 290X loses to GTX 780 here even though the GTX 780 is even more harshly capped, which given AMD’s traditional strength in OpenCL compute performance is going to be a let-down.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

SystemCompute and the underlying C++ AMP environment scales relatively well with the additional CUs offered by 290X. Not only does the 290X easily surpass the GTX Titan and GTX 780 here, but it does so while also beating the 280X by 18%. Or to use AMD’s older GPUs as a point of comparison, we’re up to a 3.4x improvement over 5870, well above the improvement in CU density alone and another reminder of how AMD has really turned things around on the GPU compute side with GCN.

Synthetics Power, Temperature, & Noise
Comments Locked

396 Comments

View All Comments

  • TheJian - Friday, October 25, 2013 - link

    LOL. Tell that to both their bottom lines. I see AMD making nothing while NV profits. People who bought titan got a $2500 Tesla for $1000. You don't buy a titan just to game (pretty dumb if you did) as it's for pro apps too (the compute part of the deal). It's a steal for gamers who make money on the card too. Saving $1500 is a great deal. So since you're hating on NV pricing, how do you feel about the 7990 at $1000. Nice to leave that out of your comment fanboy ;) Will AMD now reap what they sow and have to deal with all the angry people who bought those? ROFL. Is the 1K business model unsustainable for AMD too? Even the 6990 came in at $700 a ways back. Dual or single chip the $1000 price is alive and well from either side for those who want it.

    I'd bet money a titan ultra will be $1000 again shortly if they even bother as it's not a pure gamer card but much more already. If you fire up pro apps with Cuda you'll smoke that 290x daily (which covers just about all pro apps). Let me know when AMD makes money in a quarter that NVDA loses money. Then you can say NV pricing is biting them in the A$$. Until then, your comment is ridiculous. Don't forget even as ryan points out in this article (and he don't love NV...LOL), AMD still has driver problems (and has for ages) but he believes in AMD much like fools do in Obama still...LOL. For me, even as an 5850 owner, they have to PROVE themselves before I ponder another card from them at 20nm. The 290x is hot, noisy and uses far more watts and currently isn't coming with 3 AAA games either. NV isn't shaking in their boots. I'll be shocked if 780TI isn't $600 or above as it should match Titan which 290x doesn't do even with the heat, noise and watts.

    And you're correct no OC room. Nobody has hit above 1125.

    If NV was greedy, wouldn't they be making MORE money than in 2007? They haven't cracked 850mil in 5 years. Meanwhile, AMD's pricing which you seem to love, has cause their entire business to basically fail (no land, no fabs, gave up cpu race, 8 months to catch up with a hot noisy chip etc). They have lost over $6B in the last 10yrs. AMD has idiots managing their company and they are destroying what used to be a GREAT company with GREAT products. They should have priced this card $100 higher and all other cards rebadged should be $50 higher. They might make some actual money then every quarter right? Single digit margins on console chips (probably until 20nm shrink) won't get you rich either. Who made that deal? FIRE THAT GUY. That margin is why NV said it wasn't worth it.
  • chizow - Saturday, October 26, 2013 - link

    AMD's non-profitability goes far beyond their GPU business, it's more due to their CPU business. People who got Titan didn't get a Tesla for $1000, they got a Tesla without ECC. Compute apps without ECC would be like second-guessing every result because you're unsure whether a value was stored/retrieved from memory correctly. Regard to 7990 pricing, you can surely look it up before pulling the fanboy card, just as you can look up my comments on 7970 launch pricing. And yes, AMD will absolutely have to deal with that backlash giving that card dropped even more precipitously than even Titan, going from $1K to $600 in only 4-5 months.

    I don't think Nvidia will make the same mistake with a Titan Ultra at $1K. I also don't think Nvidia fans who only bought Titan for gaming will fall for the same mistake 2x. If Maxwell comes out and Nvidia holds out on the big ASIC, I doubt anyone disinterested in compute will fall for the same trick if Nvidia launches a Titan 2 at $1K using a compute gimmick to justify the price. They will just point to Titan and say "wait 3 months and they'll release something that's 95% of it's performance at 65% of it's price". As they say, Fool me once, shame on you, fool me twice, shame on me.

    And no, greed and profit don't go hand in hand. In 2007-2008, Nvidia posted record profits and revenue for multiple consecutive quarters as you stated on the back of a cheap $230-$270 8800GT. With Titan, they reversed course by setting record margins, but on reduced revenue and profits. They basically covet Intel's huge profit margins, but they clearly lack the revenue to grow their bottomline. Selling $1K GPUs certainly isn't going to get them there any faster.
  • FragKrag - Thursday, October 24, 2013 - link

    great performance, but I'll wait until I see some better thermals/noise from aftermarket coolers :p
  • Shark321 - Thursday, October 24, 2013 - link

    As with the Titan in the beginning, no alternate coolers will be available for the time being (according to computerbase). This means even if the price is great, you will be stuck with a very noisy and hot card. 780Ti will outperform the 290x in 3 weeks. It remains to bee seen how it will be priced (I guess $599).
  • The Von Matrices - Thursday, October 24, 2013 - link

    This is the next GTX 480 or HD 2900 XT. It provides great performance for the price, that is if you can put up with the heat and noise.
  • KaosFaction - Thursday, October 24, 2013 - link

    Work in Progress!!! Whhhhaaaattttt I want answers now!!
  • masterpine - Thursday, October 24, 2013 - link

    Good to see something from AMD challenging the GK110's, I still find it fairly remarkable that in the fast moving world of GPU's it's taken 12 months for AMD to build something to compete. Hopefully this puts a swift end to the above $600 prices in the single GPU high end.

    More than a little concerned at the 95C target temp of these things. 80C is toasty enough already for the GTX780, actually had to point a small fan at the DVI cables coming out the back of my 780 SLI surround setup because the heat coming out the back of them was causing dramas. Not sure i could cope with the noise of a 290X either.

    Anyhow, this is great for consumers. Hope to see some aftermarket coolers reign these things in a bit. If the end result is both AMD and Nvidia playing hard-ball at the $500 mark in a few weeks time, we all win.
  • valkyrie743 - Thursday, October 24, 2013 - link

    HOLY TEMPS BATMAN. its the new gtx 480 in the temp's department
  • kallogan - Thursday, October 24, 2013 - link

    No overclocking headroom with stock cooler. That's for sure.
  • FuriousPop - Thursday, October 24, 2013 - link

    can we please see 2x in CF mode with eyefinity!? or am i asking for too much?

    also, Nvidia will always be better for those of you in the 30% department of having only max 1080p. for the rest of us in 1440p and 1600p and beyond (eyefinity) then AMD will be as stated by previous comments in this thread "King of the hill"....

    but none the less, some more testing in the CF+3x monitor department would be great to see how far this puppy really goes...

    i mean seriously whats the point of putting a 80 year old man behind the wheel of the worlds fastest car?!? please push the specs on gaming benchmarks pls (eg; higher res)

Log in

Don't have an account? Sign up now