Compute

Jumping into pure compute performance, we’re going to have several new factors influencing the 290X as compared to the 280X. On the front end 290X/Hawaii has those 8 ACEs versus 280X/Tahiti’s 2 ACEs, potentially allowing 290X to queue up a lot more work and to keep itself better fed as a result; though in practice we don’t expect most workloads to be able to put the additional ACEs to good use at the moment. Meanwhile on the back end 290X has that 11% memory bandwidth boost and the 33% increase in L2 cache, which in compute workloads can be largely dedicated to said computational work. On the other hand 290X takes a hit to its double precision floating point (FP64) rate versus 280X, so in double precision scenarios it’s certainly going to enter with a larger handicap.

As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Unfortunately Civ V can’t tell us much of value, due to the fact that we’re running into CPU bottlenecks, not to mention increasingly absurd frame rates. In the 3 years since this game was released high-end CPUs are around 20% faster per core, whereas GPUs are easily 150% faster (if not more). As such the GPU portion of texture decoding has apparently started outpacing the CPU portion, though this is still an enlightening benchmark for anything less than a high-end video card.

For what it is worth, the 290X can edge out the GTX 780 here, only to fall to GTX Titan. But in these CPU limited scenarios the behavior at the very top can be increasingly inconsistent.

Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

LuxMark by comparison is very simple and very scalable. 290X packs with it a significant increase in computational resources, so 290X picks up from where 280X left off and tops the chart for AMD once more. Titan is barely half as fast here, and GTX 780 falls back even further. Though the fact that scaling from the 280X to 290X is only 16% – a bit less than half of the increase in CUs – is surprising at first glance. Even with the relatively simplistic nature of the benchmark, it has shown signs in the past of craving memory bandwidth and certainly this seems to be one of those times. Feeding those CUs with new rays takes everything the 320GB/sec memory bus of the 290X can deliver, putting a cap on performance gains versus the 280X.

Our 3rd compute benchmark is Sony Vegas Pro 12, an OpenGL and OpenCL video editing and authoring package. Vegas can use GPUs in a few different ways, the primary uses being to accelerate the video effects and compositing process itself, and in the video encoding step. With video encoding being increasingly offloaded to dedicated DSPs these days we’re focusing on the editing and compositing process, rendering to a low CPU overhead format (XDCAM EX). This specific test comes from Sony, and measures how long it takes to render a video.

Vegas is another title where GPU performance gains are outpacing CPU performance gains, and as such earlier GPU offloading work has reached its limits and led to the program once again being CPU limited. It’s a shame GPUs have historically underdelivered on video encoding (as opposed to video rendering), as wringing significantly more out of Vegas will require getting rid of the next great CPU bottleneck.

Our 4th benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

Curiously, the 290X’s performance advantage over 280X is unusual dependent on the specific sub-test. The fluid simulation scales decently enough with the additional CUs, but the computer vision benchmark is stuck in the mud as compared to the 280X. The fluid simulation is certainly closer than the vision benchmark towards being the type of stupidly parallel workload GPUs excel at, though that doesn’t fully explain the lack of scaling in computer vision. If nothing else it’s a good reminder of why professional compute workloads are typically profiled and optimized against specific target hardware, as it reduces these kinds of outcomes in complex, interconnected workloads.

Moving on, our 5th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home has moved exclusively to OpenCL this year with FAHCore 17.

With FAHBench we’re not fully convinced that it knows how to best handle 290X/Hawaii as opposed to 280X/Tahiti. The scaling in single precision explicit is fairly good, but the performance regression in the water-free (and generally more GPU-limited) implicit simulation is unexpected. Consequently while the results are accurate for FAHCore 17, it’s hopefully something AMD and/or the FAH project can work out now that 290X has been released.

Meanwhile double precision performance also regresses, though here we have a good idea why. With DP performance on 290X being 1/8 FP32 as opposed to ¼ on 280X, this is a benchmark 290X can’t win. Though given the theoretical performance differences we should be expecting between the two video cards – 290X should have about 70% of the FP 64 performance of 280X – the fact that 290X is at 82% bodes well for AMD’s newest GPU. However there’s no getting around the fact that the 290X loses to GTX 780 here even though the GTX 780 is even more harshly capped, which given AMD’s traditional strength in OpenCL compute performance is going to be a let-down.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

SystemCompute and the underlying C++ AMP environment scales relatively well with the additional CUs offered by 290X. Not only does the 290X easily surpass the GTX Titan and GTX 780 here, but it does so while also beating the 280X by 18%. Or to use AMD’s older GPUs as a point of comparison, we’re up to a 3.4x improvement over 5870, well above the improvement in CU density alone and another reminder of how AMD has really turned things around on the GPU compute side with GCN.

Synthetics Power, Temperature, & Noise
Comments Locked

396 Comments

View All Comments

  • Antiflash - Thursday, October 24, 2013 - link

    I've usually prefer Nvidia Cards, but they have it well deserved when decided to price GK110 to the stratosphere just "because they can" and had no competition. That's poor way to treat your customers and taking advantage of fanboys. Full implementation of Tesla and Fermi were always priced around $500. Pricing Keppler GK110 at $650+ was stupid. It's silicon after all, you should get more performance for the same price each year. Not more performance at a premium price as Nvidia tried to do this generation. AMD is not doing anything extraordinary here they are just not following nvidia price gouging practices and $550 is their GPU at historical market prices for their flagship GPU. We would not have been having this discussion if Nvidia had done the same with GK110.
  • blitzninja - Saturday, October 26, 2013 - link

    OMG, why won't you people get it? The Titan is a COMPUTE-GAMING HYBRID card, it's for professionals who run PRO apps (ie. Adobe Media product line, 3D Modeling, CAD, etc) but are also gamers and don't want to have SLI setups for gaming + compute or they can't afford to do so.

    A Quadro card is $2500, this card has 1 less SMX unit and no PRO customer driver support but is $1000 and does both Gaming AND Compute, as far as low-level professionals are concerned this thing is the very definition of steal. Heck, you SLI two of these things and you're still up $500 from a K6000.

    What usually happens is the company they work at will have Quadro workstations and at home the employee has a Titan. Sure it's not as good but it gets the job done until you get back to work.

    Please check your shit. Everyone saying R9 290X--and yes I agree for gaming it's got some real good price/performance--destroys the Titan is ignorant and needs to do some good long research into:
    A. How well the Titan sold
    B. The size of the compute market and MISSING PRICE POINTS in said market.
    C. The amount of people doing compute who are also avid gamers.
  • chimaxi83 - Thursday, October 24, 2013 - link

    Impressive. This cards beats Nvidia on EVERY level! Price, performance, features, power..... every level. Nvidia paid the price for gouging it's customers, they are going to lose a ton of marketshare. I doubt they have anything to match this for at least a year.
  • Berzerker7 - Thursday, October 24, 2013 - link

    Sounds like a bot. The card is worse than a Titan on every point except high resolution (read: 4K), including power, temperature and noise.
  • testbug00 - Thursday, October 24, 2013 - link

    Er, the Titan beats it on being higher priced, looking nicer, having a better cooler and using less power.

    even in 1080p a 290x approxs ties (slightly ahead according to techpowerup (4%)) the Titan.

    Well, a $550 card that can tie a $1000 card in a resolution a card that fast really shouldn't be bought for (seriously, if you are playing in 1200p or less there is no reason to buy any GPU over $400 unless you plan to ugprade screens soon)
  • Sancus - Thursday, October 24, 2013 - link

    The Titan was a $1000 card when it was released.... 8 months ago. So for 8 months nvidia has had the fastest card and been able to sell it at a ridiculous price premium(even at $1000, supply of Titans was quite limited, so it's not like they would have somehow benefited from setting the price lower... in fact Titan would probably have made more money for Nvidia at an even HIGHER price).

    The fact that ATI is just barely matching Nvidia at regular resolutions and slightly beating them at 4k, 8 months later, is a baseline EXPECTATION. It's hardly an achievement. If they had released anything less than the 290X they would have completely embarrassed themselves.

    And I should point out that they're heavily marketing 4k resolution for this card and yet frame pacing in Crossfire even with their 'fixes' is still pretty terrible, and if you are seriously planning to game at 4k you need Crossfire to be actually usable, which it has never really been.
  • anubis44 - Thursday, October 24, 2013 - link

    The margin of victory for the R9 290X over the Titan at 4K resolutions is not 'slight', it's substantial. HardOCP says it's 10-15% faster on average. That's a $550 card that's 10-15% faster than a $1000 card.

    What was that about AMD being embarassed?
  • Sancus - Thursday, October 24, 2013 - link

    By the time more than 1% of the people buying this card even have 4k monitors 20nm cards will have been on sale for months. Not only that but you would basically go deaf next to a Crossfire 290x setup which is what you need for 4k. And anyway, the 290x is faster only because it's been monstrously over clocked beyond the ability of its heatsink to cool it properly. 780/Titan are still far more viable 2/3/4 GPU cards because of their superior noise and power consumption.

    All 780s overclock to considerably faster than this card at ALL resolutions so the gtx 780ti is probably just an OCed 780, and it will outperform the 290x while still being 10db quieter.
  • DMCalloway - Thursday, October 24, 2013 - link

    You mention monstrously OC'ing the 290x yet have no problem OC'ing the 780 in order to create a 780ti. Everyone knows that aftermarket coolers will keep the noise and temps. in check when released. Let's deal with the here and now, not speculate on future cards. Face it; AMD at least matches or beats a card costing $100 more which will cause Nvidia to launch the 780ti at less than current 780 prices.
  • Sancus - Thursday, October 24, 2013 - link

    You don't understand how pricing works. AMD is 8 months late to the game. They've released a card that is basically the GTX Titan, except it uses more than 50W more power and has a bargain basement heatsink. That's why it's $100 cheaper. Because AMD is the one who are far behind and the only way for them to compete is on price. They demonstrably can't compete purely based on performance, if the 290X was WAY better than the GTX Titan, AMD would have priced it higher because guess what, AMD needs to make a profit too -- and they consistently have lost money for years now.

    The company that completely owned the market to the point they could charge $1000 for a video card are the winners here, not the one that arrived out of breath at the finish line 8 months later.

    I would love for AMD to be competitive *at a competitive time* so that we didn't have to pay $650 for a GTX 780, but the fact of the matter is that they're simply not.

Log in

Don't have an account? Sign up now