Compute Performance

As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.  While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Compute: Civilization V

Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

Compute: LuxMark 2.0

Our 3rd benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

Compute: CLBenchmark 1.1 Computer Vision

Compute: CLBenchmark 1.1 Fluid Simulation

Moving on, our 4th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home is moving exclusively OpenCL this year with FAHCore 17.

Compute: Folding @ Home: Explicit, Single Precision

Compute: Folding @ Home: Implicit, Single Precision

Our 5th compute benchmark is Sony Vegas Pro 12, an OpenGL and OpenCL video editing and authoring package. Vegas can use GPUs in a few different ways, the primary uses being to accelerate the video effects and compositing process itself, and in the video encoding step. With video encoding being increasingly offloaded to dedicated DSPs these days we’re focusing on the editing and compositing process, rendering to a low CPU overhead format (XDCAM EX). This specific test comes from Sony, and measures how long it takes to render a video.

Compute: Sony Vegas Pro 12 Video Render

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

Compute: SystemCompute v0.5.7.2 C++ AMP Benchmark

Civilization V Synthetics
Comments Locked

107 Comments

View All Comments

  • GivMe1 - Friday, March 22, 2013 - link

    128bit interface is going to hurt high res textures...
  • CeriseCogburn - Sunday, March 24, 2013 - link

    Oh no it won't ! this is amd man! nothing hurts when it's amd ! amd yes it can !
  • Quizzical - Friday, March 22, 2013 - link

    Your chart shows Radeon HD 6870 FP64 performance as N/A. I think it's 1/20 of FP32 performance, but I'm not sure of that. It definitely can do FP64, as otherwise, it wouldn't be able to claim OpenGL 4 compliance.
  • MrSpadge - Friday, March 22, 2013 - link

    No, it doesn't have any HARDWARE FP64 capabilities. It's always possible to emulate this at slow performance via software, though.
  • Quizzical - Friday, March 22, 2013 - link

    It's basically the same as what the 7770, 7790, and 7850 do, but they're not listed as N/A. The relevant question isn't whether you can do it more slowly, but how much more slowly.
  • MrSpadge - Tuesday, March 26, 2013 - link

    No, it's not the same, the GCN cards have hardware FP64 capabilities.
  • Ryan Smith - Friday, March 22, 2013 - link

    Let's be clear here. 85W is not the TDP. The TDP is higher (likely on the order of 110W or so). However AMD chooses not to publish the TDP for these lower end cards, and instead the TBP.
  • alwayssts - Friday, March 22, 2013 - link

    Yeah, I figure ~85 TBP/105w TDP because that would be smack between 7770/7850 as well as having 20% headroom (which also allows another product to have their TBP between there and 7850's max TDP with it's max tdp above it within 150w....ie ~120-125/150w). IIRC, 80w is the powertune max (TDP) of 7770, 130w for 7850. 85w is the stock operation (TBP) of 7790.

    I really, really dislike how convoluted this power game has become...can you tell?!

    First it was max power. Then it was nvidia stating typical power (so products were within pci-e spec) with AMD still quoting max, which made them look bad. Then we get this 'awesome' product segmentation with 7000 having TBP and max powertune TDPs to separate them, while nvidia quotes TBP and hides the fact the TDP limits for their products exist unless you deduce them from the percentage you can up the boost power.

    AAAAaaaarrrrrghhhhh. I miss when the product you had could do what you wanted it to, ie before software voltage control and multiple states, as for products like this it gives the user less control and the companies a ton to create segmentation. Low-end stock products may have been less-than-stellar back in the day, but with determination you could get something out of it without some marketing stating it should fit x niche so give it y max tdp so it doesn't interfere with the market of z product.
  • CeriseCogburn - Friday, March 22, 2013 - link

    Maybe so you couldn't blow the crap out of it then return it for another one, then another one, as "you saved money" and caused everyone else to pay 25% more since you overclock freaks would blow them up, then LIE and get the freebie replacement, over and over again.

    Maybe they got sick of dealing with scam artist liars... maybe they aren't evil but the end user IS.
  • Spunjji - Friday, March 22, 2013 - link

    Why would the design power be higher than the total board power? :/ You're correct that the figure they're quoting isn't TDP but then you just went and made up a number.

    Here's some actual power consumption measurements of a 7770:
    http://www.techpowerup.com/reviews/HIS/HD_7770_iCo...

    So using Ananad's figures to extrapolate you can expect this thing to be ~90W max, usually lower than that at peak, right about where AMD put it.

Log in

Don't have an account? Sign up now