Compute Performance

As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.  While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Compute: Civilization V

Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

Compute: LuxMark 2.0

Our 3rd benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

Compute: CLBenchmark 1.1 Computer Vision

Compute: CLBenchmark 1.1 Fluid Simulation

Moving on, our 4th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home is moving exclusively OpenCL this year with FAHCore 17.

Compute: Folding @ Home: Explicit, Single Precision

Compute: Folding @ Home: Implicit, Single Precision

Our 5th compute benchmark is Sony Vegas Pro 12, an OpenGL and OpenCL video editing and authoring package. Vegas can use GPUs in a few different ways, the primary uses being to accelerate the video effects and compositing process itself, and in the video encoding step. With video encoding being increasingly offloaded to dedicated DSPs these days we’re focusing on the editing and compositing process, rendering to a low CPU overhead format (XDCAM EX). This specific test comes from Sony, and measures how long it takes to render a video.

Compute: Sony Vegas Pro 12 Video Render

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

Compute: SystemCompute v0.5.7.2 C++ AMP Benchmark

Civilization V Synthetics
POST A COMMENT

107 Comments

View All Comments

  • Spunjji - Friday, March 22, 2013 - link

    ...forgive my stupidity. Actual figures of the 7790 here:
    http://www.techpowerup.com/reviews/Sapphire/HD_779...

    Depends on whether we focus on Peak / Max figures to decide whether you or I am closer to the truth. :)
    Reply
  • Ryan Smith - Friday, March 22, 2013 - link

    Typical Board Power, not Total. TBP is an average rather than a peak like TDP, which is why it's a lower number than TDP. Reply
  • dbcoopernz - Friday, March 22, 2013 - link

    Any details on UVD module? Any changes?

    The Asus Direct Cu-II might make an interesting high power but quiet HTPC card. Any chance of a review?
    Reply
  • Ryan Smith - Friday, March 22, 2013 - link

    There are no changes that we have been made aware of. Reply
  • haplo602 - Friday, March 22, 2013 - link

    somebody please make this a single slot card and I am sold ... otherwise I'll wait for the 8k radeons ... Reply
  • Shut up and drink - Friday, March 22, 2013 - link

    Has it occurred to anyone else that this is in all probability an OEM release of the "semi-custom" silicon that will find its way into Sony's Playstation 4 in the fall?

    Word has it that Sony has some form of GPU switching tech integrated into the PS4.

    - apologies for the link to something other than Anand but I don't think they ran anything on the story http://www.tomshardware.com/news/sony-ps4-patent-p...

    Initially I presumed this to be some "Optimus"-esque dynamic context switching power saving routine. However, the patent explicitly states, "This architecture lets a user run one or more GPUs in parallel, but only for the purpose of increasing performance, not to reduce power consumption."
    Which struck me as some kind of expansion on the nebulous "hybrid crossfire" tech that AMD has been playing w/since they birthed the 3000 series 780G igpu

    Based off of AMD's previous endeavors in this area on the PC side I would be skeptical of the benefits/merit of pairing the comparatively anemic iGPU's of Kabini w/a presumably Bonaire derived GPU.
    As an aside; since SLI/CFX work by issuing frames to the next GPU available, if one GPU is substantially faster than the other(s), frames get finished out-of-order and the IGP/slower-GPU's tardy frames simply get dropped which may make the final rendered video stuttery/choppy.

    Pairing an IGP with a disproportionately powerful discrete GPU simply does not work for realtime rendering.

    It is certainly possible that with the static nature of the console and perhaps especially the unified nature of the GDDR5 memory pool/bank that performance gains could be had

    However, my digression on the merits of the tech thus far is
    128 + 128 = 256 + 896 = Anand's own deduction of 1152sp's)
    Reply
  • Shut up and drink - Friday, March 22, 2013 - link

    I pushed submit by mistake...damn...

    oh well...my last point of arithmetic was simply that 1 fully enabled 4 core Kabini's I'm suspecting would have a 128 shader count igpu. Factor in the much ballyhooed 8-core Cpu in the PS4 we would have two Kabini's (128+128=256) + a Bonaire derived 896sp GPU all on some kind of custom MCM style packaging "semi-custom APU" (rumor had it that the majority of Sony's R&D contributions were in the stacking/packaging dept.)

    Anyone concur?
    Reply
  • Shut up and drink - Friday, March 22, 2013 - link

    ...which jives w/Anand's own piece that ran on the console's unveiling, "Sony claims the GPU features 18 compute units, which if this is GCN based we'd be looking at 1152 SPs and 72 texture units"

    http://www.anandtech.com/show/6770/sony-announces-...
    Reply
  • A5 - Friday, March 22, 2013 - link

    Yeah, once this came in at 14 CUs with minor architecture changes, it seemed like a likely scenario to me.

    Obviously it isn't going to give you PS4 performance on ports with only 1GB of memory, though.
    Reply
  • crimson117 - Friday, March 22, 2013 - link

    Good thought, but I sure hope Sony doesn't hamstring its PS4 with a 128-bit memory bus! Reply

Log in

Don't have an account? Sign up now