Compute

Shifting gears, we have our look at compute performance.

As we outlined earlier, GTX Titan X is not the same kind of compute powerhouse that the original GTX Titan was. Make no mistake, at single precision (FP32) compute tasks it is still a very potent card, which for consumer level workloads is generally all that will matter. But for pro-level double precision (FP64) workloads the new Titan lacks the high FP64 performance of the old one.

Starting us off for our look at compute is LuxMark3.0, the latest version of the official benchmark of LuxRender 2.0. LuxRender’s GPU-accelerated rendering mode is an OpenCL based ray tracer that forms a part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

Compute: LuxMark 3.0 - Hotel

While in LuxMark 2.0 AMD and NVIDIA were fairly close post-Maxwell, the recently released LuxMark 3.0 finds NVIDIA trailing AMD once more. While GTX Titan X sees a better than average 41% performance increase over the GTX 980 (owing to its ability to stay at its max boost clock on this benchmark) it’s not enough to dethrone the Radeon R9 290X. Even though GTX Titan X packs a lot of performance on paper, and can more than deliver it in graphics workloads, as we can see compute workloads are still highly variable.

For our second set of compute benchmarks we have CompuBench 1.5, the successor to CLBenchmark. CompuBench offers a wide array of different practical compute workloads, and we’ve decided to focus on face detection, optical flow modeling, and particle simulations.

Compute: CompuBench 1.5 - Face Detection

Compute: CompuBench 1.5 - Optical Flow

Compute: CompuBench 1.5 - Particle Simulation 64K

Although GTX Titan X struggled at LuxMark, the same cannot be said for CompuBench. Though the lead varies with the specific sub-benchmark, in every case the latest Titan comes out on top. Face detection in particular shows some massive gains, with GTX Titan X more than doubling the GK110 based GTX 780 Ti's performance.

Our 3rd compute benchmark is Sony Vegas Pro 13, an OpenGL and OpenCL video editing and authoring package. Vegas can use GPUs in a few different ways, the primary uses being to accelerate the video effects and compositing process itself, and in the video encoding step. With video encoding being increasingly offloaded to dedicated DSPs these days we’re focusing on the editing and compositing process, rendering to a low CPU overhead format (XDCAM EX). This specific test comes from Sony, and measures how long it takes to render a video.

Compute: Sony Vegas Pro 13 Video Render

Traditionally a benchmark that favors AMD, GTX Titan X closes the gap some. But it's still not enough to surpass the R9 290X.

Moving on, our 4th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, utilizing the OpenCL path for FAHCore 17.

Compute: Folding @ Home: Explicit, Single Precision

Compute: Folding @ Home: Implicit, Single Precision

Folding @ Home’s single precision tests reiterate just how powerful GTX Titan X can be at FP32 workloads, even if it’s ostensibly a graphics GPU. With a 50-75% lead over the GTX 780 Ti, the GTX Titan X showcases some of the remarkable efficiency improvements that the Maxwell GPU architecture can offer in compute scenarios, and in the process shoots well past the AMD Radeon cards.

Compute: Folding @ Home: Explicit, Double Precision

On the other hand with a native FP64 rate of 1/32, the GTX Titan X flounders at double precision. There is no better example of just how much the GTX Titan X and the original GTX Titan differ in their FP64 capabilities than this graph; the GTX Titan X can’t beat the GTX 580, never mind the chart-topping original GTX Titan. FP64 users looking for an entry level FP64 card would be well advised to stick with the GTX Titan Black for now. The new Titan is not the prosumer compute card that was the old Titan.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

Compute: SystemCompute v0.5.7.2 C++ AMP Benchmark

With the GTX 980 already performing well here, the GTX Titan X takes it home, improving on the GTX 980 by 31%. Whereas GTX 980 could only hold even with the Radeon R9 290X, the GTX Titan X takes a clear lead.

Overall then the new GTX Titan X can still be a force to be reckoned with in compute scenarios, but only when the workloads are FP32. Users accustomed to the original GTX Titan’s FP64 performance on the other hand will find that this is a very different card, one that doesn’t live up to the same standards.

Synthetics Power, Temperature, & Noise
Comments Locked

276 Comments

View All Comments

  • looncraz - Tuesday, March 17, 2015 - link

    If the most recent slides (allegedly leaked from AMD) hold true, the 390x will be at least as fast as the Titan X, though with only 8GB of RAM (but HBM!).

    A straight 4096SP GCN 1.2/3 GPU would be a close match-up already, but any other improvements made along the way will potentially give the 390X a fairly healthy launch-day lead.

    I think nVidia wanted to keep AMD in the dark as much as possible so that they could not position themselves to take more advantage of this, but AMD decided to hold out, apparently, until May/June (even though they apparently already have some inventory on hand) rather than give nVidia a chance to revise the Titan X before launch.

    nVidia blinked, it seems, after it became apparent AMD was just going to wait out the clock with their current inventory.
  • zepi - Wednesday, March 18, 2015 - link

    Unless AMD has achieved considerable increase in perf/w, they are going to have really hard time tuning those 4k shaders to a reasonable frequency without being a 450W card.

    Not that being a 500W is necessarily a deal breaker for everyone, but in practice cooling a 450W card without causing ear shattering level of noise is very difficult compared to cooling a 250W card.

    Let us wait and hope, since AMD really would need to get a break and make some money on this one...
  • looncraz - Wednesday, March 18, 2015 - link

    Very true. We know that with HBM there should already be a fairly beefy power savings (~20-30W vs 290X it seems).

    That doesn't buy them room for 1,280 more SPs, of course, but it should get them a healthy 256 of them. Then, GCN 1.3 vs 1.1 should have power advantages as well. GCN 1.2 vs 1.0 (R9 285 vs R9 280) with 1792 SPs showed a 60W improvement, if we assume GCN 1.1 to GCN 1.3 shows a similar trend the 390X should be pulling only about 15W more than the 290X with the rumored specs without any other improvements.

    Of course, the same math says the 290X should be drawing 350W, but that's because it assumes all the power is in the SPs... But I do think it reveals that AMD could possibly do it without drawing much, if any, more power without making any unprecedented improvements.
  • Braincruser - Wednesday, March 18, 2015 - link

    Yeah, but the question is, How well will the memory survive on top of a 300W GPU?
    Because the first part in a graphic card to die from high temperatures is the VRAM.
  • looncraz - Thursday, March 19, 2015 - link

    It will be to the side, on a 2.5d interposer, I believe.

    GPU thermal energy will move through the path of least resistance (technically, to the area with the greatest deltaT, but regulated by the material thermal conductivity coefficient), which should be into the heatsink or water block. I'm not sure, but I'd think the chips could operate in the same temperature range as the GPU, but maybe not. It may be necessary to keep them thermally isolated. Which shouldn't be too difficult, maybe as simple as not using thermal pads at all for the memory and allowing them to passively dissipate heat (or through interposer mounted heatsinks).

    It will be interesting to see what they have done to solve the potential issues, that's for sure.
  • Xenonite - Thursday, March 19, 2015 - link

    Yes, I agree that AMD would be able to absolutely destroy NVIDIA on the performance front if they designed a 500W GPU and left the PCB and waterblock design to their AIB partners.

    I would also absolutely love to see what kind of performance a 500W or even a 1kW graphics card would be able to muster; however, since a relatively constant 60fps presented with less than about 100ms of total system latency has been deemed sufficient for a "smooth and responsive" gaming experience, I simply can't imagine such a card ever seeing the light of day.
    And while I can understand everyone likes to pretend that they are saving the planet with their <150W GPUs, the argument that such a TDP would be very difficult to cool does not really hold much water IMHO.

    If, for instance, the card was designed from the ground up to dissipate its heat load over multiple 200W~300W GPUs, connected via a very-high-speed, N-directional data interconnect bus, the card could easily and (most importantly) quietly be cooled with chilled-watercooling dissipating into a few "quad-fan" radiators. Practically, 4 GM200-size GPUs could be placed back-to-back on the PCB, with each one rendering a quarter of the current frame via shared, high-speed frame buffers (thereby eliminating SLI-induced microstutter and "frame-pacing" lag). Cooling would then be as simple as installing 4 standard gpu-watercooling loops with each loop's radiator only having to dissipate the TDP of a single GPU module.
  • naxeem - Tuesday, March 24, 2015 - link

    They did solve that problem with a water-cooling solution. 390X WCE is probably what we'll get.
  • ShieTar - Wednesday, March 18, 2015 - link

    Who says they don't allow it? EVGA have already anounced two special models, a superclocked one and one with a watercooling-block:

    http://eu.evga.com/articles/00918/EVGA-GeForce-GTX...
  • Wreckage - Tuesday, March 17, 2015 - link

    If by fast you mean June or July. I'm more interested in a 980ti so I don't need a new power supply.
  • ArmedandDangerous - Saturday, March 21, 2015 - link

    There won't ever be a 980 Ti if you understand Nvidia's naming schemes. Ti's are for unlocked parts, there's nothing to further unlock on the 980 GM204.

Log in

Don't have an account? Sign up now