Compute: What You Leave Behind?

As always our final set of benchmarks is a look at compute performance. As we mentioned in our discussion on the Kepler architecture, GK104’s improvements seem to be compute neutral at best, and harmful to compute performance at worst. NVIDIA has made it clear that they are focusing first and foremost on gaming performance with GTX 680, and in the process are deemphasizing compute performance. Why? Let’s take a look.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

Compute: Civilization V

Remember when NVIDIA used to sweep AMD in Civ V Compute? Times have certainly changed. AMD’s shift to GCN has rocketed them to the top of our Civ V Compute benchmark, meanwhile the reality is that in what’s probably the most realistic DirectCompute benchmark we have has the GTX 680 losing to the GTX 580, never mind the 7970. It’s not by much, mind you, but in this case the GTX 680 for all of its functional units and its core clock advantage doesn’t have the compute performance to stand toe-to-toe with the GTX 580.

At first glance our initial assumptions would appear to be right: Kepler’s scheduler changes have weakened its compute performance relative to Fermi.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

SmallLuxGPU 2.0d4

CivV was bad; SmallLuxGPU is worse. At this point the GTX 680 can’t even compete with the GTX 570, let alone anything Radeon. In fact the GTX 680 has more in common with the GTX 560 Ti than it does anything else.

On that note, since we weren’t going to significantly change our benchmark suite for the GTX 680 launch, NVIDIA had a solid hunch that we were going to use SmallLuxGPU in our tests, and spoke specifically of it. Apparently NVIDIA has put absolutely no time into optimizing their now all-important Kepler compiler for SmallLuxGPU, choosing to focus on games instead. While that doesn’t make it clear how much of GTX 680’s performance is due to the compiler versus a general loss in compute performance, it does offer at least a slim hope that NVIDIA can improve their compute performance.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

AESEncryptDecrypt

Starting with our AES encryption benchmark NVIDIA begins a recovery. GTX 680 is still technically slower than GTX 580, but only marginally so. If nothing else it maintains NVIDIA’s general lead in this benchmark, and is the first sign that GTX 680’s compute performance isn’t all bad.

For our fourth compute benchmark we wanted to reach out and grab something for CUDA, given the popularity of NVIDIA’s proprietary API. Unfortunately we were largely met with failure, for similar reasons as we were when the Radeon HD 7970 launched. Just as many OpenCL programs were hand optimized and didn’t know what to do with the Southern Islands architecture, many CUDA applications didn’t know what to do with GK104 and its Compute Capability 3.0 feature set.

To be clear, NVIDIA’s “core” CUDA functionality remains intact; PhysX, video transcoding, etc all work. But 3rd party applications are a much bigger issue. Among the CUDA programs that failed were NVIDIA’s own Design Garage (a GTX 480 showcase package), AccelerEyes’ GBENCH MatLab benchmark, and the latest Folding@Home client. Since our goal here is to stick to consumer/prosumer applications in reflection of the fact that the GTX 680 is a consumer card, we did somewhat limit ourselves by ruling out a number of professional CUDA applications, but  there’s no telling that compatibility there would fare any better.

We ultimately started looking at Distributed Computing applications and settled on PrimeGrid, whose CUDA accelerated GENEFER client worked with GTX 680. Interestingly enough it primarily uses double precision math – whether this is a good thing or not though is up to the reader given the GTX 680’s anemic double precision performance.

PrimeGrid GENEFER 1.06: 1325824^32768+1

Because it’s based around double precision math the GTX 680 does rather poorly here, but the surprising bit is that it did so to a larger degree than we’d expect. The GTX 680’s FP64 performance is 1/24th its FP32 performance, compared to 1/8th on GTX 580 and 1/12th on GTX 560 Ti. Still, our expectation would be that performance would at least hold constant relative to the GTX 560 Ti, given that the GTX 680 has more than double the compute performance to offset the larger FP64 gap.

Instead we found that the GTX 680 takes 35% longer, when on paper it should be 20% faster than the GTX 560 Ti (largely due to the difference in the core clock). This makes for yet another test where the GTX 680 can’t keep up with the GTX 500 series, be it due to the change in the scheduler, or perhaps the greater pressure on the still-64KB L1 cache. Regardless of the reason, it is becoming increasingly evident that NVIDIA has sacrificed compute performance to reach their efficiency targets for GK104, which is an interesting shift from a company that was so gung-ho about compute performance, and a slightly concerning sign that NVIDIA may have lost faith in the GPU Computing market for consumer applications.

Finally, our last benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

DirectX11 Compute Shader Fluid Simulation - Nearest Neighbor

Redemption at last? In our final compute benchmark the GTX 680 finally shows that it can still succeed in some compute scenarios, taking a rather impressive lead over both the 7970 and the GTX 580. At this point it’s not particularly clear why the GTX 680 does so well here and only here, but the fact that this is a compute shader program as opposed to an OpenCL program may have something to do with it. NVIDIA needs solid compute shader performance for the games that use it; OpenCL and even CUDA performance however can take a backseat.

Civilization V Theoreticals
POST A COMMENT

405 Comments

View All Comments

  • jospoortvliet - Thursday, March 22, 2012 - link

    Seeing on other sites, the AMD does overclock better than the NVIDIA card - and the difference in power usage in every day scenario's is that NVIDIA uses a few more watts in idle and a few less under load.

    I'd agree with my dutch hardware.info site which concludes that the two cards are incredibly close and that price should determine what you'd buy.

    A quick look shows that at least in NL, the AMD is about 50 bucks cheaper so unless NVIDIA lowers their price, the 7970 continues to be the better buy.

    Obviously, AMD has higher costs with the bigger die so NVIDIA should have higher margins. If only they weren't so late to market...

    Let's see what the 7990 and NVIDIA's answer to that will do; and what the 8000 and 700 series will do and when they will be released. NVIDIA will have to make sure they don't lag behind AMD anymore, this is hurting them...
    Reply
  • theartdude - Thursday, March 22, 2012 - link

    Late to market? with Battlefield DLC, Diablo III, MechWarrier Online (and many more titles approaching), this is the PERFECT TIME for an upgrade, btw, my computer is begging for an upgrade right now, just in time for summer-time LAN parties. Reply
  • CeriseCogburn - Tuesday, March 27, 2012 - link

    GTX680 overclocks to 1,280 out of the box for an average easy attempt...
    http://www.newegg.com/Product/Product.aspx?Item=N8...

    See the feedback bro.
    7970 makes it to 1200 if it's very lucky.
    Sorry, another lie is 7970 oc's better.
    Reply
  • CeriseCogburn - Tuesday, March 27, 2012 - link

    So you're telling me the LIGHTNING amd card is cheaper ? LOL
    Further, if you don't get that exact model you won't get the overclocks, and they got a pathetic 100 on the nvidia, which noobs surpass regularly, then they used 2dmark 11 which has amd tessellation driver cheating active.... (apparently they are clueless there as well).
    Furthermore, they declared the Nvidia card 10% faster overall- well worth the 50 bucks difference for your generic AMD card no Overclocked LIghtning further overclocked with the special vrm's onboard and much more expensive... then not game tested but benched in amd cheater ware 3dmark 11 tess cheat.
    Reply
  • Reaper_17 - Thursday, March 22, 2012 - link

    i agree, Reply
  • blanarahul - Tuesday, March 27, 2012 - link

    Mr. AMD Fan Boy then you should compare how was AMD doing it since since the HD 5000 Series.

    6970= 880 MHz
    GTX 580=772 MHz
    Is it a fair comparison?

    GTX 480=702 MHz
    HD 5870=850 Mhz
    Is it a fair compaison?

    According to your argument the NVIDIA cards were at a disadvantage since the AMD cards were always clocked higher. But still the NVIDIA cards were better.

    And now that NVIDIA has taken the lead in clock speeds you are crying like a baby that NVIDIA built a souped up overclocked GK104.

    First check the facts. Plus the HD 8000 series aren't gonna come so early.
    Reply
  • CeriseCogburn - Friday, April 06, 2012 - link

    LOL
    +1
    Tell 'em bro !
    (fanboys and fairness don't mix)
    Reply
  • Sabresiberian - Thursday, March 22, 2012 - link

    Yah, I agree here. Clearly, once again, your favorite game and the screen size (resolution) you run at are going to be important factors in making a wise choice.

    ;)
    Reply
  • Concillian - Thursday, March 22, 2012 - link

    "... but he's correct. The 680 does dominate in nearly every situation and category."

    Except some of the most consistently and historically demanding games (Crysis Warhead and Metro 2033) it doesn't fare so well compared to the AMD designs. What does this mean if the PC gaming market ever breaks out of it's console port funk?

    I suppose it's unlikely, but it indicates it handles easy loads well (loads that can often be handled by a lesser card,) but when it comes to the most demanding resolutions and games, it loses a lot of steam compared to the AMD offering, to the point where it goes from a >15% lead in games that don't need it (Portal 2, for example) to a 10-20% loss in Crysis Warhead at 2560x.

    That it struggles in what are traditionally the most demanding games is worrisome, but, I suppose as long as developers continue pumping out the relatively easy to render console ports, it shouldn't pose any major issues.
    Reply
  • Eugene86 - Thursday, March 22, 2012 - link

    Yes, because people are really buying both the 7970 and GTX680 to play Crysis Warhead at 2560x.... :eyeroll:

    Nobody cares about old, unoptimized games like that. How about you take a look at the benchmarks that actually, realistically, matter. Look at the benches for Battlefield 3, which is a game that people are actually playing right now. The GTX680 kills the 7970 with about 35% higher frame rates, according to the benchmarks posted in this review.

    THAT is what actually matters and that is why the GTX680 is a better card than the 7970.
    Reply

Log in

Don't have an account? Sign up now