Compute and Synthetics

Moving on from our look at gaming performance, we have our customary look at compute performance. Kepler’s compute performance has been hit and miss as we’ve seen on GK104 cards, so it will be interesting to see how GK107 fares.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

Because this is a compute benchmark the massive increase in ROPs coming from GT 440 to GT 640 doesn’t help the GT 640, which means the GT 640 is relying on the smaller increase in shader performance. The end result is that the GT 640 neither greatly improves on the GT 440 nor is it competitive with the 7750. Compared to the GT 440 compute shader performance only improved by 28%, and the 7750 is some 50% faster here. I suspect memory bandwidth is still a factor here, so we’ll have to see what GDDR5 cards are like.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

NVIDIA’s poor OpenCL performance under Kepler doesn’t do them any favors here. Even the GT 240 – a DX10.1 card that doesn’t have the compute enhancements of Fermi – manages to beat the GT 640 here. And the GT 440 is only a few percent behind the GT 640.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

The GT 640 is at the very bottom of the chart. NVIDIA’s downplaying of OpenCL performance is a deliberate decision, but it’s also a decision with consequences.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

All indications are that our fluid simulation benchmark is light on memory bandwidth usage and heavy on cache usage, which makes this a particularly exciting benchmark. Our results back this theory, as for the first and only time the GT 640 shoots past the GTS 450 and coms close to tying the GTX 550Ti. The 7750 still handily wins here, but based on the specs of GK107 I believe this is the benchmark most representative of what GK107 is capable of when it’s not facing such a massive memory bandwidth bottleneck. It will be interesting to see what GDDR5 GK107 cards do here, if only to further validate our assumptions about this benchmark’s memory bandwidth needs.

Our final benchmark is a look at CUDA performance, based on a special benchmarkable version of the CUDA Folding@Home client that NVIDIA  and the Folding@Home group have sent over. Folding@Home and similar initiatives are still one of the most popular consumer compute workloads, so it’s something NVIDIA wants their GPUs to do well at.

Folding@Home has historically pushed both shader performance and memory bandwidth, so it’s not particularly surprising that the GT 640 splits the difference. It’s faster than the GT 440 by 32%, but the GTS 450 still has a 25% lead in spite of the fact that the GT 640 has the greater theoretical compute performance. This is another test that will be interesting to revisit once GDDR5 cards hit the market.

Synthetics

Jumping over to synthetic benchmarks quickly, it doesn’t look like we’ll be able to tease much more out of GK107 at this time. GT 640 looks relatively good under 3DMark in both Pixel Fill and Texel fill, but as we’ve seen real-world performance doesn’t match that. Given that the GT 640 does this well with DDR3 however, it’s another sign that a GDDR5 card may be able to significantly improve on the DDR3 GT 640.

Tessellation performance is also really poor here, however there’s no evidence that this is a memory bandwidth issue. The culprit appears to be the scalability of NVIDIA’s tessellation design – it scales down just as well as it scales up, leaving cards with low numbers of SMXes with relatively low tessellation performance. NVIDIA’s improvements to their Polymorph Engines do shine through here as evidences by the GT 640’s performance improvement relative to the GT 440, but it’s not a complete substitute to just having more Polymorph Engines.

Portal 2, Battlefield 3, Starcraft II, Skyrim, & Civ V Power, Temperature, & Noise
Comments Locked

60 Comments

View All Comments

  • Joe H - Wednesday, June 20, 2012 - link

    This is the type of review that other hardware sites can't even imagine, let alone write. Thanks for putting this and the other HTPC articles together. It's great to see a hardware review site taking HTPC enthusiasts and their needs seriously. Excellent review.
  • n0b0dykn0ws - Wednesday, June 20, 2012 - link

    Is there a chance of a follow up once a few driver updates have been released?

    I would love to see if the card gets even better after a few releases.

    I have a Radeon 6570 right now, and I've found it to be palatable for HTPC purposes.

    n0b0dykn0ws
  • Taft12 - Wednesday, June 20, 2012 - link

    They haven't done it before, I don't know why they'd start now.
  • Ryan Smith - Thursday, June 21, 2012 - link

    What specifically are you looking for? Gaming performance or HTPC functionality? Gaming performance isn't likely to improve; even with the newer architecture it's not Kepler that's the limiting factor. HTPC functionality on the other hand can easily be improved with drivers.
  • n0b0dykn0ws - Thursday, June 21, 2012 - link

    HTPC only. For gaming I would get a 670.

    Sometimes drivers break HTPC performance/quality though. At least in the AMD world.

    n0b0dykn0ws
  • Kevin G - Wednesday, June 20, 2012 - link

    If they're going to release a DDR3 version, why not just offer a version with no onboard memory and two DIMM slots so that users can add there own? You can get a DDR3-2133 kit which would boost bandwidth limited scenarios by roughly 15%. While I don't see the need, such a card could be upgraded all the way to 16 GB of memory.
  • MrSpadge - Thursday, June 21, 2012 - link

    Sockets
    - are unconventional (I don't think nVidia likes this word)
    - introduce a little cost (GPU manufacturer doesn't like it)
    - make the board larger (GPU manufacturer doesn't like it)
    - make the bus timing worse, so it's harder to clock them as high as directly soldered chips (wouldn't matter with DDR3, though)
    - introduce another point of failure (GPU manufacturer doesn't like higher RAM rates)
    - add cost to the overall product, as the end user wouldn't get as sweet a deal on RAM as the GPU manufacturer (this would eat into the GPU manufacturers profit margin)
  • Stuka87 - Wednesday, June 20, 2012 - link

    Sounds like unless temps are really important to you, the 7750-800 is by far the better choice. It outperforms the GT640 (And by a wide margin in some cases) in what looks like, every single test.

    And they are priced the same, which makes the GT640 kind of worthless for its intended price point.
  • cjs150 - Wednesday, June 20, 2012 - link

    Great review.

    It is too noisy, and the HDMI socket is an epic design fail. As a card for an HTPC what were Zotac thinking of? This is so badly wrong.

    Now onto frame rates. Nvidia, AMD and Intel really are total and utter idiots or they have decided that we the customers are total and utter idiots. There is simply no excuse for all IGPs and video cards not to be able to lock on to the correct frame rate with absolute precision. It is not as though the frame rate specs for film have changed recently. I cannot decide whether it is sloppiness, arrogance or they simply do not give a rats a##e for the customer experience.
  • Stuka87 - Wednesday, June 20, 2012 - link

    God forbid there be a technical reason for it....

Log in

Don't have an account? Sign up now