Compute and Synthetics

Moving on from our look at gaming performance, we have our customary look at compute performance. Kepler’s compute performance has been hit and miss as we’ve seen on GK104 cards, so it will be interesting to see how GK107 fares.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

Because this is a compute benchmark the massive increase in ROPs coming from GT 440 to GT 640 doesn’t help the GT 640, which means the GT 640 is relying on the smaller increase in shader performance. The end result is that the GT 640 neither greatly improves on the GT 440 nor is it competitive with the 7750. Compared to the GT 440 compute shader performance only improved by 28%, and the 7750 is some 50% faster here. I suspect memory bandwidth is still a factor here, so we’ll have to see what GDDR5 cards are like.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

NVIDIA’s poor OpenCL performance under Kepler doesn’t do them any favors here. Even the GT 240 – a DX10.1 card that doesn’t have the compute enhancements of Fermi – manages to beat the GT 640 here. And the GT 440 is only a few percent behind the GT 640.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

The GT 640 is at the very bottom of the chart. NVIDIA’s downplaying of OpenCL performance is a deliberate decision, but it’s also a decision with consequences.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

All indications are that our fluid simulation benchmark is light on memory bandwidth usage and heavy on cache usage, which makes this a particularly exciting benchmark. Our results back this theory, as for the first and only time the GT 640 shoots past the GTS 450 and coms close to tying the GTX 550Ti. The 7750 still handily wins here, but based on the specs of GK107 I believe this is the benchmark most representative of what GK107 is capable of when it’s not facing such a massive memory bandwidth bottleneck. It will be interesting to see what GDDR5 GK107 cards do here, if only to further validate our assumptions about this benchmark’s memory bandwidth needs.

Our final benchmark is a look at CUDA performance, based on a special benchmarkable version of the CUDA Folding@Home client that NVIDIA  and the Folding@Home group have sent over. Folding@Home and similar initiatives are still one of the most popular consumer compute workloads, so it’s something NVIDIA wants their GPUs to do well at.

Folding@Home has historically pushed both shader performance and memory bandwidth, so it’s not particularly surprising that the GT 640 splits the difference. It’s faster than the GT 440 by 32%, but the GTS 450 still has a 25% lead in spite of the fact that the GT 640 has the greater theoretical compute performance. This is another test that will be interesting to revisit once GDDR5 cards hit the market.

Synthetics

Jumping over to synthetic benchmarks quickly, it doesn’t look like we’ll be able to tease much more out of GK107 at this time. GT 640 looks relatively good under 3DMark in both Pixel Fill and Texel fill, but as we’ve seen real-world performance doesn’t match that. Given that the GT 640 does this well with DDR3 however, it’s another sign that a GDDR5 card may be able to significantly improve on the DDR3 GT 640.

Tessellation performance is also really poor here, however there’s no evidence that this is a memory bandwidth issue. The culprit appears to be the scalability of NVIDIA’s tessellation design – it scales down just as well as it scales up, leaving cards with low numbers of SMXes with relatively low tessellation performance. NVIDIA’s improvements to their Polymorph Engines do shine through here as evidences by the GT 640’s performance improvement relative to the GT 440, but it’s not a complete substitute to just having more Polymorph Engines.

Portal 2, Battlefield 3, Starcraft II, Skyrim, & Civ V Power, Temperature, & Noise
Comments Locked

60 Comments

View All Comments

  • cjs150 - Thursday, June 21, 2012 - link

    "God forbid there be a technical reason for it.... "

    Intel and Nvidia have had several generations of chip to fix any technical issue and didnt (HD4000 is good enough though). AMD have been pretty close to the correct frame rate for a while.

    But it is not enough to have the capability to run at the correct frame rate is you make it too difficult to change the frame rate to the correct setting. That is not a hardware issue just bad design of software.
  • UltraTech79 - Wednesday, June 20, 2012 - link

    Anyone else really disappointed in 4 still being standardized around 24 fps? I thought 60 would be the min standard by now with 120 in higher end displays. 24 is crap. Anyone that has seen a movie recorded at 48+FPS know whats I'm talking about.

    This is like putting shitty unleaded gas into a super high-tech racecar.
  • cjs150 - Thursday, June 21, 2012 - link

    You do know that Blu-ray is displayed at 23.976 FPS? That looks very good to me.

    Please do not confuse screen refresh rates with frame rates. Screen refresh runs on most large TVs at between 60 and 120 Hz, anything below 60 tends to look crap. (if you want real crap trying running American TV on an European PAL system - I mean crap in a technical sense not creatively!)

    I must admit that having a fps of 23.976 rather than some round number such as 24 (or higher) FPS is rather daft and some new films are coming out with much higher FPS. I have a horrible recollection that the reason for such an odd FPS is very historic - something to do with the length of 35mm film that would be needed per second, the problem is I cannot remember whether that was simply because 35mm film was expensive and it was the minimum to provide smooth movement or whether it goes right back to days when film had a tendency to catch light and then it was the maximum speed you could put a film through a projector without friction causing the film to catch light. No doubt there is an expert on this site who could explain precisely why we ended up with such a silly number as the standard
  • UltraTech79 - Friday, June 22, 2012 - link

    You are confusing things here. I clearly said 120(fps) would need higher end displays (120Hz) I was rounding up 23.976 FPS to 24, give me a break.

    It looks good /to you/ is wholly irrelevant. Do you realize how many people said "it looks very good to me." Referring to SD when resisting the HD movement? Or how many will say it again referring to 1080p thinking 4k is too much? It's a ridiculous mindset.

    My point was that we are upping the resolution, but leaving another very important aspect in the dust that we need to improve. Even audio is moving faster than framerates in movies, and now that most places are switching to digital, the cost to goto the next step has dropped dramatically.
  • nathanddrews - Friday, June 22, 2012 - link

    It was NVIDIA's choice to only implement 4K @ 24Hz (23.xxx) due to limitations of HDMI. If NVIDIA had optimized around DisplayPort, you could then have 4K @ 60Hz.

    For computer use, anything under 60Hz is unacceptable. For movies, 24Hz has been the standard for a century - all film is 24fps and most movies are still shot on film. In the next decade, there will be more and more films that will use 48, 60, even 120fps. Cameron was cock-blocked by the studio when he wanted to film Avatar at 60fps, but he may get his wish for the sequels. Jackson is currently filming The Hobbit at 48fps. Eventually all will be right with the world.
  • karasaj - Wednesday, June 20, 2012 - link

    If we wanted to use this to compare a 640M or 640M LE to the GT640, is this doable? If it's built on the same card, (both have 384 CUDA cores) can we just reduce the numbers by a rough % of the core clock speed to get rough numbers that the respective cards would put out? I.E. the 640M LE has a clock of 500mhz, the 640M is ~625Mhz. Could we expect ~55% of this for the 640M LE and 67% for the 640M? Assuming DDR3 on both so as not to have that kind of difference.
  • Ryan Smith - Wednesday, June 20, 2012 - link

    It would be fairly easy to test a desktop card at a mobile card's clocks (assuming memory type and functional unit count was equal) but you can't extrapolate performance like that because there's more to performance than clockspeeds. In practice performance shouldn't drop by that much since we're already memory bandwidth bottlenecked with DDR3.
  • jstabb - Wednesday, June 20, 2012 - link

    Can you verify if creating a custom resolution breaks 3D (frame packed) blu-ray playback?

    With my GT430, once a custom resolution has been created for 23/24hz, that custom resolution overrides the 3D frame-packed resolution created when 3D vision is enabled. The driver appeared to have a simple fall through logic. If a custom resolution is defined for the selected resolution/refresh rate it is always used, failing that it will use a 3D resolution if one is defined, failing that it will use the default 2D resolution.

    This issue made the custom resolution feature useless to me with the GT430 and pushed me to an AMD solution for their better OOTB refresh rate matching. I'd like to consider this card if the issue has been resolved.

    Thanks for the great review!
  • MrSpadge - Wednesday, June 20, 2012 - link

    It consumes about just as much as the HD7750-800, yet performs miserably in comparison. This is an amazing win for AMD, especially comparing GTX680 and HD7970!
  • UltraTech79 - Wednesday, June 20, 2012 - link

    This preform about as well as an 8800GTS for twice the price. Or half the preformance of a 460GTX for the same price.

    These should have been priced at 59.99.

Log in

Don't have an account? Sign up now