Compute

Shifting gears, let’s take a look at compute performance on Pascal.

Overall, we’re not expecting a significant difference in compute performance compared to Maxwell 2 for standard compute benchmarks. The fundamental architecture hasn’t changed – the CUDA cores, register files, and caches still behave as before - so there’s little reason for compute performance to shift. GP104 for all intents and purposes should perform like a higher clocked and slightly wider Maxwell 2, similar to what we’ve seen in most games.

However in the long run there is potential for Pascal to show some improvements. The architecture’s improved scheduling features are geared in part towards HPC users, and instruction level preemption means that compute kernels can now be a lot more aggressive on consumer systems since they can be paused so easily. That said, to really leverage any of these improvements, applications utilizing GPU compute need to have work that benefits from better scheduling and be written with Pascal in mind, and for consumer workloads the latter is likely a long way off.

Starting us off for our look at compute is LuxMark3.1, the latest version of the official benchmark of LuxRender. LuxRender’s GPU-accelerated rendering mode is an OpenCL based ray tracer that forms a part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

Compute: LuxMark 3.1 - Hotel

As with games, when it comes to LuxMark, the GTX 1080 is uncontested; this is the first high performance FinFET GPU in action. That said, I’m surprised by how close some of these results cluster. Though GTX 1080 is not a full generational replacement for GTX 980 Ti, normally it outperforms the Big Maxwell card by more than this. Instead we’re looking at a lead of just 10%, notably less than a simple extrapolation of CUDA core counts and frequencies would tell us to expect (GTX 1080 has almost 50% more FLOPs).

That said, GTX 1070 still places very close to GTX 980 Ti – albeit below it – so what we’re seeing isn’t just Pascal being a laggard. Especially since as a consequence of this, GTX 1080 only beats GTX 1070 by 12%. In any case, this may be a case of early drivers, particularly as OpenCL has not been an NVIDIA priority for the last couple of years. Alternatively, as strange as it may be, I’m not ready to rule out LuxMark being CPU limited. It’s something that we’ll have to keep an eye on.

For our second set of compute benchmarks we have CompuBench 1.5, the successor to CLBenchmark. CompuBench offers a wide array of different practical compute workloads, and we’ve decided to focus on face detection, optical flow modeling, and particle simulations.

Compute: CompuBench 1.5 - Face Detection

Compute: CompuBench 1.5 - Optical Flow

Compute: CompuBench 1.5 - Particle Simulation 64K

Depending on which sub-test we’re looking at, CompuBench is all over the place. In Face Detection the GTX 1080 takes a commanding lead, with GTX 1070 easily slotting into second place. On the other hand we have Optical Flow, which NVIDIA cards have traditionally struggled with, where even GTX 1080 can’t unseat Radeon Fury X. Finally in the middle we have the 64K Particle Simulation, which has GTX 1080 in the lead again, but not unlike LuxMark, it also has some interesting clustering going on.

Ultimately each test stresses our GPU collection in different ways, which as we can see greatly influences how the results pan out. Face Detection has always played well to NVIDIA’s strengths, and on a generational basis we get solid scaling from Maxwell 2 to Pascal. Even Optical Flow, which seems to favor raw FLOPs more than anything else, still shows very good gains with Pascal.

Particle Simulation is the outlier in this regard; Pascal’s generational gains are not insignificant, but they’re less than what we’d expect. Furthermore GTX 1080 and GTX 1070 are very closely clustered together despite their much larger difference in FLOPs. This may mean we’re looking at a CPU or driver bottleneck, or possibly some sort of internal path bottleneck. GTX 1080 has more FLOPs and a similar advantage in memory bandwidth, but once you get on chip things get much closer. If nothing else this goes to show that compute benchmarks are much more architecture sensitive than games, which is why we can’t make very broad generalizations for all compute workloads.

Moving on, our 3rd compute benchmark is the next generation release of FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, utilizing the OpenCL path for FAHCore 21.

Compute: Folding @ Home Single Precision

Compute: Folding @ Home Double Precision

In single precision performance, to the surprise of no one the GTX 1080 is solidly in the lead, followed up by the GTX 1070. On a generational basis performance gains are decent, but at 44% for GTX 1080 they aren’t quite as great as we’ve seen from the card elsewhere. Meanwhile the two Pascal cards are again closer than we’d expect, with GTX 1080 leading by only 10%.

As for double precision performance, we can see that even with the higher overall compute throughput of GP104, it still can’t make up for the fact that FP64 performance on the GPU is capped at 1/32 by virtue of so few FP64 CUDA cores, which puts even NVIDIA’s latest and greatest at a disadvantage here. But if nothing else, generational scaling versus Maxwell 2 looks very good, with performance gains closely tracking the theoretical increase in FLOPs.

Hitman Synthetics
Comments Locked

200 Comments

View All Comments

  • grrrgrrr - Wednesday, July 20, 2016 - link

    Solid review! Some nice architecture introductions.
  • euskalzabe - Wednesday, July 20, 2016 - link

    The HDR discussion of this review was super interesting, but as always, there's one key piece of information missing: WHEN are we going to see HDR monitors that take advantage of these new GPU abilities?

    I myself am stuck at 1080p IPS because more resolution doesn't entice me, and there's nothing better than IPS. I'm waiting for HDR to buy my next monitor, but being 5 years old my Dell ST2220T is getting long in the teeth...
  • ajlueke - Wednesday, July 20, 2016 - link

    Thanks for the review Ryan,

    I think the results are quite interesting, and the games chosen really help show the advantages and limitations of the different architectures. When you compare the GTX 1080 to its price predecessor, the 980 Ti, you are getting an almost universal ~25%-30% increase in performance.
    Against rival AMDs R9 Fury X, there is more of a mixed bag. As the resolutions increase the bandwidth provided by the HBM memory on the Fury X really narrows the gap, sometimes trimming the margin to less that 10%,s specifically in games optimized more for DX12 "Hitman, "AotS". But it other games, specifically "Rise of the Tomb Raider" which boasts extremely high res textures, the 4Gb memory size on the Fury X starts to limit its performance in a big way. On average, there is again a ~25%-30% performance increase with much higher game to game variability.
    This data lets a little bit of air out of the argument I hear a lot that AMD makes more "future proof" cards. While many Nvidia 900 series users may have to upgrade as more and more games switch to DX12 based programming. AMD Fury users will be in the same boat as those same games come with higher and higher res textures, due to the smaller amount of memory on board.
    While Pascal still doesn't show the jump in DX12 versus DX11 that AMD's GPUs enjoy, it does at least show an increase or at least remain at parity.
    So what you have is a card that wins in every single game tested, at every resolution over the price predecessors from both companies, all while consuming less power. That is a win pretty much any way you slice it. But there are elements of Nvidia’s strategy and the card I personally find disappointing.
    I understand Nvidia wants to keep features specific to the higher margin professional cards, but avoiding HBM2 altogether in the consumer space seems to be a missed opportunity. I am a huge fan of the mini ITX gaming machines. And the Fury Nano, at the $450 price point is a great card. With an NVMe motherboard and NAS storage the need for drive bays in the case is eliminated, the Fury Nano at only 6” leads to some great forward thinking, and tiny designs. I was hoping to see an explosion of cases that cut out the need for supporting 10-11” cards and tons of drive bays if both Nvidia and AMD put out GPUs in the Nano space, but it seems not to be. HBM2 seems destined to remain on professional cards, as Nvidia won’t take the risk of adding it to a consumer Titan or GTX 1080 Ti card and potentially again cannibalize the higher margin professional card market. Now case makers don’t really have the same incentive to build smaller cases if the Fury Nano will still be the only card at that size. It’s just unfortunate that it had to happen because NVidia decided HBM2 was something they could slap on a pro card and sell for thousands extra.
    But also what is also disappointing about Pascal stems from the GTX 1080 vs GTX 1070 data Ryan has shown. The GTX 1070 drops off far more than one would expect based off CUDA core numbers as the resolution increases. The GDDR5 memory versus the GDDR5X is probably at fault here, leading me to believe that Pascal can gain even further if the memory bandwidth is increased more, again with HBM2. So not only does the card limit you to the current mini-ITX monstrosities (I’m looking at you bulldog) by avoiding HBM2, it also very likely is costing us performance.
    Now for the rank speculation. The data does present some interesting scenarios for the future. With the Fury X able to approach the GTX 1080 at high resolutions, most specifically in DX12 optimized games. It seems extremely likely that the Vega GPU will be able to surpass the GTX 1080, especially if the greatest limitation (4 Gb HBM) is removed with the supposed 8Gb of HBM2 and games move more and more the DX12. I imagine when it launches it will be the 4K card to get, as the Fury X already acquits itself very well there. For me personally, I will have to wait for the Vega Nano to realize my Mini-ITX dreams, unless of course, AMD doesn’t make another Nano edition card and the dream is dead. A possibility I dare not think about.
  • eddman - Wednesday, July 20, 2016 - link

    The gap getting narrower at higher resolutions probably has more to do with chips' designs rather than bandwidth. After all, Fury is the big GCN chip optimized for high resolutions. Even though GP104 does well, it's still the middle Pascal chip.

    P.S. Please separate the paragraphs. It's a pain, reading your comment.
  • Eidigean - Wednesday, July 20, 2016 - link

    The GTX 1070 is really just a way for Nvidia to sell GP104's that didn't pass all of their tests. Don't expect them to put expensive memory on a card where they're only looking to make their money back. Keeping the card cost down, hoping it sells, is more important to them.

    If there's a defect anywhere within one of the GPC's, the entire GPC is disabled and the chip is sold at a discount instead of being thrown out. I would not buy a 1070 which is really just a crippled 1080.

    I'll be buying a 1080 for my 2560x1600 desktop, and an EVGA 1060 for my Mini-ITX build; which has a limited power supply.
  • mikael.skytter - Wednesday, July 20, 2016 - link

    Thanks Ryan! Much appreciated.
  • chrisp_6@yahoo.com - Wednesday, July 20, 2016 - link

    Very good review. One minor comment to the article writers - do a final check on grammer - granted we are technical folks, but it was noticeable especially on the final words page.
  • madwolfa - Wednesday, July 20, 2016 - link

    It's "grammar", though. :)
  • Eden-K121D - Thursday, July 21, 2016 - link

    Oh the irony
  • chrisp_6@yahoo.com - Thursday, July 21, 2016 - link

    oh snap, that is some funny stuff right there

Log in

Don't have an account? Sign up now