Synthetics

Though we’ve covered bits and pieces of synthetic performance when discussing aspects of the Pascal architecture, before we move on to power testing I want to take a deeper look at synthetic performance. Based on what we know about the Pascal architecture we should have a good idea of what to expect, but these tests none the less serve as a canary for any architectural changes we may have missed.

Synthetic: TessMark, Image Set 4, 64x Tessellation

Starting off with tessellation performance, we find that the GTX 1080 further builds on NVIDIA’s already impressive tessellation performance. Unrivaled at this point, GTX 1080 delivers a 63% increase in tessellation performance here, and maintains a 24% lead over GTX 1070. Suffice it to say, the Pascal cards will have no trouble keeping up with geometry needs in games for a long time to come.

Breaking down performance by tessellation level to look at the GTX 980 and GTX 1080 more closely on a logarithmic scale, what we find is that there’s a rather consistent advantage for the GTX 1080 at all tessellation levels. Even 8x tessellation is still 56% faster. This indicates that NVIDIA hasn’t made any fundamental changes to their geometry hardware (PolyMorph Engines) between Maxwell 2 and Pascal. Everything has simply been scaled up in clockspeed and scaled out in the total number of engines. Though I will note that the performance gains are less than the theoretical maximum, so we're not seeing perfect scaling by any means.

Up next, we have SteamVR’s Performance Test. While this test is based on the latest version of Valve’s Source engine, the test itself is purely synthetic, designed to test the suitability of systems for VR, making it our sole VR-focused test at this time. It should be noted that the results in this test are not linear, and furthermore the score is capped at 11. Of particular note, cards that fail to reach GTX 970/R9 290 levels fall off of a cliff rather quickly. So test results should be interpreted a little differently.

SteamVR Performance Test

With the minimum recommended GTX 970 and Radeon R9 290 cards get in the mid-to-high 6 range, NVIDIA’s new Pascal cards max out the score at 11. Which for the purposes of this test means that both cards exceed Valve’s recommended specifications, making them capable of running Valve’s VR software at maximum quality with no performance issues.

Finally, for looking at texel and pixel fillrate, for 2016 we have switched from the rather old 3DMark Vantage to the Beyond3D Test Suite. This test offers a slew of additional tests – many of which use behind the scenes or in our earlier architectural analysis – but for now we’ll stick to simple pixel and texel fillrates.

Beyond3D Suite - Pixel Fillrate

Starting with pixel fillrate, the GTX 1080 is well in the lead. While at 64 ROPs GP104 has fewer ROPs than the GM200 based GTX 980 Ti, it more than makes up for the difference with significantly higher clockspeeds. Similarly, when it comes to feeding those ROPs, GP104’s narrower memory bus is more than offset with the use of 10Gbps GDDR5X. But even then the two should be closer than this on paper, so the GTX 1080 is exceeding expectations.

As we discovered in 2014 with Maxwell 2, NVIDIA’s Delta Color Compression technology has a huge impact on pixel fillrate testing. So most likely what we’re seeing here is Pascal’s 4th generation DCC in action, helping GTX 1080 further compress its buffers and squeeze more performance out of the ROPs.

Though with that in mind, it’s interesting to note that even with an additional generation of DCC, this really only helps NVIDIA keep pace. The actual performance gains here versus GTX 980 are 56%, not too far removed from the gains we see in games and well below the theoretical difference in FLOPs. So despite the increase in pixel throughput due to architectural efficiency, it’s really only enough to help keep up with the other areas of the more powerful Pascal GPU.

As for GTX 1070, things are a bit different. The card has all of the ROPs of GTX 1080 and 80% of the memory bandwidth, however what it doesn’t have is GP104’s 4th GPC. Home of the Raster Engine responsible for rasterization, GTX 1070 can only setup 48 pixels/clock to begin with, despite the fact that the ROPs can accept 64 pixels. As a result it takes a significant hit here, delivering 77% of GTX 1080’s pixel throughput. With all of that said, the fact that in-game performance is closer than this is a reminder to the fact that while pixel throughput is an important part of game performance, it’s often not the bottleneck.

Beyond3D Suite - INT8 Texel Fillrate

As for INT8 texel fillrates, the results are much more straightforward. GTX 1080’s improvement over GTX 980 in texel throughput almost perfectly matches the theoretical improvement we’d expect based on the specifications (if not slightly exceeding it), delivering an 85% boost. As a result it’s now the top card in our charts for texel throughput, dethroning the still-potent Fury X. Meanwhile GTX 1070 backs off a bit from these gains, as we’d expect, as a consequence of having only three-quarters the number of texture units.

Compute Power, Temperature, & Noise
Comments Locked

200 Comments

View All Comments

  • Ryan Smith - Friday, July 22, 2016 - link

    2) I suspect the v-sync comparison is a 3 deep buffer at a very high framerate.
  • lagittaja - Sunday, July 24, 2016 - link

    1) It is a big part of it. Remember how bad 20nm was?
    The leakage was really high so Nvidia/AMD decided to skip it. FinFET's helped reduce the leakage for the "14/16"nm node.

    That's apples to oranges. CPU's are already 3-4Ghz out of the box.

    RX480 isn't showing it because the 14nm LPP node is a lemon for GPU's.
    You know what's the optimal frequency for Polaris 10? 1Ghz. After that the required voltage shoots up.
    You know, LPP where the LP stands for Low Power. Great for SoC's but GPU's? Not so much.
    "But the SoC's clock higher than 2Ghz blabla". Yeah, well a) that's the CPU and b) it's freaking tiny.

    How are we getting 2Ghz+ frequencies with Pascal which so closely resembles Maxwell?
    Because of the smaller manufacturing node. How's that possible? It's because of FinFET's which reduced the leakage of the 20nm node.
    Why couldn't we have higher clockspeeds without FinFET's at 28nm? Because power.
    28nm GPU's capped around the 1.2-1.4Ghz mark.
    20nm was no go, too high leakage current.
    16nm gives you FinFET's which reduced the leakage current dramatically.
    What does that enable you to do? Increase the clockspeed..
    Here's a good article
    http://www.anandtech.com/show/8223/an-introduction...
  • lagittaja - Sunday, July 24, 2016 - link

    As an addition to the RX 480 / Polaris 10 clockspeed
    GCN2-GCN4 VDD vs Fmax at avg ASIC
    http://i.imgur.com/Hdgkv0F.png
  • timchen - Thursday, July 21, 2016 - link

    Another question is about boost 3.0: given that we see 150-200 Mhz gpu offset very common across boards, wouldn't it be beneficial to undervolt (i.e. disallow the highest voltage bins corresponding to this extra 150-200 Mhz) and offset at the same time to maintain performance at lower power consumption? Why did Nvidia not do this in the first place? (This is coming from reading Tom's saying that 1060 can be a 60w card having 80% of its performance...)
  • AnnonymousCoward - Thursday, July 21, 2016 - link

    NVIDIA, get with the program and support VESA Adaptive-Sync already!!! When your $700 card can't support the VESA standard that's in my monitor, and as a result I have to live with more lag and lower framerate, something is seriously wrong. And why wouldn't you want to make your product more flexible?? I'm looking squarely at you, Tom Petersen. Don't get hung up on your G-sync patent and support VESA!
  • AnnonymousCoward - Thursday, July 21, 2016 - link

    If the stock cards reach the 83C throttle point, I don't see what benefit an OC gives (won't you just reach that sooner?). It seems like raising the TDP or under-voltaging would boost continuous performance. Your thoughts?
  • modeless - Friday, July 22, 2016 - link

    Thanks for the in depth FP16 section! I've been looking forward to the full review. I have to say this is puzzling. Why put it on there at all? Emulation would be faster. But anyway, NVIDIA announced a new Titan X just now! Does this one have FP16 for $1200? Instant buy for me if so.
  • Ryan Smith - Friday, July 22, 2016 - link

    Emulation would be faster, but it would not be the same as running it on a real FP16x2 unit. It's the same purpose as FP64 units: for binary compatibility so that developers can write and debug Tesla applications on their GeForce GPU.
  • hoohoo - Friday, July 22, 2016 - link

    Excellent article, Ryan, thank you!

    Especially the info on preemption and async/scheduling.

    I expected the preemption mght be expensive in some circumstances, but I didn't quite expect it to push the L2 cache though! Still this is a marked improvement for nVidia.
  • hoohoo - Friday, July 22, 2016 - link

    It seems like the preemption is implemented in the driver though? Are there actual h/w instructions to as it were "swap stack pointer", "push LDT", "swap instruction pointer"?

Log in

Don't have an account? Sign up now