GPU 2016 Benchmark Suite & The Test

As this is the first high-end card release for 2016, we have gone ahead and updated our video card benchmarking suite. Unfortunately Broadwell-E launched just a bit too late for this review, so we’ll have to hold off on updating the underlying platform to Intel’s latest and greatest for a little while longer yet.

For the 2016 suite we have retained Grand Theft Auto V, Battlefield 4, and of course, Crysis 3. Joining these games are 6 new games: Rise of the Tomb Raider, DiRT Rally, Ashes of the Singularity, The Witcher 3, The Division, and the 2016 rendition of Hitman.

AnandTech GPU Bench 2016 Game List
Game Genre API(s)
Rise of the Tomb Raider Action DX11
DiRT Rally Racing DX11
Ashes of the Singularity RTS DX12
Battlefield 4 FPS DX11
Crysis 3 FPS DX11
The Witcher 3 RPG DX11
The Division FPS DX11
Grand Theft Auto V Action/Open World DX11
Hitman (2016) Action/Stealth DX11 + DX12

As was the case in 2015, the API used will be based on the best API available for a given card. Rise of the Tomb Raider and Hitman both support DirectX 11 + DirectX 12; in the case of Tomb Raider the DX12 path was until last week a regression – a new patch changed things too late for this article – and meanwhile the best API for Hitman depends on whether we’re looking at an AMD or NVIDIA card. For now Tomb Raider is benchmarked using DX11 and Hitman on both DX11 and DX12. Meanwhile Ashes of the Singularity is essentially tailor made for DirectX 12, as the first DX12 game to be designed for it as opposed to porting over a DX11 engine, so it is being run under DX12 at all times.

Meanwhile from a design standpoint our benchmark settings remain unchanged. For lower-end cards we’ll look at 1080p at various quality settings when practical, and for high-end cards we’ll be looking at 1080p and above at the highest quality settings.

The Test

As for our hardware testbed, it remains unchanged from 2015, being composed of an overclocked Core i7-4960X housed in an NZXT Phantom 630 Windowed Edition case.

CPU: Intel Core i7-4960X @ 4.2GHz
Motherboard: ASRock Fatal1ty X79 Professional
Power Supply: Corsair AX1200i
Hard Disk: Samsung SSD 840 EVO (750GB)
Memory: G.Skill RipjawZ DDR3-1866 4 x 8GB (9-10-9-26)
Case: NZXT Phantom 630 Windowed Edition
Monitor: Asus PQ321
Video Cards: NVIDIA GeForce GTX 1080 Founders Edition
NVIDIA GeForce GTX 1070 Founders Edition
NVIDIA GeForce GTX 980 Ti
NVIDIA GeForce GTX 980
NVIDIA GeForce GTX 970
NVIDIA GeForce GTX 780
NVIDIA GeForce GTX 680
AMD Radeon RX 480
AMD Radeon Fury X
AMD Radeon R9 Nano
AMD Radeon R9 390X
AMD Radeon R9 390
AMD Radeon HD 7970
Video Drivers: NVIDIA Release 368.39
AMD Radeon Software Crimson 16.7.1 (RX 480)
AMD Radeon Software Crimson 16.6.2 (All Others)
OS: Windows 10 Pro
Meet the GeForce GTX 1080 & GTX 1070 Founders Edition Cards Rise of the Tomb Raider
Comments Locked

200 Comments

View All Comments

  • Ryan Smith - Friday, July 22, 2016 - link

    2) I suspect the v-sync comparison is a 3 deep buffer at a very high framerate.
  • lagittaja - Sunday, July 24, 2016 - link

    1) It is a big part of it. Remember how bad 20nm was?
    The leakage was really high so Nvidia/AMD decided to skip it. FinFET's helped reduce the leakage for the "14/16"nm node.

    That's apples to oranges. CPU's are already 3-4Ghz out of the box.

    RX480 isn't showing it because the 14nm LPP node is a lemon for GPU's.
    You know what's the optimal frequency for Polaris 10? 1Ghz. After that the required voltage shoots up.
    You know, LPP where the LP stands for Low Power. Great for SoC's but GPU's? Not so much.
    "But the SoC's clock higher than 2Ghz blabla". Yeah, well a) that's the CPU and b) it's freaking tiny.

    How are we getting 2Ghz+ frequencies with Pascal which so closely resembles Maxwell?
    Because of the smaller manufacturing node. How's that possible? It's because of FinFET's which reduced the leakage of the 20nm node.
    Why couldn't we have higher clockspeeds without FinFET's at 28nm? Because power.
    28nm GPU's capped around the 1.2-1.4Ghz mark.
    20nm was no go, too high leakage current.
    16nm gives you FinFET's which reduced the leakage current dramatically.
    What does that enable you to do? Increase the clockspeed..
    Here's a good article
    http://www.anandtech.com/show/8223/an-introduction...
  • lagittaja - Sunday, July 24, 2016 - link

    As an addition to the RX 480 / Polaris 10 clockspeed
    GCN2-GCN4 VDD vs Fmax at avg ASIC
    http://i.imgur.com/Hdgkv0F.png
  • timchen - Thursday, July 21, 2016 - link

    Another question is about boost 3.0: given that we see 150-200 Mhz gpu offset very common across boards, wouldn't it be beneficial to undervolt (i.e. disallow the highest voltage bins corresponding to this extra 150-200 Mhz) and offset at the same time to maintain performance at lower power consumption? Why did Nvidia not do this in the first place? (This is coming from reading Tom's saying that 1060 can be a 60w card having 80% of its performance...)
  • AnnonymousCoward - Thursday, July 21, 2016 - link

    NVIDIA, get with the program and support VESA Adaptive-Sync already!!! When your $700 card can't support the VESA standard that's in my monitor, and as a result I have to live with more lag and lower framerate, something is seriously wrong. And why wouldn't you want to make your product more flexible?? I'm looking squarely at you, Tom Petersen. Don't get hung up on your G-sync patent and support VESA!
  • AnnonymousCoward - Thursday, July 21, 2016 - link

    If the stock cards reach the 83C throttle point, I don't see what benefit an OC gives (won't you just reach that sooner?). It seems like raising the TDP or under-voltaging would boost continuous performance. Your thoughts?
  • modeless - Friday, July 22, 2016 - link

    Thanks for the in depth FP16 section! I've been looking forward to the full review. I have to say this is puzzling. Why put it on there at all? Emulation would be faster. But anyway, NVIDIA announced a new Titan X just now! Does this one have FP16 for $1200? Instant buy for me if so.
  • Ryan Smith - Friday, July 22, 2016 - link

    Emulation would be faster, but it would not be the same as running it on a real FP16x2 unit. It's the same purpose as FP64 units: for binary compatibility so that developers can write and debug Tesla applications on their GeForce GPU.
  • hoohoo - Friday, July 22, 2016 - link

    Excellent article, Ryan, thank you!

    Especially the info on preemption and async/scheduling.

    I expected the preemption mght be expensive in some circumstances, but I didn't quite expect it to push the L2 cache though! Still this is a marked improvement for nVidia.
  • hoohoo - Friday, July 22, 2016 - link

    It seems like the preemption is implemented in the driver though? Are there actual h/w instructions to as it were "swap stack pointer", "push LDT", "swap instruction pointer"?

Log in

Don't have an account? Sign up now