Hitman

The final game in our 2016 benchmark suite is the 2016 edition of Hitman, the latest title in the stealth-action franchise. The game offers two rendering paths: DirectX 11 and DirectX 12, with the latter being the case of DirectX 12 being added after the fact. As with past Hitman games, the latest proves to have a good mix of scenery and high model counts to stress modern video cards.

Hitman - 3840x2160 - Ultra Quality

Hitman - 2560x1440 - Ultra Quality

Hitman - 1920x1080 - Ultra Quality

Because Hitman supports both DX11 and DX12, for the moment we’ve gone ahead and benchmarked it with both. In practice the performance impact of DX12 is very mixed; NVIDIA cards prior to Pascal lose performance and Pascal cards can either gain or lose performance. AMD cards on the other hand tend to gain performance. The image quality is the same with both renderers, so it’s simply a matter of picking the render path that produces the best performance for a given card.

In any case, the GTX 1080 continues to top the charts here. 60fps still isn’t attainable at 4K, but it can deliver a reasonably playable 49fps. Alternatively, at 1440p it does better than 85fps. Meanwhile the GTX 1070 isn’t a great option at 4K, but at 1440p it can easily stay north of 60fps, delivering 69.4fps.

Thanks in part to the DX12 code path, this is another game where the GTX 1070 performs as expected versus GTX 1080, but still can’t hold on to second place. Rather the Radeon Fury X takes second place at all but 1080p.

Looking at our generational comparisons one last time, this final game has the Pascal cards performing better than expected. At 1440p and above, the GTX 1080 hits 86% better performance than the GTX 980 under DirectX 11, and the GTX 1070 bests the GTX 970 by an average of 63% in the same circumstances. As best as I can tell, there is just something about the Pascal cards that is slightly more in tune with this game than was the Maxwell 2 cards, leading to the performance we’re seeing here. Otherwise the gap between the GTX 1080 and GTX 1070 is pretty typical at about 25% at the higher resolutions.

Finally, in our last time checking in on the GTX 680, the GTX 1080 offers a commanding performance improvement. GTX 1080 is 4.1x faster than GTX 680 under DirectX 11, reinforcing just how much progress NVIDIA had made in 4 years and a single full manufacturing node upgrade.

Grand Theft Auto V Compute
Comments Locked

200 Comments

View All Comments

  • Ryan Smith - Friday, July 22, 2016 - link

    2) I suspect the v-sync comparison is a 3 deep buffer at a very high framerate.
  • lagittaja - Sunday, July 24, 2016 - link

    1) It is a big part of it. Remember how bad 20nm was?
    The leakage was really high so Nvidia/AMD decided to skip it. FinFET's helped reduce the leakage for the "14/16"nm node.

    That's apples to oranges. CPU's are already 3-4Ghz out of the box.

    RX480 isn't showing it because the 14nm LPP node is a lemon for GPU's.
    You know what's the optimal frequency for Polaris 10? 1Ghz. After that the required voltage shoots up.
    You know, LPP where the LP stands for Low Power. Great for SoC's but GPU's? Not so much.
    "But the SoC's clock higher than 2Ghz blabla". Yeah, well a) that's the CPU and b) it's freaking tiny.

    How are we getting 2Ghz+ frequencies with Pascal which so closely resembles Maxwell?
    Because of the smaller manufacturing node. How's that possible? It's because of FinFET's which reduced the leakage of the 20nm node.
    Why couldn't we have higher clockspeeds without FinFET's at 28nm? Because power.
    28nm GPU's capped around the 1.2-1.4Ghz mark.
    20nm was no go, too high leakage current.
    16nm gives you FinFET's which reduced the leakage current dramatically.
    What does that enable you to do? Increase the clockspeed..
    Here's a good article
    http://www.anandtech.com/show/8223/an-introduction...
  • lagittaja - Sunday, July 24, 2016 - link

    As an addition to the RX 480 / Polaris 10 clockspeed
    GCN2-GCN4 VDD vs Fmax at avg ASIC
    http://i.imgur.com/Hdgkv0F.png
  • timchen - Thursday, July 21, 2016 - link

    Another question is about boost 3.0: given that we see 150-200 Mhz gpu offset very common across boards, wouldn't it be beneficial to undervolt (i.e. disallow the highest voltage bins corresponding to this extra 150-200 Mhz) and offset at the same time to maintain performance at lower power consumption? Why did Nvidia not do this in the first place? (This is coming from reading Tom's saying that 1060 can be a 60w card having 80% of its performance...)
  • AnnonymousCoward - Thursday, July 21, 2016 - link

    NVIDIA, get with the program and support VESA Adaptive-Sync already!!! When your $700 card can't support the VESA standard that's in my monitor, and as a result I have to live with more lag and lower framerate, something is seriously wrong. And why wouldn't you want to make your product more flexible?? I'm looking squarely at you, Tom Petersen. Don't get hung up on your G-sync patent and support VESA!
  • AnnonymousCoward - Thursday, July 21, 2016 - link

    If the stock cards reach the 83C throttle point, I don't see what benefit an OC gives (won't you just reach that sooner?). It seems like raising the TDP or under-voltaging would boost continuous performance. Your thoughts?
  • modeless - Friday, July 22, 2016 - link

    Thanks for the in depth FP16 section! I've been looking forward to the full review. I have to say this is puzzling. Why put it on there at all? Emulation would be faster. But anyway, NVIDIA announced a new Titan X just now! Does this one have FP16 for $1200? Instant buy for me if so.
  • Ryan Smith - Friday, July 22, 2016 - link

    Emulation would be faster, but it would not be the same as running it on a real FP16x2 unit. It's the same purpose as FP64 units: for binary compatibility so that developers can write and debug Tesla applications on their GeForce GPU.
  • hoohoo - Friday, July 22, 2016 - link

    Excellent article, Ryan, thank you!

    Especially the info on preemption and async/scheduling.

    I expected the preemption mght be expensive in some circumstances, but I didn't quite expect it to push the L2 cache though! Still this is a marked improvement for nVidia.
  • hoohoo - Friday, July 22, 2016 - link

    It seems like the preemption is implemented in the driver though? Are there actual h/w instructions to as it were "swap stack pointer", "push LDT", "swap instruction pointer"?

Log in

Don't have an account? Sign up now