Rise of the Tomb Raider

Starting things off in our benchmark suite is the built-in benchmark for Rise of the Tomb Raider, the latest iteration in the long-running action-adventure gaming series. One of the unique aspects of this benchmark is that it’s actually the average of 4 sub-benchmarks that fly through different environments, which keeps the benchmark from being too weighted towards a GPU’s performance characteristics under any one scene.

Rise of the Tomb Raider - 3840x2160 - Very High Quality (DX11)

Rise of the Tomb Raider - 2560x1440 - Very High Quality (DX11)

Rise of the Tomb Raider - 1920x1080 - Very High Quality (DX11)

To kick things off then, while I picked the benchmark order before collecting the performance results, it’s neat that Rise of the Tomb Raider ends up being a fairly consistent representation of how the various video cards compare to each other. The end result, as you might expect, puts the GTX 1080 and GTX 1070 solidly in the lead. And truthfully there’s no reason for it to be anything but this; NVIDIA does not face any competition from AMD at the high-end at this point, so the two GP104 cards are going to be unrivaled. It’s not a question of who wins, but by how much.

Overall we find the GTX 1080 ahead of its predecessor, the GTX 980, by anywhere between 60% and 78%, with the lead increasing with the resolution. The GTX 1070’s lead isn’t quite as significant though, ranging from 53% to 60#. This is consistent with the fact that the GTX 1070 is specified to trail the GTX 1080 by more than we saw with the 980/970 in 2014, which means that in general the GTX 1070 won’t see quite as much uplift.

What we do get however is confirmation that the GTX 1070FE is a GTX 980 Ti and more. The performance of what was NVIDIA’s $650 flagship can now be had in a card that costs $450, and with any luck will get cheaper still as supplies improve. For 1440p gamers this should hit a good spot in terms of performance.

Otherwise when it comes to 4K gaming, NVIDIA has made a lot of progress thanks to GTX 1080, but even their latest and greatest card isn’t quite going to crack 60fps here. We haven’t yet escaped having to made quality tradeoffs for 4K at this time, and it’s likely that future games will drive that point home even more.

Finally, 1080p is admittedly here largely for the sake of including much older cards like the GTX 680, to show what kind of progress NVIDIA has made since their first 28nm high-end card. The result? A 4.25x performance increase over the GTX 680.

GPU 2016 Benchmark Suite & The Test DiRT Rally
Comments Locked

200 Comments

View All Comments

  • patrickjp93 - Wednesday, July 20, 2016 - link

    That doesn't actually support your point...
  • Scali - Wednesday, July 20, 2016 - link

    Did I read a different article?
    Because the article that I read said that the 'holes' would be pretty similar on Maxwell v2 and Pascal, given that they have very similar architectures. However, Pascal is more efficient at filling the holes with its dynamic repartitioning.
  • mr.techguru - Wednesday, July 20, 2016 - link

    Just Ordered the MSI GeForce GTX 1070 Gaming X , way better than 1060 / 480. NVidia Nail it :)
  • tipoo - Wednesday, July 20, 2016 - link

    " NVIDIA tells us that it can be done in under 100us (0.1ms), or about 170,000 clock cycles."

    Is my understanding right that Polaris, and I think even earlier with late GCN parts, could seamlessly interleave per-clock? So 170,000 times faster than Pascal in clock cycles (less in total time, but still above 100,000 times faster)?
  • Scali - Wednesday, July 20, 2016 - link

    That seems highly unlikely. Switching to another task is going to take some time, because you also need to switch all the registers, buffers, caches need to be re-filled etc.
    The only way to avoid most of that is to duplicate the whole register file, like HyperThreading does. That's doable on an x86 CPU, but a GPU has way more registers.
    Besides, as we can see, nVidia's approach is fast enough in practice. Why throw tons of silicon on making context switching faster than it needs to be? You want to avoid context switches as much as possible anyway.

    Sadly AMD doesn't seem to go into any detail, but I'm pretty sure it's going to be in the same ballpark.
    My guess is that what AMD calls an 'ACE' is actually very similar to the SMs and their command queues on the Pascal side.
  • Ryan Smith - Wednesday, July 20, 2016 - link

    Task switching is separate from interleaving. Interleaving takes place on all GPUs as a basic form of latency hiding (GPUs are very high latency).

    The big difference is that interleaving uses different threads from the same task; task switching by its very nature loads up another task entirely.
  • Scali - Thursday, July 21, 2016 - link

    After re-reading AMD's asynchronous shader PDF, it seems that AMD also speaks of 'interleaving' when they switch a graphics CU to a compute task after the graphics task has completed. So 'interleaving' at task level, rather than at instruction level.
    Which would be pretty much the same as NVidia's Dynamic Load Balancing in Pascal.
  • eddman - Thursday, July 21, 2016 - link

    The more I read about async computing in Polaris and Pascal, the more I realize that the implementations are not much different.

    As Ryan pointed out, it seems that the reason that Polaris, and GCN as a whole, benefit more from async is the architecture of the GPU itself, being wider and having more ALUs.

    Nonetheless, I'm sure we're still going to see comments like "Polaris does async in hardware. Pascal is hopeless with its software async hack".
  • Matt Doyle - Wednesday, July 20, 2016 - link

    Typo in the lead sentence of HPC vs. Consumer: Divergence paragraph: "Pascal in an architecture that..."

    "is" instead of "in"
  • Matt Doyle - Wednesday, July 20, 2016 - link

    Feeding Pascal page, "GDDR5X uses a 16n prefetch, which is twice the size of GDDR5’s 8n prefect."

    Prefect = prefetch

Log in

Don't have an account? Sign up now