Final Words

Bringing this belated review to a close, I want to pick up where I started this review: FinFET. In ages long gone, we used to get near yearly updates to manufacturing nodes, and while these half-node shrinks weren’t as potent as a full node shrink over a longer period of time, it kept the GPU industry moving at a quick pace. Which not to get distracted by history, but I won’t lie that as a long time editor and gamer, I do still miss those days. At the same time it underscores why I’m so excited about the first full node shrink in 4 years. It has taken a long time to get here, but now that we’re finally here we get to reap the benefits.

GP104 and the Pascal architecture is certainly defined by the transition to 16nm FinFET. Smaller and much better transistors have allowed NVIDIA to make a generational leap in performance in less than two years. You can now buy a video card built with a 314mm2 die packing 7.2B transistors, with all of those transistors adding up to fantastic performance. It’s fundamental progress in the truest sense of the word, and after 4 years it’s refreshing.

But even though FinFET is a big part of what makes Pascal so powerful, it’s still just a part. NVIDIA’s engineering team pulled off a small miracle with Maxwell, and while Pascal doesn’t rock the boat too hard here, there are still some very important changes here that set Pascal apart from Maxwell 2. These will reverberate across NVIDIA’s GPU lineup for years to come.

While not unexpected, the use of GDDR5X is an interesting choice for NVIDIA, and one that should keep NVIDIA’s consumer GPUs relatively well fed for a couple of years or so. The new memory technology is not a radical change – it’s an extension of GDDR5, after all – but it allows NVIDIA to continue to improve on memory bandwidth without having to resort to more complex and expensive technologies like HBM2. Combined with the latest generation of delta color compression, and NVIDIA’s effective memory bandwidth for graphics has actually increased by a good deal. And though it’s only being used on GTX 1080 at this time, there’s an obvious path towards using it in future cards (maybe a Pascal refresh?) if NVIDIA wants to go that route.

On the implementation side of matters, I give a lot of credit to FinFET, but NVIDIA clearly also put a great deal of work into running up the clocks for GP104. GPUs have historically favored growing wider instead of growing faster, so this is an unexpected change. It’s not one without its drawbacks – overclocking isn’t looking very good right now – but on the other hand it allows NVIDIA to make a generational jump without making their GPU too much wider, which neatly bypasses potential scaling issues for this generation.

As for the Pascal architecture, I don’t think we’re in a position to fully comprehend and appreciate the work scheduling changes that NVIDIA has made, but it will take developers some time to put these features to good use. From a computer science standpoint, the instruction level preemption addition is a huge advancement for a GPU, but right now the consumer applications are admittedly limited. Though as GPUs and CPUs get closer and closer, that won’t always be the case. Otherwise the most consumer applicable change is to dynamic load balancing, which gives Pascal the flexibility it needs to properly benefit from workload concurrency via asynchronous compute. Don’t expect AMD-like gains here, but hopefully developers will be able to squeeze a bit more still out of Pascal.

I’m also interested in seeing what developers eventually do with Simultaneous Multi-Projection. NVIDIA going after the VR market with it first is the sensible move, and anything that improves VR performance is a welcome change given the high system requirements for the technology. But there’s a lot of flexibility here that developers have only begun to experiment with.

Finally, in the grab bag, we’re still a bit too early for HDR monitors and displays that can take advantage of Pascal’s DisplayPort 1.4 controller, but the groundwork has been laid. The entire point of HDR technology is to make a night and day difference, and I’m excited to see what kind of an impact this can make on PC gaming. In the meantime, we can still enjoy things such as Fast Sync, and finally for NVIDIA’s high-end cards, a modern video codec block that can support just about every codec under the sun.

Performance & Recommendations: By The Numbers

With all of that said, let’s get down to the business of numbers. By the numbers, GeForce GTX 1080 is the fastest card on the market, and we wouldn’t expect anything less from NVIDIA. I’m still on the fence about whether GTX 1080 is truly fast enough for 4K, as our benchmarks still show cases where even NVIDA’s latest and greatest can’t get much above 30fps with all the quality features turned up, but certainly GTX 1080 has the best chance. Otherwise for 1440p the card would likely make Asus PG279Q G-Sync monitor owners very happy.

Relative to GTX 980 then, we’re looking at an average performance gain of 66% at 1440p, and 71% at 4K. This is a very significant step up for GTX 980 owners, but it’s also not quite the same step up we saw from GTX 680 to GTX 980 (75%). GTX 980 owners who are looking for a little more bang for their buck could easily be excused for waiting another generation for a true doubling, especially with GTX 1080’s higher prices. GTX 980 Ti/Titan X owners can also hold back, as this card isn’t GM200’s replacement. Otherwise for GTX 700 or 600 series owners, GTX 1080 is a rather massive step up.

GTX 1070 follows this same mold as well. NVIDIA is targeting the card at the 1440p market, and there it does a very good job, delivering 60fps performance in most games. By the numbers, it’s a good step up from GTX 970, but with a 57% at 1440p, it’s not a night and day difference. Current GTX 770/670 owners on the other hand should be very satisfied.

It’s interesting to note though that the performance gap between NVIDIA’s 80 and 70 cards have increased this generation. At 1440p GTX 970 delivers 87% of GTX 980’s performance, but GTX 1070 only delivers 81% of GTX 1080’s performance at the same settings. The net result of this is that GTX 1070 isn’t quite as much of a spoiler as GTX 970 was, or to flip that around, GTX 1080 is more valuable than GTX 980 was.

Meanwhile from a technical perspective, NVIDIA has once again nailed the technical trifecta of performance, noise, and power efficiency. GP104 in this respect is clearly descended from GM204, and it makes GTX 1080 and 1070 very potent cards. Top-tier performance with lower power consumption is always great news for desktop gamers – especially in the middle of summer – but I’m especially interested in seeing what this means for the eventual laptop SKUs. The slight uptick in rated TDPs does bear keeping an eye on though; after the GTX 700 series, NVIDIA came back to their senses on power consumption, so hopefully this isn’t the resumption of TDP creep as a means to keep performance growing.

The one real drawback right now is pricing and availability. Now, even 2 months after the launch of the GTX 1080, supplies are still very tight. GTX 1070 is much better, thankfully, but those cards still go rather quickly. The end result is that NVIDIA’s MSRPs have proven unrealistic; if you want a GTX 1080 today, be prepared to spend $699, while GTX 1070 will set you back $429 or more. Clearly these cards are worth the price to some, as NVIDIA and their partners keep selling them, but it puts a damper on things. For now all that NVIDIA can do is keep shipping chips, and hopefully once supply reaches equilibrium with demand, we get the $599/$379 prices NVIDIA original touted.

Otherwise I’m of two minds on the Founders Edition cards. NVIDIA has once again built a fantastic set of install-it-and-forget-it cards, and while not radically different from the GTX 900 series reference designs, these are still their best designs to date. That this comes with an explicit price premium makes it all a bit harder to cheer for though, as it pushes the benefits of the reference design out of the hand of some buyers. If and when overall card pricing finally comes down, it will be interesting to see what card sales are like for the Founders Editions, and if it makes sense for NVIDIA to continue doing this. I suspect it will – and that this is going to be a new normal – but it’s going to depend on consumer response and just what kind of cool things NVIDIA’s board partners do with their own design.

Overall then, I think it’s safe to say that NVIDIA has started off the FinFET generation with a bang. GTX 1080 and GTX 1070 are another fantastic set of cards from NVIDIA, and they will keep the GPU performance crown solidly in NVIDIA’s hands. At the same time competitor AMD won’t have anything to response to the high-end market for at least the next few months, so this will be an uncontested reign for NVIDIA. It goes without saying then that with current card prices due to the shortage, that I hope they prove to be benevolent rulers.

Last, but not least however, we’re not done yet. NVIDIA is moving at a quick pace, and this is just the start of the Pascal generation. GeForce GTX 1060 launched this week and we’ll be taking a look at it on Friday. Pascal has setup NVIDIA very well, and it will be interesting to see how that extends to the mainstream/enthusiast market.

Overclocking
Comments Locked

200 Comments

View All Comments

  • Ryan Smith - Friday, July 22, 2016 - link

    2) I suspect the v-sync comparison is a 3 deep buffer at a very high framerate.
  • lagittaja - Sunday, July 24, 2016 - link

    1) It is a big part of it. Remember how bad 20nm was?
    The leakage was really high so Nvidia/AMD decided to skip it. FinFET's helped reduce the leakage for the "14/16"nm node.

    That's apples to oranges. CPU's are already 3-4Ghz out of the box.

    RX480 isn't showing it because the 14nm LPP node is a lemon for GPU's.
    You know what's the optimal frequency for Polaris 10? 1Ghz. After that the required voltage shoots up.
    You know, LPP where the LP stands for Low Power. Great for SoC's but GPU's? Not so much.
    "But the SoC's clock higher than 2Ghz blabla". Yeah, well a) that's the CPU and b) it's freaking tiny.

    How are we getting 2Ghz+ frequencies with Pascal which so closely resembles Maxwell?
    Because of the smaller manufacturing node. How's that possible? It's because of FinFET's which reduced the leakage of the 20nm node.
    Why couldn't we have higher clockspeeds without FinFET's at 28nm? Because power.
    28nm GPU's capped around the 1.2-1.4Ghz mark.
    20nm was no go, too high leakage current.
    16nm gives you FinFET's which reduced the leakage current dramatically.
    What does that enable you to do? Increase the clockspeed..
    Here's a good article
    http://www.anandtech.com/show/8223/an-introduction...
  • lagittaja - Sunday, July 24, 2016 - link

    As an addition to the RX 480 / Polaris 10 clockspeed
    GCN2-GCN4 VDD vs Fmax at avg ASIC
    http://i.imgur.com/Hdgkv0F.png
  • timchen - Thursday, July 21, 2016 - link

    Another question is about boost 3.0: given that we see 150-200 Mhz gpu offset very common across boards, wouldn't it be beneficial to undervolt (i.e. disallow the highest voltage bins corresponding to this extra 150-200 Mhz) and offset at the same time to maintain performance at lower power consumption? Why did Nvidia not do this in the first place? (This is coming from reading Tom's saying that 1060 can be a 60w card having 80% of its performance...)
  • AnnonymousCoward - Thursday, July 21, 2016 - link

    NVIDIA, get with the program and support VESA Adaptive-Sync already!!! When your $700 card can't support the VESA standard that's in my monitor, and as a result I have to live with more lag and lower framerate, something is seriously wrong. And why wouldn't you want to make your product more flexible?? I'm looking squarely at you, Tom Petersen. Don't get hung up on your G-sync patent and support VESA!
  • AnnonymousCoward - Thursday, July 21, 2016 - link

    If the stock cards reach the 83C throttle point, I don't see what benefit an OC gives (won't you just reach that sooner?). It seems like raising the TDP or under-voltaging would boost continuous performance. Your thoughts?
  • modeless - Friday, July 22, 2016 - link

    Thanks for the in depth FP16 section! I've been looking forward to the full review. I have to say this is puzzling. Why put it on there at all? Emulation would be faster. But anyway, NVIDIA announced a new Titan X just now! Does this one have FP16 for $1200? Instant buy for me if so.
  • Ryan Smith - Friday, July 22, 2016 - link

    Emulation would be faster, but it would not be the same as running it on a real FP16x2 unit. It's the same purpose as FP64 units: for binary compatibility so that developers can write and debug Tesla applications on their GeForce GPU.
  • hoohoo - Friday, July 22, 2016 - link

    Excellent article, Ryan, thank you!

    Especially the info on preemption and async/scheduling.

    I expected the preemption mght be expensive in some circumstances, but I didn't quite expect it to push the L2 cache though! Still this is a marked improvement for nVidia.
  • hoohoo - Friday, July 22, 2016 - link

    It seems like the preemption is implemented in the driver though? Are there actual h/w instructions to as it were "swap stack pointer", "push LDT", "swap instruction pointer"?

Log in

Don't have an account? Sign up now