Final Words

We’re now four GPUs into the NVIDIA Turing architecture product stack, and while NVIDIA’s latest processor has pitched us a bit of a curve ball in terms of feature support, by and large NVIDIA is holding to a pretty consistent pattern with regards to product performance, positioning, and pricing. Which is to say that the company has a very specific product stack in mind for this generation, and thus far they’ve been delivering on it with the kind of clockwork efficiency that NVIDIA has come to be known for.

With the launch of the GeForce GTX 1660 Ti and the TU116 GPU underpinning it, we’re finally seeing NVIDIA shift gears a bit in how they’re building their cards. Whereas the four RTX 20 series cards are all loosely collected under the umbrella of “premium features for a premium price”, the GTX 1660 Ti goes in the other direction, dropping NVIDIA’s shiny RTX suite of effects for a product that is leaner and cheaper to produce. As a result, the new card offers a bigger improvement on a price/performance basis (in current games) than any of the other Turing cards, and with a sub-$300 price tag, is likely to be more warmly received than the other cards.

Looking at the numbers, the GeForce GTX 1660 Ti delivers around 37% more performance than the GTX 1060 6GB at 1440p, and a very similar 36% gain at 1080p. So consistent with the other Turing cards, this is not quite a major generational leap in performance; and to be fair to NVIDIA they aren’t really claiming otherwise. Instead, NVIDIA is mostly looking to sell this card to current GTX 960 and R9 380 users; people who skipped the Pascal generation and are still on 28nm parts. In which case, the GTX 1660 Ti offers well over 2x the performance of these cards, with performance frequently ending up neck-and-neck with what was the GTX 1070.

Meanwhile, taking a look at power efficiency, it’s interesting to note that for the GTX 1660 Ti NVIDIA has been able to hold the line on power consumption: performance has gone up versus the GTX 1060 6GB, but card power consumption hasn’t. Thanks to this, the GTX 1660 Ti is not just 36% faster, it’s 36% percent more efficient as well. The other Turing cards have seen their own efficiency gains as well, but with their TDPs all drifting up, this is the largest (and purest) efficiency gain we’ve seen to date, and probably the best metric thus far for evaluating Turing’s power efficiency against Pascal’s.

The end result of these improvements in performance and power efficiency is that NVIDIA has once again put together a very solid Turing-based video card. And while its performance gains don’t make the likes of the GTX 1060 6GB and Radeon RX 590 obsolete overnight, it’s a clear case of out with the old and in with the new for the mainstream video card market. The GTX 1060 is well on its way out, and meanwhile AMD is going to have to significantly reposition the $279 RX 590. The GTX 1660 Ti cleanly beats it in performance and power efficiency, delivering 25% better performance for a bit over half the power consumption.

If anything, having cleared its immediate competitors with superior technology, the only real challenge NVIDIA will face is convincing consumers to pay $279 for a xx60 class card, and which performs like a $379 card from two years ago. In this respect the GTX 1660 Ti is a much better value proposition than the RTX 2060 above it, but it’s also more expensive than the GTX 1060 6GB it replaces, so it runs the risk of drifting out of the mainstream market entirely. Thankfully pricing here is a lot more grounded than the RTX 20 series cards, but the mainstream market is admittedly more price sensitive to begin with.

This also means that AMD remains a wildcard factor; they have the option of playing the value spoiler with cheap RX 590 cards, and I’m curious to see how serious they really are about bringing the RX Vega 56 in to compete with NVIDIA’s newest card. Our testing shows that RX Vega 56 is still around 5% faster on average, so AMD could still play a new version of the RX 590 gambit (fight on performance and price, damn the power consumption).

Perhaps the most surprising part about any of this is that despite the fact that the GTX 1660 Ti very notably omits NVIDIA’s RTX functionality, I’m not convinced RTX alone is going to sway any buyers one way or another. Since the RTX 2060 is both a faster and more expensive card, I quickly tabled the performance and price increases for all of the Turing cards launched thus far.

GeForce: Turing versus Pascal
  List Price
(Turing)
Relative Performance Relative
Price
Relative
Perf-Per-Dollar
RTX 2080 Ti vs GTX 1080 Ti $999 +32% +42% -7%
RTX 2080 vs GTX 1080 $699 +35% +40% -4%
RTX 2070 vs GTX 1070 $499 +35% +32% +2%
RTX 2060 vs GTX 1060 6GB $349 +59% +40% +14%
GTX 1660 Ti vs GTX 1060 6GB $279 +36% +12% +21%

The long and short of matters is that with the cheapest RTX card costing an additional $80, there’s a much stronger rationale to act based on pricing than feature sets. In fact considering just how amazingly consistent the performance gains are on a generation-by-generation basis, there’s ample evidence that NVIDIA has always planned it this way. Earlier I mentioned that NVIDIA acts with clockwork efficiency, and with nearly ever Turing card improving over its predecessor by roughly 35% (save the RTX 2060 with no direct predecessor), it’s amazing just how consistent NVIDIA’s product positioning is here. If the next GTX 16 series card isn’t also 35% faster than its predecessor, then I’m going to be amazed.

In any case, this makes a potentially complex situation for card buyers pretty simple: buy the card you can afford – or at least, the card with the performance you’re after – and don’t worry about whether it’s RTX or GTX. And while it’s unfortunate that NVIDIA didn’t include their RTX functionality top-to-bottom in the Turing family, there’s also a good argument to be had that the high-performance cost means that it wouldn’t make sense on a mainstream card anyhow. At least, not for this generation.

Last, but not least, we have the matter of EVGA’s GeForce GTX 1660 Ti XC Black GAMING. As this is launch without reference cards, we’re going to see NVIDIA’s board partners hit the ground running with their custom cards. And in true EVGA tradition, their XC Black GAMING is a solid example of what to expect for a $279 baseline GTX 1660 Ti card.

Since this isn’t a factory overclocked card, I’m a bit surprised that EVGA bothered to ship it with an increased 130W TDP. But I’m also glad they did, as the fact that it only improves performance by around 1% versus the same card at 120W is a very clear indicator that the GTX 1660 Ti is not meaningfully TDP limited. Overclocking will be another matter of course, but at stock this means that NVIDIA hasn’t had to significantly clamp down on power consumption to hit their power targets.

As for EVGA’s card design, I have to admit a triple-slot cooler is an odd choice for a 130W card – a standard double-wide card would have been more than sufficient for that kind of TDP – but in a market that’s going to be full of single and dual fan cards it definitely stands out from the crowd; and quite literally so, in the case of NVIDIA’s own promotional photos. Meanwhile I’m not sure there’s much to be said about EVGA’s software that we haven’t said a dozen times before: in EVGA Precision remains some of the best overclocking software on the market. And with such a beefy cooler on this card, it’s certainly begging to be overclocked.

Power, Temperature, and Noise
POST A COMMENT

157 Comments

View All Comments

  • Retycint - Tuesday, February 26, 2019 - link

    AMD selling overpriced cards does not subtract from the point that Nvidia is also attempting to raise the price as well. Both companies have put out underwhelming products this gen Reply
  • Rocket321 - Friday, February 22, 2019 - link

    "finally puts a Turing card in competition with their Pascal cards" should say Polaris. Reply
  • Ryan Smith - Friday, February 22, 2019 - link

    Boy I can't wait for Navi, since it sounds nothing like Turing...

    Thanks!
    Reply
  • Kogan - Friday, February 22, 2019 - link

    Aww, I was hoping this release would lower the price on those used 1070's. Oh well. I'll still probably go for a used 1070 over this one. Nearly identical in every way and can be found for as low as $200. Reply
  • Hamm Burger - Friday, February 22, 2019 - link

    Reading "Turing Sheds" in the headline makes me wonder what he could have done with a couple of these at Bletchley Park (which, for anybody passing, is well worth the steep entry fee — see bletchleypark.org.uk).

    Sorry for the interruption. I'll return you to the normal service.
    Reply
  • Colin1497 - Friday, February 22, 2019 - link

    "Now the bigger question in my mind: why is it so important to NVIDIA to be able to dual-issue FP32 and FP16 operations, such that they’re willing to dedicate die space to fixed FP16 cores? Are they expecting these operations to be frequently used together within a thread? Or is it just a matter of execution ports and routing?"

    It seems pretty likely that they added the FP16 cores because it simplified design, drivers, etc. It was easier to just drop in a few (as you mentioned) tiny FP16 cores than it was to change behavior of the architecture.
    Reply
  • CiccioB - Friday, February 22, 2019 - link

    FP16 is a way to simplify shading computing over the common used FP32.
    They allow for higher bandwidth (x2) and higher speed (x2, so half the energy for the same work) with the same HW space occupation. It was a feature used in HPC where bandwidth, power consumption and of course computation time are quite critical. They then ended in game class architecture just because they have find a way to exploit it there too.
    Some games have started using FP16 for their shading. On AMD fence, only Vega class cards support packed FP16 math.

    The use of a INT ALU that executes integer instructions together with the FP ones is instead an exclusive feature that can really improve shading performance much more than any other complex feature like high threaded (constantly interrupted) mechanism that is needed on architectures that cannot keep the ALUs feed.
    In fact we see that with less CUDA cores Turing can do the same work of Pascal even using less energy. And no magic ACE is present.
    Reply
  • Yojimbo - Friday, February 22, 2019 - link

    They didn't just drop in a few. It seems they have enough for 2x FP32 performance. Why are they dual issue? My guess is it is because that is what's necessary for Tensor Core operation. I think NVIDIA is being a bit secretive about the Tensor Cores. It's clear they took the RT Core circuitry out of the Turing minor die. As far as the Tensor Cores, I'm not so sure. Think about it this way: suppose Tensor Cores really are specialized separate cores. Then they also happen to have the capability of non tensor FP16 operation in dual issue with FP32 CUDA cores? Because if they don't then whatever functionality NVIDIA has planned for the FP16 cores on Turing minor would be incompatible with Turing major and Volta. I don't see how that can be the case, however, because, according to this review, Turing major is listed as the same CUDA compute generation as Turing minor. Now if the Tensor Cores can double as general purpose FP16 CUDA cores, then what's to say that FP16 and FP32 CUDA cores can't double as Tensor Cores? That is, if the Tensor Core can be made with two data flow paths, one following general purpose FP16 operations and one following Tensor Core instruction operations, then commutatively a general purpose CUDA core can be made with two data flow paths, one following general purpose operations and one following Tensor Core instruction operations.

    When Turing came out with Tensor Core operations but with FP64 cores cut from the die and no increase in FP32 CUDA cores per SM over Volta I was surprised. But with this new information from the Turing Minor launch it makes more sense to me. I don't know if they have the dedicated FP16 cores on Volta. If they do then the FP64 cores don't need to play the following role, but if they are able to use the FP64 cores as FP16 cores then hypothetically they have enough cores to account for the 64 FMA operations per clock per SM of the 8 Tensor Cores per SM. But on Turing major they just didn't have the cores to account for the Tensor Core performance. These FP16 cores on Turing minor seem to be exactly what would be necessary to make up for the shortfall. So, my guess is that Turing major also has these same cores. The difference is either entirely one of firmware/drivers that allows the Tensor Core data path to be operated on Turing major but not Turing minor or Turing major has some extra circuitry that allows the CUDA cores to be lashed together with an alternate data flow path that doesn't exist in Turing minor.
    Reply
  • GreenReaper - Friday, February 22, 2019 - link

    Agreed. It seems likely that most of the hardware is present, just not active.

    Frankly, it's not clear why these couldn't be binned versions of the higher-level chips that haven't met the QA requirements, which would be one reason it took this long to release - you need enough stock to be able to distribute it. If it's planned out in advance, they just need X good CUDA cores and Y ROPs that run at Z Mhz, combined with at least [n] MB of cache. Fuse off the bad or unwanted portions to save on power and you're good.

    Of course it *could* be like Intel, which truly make smaller derivatives. If so that suggests they'll be selling a lot of these cards. Even then, though, Yojimbo's supposition about the core design being essentially the same is likely to be true.
    Reply
  • Yojimbo - Saturday, February 23, 2019 - link

    Yeah the die size and transistor count is still large for the number of CUDA cores, being that this review claims the 1660Ti has all SMs on the TU116 enabled. I said it was clear they took RT circuitry out. But I was wrong, that's not clear. It seems the die area per CUDA core and transistors per CUDA core of the TU116 are extremely close to the TU106, which is fully-enabled in RTX 2070. If this is the result of the INT32 and FP16 cores of the TU116 then where exactly do any cost savings of removing the Tensor Cores and RT Cores come from? Definitely the cost of completely re-architecting another GPU would outweigh the slight reduction in die size they seem to have achieved.

    On the other hand, I'd imagine TU116 will be such a high volume part that unless yields are really lousy, binning alone won't provide enough chips (and where are the fully enabled versions of the 284 mm^2 RTX dies going, anywhere? No such product has thus far been announced.) Perhaps such a small number of RT cores was judged to be insufficient for RTX gaming. Even if not impossible to create some useful effects including that many RT cores, if developers were incentivized to target such few RT cores with their RTX efforts because the volume of such RT-enabled cards was significant then they may reduce the scope and scale of RTX enhancements they undertake, putting a drag on the adoption of the technology. So NVIDIA opted to disable the RT cores, and perhaps the Tensor Cores, present on the dies even when they are actually fully functioning. Perhaps it was simply cheaper to eat the wasted die space per chip than to design an entirely new GPU with the RT cores and Tensor Cores removed.
    Reply

Log in

Don't have an account? Sign up now