Practical Performance Possibilities

Last but not least, we would like to explore the potential performance repercussions of the GTX 970’s unusual configuration.

Starting with the ROPs, while NVIDIA’s original incorrect specification is unfortunate, from a practical perspective it’s really just annoying. As originally (and correctly) pointed out by The Tech Report and Hardware.fr, when it comes to fillrates the GTX 970 is already bottlenecked elsewhere. With a peak pixel rate of 4 pixels per clock per SMM, the GTX 970’s 13 SMMs inherently limit the card to 52px/clock, versus the 56px/clock rate for the card’s 56 ROPs. This is distinct from the GTX 980, where every stage of the GPU can pump out 64px/clock, and the ROPs can consume it just as well. In the case of the GTX 970 those extra ROPs still play a role in other tasks such as MSAA and other ROP activities that don’t require consuming additional SMM output – not to mention a fully disabled ROP/MC partition would shift the bottleneck to the ROPs with only 48 ROPs vs. 13 SMMs – so the 56 ROPs are still useful to have, but for basic pixel operations the GTX 970 has been bound by its SMM count from the start.

As for the memory segmentation, there are 3 basic scenarios to consider, only one of which has the potential to impact the GTX 970 in particular. In all cases with less than 3.5GB of memory allocated the GTX 970 behaves just as if it had a single segment, with no corner cases to be concerned about. Meanwhile in cases with more than 4GB of memory allocation the GTX 970 will still spill over to PCIe, just as the GTX 980 does, typically crushing performance in both cases. This leaves the last case as the only real concern, which is memory allocations between 3.5GB and 4GB.

GeForce GTX 970 Theoretical Memory Bandwidth
Segment Memory
Fast Segment (3.5GB) 192GB/sec
Slow Segment (512MB) 28GB/sec
PCIe System Memory 16GB/sec

In the case of memory allocations between 3.5GB and 4GB, what happens is unfortunately less-than-deterministic. The use of heuristics to determine which resources to allocate to which memory segment, though the correct solution in this case, means that the real world performance impact is going to vary on a game-by-game basis. If NVIDIA’s heuristics and driver team do their job correctly, then the performance impact versus a theoretical single-segment 4GB card should only be a few percent. Even in cases where the entire 4GB space is filled with in-use resources, picking resources that don’t need to be accessed frequently can sufficiently hide the lack of bandwidth from the 512MB segment. This is after all just a permutation on basic caching principles.

The worst case scenario on the other hand would be to have the NVIDIA heuristics fail, or alternatively ending up with a workload where no great solution exists, and over 3.5GB of resources must be repeatedly and heavily accessed. In this case there is certainly the potential for performance to crumple, especially if accessing resources in the slow segment is a blocking action. And in this case the GTX 970 would still perform better than a true 3.5GB card since the slow segment is still much faster than system memory, but it’s nonetheless significantly slower than the 3.5GB segment as well.

But perhaps the most frustrating scenario isn’t having more than 3.5GB of necessary resources, but having more than 3.5GB of unnecessary resources due to caching by the application. One VRAM utilization strategy for games is to allocate as much VRAM as they can get their hands on and then hold onto it for internal resource caching, increased view distances, or other less immediate needs. The Frostbite engine behind the Battlefield series (and an increasing number of other EA games) is one such example, as it will opportunistically allocate additional VRAM for the purpose of increasing draw distances. For something like a game this actually makes a lot of sense at the application level – games are generally monolithic applications that are the sole program being interacted with at the time – but it makes VRAM allocation tracking all the trickier as it obfuscates what a game truly needs versus what it merely wants to hold onto for itself. In this case tracking resources by usage is still one option, though like the overall theme of real world performance implications, it’s going to be strongly dependent on the individual application.

In any case, the one bit of good news here is that for gaming running out of VRAM is generally rather obvious. Running out of VRAM, be it under normal circumstances or going over the GTX 970’s 3.5GB segment, results in some very obvious stuttering and very poor minimum framerates. So if it does happen then it will be easy to spot. Running out of (fast) VRAM isn’t something that can easily be hidden if the VRAM is truly needed.

To that end in the short amount of time we’ve had to work on this article we have also been working on cooking up potential corner cases for the GTX 970 and have so far come up empty, though we’re by no means done. Coming up with real (non-synthetic) gaming workloads that can utilize between 3.5GB and 4GB of VRAM while not running into a rendering performance wall is already a challenge, and all the more so when trying to find such workloads that actually demonstrate performance problems. This at first glance does seem to validate NVIDIA’s overall claims that performance is not significantly impacted by the memory segmentation, but we’re going to continue looking to see if that holds up. In the meantime NVIDIA seems very eager to find such corner cases as well, and if there are any they’d like to be able to identify what’s going on and tweak their heuristics to resolve them.

Ultimately we find ourselves going a full circle back to something NVIDIA initially said about the matter, which is that the performance impact of the GTX 970’s configuration is already baked into the results we have. After all, the configuration is not a bug or other form of unexpected behavior, and NVIDIA has been fully abstracting and handling the memory segments since the GTX 970’s initial launch. So while today’s revelation gives us a better understanding of how GTX 970 operates and what the benefits and drawbacks are, that information alone doesn’t change how the card behaves.

Closing Thoughts

Bringing things to a close, I must admit I was a bit taken aback when NVIDIA first told us that they needed to correct the specifications for the GTX 970. We’ve had NVIDIA decline to disclose sensitive information before only to reveal it later, but they’ve never had to do something quite like this before. In retrospect these new specifications make more sense given the performance and device specs we’re seeing, but it certainly is going to leave egg on NVIDIA’s face as this never should have happened in the first place.

As for the GTX 970’s underlying memory configuration and memory allocation techniques, this is going to be a more difficult matter to bring closure to. Without question the GTX 970’s unusual memory configuration introduces a layer of complexity that isn’t there with the GTX 980, and as a result it’s extremely difficult to quantify better and worse in this case. It’s worse than the GTX 980 – and it is a lower tier card after all – but how much worse is no longer an easy answer to provide.

At its heart the GTX 970’s configuration is a compromise between GPU yields, card prices, and memory capacity. The easiest argument to make in that regard is that it should have shipped with a full 64 ROP configuration and skipped all of these complexities entirely. But on the whole and looking at the options for configurations without this additional complexity, a 3GB/48 ROP GTX 970 would have been underspeced, and with so much of the GTX 970’s success story being NVIDIA’s ability to launch the card at $329 I’m not sure if the other option is much better. At least on paper this looks like the best compromise NVIDIA could make.

In the end while I am disappointed that these details haven’t come out until now, I am satisfied that we now finally have enough information in hand to truly understand what’s going on with the GTX 970 and what its strengths and weaknesses are as a result of memory segmentation. Meanwhile for real world performance, right now this is an ongoing test with the GTX 970. As the highest-profile card to use memory segmentation it’s the first time NVIDIA has been under the microscope like this, but it’s far from the first time they’ve used this technology. But so far with this new information we have been unable to break the GTX 970, which means NVIDIA is likely on the right track and the GTX 970 should still be considered as great a card now as it was at launch. In which case what has ultimately changed today is not the GTX 970, but rather our perception of it.

Segmented Memory Allocation in Software
Comments Locked

398 Comments

View All Comments

  • HisDivineOrder - Tuesday, January 27, 2015 - link

    I think the theory laid out here for why nVidia would be a fool to lie assumes the lie was out the gate intended to be a lie OR that they could have just been the victim of a terrible mixup. I think the answer is somewhere in between.

    I think the far more likely scenario is they did not set out to lie to the press, but when the mixup happened and they discovered it (almost right away), they realized that they could wait a few months and let the thing play out through the holiday season. They would make a ton of sales, they could focus the press entirely on the performance given rather than the specs and when the truth was discovered they could shrug it off as unimportant because really performance was all that mattered. Not specs.

    The fact that they knew for months would mean little because ultimately the performance and benchmarks would still be (mostly) applicable and people who bought in got exactly what they were promised even if they didn't know to ask the precise question that would have illustrated greater weaknesses than they expected in the long run.

    So the deception carries on for months and then when pressed about it, delaying talking about it for a month (Dec-Jan, big sales month), they admit it after all the sales and virtually all the return periods are up. Then they shrug and say, "But the performance is the same anyway, so hey."

    That's the way they went. Imagine if they had not. Imagine instead if they had announced it as soon as they realized it after the initial reviews went out. Suddenly, the big story is not the amazing performance of the card, the value of the card compared to AMD's pricing at the time, or the percentage of performance you get compared to the nVidia high end. The story is how the press were mislead and had to change the specs. The story becomes what it is now, except without all the sales in front of it.

    Suddenly, the 970 has a stink of failure on it and people avoid it even though the performance is just as good as it seems. "nVidia tried to pull a fast one," people would say (like they are now). Except BEFORE all those sales happened. Now, the card won't sell and all because of a mixup in the marketing department. Now nVidia's got the stink of fail on them from being brave and admitting what they'd done by mistake, leading to story after story of how nVidia mistakenly mislabeled the card's technical specs.

    Tanking sales through the holiday season by a decent margin and costing nVidia tons of money.

    That's the lie, people. The lie is not the mixup as though they don't happen. They absolutely happen. The lie is nVidia not knowing almost immediately they'd mixed things up. You know they did. And unlike the writer of this article, I see a clear and easy motive for why they'd continue the lie. They wanted to stall and shrug and gesture and act like they were figuring out what happened right up until the cards they'd sold between November and December were all universally securely at home in buyer's possession.

    Once the holiday return periods were up and once the cards were mostly bought as much as they were going to be in the mad rush, that's when they fess up.

    It's the old adage: It's easier to be forgiven than ask permission.

    There's your motive for deceit. I'm not saying it's right. I'm just saying that's the motive and that's why they did it and that's the timeline for how they did it. The sad part is the article here is not wrong that if nVidia had made no mistake in the first place, the story would have been squarely on how great a value the 970 was.

    But after the mistake, nVidia had the choice of fessing up and losing a ton of sales to bad press surrounding a non-issue or stall for a few months until purchases were settled and unreturnable (mostly), then fess up instead and grin and say, "Whoops."
  • SunnyNW - Tuesday, January 27, 2015 - link

    Except for the people in the Forums are the ones that brought this up not nvidia on their own...Just so many cards had been sold an a larger percentage of people starting noticing issues. But of course they knew, I agree, not Initially but pretty soon after (within days For sure). Just everything played out (time-wise) as best as it could for nvidia, considering the circumstances.
    The issue here is The Performance of the card contrary to what most keep saying, the performance of the memory. The card simply does not act the same way as a "traditional" 4GB would. Yes the extra .5GB is better than system memory but that does not change the latter fact.
  • Expressionistix - Tuesday, January 27, 2015 - link

    Most of the people buying these things just use them to play video games on the computer - does anyone really care?
  • R. Hunt - Wednesday, January 28, 2015 - link

    Gamers pay good money for these things, so I don't see why not.
  • nos024 - Tuesday, January 27, 2015 - link

    wow...as if knowing this info changes all the benchmarks. i am more disappointed with the 128bit memory bus on 960gtx.
  • nos024 - Tuesday, January 27, 2015 - link

    Oh and i bought a brand spanking new 970gtx today despite after reading this article. Msi version.
  • Dr.Neale - Wednesday, January 28, 2015 - link

    Under the circumstances, I strongly believe that NVidia should be forced to accept the return of any 970 the customer no longer wants to own, on the grounds that it does NOT MEET THE PUBLISHED SPECIFICATIONS and is therefore DEFECTIVE in that it was NOT AS DESCRIBED.

    For example, AMAZON has exactly this policy, giving the customer (at least) 90 days to return any such product sold through Amazon Marketplace, for a full refund of all costs.

    Now that NVidia has admitted that the original published specs are NOT MET by EVERY SINGLE 970 card, they would have no way to deny any customer claim.

    I believe that Consumer Protection Laws would also dictate that a full refund must be issued within a reasonable time after the defect is "found".

    So, to those who are unhappy with their 970 purchase, use this as a means to get a full refund, and buy something else instead.

    To those who aren't willing to give up their wonderful 970, simply accept the fact that this memory defect is main reason the 970 is so much cheaper than the defect-free 980, and move on.

    I further believe it would be in NVidia's long-term interests to facilitate the return of any unwanted cards, and to offer some freebie to compensate those willing to keep their 970 cards, despite the defect.

    Anything less is unacceptable.
  • GGlover - Wednesday, January 28, 2015 - link

    Early adopter here. I paid for 2 970's over 1 980 because I was lead to believe that the specs were extremely close and that the 2 970's were slightly cheaper than a single 980. I had believed that they would perform better than a single 980 (extra ram etc.). I would have probably gotten a 980 had I known that there was in fact a much larger difference in specs. Real world performance or not. The numbers weren't really in at that time. So I was misled by a bait and switch.
  • Oxford Guy - Thursday, January 29, 2015 - link

    SLI is definitely the biggest problem Nvidia is facing.

    This article's author said he couldn't think of a reason why Nvidia would benefit from misleading consumers, but SLI purchasing decisions are heavily influenced by the VRAM amount on a card. Having the 980 be ostensibly the same in terms of VRAM was a very significant factor as well as the claimed amount for the 970 by itself.
  • jbluzb - Wednesday, January 28, 2015 - link

    I do not like their unlawful business practice of false advertising. They waited after the Christmas season is over before acknowledging that there was indeed a problem in the reported specs.

    That is what really turned me off from the company. This will be last NVIDIA card that I will ever buy because I do not want to support a company who does such things to its customer.

    Also, I it made me weary of review websites. It is big eye opener for me ---- they are just different websites handled by a marketing team. They cannot talk negatively about a company because they are a major sponsor. There is no such thing as truth in journalism. :(

Log in

Don't have an account? Sign up now