Practical Performance Possibilities

Last but not least, we would like to explore the potential performance repercussions of the GTX 970’s unusual configuration.

Starting with the ROPs, while NVIDIA’s original incorrect specification is unfortunate, from a practical perspective it’s really just annoying. As originally (and correctly) pointed out by The Tech Report and Hardware.fr, when it comes to fillrates the GTX 970 is already bottlenecked elsewhere. With a peak pixel rate of 4 pixels per clock per SMM, the GTX 970’s 13 SMMs inherently limit the card to 52px/clock, versus the 56px/clock rate for the card’s 56 ROPs. This is distinct from the GTX 980, where every stage of the GPU can pump out 64px/clock, and the ROPs can consume it just as well. In the case of the GTX 970 those extra ROPs still play a role in other tasks such as MSAA and other ROP activities that don’t require consuming additional SMM output – not to mention a fully disabled ROP/MC partition would shift the bottleneck to the ROPs with only 48 ROPs vs. 13 SMMs – so the 56 ROPs are still useful to have, but for basic pixel operations the GTX 970 has been bound by its SMM count from the start.

As for the memory segmentation, there are 3 basic scenarios to consider, only one of which has the potential to impact the GTX 970 in particular. In all cases with less than 3.5GB of memory allocated the GTX 970 behaves just as if it had a single segment, with no corner cases to be concerned about. Meanwhile in cases with more than 4GB of memory allocation the GTX 970 will still spill over to PCIe, just as the GTX 980 does, typically crushing performance in both cases. This leaves the last case as the only real concern, which is memory allocations between 3.5GB and 4GB.

GeForce GTX 970 Theoretical Memory Bandwidth
Segment Memory
Fast Segment (3.5GB) 192GB/sec
Slow Segment (512MB) 28GB/sec
PCIe System Memory 16GB/sec

In the case of memory allocations between 3.5GB and 4GB, what happens is unfortunately less-than-deterministic. The use of heuristics to determine which resources to allocate to which memory segment, though the correct solution in this case, means that the real world performance impact is going to vary on a game-by-game basis. If NVIDIA’s heuristics and driver team do their job correctly, then the performance impact versus a theoretical single-segment 4GB card should only be a few percent. Even in cases where the entire 4GB space is filled with in-use resources, picking resources that don’t need to be accessed frequently can sufficiently hide the lack of bandwidth from the 512MB segment. This is after all just a permutation on basic caching principles.

The worst case scenario on the other hand would be to have the NVIDIA heuristics fail, or alternatively ending up with a workload where no great solution exists, and over 3.5GB of resources must be repeatedly and heavily accessed. In this case there is certainly the potential for performance to crumple, especially if accessing resources in the slow segment is a blocking action. And in this case the GTX 970 would still perform better than a true 3.5GB card since the slow segment is still much faster than system memory, but it’s nonetheless significantly slower than the 3.5GB segment as well.

But perhaps the most frustrating scenario isn’t having more than 3.5GB of necessary resources, but having more than 3.5GB of unnecessary resources due to caching by the application. One VRAM utilization strategy for games is to allocate as much VRAM as they can get their hands on and then hold onto it for internal resource caching, increased view distances, or other less immediate needs. The Frostbite engine behind the Battlefield series (and an increasing number of other EA games) is one such example, as it will opportunistically allocate additional VRAM for the purpose of increasing draw distances. For something like a game this actually makes a lot of sense at the application level – games are generally monolithic applications that are the sole program being interacted with at the time – but it makes VRAM allocation tracking all the trickier as it obfuscates what a game truly needs versus what it merely wants to hold onto for itself. In this case tracking resources by usage is still one option, though like the overall theme of real world performance implications, it’s going to be strongly dependent on the individual application.

In any case, the one bit of good news here is that for gaming running out of VRAM is generally rather obvious. Running out of VRAM, be it under normal circumstances or going over the GTX 970’s 3.5GB segment, results in some very obvious stuttering and very poor minimum framerates. So if it does happen then it will be easy to spot. Running out of (fast) VRAM isn’t something that can easily be hidden if the VRAM is truly needed.

To that end in the short amount of time we’ve had to work on this article we have also been working on cooking up potential corner cases for the GTX 970 and have so far come up empty, though we’re by no means done. Coming up with real (non-synthetic) gaming workloads that can utilize between 3.5GB and 4GB of VRAM while not running into a rendering performance wall is already a challenge, and all the more so when trying to find such workloads that actually demonstrate performance problems. This at first glance does seem to validate NVIDIA’s overall claims that performance is not significantly impacted by the memory segmentation, but we’re going to continue looking to see if that holds up. In the meantime NVIDIA seems very eager to find such corner cases as well, and if there are any they’d like to be able to identify what’s going on and tweak their heuristics to resolve them.

Ultimately we find ourselves going a full circle back to something NVIDIA initially said about the matter, which is that the performance impact of the GTX 970’s configuration is already baked into the results we have. After all, the configuration is not a bug or other form of unexpected behavior, and NVIDIA has been fully abstracting and handling the memory segments since the GTX 970’s initial launch. So while today’s revelation gives us a better understanding of how GTX 970 operates and what the benefits and drawbacks are, that information alone doesn’t change how the card behaves.

Closing Thoughts

Bringing things to a close, I must admit I was a bit taken aback when NVIDIA first told us that they needed to correct the specifications for the GTX 970. We’ve had NVIDIA decline to disclose sensitive information before only to reveal it later, but they’ve never had to do something quite like this before. In retrospect these new specifications make more sense given the performance and device specs we’re seeing, but it certainly is going to leave egg on NVIDIA’s face as this never should have happened in the first place.

As for the GTX 970’s underlying memory configuration and memory allocation techniques, this is going to be a more difficult matter to bring closure to. Without question the GTX 970’s unusual memory configuration introduces a layer of complexity that isn’t there with the GTX 980, and as a result it’s extremely difficult to quantify better and worse in this case. It’s worse than the GTX 980 – and it is a lower tier card after all – but how much worse is no longer an easy answer to provide.

At its heart the GTX 970’s configuration is a compromise between GPU yields, card prices, and memory capacity. The easiest argument to make in that regard is that it should have shipped with a full 64 ROP configuration and skipped all of these complexities entirely. But on the whole and looking at the options for configurations without this additional complexity, a 3GB/48 ROP GTX 970 would have been underspeced, and with so much of the GTX 970’s success story being NVIDIA’s ability to launch the card at $329 I’m not sure if the other option is much better. At least on paper this looks like the best compromise NVIDIA could make.

In the end while I am disappointed that these details haven’t come out until now, I am satisfied that we now finally have enough information in hand to truly understand what’s going on with the GTX 970 and what its strengths and weaknesses are as a result of memory segmentation. Meanwhile for real world performance, right now this is an ongoing test with the GTX 970. As the highest-profile card to use memory segmentation it’s the first time NVIDIA has been under the microscope like this, but it’s far from the first time they’ve used this technology. But so far with this new information we have been unable to break the GTX 970, which means NVIDIA is likely on the right track and the GTX 970 should still be considered as great a card now as it was at launch. In which case what has ultimately changed today is not the GTX 970, but rather our perception of it.

Segmented Memory Allocation in Software
Comments Locked

398 Comments

View All Comments

  • Will Robinson - Wednesday, January 28, 2015 - link

    You're going to love this then...
    http://gamenab.net/2015/01/26/truth-about-the-g-sy...
  • Oxford Guy - Thursday, January 29, 2015 - link

    Fascinating link, for sure.
  • mudz78 - Wednesday, January 28, 2015 - link

    "we have also been working on cooking up potential corner cases for the GTX 970 and have so far come up empty"

    Riiight.

    "As part of our discussion with NVIDIA, they laid out the fact that the original published specifications for the GTX 970 were wrong, and as a result the “unusual” behavior that users had been seeing from the GTX 970 was in fact expected behavior for a card configured as the GTX 970 was."

    Nvidia has already admitted they had complaints about performance.

    If you want to come up with scenarios where the 970 shits its pants you should really try harder:

    http://www.overclock.net/t/1535502/gtx-970s-can-on...

    http://forums.guru3d.com/showthread.php?t=396064

    https://www.reddit.com/r/hardware/comments/2s333r/...

    http://www.reddit.com/r/pcgaming/comments/2s2968/g...

    All of those threads have been around for weeks before Nvidia's announcment.

    Who cares what Nvidia's take on the situation is? It was an accident? Oh, no worries, mate!

    They are a business that lied, there's consequences to that. Nobody cares that they didn't mean it.

    Refunds will start rolling out in coming weeks.
  • Yojimbo - Wednesday, January 28, 2015 - link

    Hey, can you link to the actual relevant part of those threads where someone is posting his methodology and results for creating a performance problem? The overclocker link seems to be a link to a 106-page thread whose first message is just a link to the other 3 threads you posted. The first message in the guru3d thread claims that the card can't use more than 3.5GB at all, which we now know to be completely false. It's like you're throwing us a cookbook and flour and saying "Here, there's a pie in here somewhere." If it's somewhere in there, and you have seen it before, could you please find and point to the methodology and claimed results so that people can try to repeat it rather than you just saying "you really should try harder"?
  • mudz78 - Wednesday, January 28, 2015 - link

    I think a more fitting analogy would be, somebody is complaining they can't spell and I am handing them a dictionary. I'm telling you the information is in there, so have a read and find it.

    Maybe if you bothered to read beyond the first post in each thread you would have some answers?

    " The first message in the guru3d thread claims that the card can't use more than 3.5GB at all,"

    No it doesn't.

    "I think (maybe) is here a little problem with GTX 970. If I run some games, for example Far Cry 4, GTX 970 allocate only around 3500MB video memory, but in same game and same scene GTX 980 allocate full 4000MB video memory.
    But if I change resolution to higher - 3840x2160, then all memory is allocated.
    Same problem exist in many other games like Crysis 3, Watch Dogs etc..

    Where is problem?? I really dont know..."
    http://forums.guru3d.com/showthread.php?t=396064

    "I didn't believe this at first, but I just decided to try and test it myself with texture modded Skyrim and my SLI 970s. I tried to push the 3.5 GBs barrier by downsampling it from 5120x2880 with the four following experimental conditions:

    1. No MSAA applied on top
    2. 2xMSAA applied on top
    3. 4xMSAA applied on top
    4. 8xMSAA applied on top

    Since MSAA is known to be VRAM heavy, it made sense. I also kept a close eye on GPU usage and FPS with the Rivatuner overlay as well as VRAM usage. All of this was done running around Whiterun to minimize GPU usage. My results were as follows.

    1. Skyrim peaked at about 3600 MBs in usage with occasional brief hitching while loading new textures in and out of VRAM. GPU usage remained well below 99% on each card.

    2. Skyrim once again peaked at about 3600 MBs with the mentioned hitching, this time somewhat more frequently. Once again, GPU usage remained well below 99%.

    3. Skyrim yet again peaked at about 3600 MBs and hitched much more prominently and frequently at the same time as VRAM usage droppped down 100-200 MBs. GPU usage was below 99% again with FPS still at 60 aside from those hitches.

    4. Now Skyrim was using the full 4 GB framebuffer with massive stuttering and hitching from a lack of VRAM. This time, I had to stare at the ground to keep GPU usage below 99% and retain 60 FPS. I ran around Whiterun just staring at the ground and it remained at 60 FPS except with those massive hitches where GPU usage and framerate temporarily plummeted. This last run merely indicated that Skyrim can indeed use more VRAM than it was with the previous 3 settings and so the issue seems to be with the 970s themselves rather than just the game in this example. The performance degradation aside from VRAM was severe, but that could just be 8xMSAA at 5K taking its calculative toll.

    So it seems to me that my 970s refuse to utilize above ~3600 MBs of VRAM unless they absolutely need it, but I've no idea why. Nvidia didn't gimp the memory bus in any overly obvious way from the full GM204 chip therefore the 970s should have no issue using the same VRAM amount as the 980s. I don't like what I see, it's like the situation with the GTX 660 that had 2 GBs but could only effectively use up 1.5 without reducing its bandwidth to a third, so it tried to avoid exceeding 1.5. The difference is that was predictable due to the GK106's 192-bit memory bus, there's nothing about the 970's explicit specifications that indicates the same situation should apply.

    A similar shortcoming was noticed sometime back regarding the 970's ROPs and how the cutting-down of 3 of GM204's 16 SMM units affected the effective pixel fillrate of the 970s despite retaining the full 64 ROPs. It's possible that Maxwell is more tightly-connected to shader clusters and severing them affects a lot about how the chip behaves, but that doesn't really make sense. If this is an issue, it's almost certainly software-related. I'm not happy regardless of the reason and I'll try more games later. Anecdotally, I have noticed recent demanding games peaking at about 3500-3600 MBs and can't actually recall anything going beyond that. I didn't pay attention to it or change any conditions to test it."
    http://www.overclock.net/t/1535502/gtx-970s-can-on...

    "I can reproduce this issue in Hitman: Absolution.
    Once more than 3.5GB get allocated, there is a huge frametime spike.
    The same scene can be tested to get reproducible results.
    In 4k, memory usage stays below 3.5GB and there is no extreme spike. But in 5k (4x DSR with 1440p), at the same scene, there is a huge fps drop once the game wants to allocate 2-300MB at once and burst the 3.5GB.
    It happens in the tutorial mission when encountering the tennis field.

    With older driver (344.11 instead of 347.09), memory usage is lower, but you can enable MSAA to get high VRAM usage and thus be able to reproduce by 100%.

    Could a GTX 980 owner test this?"
    http://www.overclock.net/t/1535502/gtx-970s-can-on...

    "Without AA or just FXAA, I have around 3.5GB used in AC: U and mostly no stuttering. With 2xMSAA it rises to ~3.6-3.7GB and performance is still ok. But when I enable 4xMSAA and it needs ~3.8GB, I often have severe stuttering.
    When I set resolution to 720p and enable 8xMSAA, VRAM usage is well below 3GB and there is no stuttering at all."
    http://forums.guru3d.com/showpost.php?p=4991141&am...

    "In Far Cry 4 @ 1440p
    No AA: 3320MB Max Vram, locked at 60 fps
    2x MSAA: 3405MB Max Vram, locked at 60fps
    4x MSAA: 3500MB Max Vram, 45-60fps
    8x MSAA, starts around 3700-3800MB @ 4-5fps, stabilizes at 3500MB @ 30-40fps."
    http://forums.guru3d.com/showpost.php?p=4991210&am...

    There's plenty more evidence supporting the acknowledged (by Nvidia) fact that the GTX970 has performance issues with VRAM allocation above 3.5GB.

    And all those people posting "my games run fine at 1080p", you are clearly missing the point.
  • aoshiryaev - Wednesday, January 28, 2015 - link

    Why not just disable the slow 512mb of memory?
  • SkyBill40 - Wednesday, January 28, 2015 - link

    Why not just have the full 4GB at the rated speed as advertised?
  • Oxford Guy - Thursday, January 29, 2015 - link

    Ding ding ding.
  • MrWhtie - Wednesday, January 28, 2015 - link

    I can run 4 games at 100+ fps on 1080p simultaneously (MSI GTX 970). Power like this used to always cost $500+. I have no complaints; I didn't have $500 to spend on a GTX 980.

    I feel Nvidia is doing us a favor by significantly undercutting AMD.
  • mudz78 - Wednesday, January 28, 2015 - link

    Yeah, a huge favour. By lying about their product specs, undercutting the competition and concreting market share, they set themselves up to hike prices in the future.

Log in

Don't have an account? Sign up now