Practical Performance Possibilities

Last but not least, we would like to explore the potential performance repercussions of the GTX 970’s unusual configuration.

Starting with the ROPs, while NVIDIA’s original incorrect specification is unfortunate, from a practical perspective it’s really just annoying. As originally (and correctly) pointed out by The Tech Report and Hardware.fr, when it comes to fillrates the GTX 970 is already bottlenecked elsewhere. With a peak pixel rate of 4 pixels per clock per SMM, the GTX 970’s 13 SMMs inherently limit the card to 52px/clock, versus the 56px/clock rate for the card’s 56 ROPs. This is distinct from the GTX 980, where every stage of the GPU can pump out 64px/clock, and the ROPs can consume it just as well. In the case of the GTX 970 those extra ROPs still play a role in other tasks such as MSAA and other ROP activities that don’t require consuming additional SMM output – not to mention a fully disabled ROP/MC partition would shift the bottleneck to the ROPs with only 48 ROPs vs. 13 SMMs – so the 56 ROPs are still useful to have, but for basic pixel operations the GTX 970 has been bound by its SMM count from the start.

As for the memory segmentation, there are 3 basic scenarios to consider, only one of which has the potential to impact the GTX 970 in particular. In all cases with less than 3.5GB of memory allocated the GTX 970 behaves just as if it had a single segment, with no corner cases to be concerned about. Meanwhile in cases with more than 4GB of memory allocation the GTX 970 will still spill over to PCIe, just as the GTX 980 does, typically crushing performance in both cases. This leaves the last case as the only real concern, which is memory allocations between 3.5GB and 4GB.

GeForce GTX 970 Theoretical Memory Bandwidth
Segment Memory
Fast Segment (3.5GB) 192GB/sec
Slow Segment (512MB) 28GB/sec
PCIe System Memory 16GB/sec

In the case of memory allocations between 3.5GB and 4GB, what happens is unfortunately less-than-deterministic. The use of heuristics to determine which resources to allocate to which memory segment, though the correct solution in this case, means that the real world performance impact is going to vary on a game-by-game basis. If NVIDIA’s heuristics and driver team do their job correctly, then the performance impact versus a theoretical single-segment 4GB card should only be a few percent. Even in cases where the entire 4GB space is filled with in-use resources, picking resources that don’t need to be accessed frequently can sufficiently hide the lack of bandwidth from the 512MB segment. This is after all just a permutation on basic caching principles.

The worst case scenario on the other hand would be to have the NVIDIA heuristics fail, or alternatively ending up with a workload where no great solution exists, and over 3.5GB of resources must be repeatedly and heavily accessed. In this case there is certainly the potential for performance to crumple, especially if accessing resources in the slow segment is a blocking action. And in this case the GTX 970 would still perform better than a true 3.5GB card since the slow segment is still much faster than system memory, but it’s nonetheless significantly slower than the 3.5GB segment as well.

But perhaps the most frustrating scenario isn’t having more than 3.5GB of necessary resources, but having more than 3.5GB of unnecessary resources due to caching by the application. One VRAM utilization strategy for games is to allocate as much VRAM as they can get their hands on and then hold onto it for internal resource caching, increased view distances, or other less immediate needs. The Frostbite engine behind the Battlefield series (and an increasing number of other EA games) is one such example, as it will opportunistically allocate additional VRAM for the purpose of increasing draw distances. For something like a game this actually makes a lot of sense at the application level – games are generally monolithic applications that are the sole program being interacted with at the time – but it makes VRAM allocation tracking all the trickier as it obfuscates what a game truly needs versus what it merely wants to hold onto for itself. In this case tracking resources by usage is still one option, though like the overall theme of real world performance implications, it’s going to be strongly dependent on the individual application.

In any case, the one bit of good news here is that for gaming running out of VRAM is generally rather obvious. Running out of VRAM, be it under normal circumstances or going over the GTX 970’s 3.5GB segment, results in some very obvious stuttering and very poor minimum framerates. So if it does happen then it will be easy to spot. Running out of (fast) VRAM isn’t something that can easily be hidden if the VRAM is truly needed.

To that end in the short amount of time we’ve had to work on this article we have also been working on cooking up potential corner cases for the GTX 970 and have so far come up empty, though we’re by no means done. Coming up with real (non-synthetic) gaming workloads that can utilize between 3.5GB and 4GB of VRAM while not running into a rendering performance wall is already a challenge, and all the more so when trying to find such workloads that actually demonstrate performance problems. This at first glance does seem to validate NVIDIA’s overall claims that performance is not significantly impacted by the memory segmentation, but we’re going to continue looking to see if that holds up. In the meantime NVIDIA seems very eager to find such corner cases as well, and if there are any they’d like to be able to identify what’s going on and tweak their heuristics to resolve them.

Ultimately we find ourselves going a full circle back to something NVIDIA initially said about the matter, which is that the performance impact of the GTX 970’s configuration is already baked into the results we have. After all, the configuration is not a bug or other form of unexpected behavior, and NVIDIA has been fully abstracting and handling the memory segments since the GTX 970’s initial launch. So while today’s revelation gives us a better understanding of how GTX 970 operates and what the benefits and drawbacks are, that information alone doesn’t change how the card behaves.

Closing Thoughts

Bringing things to a close, I must admit I was a bit taken aback when NVIDIA first told us that they needed to correct the specifications for the GTX 970. We’ve had NVIDIA decline to disclose sensitive information before only to reveal it later, but they’ve never had to do something quite like this before. In retrospect these new specifications make more sense given the performance and device specs we’re seeing, but it certainly is going to leave egg on NVIDIA’s face as this never should have happened in the first place.

As for the GTX 970’s underlying memory configuration and memory allocation techniques, this is going to be a more difficult matter to bring closure to. Without question the GTX 970’s unusual memory configuration introduces a layer of complexity that isn’t there with the GTX 980, and as a result it’s extremely difficult to quantify better and worse in this case. It’s worse than the GTX 980 – and it is a lower tier card after all – but how much worse is no longer an easy answer to provide.

At its heart the GTX 970’s configuration is a compromise between GPU yields, card prices, and memory capacity. The easiest argument to make in that regard is that it should have shipped with a full 64 ROP configuration and skipped all of these complexities entirely. But on the whole and looking at the options for configurations without this additional complexity, a 3GB/48 ROP GTX 970 would have been underspeced, and with so much of the GTX 970’s success story being NVIDIA’s ability to launch the card at $329 I’m not sure if the other option is much better. At least on paper this looks like the best compromise NVIDIA could make.

In the end while I am disappointed that these details haven’t come out until now, I am satisfied that we now finally have enough information in hand to truly understand what’s going on with the GTX 970 and what its strengths and weaknesses are as a result of memory segmentation. Meanwhile for real world performance, right now this is an ongoing test with the GTX 970. As the highest-profile card to use memory segmentation it’s the first time NVIDIA has been under the microscope like this, but it’s far from the first time they’ve used this technology. But so far with this new information we have been unable to break the GTX 970, which means NVIDIA is likely on the right track and the GTX 970 should still be considered as great a card now as it was at launch. In which case what has ultimately changed today is not the GTX 970, but rather our perception of it.

Segmented Memory Allocation in Software
Comments Locked

398 Comments

View All Comments

  • Jon Tseng - Monday, January 26, 2015 - link

    If it's such a gimped card I'll buy yours off you for $200. After all if it's really going to lose performance and value so quickly it can't be worth much more than that.

    You can sue NVidia for the extra $150. I can finally get FSX running at 4K*. Everyone's happy! :-) :-)

    * Bonus point if you can spot the deliberate "Kessel Run in 12 Parsecs" logic here.
  • JarredWalton - Monday, January 26, 2015 - link

    Not likely. Most games target specific amounts of VRAM, with 1GB, 2GB, 3GB, 4GB, 6GB, and 8GB are all likely candidates, usually the targets have some leeway. The reason is you target memory use based on textures and shadow maps, but you know that you also have to have frame buffers and other elements in VRAM (that aren't usually directly under the control of the game). So a game that targets 4GB VRAM will usually target more like 3-3.2GB VRAM, leaving the rest for the GPU to use on frame buffers, Z-buffers, etc.

    To that end, I've seen games where a GTX 970 runs great at Ultra QHD, but Ultra 4K really kills performance -- because where Ultra QHD might be just under the 3.5GB VRAM of the 970, Ultra 4K ends up going over the 4GB barrier. (The same performance drop occurs with the GTX 980 as well in my experience.) And do you know what most gamers will do if they hit the point where performance takes a big dive? They'll drop one or two settings to "fix" things.

    And that's where NVIDIA's GeForce Experience can help the majority: they go in, select their resolution, and let the game do the tricky part of selecting ideal settings. Maybe it won't be perfect, but for most gamers it's sufficient.

    TL;DR: Much ado about nothing.
  • Samus - Monday, January 26, 2015 - link

    And for those games, you'll need a higher end card. The realistic difference between 3.5GB and 4GB VRAM for texture cache means very little, even at 4K, where even 4GB is the ceiling NOW. Let's face it, with consoles having 8GB and high end cards having 6GB, 4GB cards just won't cut it in a few years let alone 3.5GB cards.
  • Mvoigt - Monday, January 26, 2015 - link

    You fail to understand that the consoles have a total of 8gb ram... not all dedicated to graphics... the OS uses some, the game uses some, and the graphics use a portion of that... Then i could say consoles fail, since my graphics card has 4GB ram and my machine has 32GB ram, I have a combined 36gb ram avaliable vs 8 gb on the consoles....
  • Kevin G - Monday, January 26, 2015 - link

    The thing is that the GPU and CPU have independent memory pools. If a game only uses 1 GB of that 32 GB main memory, you have 31 GB going to waste. Attempting to utilize that extra memory for a texture cache tends to make games crawl due to the latency bottleneck.

    On a console, what isn't used up by the host OS (512 MB last I checked), and core game logic can all go toward game assets. That can easily mean more than 6 GB of textures and models. With PC games typically running at higher resolution and using even higher resolution assets, it could easily equate to a demand for 6 GB and 8 GB graphics cards this year or next.
  • hermeslyre@hotmail.com - Monday, January 26, 2015 - link

    Last I checked both consoles reserve around 3.5GB for OS, With the PS4 having 512MB of that reserved pool as flexible. Which leaves not a megabyte more than 5GB available to developers to do their thing. On the PS4 at least.
  • McC54u - Monday, January 26, 2015 - link

    You guys act like consoles are running on Titan Z's or something. They are running on radeon 7850 at best. They will never have a real graphics intense 4K game on a console they can't even do 1080p on most their launch titles. Even with all that ram. Unless they get on board with some serious streaming tech for new titles we have seen almost the peak of what these consoles can do.
  • Galidou - Monday, January 26, 2015 - link

    What you don't realize is that the 8gb is usable for textures. Games tend to look very good on consoles even if they use an underpowered GPU. Take for example the modified GeForce GT 7800 in the PS3, how far did it go? I think it did ALOT better than the GT 7800 on the PC side.

    Fallout 3 was designed as a console port for the PC. Computer graphics cards had more VRAM than console and people developed textures pack for the game making it a lot more beautiful but using much more VRAM which was impossible for PS3 and XBOX 360.

    The same will happen with console games of this gen, at some point, they won't have the vertex/shader heavy capability of PC cards but the memory for really beautiful textures at 1080p. Take those console port on pc with very beautiful textures but play them in 1440p or 4k... there you go, VRAM utilisation way beyond what has been seen in the past.
  • anandreader106 - Monday, January 26, 2015 - link

    Did you read the post by Mvoigt? You DO NOT have 8GB for textures! The OS eats up over a GB of RAM and the CPU will use RAM while processing whatever is going on in the game world. Maybe, you can get to 5GB of RAM for graphics,....maybe. But that means the world your traversing is pretty static and boring.
  • Galidou - Monday, January 26, 2015 - link

    Yep 4-5gb of ram for graphics on console is possible, but then you didn't read what I've said?

    I was focusing on the fact that the same game that can use up to 4-5gb of textures in 1080p, port it to pc users that will play it in 1440p and 4k give them some graphical options not available on console and there you go, games that can make use of 6gb of VRAM easily

Log in

Don't have an account? Sign up now