GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation

Name: GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation
Item: GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation
Author: Ryan Smith

by Ryan Smith on January 26, 2015 1:00 PM EST

Posted in
GPUs
GeForce
NVIDIA
Maxwell

398 Comments | Add A Comment

398 Comments

Practical Performance Possibilities

Last but not least, we would like to explore the potential performance repercussions of the GTX 970’s unusual configuration.

Starting with the ROPs, while NVIDIA’s original incorrect specification is unfortunate, from a practical perspective it’s really just annoying. As originally (and correctly) pointed out by The Tech Report and Hardware.fr, when it comes to fillrates the GTX 970 is already bottlenecked elsewhere. With a peak pixel rate of 4 pixels per clock per SMM, the GTX 970’s 13 SMMs inherently limit the card to 52px/clock, versus the 56px/clock rate for the card’s 56 ROPs. This is distinct from the GTX 980, where every stage of the GPU can pump out 64px/clock, and the ROPs can consume it just as well. In the case of the GTX 970 those extra ROPs still play a role in other tasks such as MSAA and other ROP activities that don’t require consuming additional SMM output – not to mention a fully disabled ROP/MC partition would shift the bottleneck to the ROPs with only 48 ROPs vs. 13 SMMs – so the 56 ROPs are still useful to have, but for basic pixel operations the GTX 970 has been bound by its SMM count from the start.

As for the memory segmentation, there are 3 basic scenarios to consider, only one of which has the potential to impact the GTX 970 in particular. In all cases with less than 3.5GB of memory allocated the GTX 970 behaves just as if it had a single segment, with no corner cases to be concerned about. Meanwhile in cases with more than 4GB of memory allocation the GTX 970 will still spill over to PCIe, just as the GTX 980 does, typically crushing performance in both cases. This leaves the last case as the only real concern, which is memory allocations between 3.5GB and 4GB.

GeForce GTX 970 Theoretical Memory Bandwidth
Segment	Memory
Fast Segment (3.5GB)	192GB/sec
Slow Segment (512MB)	28GB/sec
PCIe System Memory	16GB/sec

In the case of memory allocations between 3.5GB and 4GB, what happens is unfortunately less-than-deterministic. The use of heuristics to determine which resources to allocate to which memory segment, though the correct solution in this case, means that the real world performance impact is going to vary on a game-by-game basis. If NVIDIA’s heuristics and driver team do their job correctly, then the performance impact versus a theoretical single-segment 4GB card should only be a few percent. Even in cases where the entire 4GB space is filled with in-use resources, picking resources that don’t need to be accessed frequently can sufficiently hide the lack of bandwidth from the 512MB segment. This is after all just a permutation on basic caching principles.

The worst case scenario on the other hand would be to have the NVIDIA heuristics fail, or alternatively ending up with a workload where no great solution exists, and over 3.5GB of resources must be repeatedly and heavily accessed. In this case there is certainly the potential for performance to crumple, especially if accessing resources in the slow segment is a blocking action. And in this case the GTX 970 would still perform better than a true 3.5GB card since the slow segment is still much faster than system memory, but it’s nonetheless significantly slower than the 3.5GB segment as well.

But perhaps the most frustrating scenario isn’t having more than 3.5GB of necessary resources, but having more than 3.5GB of unnecessary resources due to caching by the application. One VRAM utilization strategy for games is to allocate as much VRAM as they can get their hands on and then hold onto it for internal resource caching, increased view distances, or other less immediate needs. The Frostbite engine behind the Battlefield series (and an increasing number of other EA games) is one such example, as it will opportunistically allocate additional VRAM for the purpose of increasing draw distances. For something like a game this actually makes a lot of sense at the application level – games are generally monolithic applications that are the sole program being interacted with at the time – but it makes VRAM allocation tracking all the trickier as it obfuscates what a game truly needs versus what it merely wants to hold onto for itself. In this case tracking resources by usage is still one option, though like the overall theme of real world performance implications, it’s going to be strongly dependent on the individual application.

In any case, the one bit of good news here is that for gaming running out of VRAM is generally rather obvious. Running out of VRAM, be it under normal circumstances or going over the GTX 970’s 3.5GB segment, results in some very obvious stuttering and very poor minimum framerates. So if it does happen then it will be easy to spot. Running out of (fast) VRAM isn’t something that can easily be hidden if the VRAM is truly needed.

To that end in the short amount of time we’ve had to work on this article we have also been working on cooking up potential corner cases for the GTX 970 and have so far come up empty, though we’re by no means done. Coming up with real (non-synthetic) gaming workloads that can utilize between 3.5GB and 4GB of VRAM while not running into a rendering performance wall is already a challenge, and all the more so when trying to find such workloads that actually demonstrate performance problems. This at first glance does seem to validate NVIDIA’s overall claims that performance is not significantly impacted by the memory segmentation, but we’re going to continue looking to see if that holds up. In the meantime NVIDIA seems very eager to find such corner cases as well, and if there are any they’d like to be able to identify what’s going on and tweak their heuristics to resolve them.

Ultimately we find ourselves going a full circle back to something NVIDIA initially said about the matter, which is that the performance impact of the GTX 970’s configuration is already baked into the results we have. After all, the configuration is not a bug or other form of unexpected behavior, and NVIDIA has been fully abstracting and handling the memory segments since the GTX 970’s initial launch. So while today’s revelation gives us a better understanding of how GTX 970 operates and what the benefits and drawbacks are, that information alone doesn’t change how the card behaves.

Closing Thoughts

Bringing things to a close, I must admit I was a bit taken aback when NVIDIA first told us that they needed to correct the specifications for the GTX 970. We’ve had NVIDIA decline to disclose sensitive information before only to reveal it later, but they’ve never had to do something quite like this before. In retrospect these new specifications make more sense given the performance and device specs we’re seeing, but it certainly is going to leave egg on NVIDIA’s face as this never should have happened in the first place.

As for the GTX 970’s underlying memory configuration and memory allocation techniques, this is going to be a more difficult matter to bring closure to. Without question the GTX 970’s unusual memory configuration introduces a layer of complexity that isn’t there with the GTX 980, and as a result it’s extremely difficult to quantify better and worse in this case. It’s worse than the GTX 980 – and it is a lower tier card after all – but how much worse is no longer an easy answer to provide.

At its heart the GTX 970’s configuration is a compromise between GPU yields, card prices, and memory capacity. The easiest argument to make in that regard is that it should have shipped with a full 64 ROP configuration and skipped all of these complexities entirely. But on the whole and looking at the options for configurations without this additional complexity, a 3GB/48 ROP GTX 970 would have been underspeced, and with so much of the GTX 970’s success story being NVIDIA’s ability to launch the card at $329 I’m not sure if the other option is much better. At least on paper this looks like the best compromise NVIDIA could make.

In the end while I am disappointed that these details haven’t come out until now, I am satisfied that we now finally have enough information in hand to truly understand what’s going on with the GTX 970 and what its strengths and weaknesses are as a result of memory segmentation. Meanwhile for real world performance, right now this is an ongoing test with the GTX 970. As the highest-profile card to use memory segmentation it’s the first time NVIDIA has been under the microscope like this, but it’s far from the first time they’ve used this technology. But so far with this new information we have been unable to break the GTX 970, which means NVIDIA is likely on the right track and the GTX 970 should still be considered as great a card now as it was at launch. In which case what has ultimately changed today is not the GTX 970, but rather our perception of it.

Segmented Memory Allocation in Software

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

398 Comments

View All Comments

Kutark - Tuesday, January 27, 2015 - link
b/c consumers are dumb and if the 970 had 1mb less ram than 4gb it would of decreased sales. There is a reason they moved away from stuff like having 1.2gb or 1.5gb, etc etc. People like big solid numbers.
HisDivineOrder - Tuesday, January 27, 2015 - link
You give lawyers so much credit. Often, lawyers like being one to start such things and don't care much if they manage to finish it.
maximumGPU - Wednesday, January 28, 2015 - link
I'd say both Jarred. Sure, i look at performance first, but performance metrics tell me how good the card is NOW. The next thing i do is look at the specs and try and estimate how future proof my purchase will be.
A 3GB 970 would show great metrics at 1080p, but i wouldn't buy it because i know ram is ever more important thanks to the consoles catching up.
Since i game at 1440p, 4GB was my minimum ram threshold. i thought i got that with the 970, but instead got 3.5 + 0.5GB of slow ram. That makes my card less future proof than i thought and could've well affected my buying decision, regardless of its current performance metrics.
Ranger101 - Tuesday, January 27, 2015 - link
No surprises as to Nvidia's behaviour, as a company they are of course a rapacious juggernaut, but Tut Tut Anandtech, what would the great founder have to say?

Having read many recent articles in the GPU section, I am mostly impressed by the high quality of writing, however those who read between the lines of Mr Smith's Gpu reviews, realise that appearances of impartiality in his writing are misleading and that he is in fact a staunch and unrelenting supporter of camp green. ( Everyone is of course biased, it's just less appropriate to let it shine through in technical website reviews.)

It should therefore come as no surpise that in his initial review these issues "escaped" his attention, despite the fact that "a limited number of flags were raised" and that in the follow up article, he unashamedly wields the Bastard sword of Nvidia. LOL.

You must remember these things happen for a reason Ryan and I once again encourage you to temper your bias in forthcoming utterances...AMD still make good cards and a little competition is good....right? :)
just4U - Tuesday, January 27, 2015 - link
I think the fact that Ryan gets accused of being in favor of AMD and Nvidia means that's he's doing a pretty good job of not really being in either camp. If anything I'd simply suggest his expectations on performance are limited and when the cards actually do better.. he tends to point that out. Not really a bad thing considering how underwhelming hardware leaps are these days in most segments. Smaller jumps not the leaps and bounds we were all once used to.
OrphanageExplosion - Tuesday, January 27, 2015 - link
Oh do behave. When Anandtech had the AMD News Center sponsorship all we ever heard from the commentariat was that the site, and Ryan specifically, were AMD biased. I think we all know where the bias is on Anandtech - and it's in the comments, not the editorial.
HisDivineOrder - Tuesday, January 27, 2015 - link
You're talking about the same guy that just took it on AMD's word that Mantle was going to be "virtually identical" to the same low level access API as the Xbox One and that subsequently Mantle was AMD bringing the Xbox One's low level access language to PC gaming.

Seriously. If the guy is biased toward anything, he's biased toward believing more of AMD's statements than he really ought to, but I've had a hard time really blaming him since AMD had JUST paid for him (and his buddy journalists) to go to Hawaii on a beach trip and vacation under the excuse that it was to present the GPU part called "Hawaii." I mean, if I was tatken to Hawaii, I'd probably be willing to believe anything they told me, too.

Still, don't mistake the man for an nVidia fanboy. He's clearly not. Lots of other people questioned that AMD party line far more than Anandtech did back in the day and it took a long time before they acknowledged that AMD had hoodwinked them and they never REALLY admitted it wholeheartedly.

Because AMD suggesting that Mantle was anything but a completely proprietary and locked-in API was a lie and no hardware company has yet to sign up in spite of the fact Intel tried very hard to research the subject and was rebuffed by AMD for months.

Intel likes to do anything they can do for free and they read all the press (like Ryan's) that suggested Mantle was going to be free and freely available, but as it turned out, that was more hyperbole on the part of AMD.

Yet I saw nothing of that on Anandtech. No, I don't think there's much evidence of his being "a staunch and unrelenting supporter of camp green" unless you're recalling the heady days of AMD's time as a "green" company.
Gothmoth - Tuesday, January 27, 2015 - link
as if you read or even UNDERSTAND what ROP´s mean before you buy a card.....

you and all the others are just trolls who have to much time on their hands....
Kutark - Tuesday, January 27, 2015 - link
I honestly don't understand why people are so up in arms over this. At the end of the day the performance figures still stand. The situations in which this news could actually arise and cause any problems are so limited its not even funny. At the resolutions and settings most games operate at don't use anywhere close to 4gb of vram.

Honestly if i didn't have SLI'd 760's i'd go out and buy a 970 tonight, regardless of any of this information.

That being said, this is another article that proves why anandtech is easily the best tech website out there. Thorough and honest, unbiased, just, amazing, love it. Sorry for all the commas.
Kutark - Tuesday, January 27, 2015 - link
Meant to say gamers, not games. Regardless.

GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation

Practical Performance Possibilities

Closing Thoughts

Post Your Comment

398 Comments

View All Comments

Kutark - Tuesday, January 27, 2015 - link

HisDivineOrder - Tuesday, January 27, 2015 - link

maximumGPU - Wednesday, January 28, 2015 - link

Ranger101 - Tuesday, January 27, 2015 - link

just4U - Tuesday, January 27, 2015 - link

OrphanageExplosion - Tuesday, January 27, 2015 - link

HisDivineOrder - Tuesday, January 27, 2015 - link

Gothmoth - Tuesday, January 27, 2015 - link

Kutark - Tuesday, January 27, 2015 - link

Kutark - Tuesday, January 27, 2015 - link

Log in

Don't have an account? Sign up now