GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation

Name: GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation
Item: GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation
Author: Ryan Smith

by Ryan Smith on January 26, 2015 1:00 PM EST

Posted in
GPUs
GeForce
NVIDIA
Maxwell

398 Comments | Add A Comment

398 Comments

Practical Performance Possibilities

Last but not least, we would like to explore the potential performance repercussions of the GTX 970’s unusual configuration.

Starting with the ROPs, while NVIDIA’s original incorrect specification is unfortunate, from a practical perspective it’s really just annoying. As originally (and correctly) pointed out by The Tech Report and Hardware.fr, when it comes to fillrates the GTX 970 is already bottlenecked elsewhere. With a peak pixel rate of 4 pixels per clock per SMM, the GTX 970’s 13 SMMs inherently limit the card to 52px/clock, versus the 56px/clock rate for the card’s 56 ROPs. This is distinct from the GTX 980, where every stage of the GPU can pump out 64px/clock, and the ROPs can consume it just as well. In the case of the GTX 970 those extra ROPs still play a role in other tasks such as MSAA and other ROP activities that don’t require consuming additional SMM output – not to mention a fully disabled ROP/MC partition would shift the bottleneck to the ROPs with only 48 ROPs vs. 13 SMMs – so the 56 ROPs are still useful to have, but for basic pixel operations the GTX 970 has been bound by its SMM count from the start.

As for the memory segmentation, there are 3 basic scenarios to consider, only one of which has the potential to impact the GTX 970 in particular. In all cases with less than 3.5GB of memory allocated the GTX 970 behaves just as if it had a single segment, with no corner cases to be concerned about. Meanwhile in cases with more than 4GB of memory allocation the GTX 970 will still spill over to PCIe, just as the GTX 980 does, typically crushing performance in both cases. This leaves the last case as the only real concern, which is memory allocations between 3.5GB and 4GB.

GeForce GTX 970 Theoretical Memory Bandwidth
Segment	Memory
Fast Segment (3.5GB)	192GB/sec
Slow Segment (512MB)	28GB/sec
PCIe System Memory	16GB/sec

In the case of memory allocations between 3.5GB and 4GB, what happens is unfortunately less-than-deterministic. The use of heuristics to determine which resources to allocate to which memory segment, though the correct solution in this case, means that the real world performance impact is going to vary on a game-by-game basis. If NVIDIA’s heuristics and driver team do their job correctly, then the performance impact versus a theoretical single-segment 4GB card should only be a few percent. Even in cases where the entire 4GB space is filled with in-use resources, picking resources that don’t need to be accessed frequently can sufficiently hide the lack of bandwidth from the 512MB segment. This is after all just a permutation on basic caching principles.

The worst case scenario on the other hand would be to have the NVIDIA heuristics fail, or alternatively ending up with a workload where no great solution exists, and over 3.5GB of resources must be repeatedly and heavily accessed. In this case there is certainly the potential for performance to crumple, especially if accessing resources in the slow segment is a blocking action. And in this case the GTX 970 would still perform better than a true 3.5GB card since the slow segment is still much faster than system memory, but it’s nonetheless significantly slower than the 3.5GB segment as well.

But perhaps the most frustrating scenario isn’t having more than 3.5GB of necessary resources, but having more than 3.5GB of unnecessary resources due to caching by the application. One VRAM utilization strategy for games is to allocate as much VRAM as they can get their hands on and then hold onto it for internal resource caching, increased view distances, or other less immediate needs. The Frostbite engine behind the Battlefield series (and an increasing number of other EA games) is one such example, as it will opportunistically allocate additional VRAM for the purpose of increasing draw distances. For something like a game this actually makes a lot of sense at the application level – games are generally monolithic applications that are the sole program being interacted with at the time – but it makes VRAM allocation tracking all the trickier as it obfuscates what a game truly needs versus what it merely wants to hold onto for itself. In this case tracking resources by usage is still one option, though like the overall theme of real world performance implications, it’s going to be strongly dependent on the individual application.

In any case, the one bit of good news here is that for gaming running out of VRAM is generally rather obvious. Running out of VRAM, be it under normal circumstances or going over the GTX 970’s 3.5GB segment, results in some very obvious stuttering and very poor minimum framerates. So if it does happen then it will be easy to spot. Running out of (fast) VRAM isn’t something that can easily be hidden if the VRAM is truly needed.

To that end in the short amount of time we’ve had to work on this article we have also been working on cooking up potential corner cases for the GTX 970 and have so far come up empty, though we’re by no means done. Coming up with real (non-synthetic) gaming workloads that can utilize between 3.5GB and 4GB of VRAM while not running into a rendering performance wall is already a challenge, and all the more so when trying to find such workloads that actually demonstrate performance problems. This at first glance does seem to validate NVIDIA’s overall claims that performance is not significantly impacted by the memory segmentation, but we’re going to continue looking to see if that holds up. In the meantime NVIDIA seems very eager to find such corner cases as well, and if there are any they’d like to be able to identify what’s going on and tweak their heuristics to resolve them.

Ultimately we find ourselves going a full circle back to something NVIDIA initially said about the matter, which is that the performance impact of the GTX 970’s configuration is already baked into the results we have. After all, the configuration is not a bug or other form of unexpected behavior, and NVIDIA has been fully abstracting and handling the memory segments since the GTX 970’s initial launch. So while today’s revelation gives us a better understanding of how GTX 970 operates and what the benefits and drawbacks are, that information alone doesn’t change how the card behaves.

Closing Thoughts

Bringing things to a close, I must admit I was a bit taken aback when NVIDIA first told us that they needed to correct the specifications for the GTX 970. We’ve had NVIDIA decline to disclose sensitive information before only to reveal it later, but they’ve never had to do something quite like this before. In retrospect these new specifications make more sense given the performance and device specs we’re seeing, but it certainly is going to leave egg on NVIDIA’s face as this never should have happened in the first place.

As for the GTX 970’s underlying memory configuration and memory allocation techniques, this is going to be a more difficult matter to bring closure to. Without question the GTX 970’s unusual memory configuration introduces a layer of complexity that isn’t there with the GTX 980, and as a result it’s extremely difficult to quantify better and worse in this case. It’s worse than the GTX 980 – and it is a lower tier card after all – but how much worse is no longer an easy answer to provide.

At its heart the GTX 970’s configuration is a compromise between GPU yields, card prices, and memory capacity. The easiest argument to make in that regard is that it should have shipped with a full 64 ROP configuration and skipped all of these complexities entirely. But on the whole and looking at the options for configurations without this additional complexity, a 3GB/48 ROP GTX 970 would have been underspeced, and with so much of the GTX 970’s success story being NVIDIA’s ability to launch the card at $329 I’m not sure if the other option is much better. At least on paper this looks like the best compromise NVIDIA could make.

In the end while I am disappointed that these details haven’t come out until now, I am satisfied that we now finally have enough information in hand to truly understand what’s going on with the GTX 970 and what its strengths and weaknesses are as a result of memory segmentation. Meanwhile for real world performance, right now this is an ongoing test with the GTX 970. As the highest-profile card to use memory segmentation it’s the first time NVIDIA has been under the microscope like this, but it’s far from the first time they’ve used this technology. But so far with this new information we have been unable to break the GTX 970, which means NVIDIA is likely on the right track and the GTX 970 should still be considered as great a card now as it was at launch. In which case what has ultimately changed today is not the GTX 970, but rather our perception of it.

Segmented Memory Allocation in Software

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

398 Comments

View All Comments

Mondozai - Monday, January 26, 2015 - link
When a company intentionally lies to its consumers, that isn't a storm in a teacup. Ryan may believe them but I don't. I agree with him that it's incredibly stupid to do this kind of stuff, but the notion that they didn't know, even after all the manuals were passed around the company? Knowing the number of ROPs is basic stuff for technical marketing.

And okay if this got missed a single round. But in successive rounds, over a period of almost half a year? C'mon. Nvidia knows that it wouldn't sell as well if they marketed it as "3.5 VRAM" and they tried to cover this shit up.

I'm guessing Jonah Alben didn't have anything to do with this, and I'm guessing he's pissed as fuck. The big quesiton is if Jen-Hsun knew or not. Their marketing team are not exactly people I'd trust(watch Tom Peterson in any stream and you'll know what I mean).

Throwing the marketing guys under the bus is poetic justice. But also an easy move. Again, did the CEO know?
mapesdhs - Monday, January 26, 2015 - link

"intentionally lies".. yeah right! So you're saying this is not acceptable, and yet it's ok for AMD
(and indeed NVIDIA) to market dual-GPU cards by advertising the sum of the VRAM on both
GPUs, even though an application can only see & access the individual amount? Look at
*any* seller site spec list for an AMD 295x2, they all say 8GB (ditto the specs page on
AMD's site), while Anandtech's own review shows quite clearly that it's just 2x4GB, so the
real amount accessible by an application is 4GB, not 8GB. Surely this is far more of a
deception than the mistake NVIDIA states they have made with the 970 specs.

So I call out hypocrasy; your comment is just NVIDIA-bashing when there have been far
more blatant deceptions in the past, from both sides. NVIDIA does the double-up VRAM
nonsense aswell, eg. the sale ads for the Titan Z all state 12GB, as do the specs on the
NVIDIA web site, but again it's just 6GB per GPU, so 6GB max visible to an application.
Look back in time, you'll see the same mush published for cards like the GTX 295 and
equivalent ATIs from back then.

So quit moaning about what is merely a mistake which doesn't change the conclusions
based on the initial 970 review performance results, and instead highlight the more blatant
marketing fibs, especially on dual-GPU cards. Or of course feel free to cite in *any* dual-
GPU review where you complained about the VRAM diddle.

Sorry if I sound peeved, but your comment started by claiming something is true when
it's just your opinion, based on what you'd like to believe is true.

Ian.
alacard - Monday, January 26, 2015 - link
"So you're saying this is not acceptable, and yet it's ok for AMD
(and indeed NVIDIA) to market dual-GPU cards by advertising the sum of the VRAM on both
GPUs, even though an application can only see & access the individual amount?"

That's what's known as a straw-man, he never mentioned anything about dual GPUs. His point about ROPs is perfectly valid--and no Ian it's not ok to lie about that, nor about the amount of cache.

"Sorry if I sound peeved, but your comment started by claiming something is true when
it's just your opinion, based on what you'd like to believe is true."

Why would you give Nvidia the benefit of the doubt here? If you really and truly believed no one brought this up before release or noticed it afterwards than you're a bigger fool than i could have ever guessed you are.

Sorry if I sound peeved, but your comment started is claiming something is true when
it's just your opinion, based on what you'd like to believe is true.
dragonsqrrl - Monday, January 26, 2015 - link
"Why would you give Nvidia the benefit of the doubt here?"

Why would Nvidia want to deceive the whole PC gaming world over something so minor? As Ryan stated in the article that would be genuinely stupid. Can you think of a reason why Nvidia would intentionally seed a slightly inaccurate spec sheet to the press? What would they gain from that? I don't think there's any reason to believe the initial spec sheet was anything other than a mistake by Nvidia, and neither does any credible tech journalist I know of.

That being said I also highly doubt they weren't aware of the mistake until now. While I think their response to this incident has been good so far, I really think they should've come out with this information sooner (like last week when this started to really heat up). But I think that time was probably spent confirming what had happened and how to present it to the press.
alacard - Monday, January 26, 2015 - link
" Can you think of a reason why Nvidia would intentionally seed a slightly inaccurate spec sheet to the press?"

Is this a real question or some sort of a joke? You're asking why a company would knowingly inflate a spec sheet for a product they want to sell, and doing so with a straight face? Is that PT Barnum's johnson i see swinging from your asshole?
Galidou - Tuesday, January 27, 2015 - link
People buy performance, don't say a thing about memory bandwidth rops and such install it in your computer. You paid it less than some video cards it outperforms, don't care about stats, you're on the good way.

Companies lie to us about advertising any sort of things on tv and so on. I've seen many LCD monitors advertising X nits and not delivering totally the amount and no one ever sues them. If the monitor is still averages better or the same image quality than the best monitors in it's price class who cares about the advertisement.

Not saying that lying to improve sales number is right, but SO MANY companies do that. Unless it turns out to be a really bad product for the price you paid, then sue them. But don't whine when there's a SLIGHT difference but still outperforms everything in it's price class, uses less power, has good drivers and so on.

The only reason Nvidia would have to do this intentionally would be to back up a medium video card performance, a kind of semi failure, which the GTX 970 SURELY isn't. Why would a company need to boost sales while they know it's gonna be sold out for the next month because of it's price/performance ratio?
FlushedBubblyJock - Friday, January 30, 2015 - link
Oh, so that's why AMD lied about the number of transistors in the Bullldozer core, claiming it was 2 billion, then months later correcting their lie to journalists and revising it downward quite a large leap to 1.2 billion, a full 40% drop.
Yes, lying about a cruddy product that never met expectations by pumping up that core transistor count to give the impression of latent power just not yet utilized, by say, optimizations required for the Windows OS to use all the "8"/(4) cores better with improved threading...

Hahahhaaa no it's not a joke...

http://www.anandtech.com/show/5176/amd-revises-bul...
dragonsqrrl - Tuesday, January 27, 2015 - link
Wow, disproportionately aggressive response to appropriate and logical questions. I can't tell if you're trying to intentionally mislead others or if you really have no clue what you're talking about. Yes, I'm asking why Nvidia would conspire to intentionally lie about something so minor in the initial spec sheet that would almost certainly be discovered soon after launch? I even tried to help you out a little: What would they gain from that?

It just takes a simple risk assessment and a little bit of logic to pretty much rule this out as an intentional deception.
Galidou - Tuesday, January 27, 2015 - link
Nvidia's way of thinking by the mad community: ''With the performance to cost ratio of that card when it's gonna be launched, it will be sold out for weeks to come even if we give the true spec sheets! Let's speak to marketing department and modifiy that so it can be SOLD OUT TIMES 2!! YEAH, now you speak, let's make the community so mad they have to wait for it! YEAH, we want the community to HATE US!''
alacard - Tuesday, January 27, 2015 - link
Galadou, Dragonsqrrl: Can you explain how a 970 with one of the dram banks partitioned for low priority data is supposed to operate at 256 bits? Given that the last 512 chunk is only being accessed as a last resort, and only after all the other RAM is occupied, the memory subsystem could only be operating at 224 bits for the majority of cases.

I could be wrong but i just don't see it. Given that, we're not merely talking about diminished ROP and cache count, but also a shallower memory interface which NVIDIA marketed specifically as being exactly the same as the 980. Here is a direct quote from their reviewer's guide:

"Equipped with 13 SMX units and 1664 CUDA Cores the GeForce GTX 970 also has the rending horsepower to tackle next generation gaming. And with its 256-bit memory interface, 4GB frame buffer, and 7Gbps memory the GTX 970 ships with the SAME MEMORY SUBSYSTEM as our flagship GEFORCE GTX 980"

If it really is only operating at 224 bits, THIS IS A BIG DEAL. Even if it were an honest mistake, it's still a big deal. Giving them the benefit of the doubt and assuming their initial materials were wrong, the idea they didn't notice it after release... come on.

BTW that PT Barnum comment was just a joke that popped in my head at the last second and i couldn't resist adding it.

GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation

Practical Performance Possibilities

Closing Thoughts

Post Your Comment

398 Comments

View All Comments

Mondozai - Monday, January 26, 2015 - link

mapesdhs - Monday, January 26, 2015 - link

alacard - Monday, January 26, 2015 - link

dragonsqrrl - Monday, January 26, 2015 - link

alacard - Monday, January 26, 2015 - link

Galidou - Tuesday, January 27, 2015 - link

FlushedBubblyJock - Friday, January 30, 2015 - link

dragonsqrrl - Tuesday, January 27, 2015 - link

Galidou - Tuesday, January 27, 2015 - link

alacard - Tuesday, January 27, 2015 - link

Log in

Don't have an account? Sign up now