GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation

Name: GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation
Item: GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation
Author: Ryan Smith

by Ryan Smith on January 26, 2015 1:00 PM EST

Posted in
GPUs
GeForce
NVIDIA
Maxwell

398 Comments | Add A Comment

398 Comments

Over the weekend we reported on a statement released by NVIDIA regarding recent concern over unusual VRAM allocation and VRAM bandwidth performance from the GeForce GTX 970. In brief, various GTX 970 owners had observed that the GTX 970 was prone to topping out its reported VRAM allocation at 3.5GB rather than 4GB, and that meanwhile the GTX 980 was reaching 4GB allocated in similar circumstances. All of this was at odds with what we thought we knew about the cards and the underlying GM204 GPU, as the two cards were believed to have identical memory subsystems.

In their statement, NVIDIA confirmed that this unusual behavior existed, and that it was due to the GTX 970 possessing “fewer crossbar resources to the memory system” than the GTX 980. Furthermore due to this, the VRAM on the GTX 970 was segmented in a way it’s not on the GTX 980, with 3.5GB of the 4GB serving as one high priority segment, and the final 512MB serving as a low priority segment. The rest of NVIDIA’s statement then focused on performance, and why they believed this design would not have a significant performance impact versus a GTX 980-style non-segmented memory structure.

However because NVIDIA’s statement was intentionally high-level – it was meant to be accessible to the wider press and their readers, and not just the technical press such as ourselves – this meant that the statement offered few technical details on what was going on under the hood, which makes it difficult for us to validate NVIDIA’s claims and on a more human level we’re always dissatisfied with half of an answer.

As a result we’ve been digging into this issue over the weekend, working to collect the final pieces of the puzzle. To that end NVIDIA reached out to us Sunday evening to fully explain the issue at a level of detail necessary for our level of depth, making available to us a “triage team” of sorts of technical marketing, product management, and engineering/architectural personnel to answer questions and to better explain the issue. While there’s still some benchmarking left to do on our end – we’ve only had a relatively brief Sunday evening to work with NVIDIA’s full explanation – at this point we feel like we have a solid grasp of what’s going on with the GTX 970, how its memory segmenting and allocation work, and how it can impact performance.

Now make no mistake, NVIDIA right now is in full damage control mode due to the negative press this issue has garnered and the pain that’s going to follow. When NVIDIA is getting Senior VPs like Jonah Alben on the phone with us on a weekend night to talk architecture and answer questions, this isn’t normal operating procedure for the company. But at the same time it’s a positive sign for how serious NVIDIA is taking our concerns, and meanwhile an NVIDIA under pressure is an NVIDIA that is more likely to answer our deepest technical questions, giving us more insight than ever before into GM204.

GeForce GTX 970: Correcting the Specifications

To kick things off, we need to start at the very beginning and talk about the specifications of the GTX 970 itself. Everything that follows is derived from how the chip is configured, so we cannot go any farther until we understand the true configuration of the GTX 970.

NVIDIA GPU Specification Comparison
	GTX 980	GTX 970 (Original)	GTX 970 (Corrected)
CUDA Cores	2048	1664	1664
Texture Units	128	104	104
ROPs	64	64	56
Core Clock	1126MHz	1050MHz	1050MHz
Boost Clock	1216MHz	1178MHz	1178MHz
Memory Clock	7GHz GDDR5	7GHz GDDR5	7GHz GDDR5
Memory Bus Width	256-bit	256-bit	256-bit
VRAM	4GB	4GB	4GB
L2 Cache	2MB	2MB	1.75MB
FP64	1/32 FP32	1/32 FP32	1/32 FP32
TDP	165W	145W	145W
GPU	GM204	GM204	GM204
Transistor Count	5.2B	5.2B	5.2B
Manufacturing Process	TSMC 28nm	TSMC 28nm	TSMC 28nm
Launch Date	09/18/14	09/18/14	09/18/14
Launch Price	$549	$329	$329

When the GTX 980 and GTX 970 were released, NVIDIA provided the above original specifications for the two cards. The launch GTX 900 GPUs would be a standard full/die-harvested card pair, with the GTX 980 using a fully enabled GM204 GPU, while the GTX 970 would be using a die-harvested GPU where one or more SMMs had failed. As a result of this the big differences between the GTX 980 and GTX 970 would be a minor clockspeed difference, the disabling of 3 (of 16) SMMs, and a resulting reduction in power consumption. Most important for the conversation at hand, we were told that both possessed identical memory subsystems: 4GB of 7GHz GDDR5 on a 256-bit bus, split amongst 4 ROP/memory controller partitions. All 4 partitions would be fully active on the GTX 970, with 2MB of L2 cache and 64 ROPs available.

This, as it turns out, was incorrect.

As part of our discussion with NVIDIA, they laid out the fact that the original published specifications for the GTX 970 were wrong, and as a result the “unusual” behavior that users had been seeing from the GTX 970 was in fact expected behavior for a card configured as the GTX 970 was. To get straight to the point then, NVIDIA’s original publication of the ROP/memory controller subsystem was wrong; GTX 970 has a 256-bit memory bus, but 1 of the 4 ROP/memory controller partitions was partially disabled, not fully enabled like we were originally told. As a result GTX 970 only has 56 of 64 ROPs and 1.75MB of 2MB of L2 cache enabled. The memory controllers themselves remain unchanged, with all four controllers active and driving 4GB of VRAM over a combined 256-bit memory bus.

This revelation significantly alters how we perceive the performance of the GTX 970 in certain situations, and is the missing piece of the puzzle in understanding the memory allocation issues that originally brought all of this to light. The ability to “partially disable” a ROP/memory controller partition is new to Maxwell, and we’ll fully explore how that works in a moment, but the important part to take away is that the ROP/MC layout on the GTX 970 is not fully enabled like the GTX 980, and as a result will not behave identically to the GTX 980. All of the behavior from the GTX 970 we’ve seen in light of this correction now makes sense, and it is immediately clear that this is not a hardware or software bug in GTX 970, but rather the planned/intentional behavior of the product.

The biggest and most painful question about all of this then is how did this happen? How did we get fed incorrect specifications? NVIDIA’s explanation, in a nutshell, is that this was completely accidental and that all of this stems from assumptions made by NVIDIA’s technical marketing team.

GTX 970 Reviewer's Guide: Specifications

In short, while NVIDIA does publish the specifications of their cards on the GeForce website and in other places (e.g. partner product pages), those are less-detailed high level specifications suitable for a wider audience, not unlike NVIDIA’s initial statement. The deeper technical specifications we get, such as the number of ROPs, transistor count, and L2 cache size, comes from NVIDIA’s reviewer’s guide, which outlines for us the specifications of the card, the major new features that NVIDIA is focusing on, and some examples of benchmark numbers they’re seeing. This document is compiled by NVIDIA’s technical marketing team (a group no less than extensively knowledgeable about GPUs) based on the information provided to them by the product engineers and the product management team.

The error, as NVIDIA explains it, is that in creating the GTX 970 reviewer’s guide, the technical marketing team was unaware of Maxwell’s aforementioned and new “partial disable” capabilities when they filled out the GTX 970 specification table. They were aware that the GTX 970 would have the full 256-bit memory bus, and unaware of the ability to independently disable ROPs they assumed that all 64 ROPs and the full 2MB of L2 cache was similarly available and wrote the specification table accordingly. This error then made it into the final copy of the guide, not getting caught even after being shared around various groups at NVIDIA, with that information finally diffused by press such as ourselves.

Somewhat murkier is the intervening 4 months between the launch of the GTX 970 and the discovery of this error. As NVIDIA tell us, since the error wasn’t caught in the initial check of the reviewer’s guide there weren’t any further checks being made, especially since this information wasn’t also distributed via a public source (such as the GeForce website). As far as technical marketing knew the product had 64 ROPs, and as far as product management and engineering knew the technical marketing team had done their due diligence. As a result this issue was not caught and identified at NVIDIA until earlier this month when NVIDIA technical marketing started investigating the reports and requests from the press about the memory allocation oddities. Only after running similar tests and running into the same odd results as the press and users did NVIDIA properly identify the issue.

GTX 970 Memory Allocation (Image Courtesy error-id10t of Overclock.net Forums)

Now as NVIDIA is in full damage control mode at this point, consideration must be given as to whether NVIDIA’s story is at all true; NVIDIA would hardly be the first company to lie when painted into a corner by controversy. With that in mind, given the story that NVIDIA has provided, do we believe them? In short, yes we do.

To be blunt, if this was intentional then this would be an incredibly stupid plan, and NVIDIA as a company has not shown themselves to be that dumb. NVIDIA gains nothing by publishing an initially incorrect ROP count for the GTX 970, and if this information had been properly presented in the first place it would have been a footnote in an article extoling the virtues of the GTX 970, rather than the centerpiece of a full-on front page exposé. Furthermore if not by this memory allocation issues then other factors would have ultimately brought these incorrect specifications to light, so NVIDIA would have never been able to keep it under wraps for long if it was part of an intentional deception. Ultimately only NVIDIA can know the complete truth, but given what we’ve been presented we have no reason to doubt NVIDIA’s story.

In any case, going forward NVIDIA has made it clear that they intend to make sure this doesn’t happen again, to make sure that the specifications released to the press and public are better checked to ensure nothing is inaccurate or being assumed. NVIDIA rarely makes the same mistake twice, and certainly they will be under a great deal more scrutiny from here on to make sure their published specifications are correct.

Meanwhile as press we play a role in this as well, as in retrospect we should have seen this sooner. Nebulous metrics such as VRAM allocation aside, NVIDIA’s own tools and our own experiences pointed to something being off. The pre-GM204 version of NVIDIA’s DeviceQuery CUDA application, for example, correctly reports that the GTX 970 only has 1.75MB of L2 cache and the GTX 980 has 2MB of L2 cache, even if it doesn’t fully understand the makeup of the Maxwell 2 architecture.

NVIDIA CUDA DeviceQuery: GTX 970

Furthermore our own initial incompatibility issues with the GTX 970 (and only the GTX 970) pointed to some kind of memory subsystem change that we should have looked into, even after NVIDIA corrected the problem in their drivers. Ultimately other tools such as GPU-Z and AIDA64 weren’t able to catch the difference since they too did not know about Maxwell’s ability to disable the ROPs, so only a limited number of flags were raised, but nonetheless this is something we admittedly should have been able to catch had we noticed what was wrong.

Diving Deeper: The Maxwell 2 Memory Crossbar & ROP Partitions

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

398 Comments

View All Comments

Will Robinson - Wednesday, January 28, 2015 - link
You're going to love this then...
http://gamenab.net/2015/01/26/truth-about-the-g-sy...
Oxford Guy - Thursday, January 29, 2015 - link
Fascinating link, for sure.
mudz78 - Wednesday, January 28, 2015 - link
"we have also been working on cooking up potential corner cases for the GTX 970 and have so far come up empty"

Riiight.

"As part of our discussion with NVIDIA, they laid out the fact that the original published specifications for the GTX 970 were wrong, and as a result the “unusual” behavior that users had been seeing from the GTX 970 was in fact expected behavior for a card configured as the GTX 970 was."

Nvidia has already admitted they had complaints about performance.

If you want to come up with scenarios where the 970 shits its pants you should really try harder:

http://www.overclock.net/t/1535502/gtx-970s-can-on...

http://forums.guru3d.com/showthread.php?t=396064

https://www.reddit.com/r/hardware/comments/2s333r/...

http://www.reddit.com/r/pcgaming/comments/2s2968/g...

All of those threads have been around for weeks before Nvidia's announcment.

Who cares what Nvidia's take on the situation is? It was an accident? Oh, no worries, mate!

They are a business that lied, there's consequences to that. Nobody cares that they didn't mean it.

Refunds will start rolling out in coming weeks.
Yojimbo - Wednesday, January 28, 2015 - link
Hey, can you link to the actual relevant part of those threads where someone is posting his methodology and results for creating a performance problem? The overclocker link seems to be a link to a 106-page thread whose first message is just a link to the other 3 threads you posted. The first message in the guru3d thread claims that the card can't use more than 3.5GB at all, which we now know to be completely false. It's like you're throwing us a cookbook and flour and saying "Here, there's a pie in here somewhere." If it's somewhere in there, and you have seen it before, could you please find and point to the methodology and claimed results so that people can try to repeat it rather than you just saying "you really should try harder"?
mudz78 - Wednesday, January 28, 2015 - link
I think a more fitting analogy would be, somebody is complaining they can't spell and I am handing them a dictionary. I'm telling you the information is in there, so have a read and find it.

Maybe if you bothered to read beyond the first post in each thread you would have some answers?

" The first message in the guru3d thread claims that the card can't use more than 3.5GB at all,"

No it doesn't.

"I think (maybe) is here a little problem with GTX 970. If I run some games, for example Far Cry 4, GTX 970 allocate only around 3500MB video memory, but in same game and same scene GTX 980 allocate full 4000MB video memory.
But if I change resolution to higher - 3840x2160, then all memory is allocated.
Same problem exist in many other games like Crysis 3, Watch Dogs etc..

Where is problem?? I really dont know..."
http://forums.guru3d.com/showthread.php?t=396064

"I didn't believe this at first, but I just decided to try and test it myself with texture modded Skyrim and my SLI 970s. I tried to push the 3.5 GBs barrier by downsampling it from 5120x2880 with the four following experimental conditions:

1. No MSAA applied on top
2. 2xMSAA applied on top
3. 4xMSAA applied on top
4. 8xMSAA applied on top

Since MSAA is known to be VRAM heavy, it made sense. I also kept a close eye on GPU usage and FPS with the Rivatuner overlay as well as VRAM usage. All of this was done running around Whiterun to minimize GPU usage. My results were as follows.

1. Skyrim peaked at about 3600 MBs in usage with occasional brief hitching while loading new textures in and out of VRAM. GPU usage remained well below 99% on each card.

2. Skyrim once again peaked at about 3600 MBs with the mentioned hitching, this time somewhat more frequently. Once again, GPU usage remained well below 99%.

3. Skyrim yet again peaked at about 3600 MBs and hitched much more prominently and frequently at the same time as VRAM usage droppped down 100-200 MBs. GPU usage was below 99% again with FPS still at 60 aside from those hitches.

4. Now Skyrim was using the full 4 GB framebuffer with massive stuttering and hitching from a lack of VRAM. This time, I had to stare at the ground to keep GPU usage below 99% and retain 60 FPS. I ran around Whiterun just staring at the ground and it remained at 60 FPS except with those massive hitches where GPU usage and framerate temporarily plummeted. This last run merely indicated that Skyrim can indeed use more VRAM than it was with the previous 3 settings and so the issue seems to be with the 970s themselves rather than just the game in this example. The performance degradation aside from VRAM was severe, but that could just be 8xMSAA at 5K taking its calculative toll.

So it seems to me that my 970s refuse to utilize above ~3600 MBs of VRAM unless they absolutely need it, but I've no idea why. Nvidia didn't gimp the memory bus in any overly obvious way from the full GM204 chip therefore the 970s should have no issue using the same VRAM amount as the 980s. I don't like what I see, it's like the situation with the GTX 660 that had 2 GBs but could only effectively use up 1.5 without reducing its bandwidth to a third, so it tried to avoid exceeding 1.5. The difference is that was predictable due to the GK106's 192-bit memory bus, there's nothing about the 970's explicit specifications that indicates the same situation should apply.

A similar shortcoming was noticed sometime back regarding the 970's ROPs and how the cutting-down of 3 of GM204's 16 SMM units affected the effective pixel fillrate of the 970s despite retaining the full 64 ROPs. It's possible that Maxwell is more tightly-connected to shader clusters and severing them affects a lot about how the chip behaves, but that doesn't really make sense. If this is an issue, it's almost certainly software-related. I'm not happy regardless of the reason and I'll try more games later. Anecdotally, I have noticed recent demanding games peaking at about 3500-3600 MBs and can't actually recall anything going beyond that. I didn't pay attention to it or change any conditions to test it."
http://www.overclock.net/t/1535502/gtx-970s-can-on...

"I can reproduce this issue in Hitman: Absolution.
Once more than 3.5GB get allocated, there is a huge frametime spike.
The same scene can be tested to get reproducible results.
In 4k, memory usage stays below 3.5GB and there is no extreme spike. But in 5k (4x DSR with 1440p), at the same scene, there is a huge fps drop once the game wants to allocate 2-300MB at once and burst the 3.5GB.
It happens in the tutorial mission when encountering the tennis field.

With older driver (344.11 instead of 347.09), memory usage is lower, but you can enable MSAA to get high VRAM usage and thus be able to reproduce by 100%.

Could a GTX 980 owner test this?"
http://www.overclock.net/t/1535502/gtx-970s-can-on...

"Without AA or just FXAA, I have around 3.5GB used in AC: U and mostly no stuttering. With 2xMSAA it rises to ~3.6-3.7GB and performance is still ok. But when I enable 4xMSAA and it needs ~3.8GB, I often have severe stuttering.
When I set resolution to 720p and enable 8xMSAA, VRAM usage is well below 3GB and there is no stuttering at all."
http://forums.guru3d.com/showpost.php?p=4991141&am...

"In Far Cry 4 @ 1440p
No AA: 3320MB Max Vram, locked at 60 fps
2x MSAA: 3405MB Max Vram, locked at 60fps
4x MSAA: 3500MB Max Vram, 45-60fps
8x MSAA, starts around 3700-3800MB @ 4-5fps, stabilizes at 3500MB @ 30-40fps."
http://forums.guru3d.com/showpost.php?p=4991210&am...

There's plenty more evidence supporting the acknowledged (by Nvidia) fact that the GTX970 has performance issues with VRAM allocation above 3.5GB.

And all those people posting "my games run fine at 1080p", you are clearly missing the point.
aoshiryaev - Wednesday, January 28, 2015 - link
Why not just disable the slow 512mb of memory?
SkyBill40 - Wednesday, January 28, 2015 - link
Why not just have the full 4GB at the rated speed as advertised?
Oxford Guy - Thursday, January 29, 2015 - link
Ding ding ding.
MrWhtie - Wednesday, January 28, 2015 - link
I can run 4 games at 100+ fps on 1080p simultaneously (MSI GTX 970). Power like this used to always cost $500+. I have no complaints; I didn't have $500 to spend on a GTX 980.

I feel Nvidia is doing us a favor by significantly undercutting AMD.
mudz78 - Wednesday, January 28, 2015 - link
Yeah, a huge favour. By lying about their product specs, undercutting the competition and concreting market share, they set themselves up to hike prices in the future.

GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation

GeForce GTX 970: Correcting the Specifications

Post Your Comment

398 Comments

View All Comments

Will Robinson - Wednesday, January 28, 2015 - link

Oxford Guy - Thursday, January 29, 2015 - link

mudz78 - Wednesday, January 28, 2015 - link

Yojimbo - Wednesday, January 28, 2015 - link

mudz78 - Wednesday, January 28, 2015 - link

aoshiryaev - Wednesday, January 28, 2015 - link

SkyBill40 - Wednesday, January 28, 2015 - link

Oxford Guy - Thursday, January 29, 2015 - link

MrWhtie - Wednesday, January 28, 2015 - link

mudz78 - Wednesday, January 28, 2015 - link

Log in

Don't have an account? Sign up now