Segmented Memory Allocation in Software

So far we’ve talked about the hardware, and having finally explained the hardware basis of segmented memory we can begin to understand the role software plays, and how software allocates memory among the two segments.

From a low-level perspective, video memory management under Windows is the domain of the combination of the operating system and the video drivers. Strictly speaking Windows controls video memory management – this being one of the big changes of Windows Vista and the Windows Display Driver Model – while the video drivers get a significant amount of input in hinting at how things should be laid out.

Meanwhile from an application’s perspective all video memory and its address space is virtual. This means that applications are writing to their own private space, blissfully unaware of what else is in video memory and where it may be, or for that matter where in memory (or even which memory) they are writing. As a result of this memory virtualization it falls to the OS and video drivers to decide where in physical VRAM to allocate memory requests, and for the GTX 970 in particular, whether to put a request in the 3.5GB segment, the 512MB segment, or in the worst case scenario system memory over PCIe.


Virtual Address Space (Image Courtesy Dysprosia)

Without going quite so far to rehash the entire theory of memory management and caching, the goal of memory management in the case of the GTX 970 is to allocate resources over the entire 4GB of VRAM such that high-priority items end up in the fast segment and low-priority items end up in the slow segment. To do this NVIDIA focuses up to the first 3.5GB of memory allocations on the faster 3.5GB segment, and then finally for memory allocations beyond 3.5GB they turn to the 512MB segment, as there’s no benefit to using the slower segment so long as there’s available space in the faster segment.

The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From an API perspective this is applicable towards both graphics and compute, though it’s a safe bet that graphics is the more easily and accurately handled of the two thanks to the rigid nature of graphics rendering. Direct3D, OpenGL, CUDA, and OpenCL all see and have access to the full 4GB of memory available on the GTX 970, and from the perspective of the applications using these APIs the 4GB of memory is identical, the segments being abstracted. This is also why applications attempting to benchmark the memory in a piecemeal fashion will not find slow memory areas until the end of their run, as their earlier allocations will be in the fast segment and only finally spill over to the slow segment once the fast segment is full.

GeForce GTX 970 Addressable VRAM
API Memory
Direct3D 4GB
OpenGL 4GB
CUDA 4GB
OpenCL 4GB

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980. Again from a high-level perspective all of this segmentation is abstracted, so games should not be aware of what’s going on under the hood.

Overall then the role of software in memory allocation is relatively straightforward since it’s layered on top of the segments. Applications have access to the full 4GB, and due to the fact that application memory space is virtualized the existence and usage of the memory segments is abstracted from the application, with the physical memory allocation handled by the OS and driver. Only after 3.5GB is requested – enough to fill the entire 3.5GB segment – does the 512MB segment get used, at which point NVIDIA attempts to place the least sensitive/important data in the slower segment.

Diving Deeper: The Maxwell 2 Memory Crossbar & ROP Partitions Practical Performance Possibilities & Closing Thoughts
Comments Locked

398 Comments

View All Comments

  • ol1bit - Monday, February 2, 2015 - link

    They don't disable all this stuff, it more about what chips pass what tests. In the past, they would have had to disable more of it, so the 970 would not be as fast as it is. Cheers!
  • Oderus Urungus - Saturday, February 7, 2015 - link

    You cant simply "turn on" the ROP, it's lazer-ed off I do believe, which makes it impossible.
  • xenol - Tuesday, January 27, 2015 - link

    A class action lawsuit will take years to settle and in the end all you get is a $30 rebate (while the lawyer who represented the customers gets millions) and you forfeit your right to participate in another class action lawsuit, as the legalese tends to say.
  • 3ricss - Tuesday, January 27, 2015 - link

    Yeah, no class action lawsuit is going to happened. The devil is in the details on this one and I'd be surprise if less than 1% of the users out there have even read up on this.
  • Yojimbo - Tuesday, January 27, 2015 - link

    Are you a lawyer? I wouldn't normally dare say it is or isn't a case for a class action suit, as I am not a lawyer and really have no idea what I am talking about, but since you took a gander at it but didn't back it up in any way, I will too, but with some explanations of my line of thinking. My first reaction is that NVIDIA would probably have to release false claims of utility and not just false numbers. The 4GB claim is real, not false. I wonder if ROPs and memory bandwidth might be a bit too abstract for the courts to rule that consumers were truly deceived. Game performance is the true benchmark as far as consumers are concerned. Secondly they never advertised the inaccurate information, they released it to review sites in press packets. There's still a responsibility there, but my guess is it's a step down from an active advertising campaign.
  • eanazag - Tuesday, January 27, 2015 - link

    Get off your class action suit gravy train America - them lawyers got yous trained. This could easily have been a mistake. On top of that, what was illegal? Price fixing? No. Ruined the competitive landscape? Absolutely not.

    I don't own a 970. The price to performance has only helped consumers.
  • SkyBill40 - Wednesday, January 28, 2015 - link

    A mistake? Unlikely. It seems pretty clear this was likely known prior to launch yet they made no mention of it. That's deliberately falsifying the specs of the card. While it's not really a huge deal, it IS a rather huge hit to Nvidia for the sake of trust in their product specs. Full disclosure is what it should be about from the beginning... not dropping a "mea culpa" afterwards and expecting everyone to just buy into that.

    You can buy into that if you wish, but not I. I guess it's good that I've waited to upgrade. I'll either pick up a 980 or one of the Ti variants should they be released.
  • rafaelluik - Wednesday, January 28, 2015 - link

    Are you kidding or are you mind? Will you keep buying from NVIDIA from now on?!
  • Yojimbo - Wednesday, January 28, 2015 - link

    NO! BURN THEIR CORPORATE HEADQUARTERS! (My legal counsel insists I point out that this was sarcasm.)
  • Yojimbo - Wednesday, January 28, 2015 - link

    What exactly was known prior to launch, the way the card was designed? I hope so. The mistake was made AT launch. The mistake was the improper information being given to the reviewing press. The card works as designed. It doesn't seem worth it to try to fool consumers into thinking there's 64 ROPs instead of 56 ROPs or there's 224 GB/s of bandwidth instead of 196 GB/s of bandwidth, especially when there's seemingly so much fancy hardware and software engineering behind it to make it work. If the wrong information really was released on purpose, it seems like a stupid decision. Sure, consumers would have more peace of mind thinking there was that extra headroom, but on the other hand consumers would also be impressed with the innovations that make it work. Probably the net result is that the card would get more positive attention from the better specs, but I hardly think it could be considered enough to want to choose to lie about the specs, knowing the shit-storm that would be released if and when the truth is found. I mean, games which usually allocate the whole VRAM only are allocating 3.5GB in the 970, so, in hindsight at least, it's pretty obvious that something is noticeable.

    This card is out of my price range so I wouldn't have got one anyway, but I don't see any reason to avoid it. It's SM-bound, not ROP- nor memory bandwidth-bound, and the .5 GB of slower RAM hasn't been shown to create a problem, from what I've seen. NVIDIA just managed to make the whole thing fit together more efficiently from a manufacturing cost standpoint. Hence they can sell a card that works equally well as a 64 ROP and 224 GB/s card for less money than they would be able to sell a 64 ROP and 224GB/s card for.

Log in

Don't have an account? Sign up now