Segmented Memory Allocation in Software

So far we’ve talked about the hardware, and having finally explained the hardware basis of segmented memory we can begin to understand the role software plays, and how software allocates memory among the two segments.

From a low-level perspective, video memory management under Windows is the domain of the combination of the operating system and the video drivers. Strictly speaking Windows controls video memory management – this being one of the big changes of Windows Vista and the Windows Display Driver Model – while the video drivers get a significant amount of input in hinting at how things should be laid out.

Meanwhile from an application’s perspective all video memory and its address space is virtual. This means that applications are writing to their own private space, blissfully unaware of what else is in video memory and where it may be, or for that matter where in memory (or even which memory) they are writing. As a result of this memory virtualization it falls to the OS and video drivers to decide where in physical VRAM to allocate memory requests, and for the GTX 970 in particular, whether to put a request in the 3.5GB segment, the 512MB segment, or in the worst case scenario system memory over PCIe.


Virtual Address Space (Image Courtesy Dysprosia)

Without going quite so far to rehash the entire theory of memory management and caching, the goal of memory management in the case of the GTX 970 is to allocate resources over the entire 4GB of VRAM such that high-priority items end up in the fast segment and low-priority items end up in the slow segment. To do this NVIDIA focuses up to the first 3.5GB of memory allocations on the faster 3.5GB segment, and then finally for memory allocations beyond 3.5GB they turn to the 512MB segment, as there’s no benefit to using the slower segment so long as there’s available space in the faster segment.

The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From an API perspective this is applicable towards both graphics and compute, though it’s a safe bet that graphics is the more easily and accurately handled of the two thanks to the rigid nature of graphics rendering. Direct3D, OpenGL, CUDA, and OpenCL all see and have access to the full 4GB of memory available on the GTX 970, and from the perspective of the applications using these APIs the 4GB of memory is identical, the segments being abstracted. This is also why applications attempting to benchmark the memory in a piecemeal fashion will not find slow memory areas until the end of their run, as their earlier allocations will be in the fast segment and only finally spill over to the slow segment once the fast segment is full.

GeForce GTX 970 Addressable VRAM
API Memory
Direct3D 4GB
OpenGL 4GB
CUDA 4GB
OpenCL 4GB

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980. Again from a high-level perspective all of this segmentation is abstracted, so games should not be aware of what’s going on under the hood.

Overall then the role of software in memory allocation is relatively straightforward since it’s layered on top of the segments. Applications have access to the full 4GB, and due to the fact that application memory space is virtualized the existence and usage of the memory segments is abstracted from the application, with the physical memory allocation handled by the OS and driver. Only after 3.5GB is requested – enough to fill the entire 3.5GB segment – does the 512MB segment get used, at which point NVIDIA attempts to place the least sensitive/important data in the slower segment.

Diving Deeper: The Maxwell 2 Memory Crossbar & ROP Partitions Practical Performance Possibilities & Closing Thoughts
Comments Locked

398 Comments

View All Comments

  • Jon Tseng - Monday, January 26, 2015 - link

    If it's such a gimped card I'll buy yours off you for $200. After all if it's really going to lose performance and value so quickly it can't be worth much more than that.

    You can sue NVidia for the extra $150. I can finally get FSX running at 4K*. Everyone's happy! :-) :-)

    * Bonus point if you can spot the deliberate "Kessel Run in 12 Parsecs" logic here.
  • JarredWalton - Monday, January 26, 2015 - link

    Not likely. Most games target specific amounts of VRAM, with 1GB, 2GB, 3GB, 4GB, 6GB, and 8GB are all likely candidates, usually the targets have some leeway. The reason is you target memory use based on textures and shadow maps, but you know that you also have to have frame buffers and other elements in VRAM (that aren't usually directly under the control of the game). So a game that targets 4GB VRAM will usually target more like 3-3.2GB VRAM, leaving the rest for the GPU to use on frame buffers, Z-buffers, etc.

    To that end, I've seen games where a GTX 970 runs great at Ultra QHD, but Ultra 4K really kills performance -- because where Ultra QHD might be just under the 3.5GB VRAM of the 970, Ultra 4K ends up going over the 4GB barrier. (The same performance drop occurs with the GTX 980 as well in my experience.) And do you know what most gamers will do if they hit the point where performance takes a big dive? They'll drop one or two settings to "fix" things.

    And that's where NVIDIA's GeForce Experience can help the majority: they go in, select their resolution, and let the game do the tricky part of selecting ideal settings. Maybe it won't be perfect, but for most gamers it's sufficient.

    TL;DR: Much ado about nothing.
  • Samus - Monday, January 26, 2015 - link

    And for those games, you'll need a higher end card. The realistic difference between 3.5GB and 4GB VRAM for texture cache means very little, even at 4K, where even 4GB is the ceiling NOW. Let's face it, with consoles having 8GB and high end cards having 6GB, 4GB cards just won't cut it in a few years let alone 3.5GB cards.
  • Mvoigt - Monday, January 26, 2015 - link

    You fail to understand that the consoles have a total of 8gb ram... not all dedicated to graphics... the OS uses some, the game uses some, and the graphics use a portion of that... Then i could say consoles fail, since my graphics card has 4GB ram and my machine has 32GB ram, I have a combined 36gb ram avaliable vs 8 gb on the consoles....
  • Kevin G - Monday, January 26, 2015 - link

    The thing is that the GPU and CPU have independent memory pools. If a game only uses 1 GB of that 32 GB main memory, you have 31 GB going to waste. Attempting to utilize that extra memory for a texture cache tends to make games crawl due to the latency bottleneck.

    On a console, what isn't used up by the host OS (512 MB last I checked), and core game logic can all go toward game assets. That can easily mean more than 6 GB of textures and models. With PC games typically running at higher resolution and using even higher resolution assets, it could easily equate to a demand for 6 GB and 8 GB graphics cards this year or next.
  • hermeslyre@hotmail.com - Monday, January 26, 2015 - link

    Last I checked both consoles reserve around 3.5GB for OS, With the PS4 having 512MB of that reserved pool as flexible. Which leaves not a megabyte more than 5GB available to developers to do their thing. On the PS4 at least.
  • McC54u - Monday, January 26, 2015 - link

    You guys act like consoles are running on Titan Z's or something. They are running on radeon 7850 at best. They will never have a real graphics intense 4K game on a console they can't even do 1080p on most their launch titles. Even with all that ram. Unless they get on board with some serious streaming tech for new titles we have seen almost the peak of what these consoles can do.
  • Galidou - Monday, January 26, 2015 - link

    What you don't realize is that the 8gb is usable for textures. Games tend to look very good on consoles even if they use an underpowered GPU. Take for example the modified GeForce GT 7800 in the PS3, how far did it go? I think it did ALOT better than the GT 7800 on the PC side.

    Fallout 3 was designed as a console port for the PC. Computer graphics cards had more VRAM than console and people developed textures pack for the game making it a lot more beautiful but using much more VRAM which was impossible for PS3 and XBOX 360.

    The same will happen with console games of this gen, at some point, they won't have the vertex/shader heavy capability of PC cards but the memory for really beautiful textures at 1080p. Take those console port on pc with very beautiful textures but play them in 1440p or 4k... there you go, VRAM utilisation way beyond what has been seen in the past.
  • anandreader106 - Monday, January 26, 2015 - link

    Did you read the post by Mvoigt? You DO NOT have 8GB for textures! The OS eats up over a GB of RAM and the CPU will use RAM while processing whatever is going on in the game world. Maybe, you can get to 5GB of RAM for graphics,....maybe. But that means the world your traversing is pretty static and boring.
  • Galidou - Monday, January 26, 2015 - link

    Yep 4-5gb of ram for graphics on console is possible, but then you didn't read what I've said?

    I was focusing on the fact that the same game that can use up to 4-5gb of textures in 1080p, port it to pc users that will play it in 1440p and 4k give them some graphical options not available on console and there you go, games that can make use of 6gb of VRAM easily

Log in

Don't have an account? Sign up now