Segmented Memory Allocation in Software

So far we’ve talked about the hardware, and having finally explained the hardware basis of segmented memory we can begin to understand the role software plays, and how software allocates memory among the two segments.

From a low-level perspective, video memory management under Windows is the domain of the combination of the operating system and the video drivers. Strictly speaking Windows controls video memory management – this being one of the big changes of Windows Vista and the Windows Display Driver Model – while the video drivers get a significant amount of input in hinting at how things should be laid out.

Meanwhile from an application’s perspective all video memory and its address space is virtual. This means that applications are writing to their own private space, blissfully unaware of what else is in video memory and where it may be, or for that matter where in memory (or even which memory) they are writing. As a result of this memory virtualization it falls to the OS and video drivers to decide where in physical VRAM to allocate memory requests, and for the GTX 970 in particular, whether to put a request in the 3.5GB segment, the 512MB segment, or in the worst case scenario system memory over PCIe.


Virtual Address Space (Image Courtesy Dysprosia)

Without going quite so far to rehash the entire theory of memory management and caching, the goal of memory management in the case of the GTX 970 is to allocate resources over the entire 4GB of VRAM such that high-priority items end up in the fast segment and low-priority items end up in the slow segment. To do this NVIDIA focuses up to the first 3.5GB of memory allocations on the faster 3.5GB segment, and then finally for memory allocations beyond 3.5GB they turn to the 512MB segment, as there’s no benefit to using the slower segment so long as there’s available space in the faster segment.

The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From an API perspective this is applicable towards both graphics and compute, though it’s a safe bet that graphics is the more easily and accurately handled of the two thanks to the rigid nature of graphics rendering. Direct3D, OpenGL, CUDA, and OpenCL all see and have access to the full 4GB of memory available on the GTX 970, and from the perspective of the applications using these APIs the 4GB of memory is identical, the segments being abstracted. This is also why applications attempting to benchmark the memory in a piecemeal fashion will not find slow memory areas until the end of their run, as their earlier allocations will be in the fast segment and only finally spill over to the slow segment once the fast segment is full.

GeForce GTX 970 Addressable VRAM
API Memory
Direct3D 4GB
OpenGL 4GB
CUDA 4GB
OpenCL 4GB

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980. Again from a high-level perspective all of this segmentation is abstracted, so games should not be aware of what’s going on under the hood.

Overall then the role of software in memory allocation is relatively straightforward since it’s layered on top of the segments. Applications have access to the full 4GB, and due to the fact that application memory space is virtualized the existence and usage of the memory segments is abstracted from the application, with the physical memory allocation handled by the OS and driver. Only after 3.5GB is requested – enough to fill the entire 3.5GB segment – does the 512MB segment get used, at which point NVIDIA attempts to place the least sensitive/important data in the slower segment.

Diving Deeper: The Maxwell 2 Memory Crossbar & ROP Partitions Practical Performance Possibilities & Closing Thoughts
Comments Locked

398 Comments

View All Comments

  • Galidou - Tuesday, January 27, 2015 - link

    The performance remains still amazing for the price. They wouldn't have to describe the spec to me that I would have bought it if I didn't have an already good enough card for what I do/play.

    What's a big deal to me: performance to cost ratio end of the line. I never cared about anything else.
  • Galidou - Tuesday, January 27, 2015 - link

    alacard, did you really read the article? It is said about the memory bus that: ''Ultimately this link is what makes a partially disabled partition possible, and is also what makes it possible to have the full 256-bit memory bus present and active in spite of the lack of a ROP/L2 unit and its associated crossbar port.''

    If you have to be on the mad side of the community, at least, know your subject.
  • alacard - Tuesday, January 27, 2015 - link

    Galidou, did you read my comment? It CAN'T be running at 256bit bus width with the last 500MB DRAM module empty, which is will be most of the time. Please don't be dense with your replies about me not knowing my subject when you clearly don't know yours.

    "People buy performance, don't say a thing about memory bandwidth rops"

    Do you have a crystal ball? How do you know whether or not people buy things for specs are you clairvoyant? Can you see backward and forward in time? I buy tech based on specs all the time, in fact i don't know anyone personally who doesn't. A 256bit bus vs a 224 bus would cause me to think more carefully about my decision. Maybe i have a program with extremely high bandwidth needs that would run faster with 256. Maybe I plan on 4k gaming so I want the extra ROPS just in case. Maybe 2MB of cache sounds better to my ears than 1K+.

    My guess is the above - people like me - is why NVIDIA did nothing to correct specs they had to know were false. Now they're reaping what they sow, and i hope it's a huge harvest. They've earned it.
  • Galidou - Saturday, January 31, 2015 - link

    Nothing can be done qith you, you're the ultimate truth, next time, I will buy based on specs. Oh god forgive me for thinking I had to buy a gtx 970 because on paper it performs better than a R9 290 which has a better spec sheet.

    I will know better from now, buy the R9 290 because of the 512 bit memory bus and 2560 Stream processors. EVEN if it gives me worse fps for the price. oh almighty specs, forgive me for being such an ignorant, I once thought performance was more important than specs, now that alacard has enlightened me, never shall I make the same mistake.

    P.S.: I bought a GTX 660 ti even if peeps were going against the 192-bit memory bus and it's still one of the best video card I bought even 2 years and a half later.
  • Galidou - Saturday, January 31, 2015 - link

    Oh, another point, you want to buy a video card, look at benchmarks of the game you play at your resolution. Buy the video card that give you the best fps for the price, oh NOO I forgot again, no one buy BASED on performance, I HAS NO CRISTAL BALL OH NO! Fast, to the spec sheets: OH yeah I dunno what the performance is but THE SPEC OH THEY SPEAK TO ME!

    Darn, my friend plays the same games than me, bought video card for a cheaper price based on benchmark and gets better performance than me, alacard screwed me!!!
  • Galidou - Saturday, January 31, 2015 - link

    We sure look at specs, but again, we buy based on performance and I know I'm right. If I wasn't everyone would buy R9 290 and R9 290x because of the memory bus and quantity of Stream processors, not considering that 256 bit bus with superb compression with no loss in image quality will give you better performance.

    But nowhere will you see about nvidia's bandwidth that it surpasses the 512-bit bus of AMD. Not in any online retailer store directly on the spec sheet. So no, specs aren't everything, it doesn't say a thing about the optimization it had behind the specs.
  • spartaman64 - Monday, January 26, 2015 - link

    the 970 is still a great card but we should hold nvidia accountable
  • bigboxes - Tuesday, January 27, 2015 - link

    Good lord, you're shilling on HardOCP as well. You should be banned for this kind of crap. C'mon mods.
  • Ranger101 - Tuesday, January 27, 2015 - link

    What the hell do you know?
  • jackstar7 - Tuesday, January 27, 2015 - link

    So the life of your card was cut short (as games continue to use more and more VRAM) and you're not bothered by that? Is it that you would upgrade again before this became an issue and you don't mind losing value for potential resale? I just don't understand the mindset of someone who is okay with finding out their purchase was not made with all the correct information and in this case because the company specifically screwed up in providing it.

Log in

Don't have an account? Sign up now