Segmented Memory Allocation in Software

So far we’ve talked about the hardware, and having finally explained the hardware basis of segmented memory we can begin to understand the role software plays, and how software allocates memory among the two segments.

From a low-level perspective, video memory management under Windows is the domain of the combination of the operating system and the video drivers. Strictly speaking Windows controls video memory management – this being one of the big changes of Windows Vista and the Windows Display Driver Model – while the video drivers get a significant amount of input in hinting at how things should be laid out.

Meanwhile from an application’s perspective all video memory and its address space is virtual. This means that applications are writing to their own private space, blissfully unaware of what else is in video memory and where it may be, or for that matter where in memory (or even which memory) they are writing. As a result of this memory virtualization it falls to the OS and video drivers to decide where in physical VRAM to allocate memory requests, and for the GTX 970 in particular, whether to put a request in the 3.5GB segment, the 512MB segment, or in the worst case scenario system memory over PCIe.


Virtual Address Space (Image Courtesy Dysprosia)

Without going quite so far to rehash the entire theory of memory management and caching, the goal of memory management in the case of the GTX 970 is to allocate resources over the entire 4GB of VRAM such that high-priority items end up in the fast segment and low-priority items end up in the slow segment. To do this NVIDIA focuses up to the first 3.5GB of memory allocations on the faster 3.5GB segment, and then finally for memory allocations beyond 3.5GB they turn to the 512MB segment, as there’s no benefit to using the slower segment so long as there’s available space in the faster segment.

The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From an API perspective this is applicable towards both graphics and compute, though it’s a safe bet that graphics is the more easily and accurately handled of the two thanks to the rigid nature of graphics rendering. Direct3D, OpenGL, CUDA, and OpenCL all see and have access to the full 4GB of memory available on the GTX 970, and from the perspective of the applications using these APIs the 4GB of memory is identical, the segments being abstracted. This is also why applications attempting to benchmark the memory in a piecemeal fashion will not find slow memory areas until the end of their run, as their earlier allocations will be in the fast segment and only finally spill over to the slow segment once the fast segment is full.

GeForce GTX 970 Addressable VRAM
API Memory
Direct3D 4GB
OpenGL 4GB
CUDA 4GB
OpenCL 4GB

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980. Again from a high-level perspective all of this segmentation is abstracted, so games should not be aware of what’s going on under the hood.

Overall then the role of software in memory allocation is relatively straightforward since it’s layered on top of the segments. Applications have access to the full 4GB, and due to the fact that application memory space is virtualized the existence and usage of the memory segments is abstracted from the application, with the physical memory allocation handled by the OS and driver. Only after 3.5GB is requested – enough to fill the entire 3.5GB segment – does the 512MB segment get used, at which point NVIDIA attempts to place the least sensitive/important data in the slower segment.

Diving Deeper: The Maxwell 2 Memory Crossbar & ROP Partitions Practical Performance Possibilities & Closing Thoughts
Comments Locked

398 Comments

View All Comments

  • Mondozai - Monday, January 26, 2015 - link

    When a company intentionally lies to its consumers, that isn't a storm in a teacup. Ryan may believe them but I don't. I agree with him that it's incredibly stupid to do this kind of stuff, but the notion that they didn't know, even after all the manuals were passed around the company? Knowing the number of ROPs is basic stuff for technical marketing.

    And okay if this got missed a single round. But in successive rounds, over a period of almost half a year? C'mon. Nvidia knows that it wouldn't sell as well if they marketed it as "3.5 VRAM" and they tried to cover this shit up.

    I'm guessing Jonah Alben didn't have anything to do with this, and I'm guessing he's pissed as fuck. The big quesiton is if Jen-Hsun knew or not. Their marketing team are not exactly people I'd trust(watch Tom Peterson in any stream and you'll know what I mean).

    Throwing the marketing guys under the bus is poetic justice. But also an easy move. Again, did the CEO know?
  • mapesdhs - Monday, January 26, 2015 - link


    "intentionally lies".. yeah right! So you're saying this is not acceptable, and yet it's ok for AMD
    (and indeed NVIDIA) to market dual-GPU cards by advertising the sum of the VRAM on both
    GPUs, even though an application can only see & access the individual amount? Look at
    *any* seller site spec list for an AMD 295x2, they all say 8GB (ditto the specs page on
    AMD's site), while Anandtech's own review shows quite clearly that it's just 2x4GB, so the
    real amount accessible by an application is 4GB, not 8GB. Surely this is far more of a
    deception than the mistake NVIDIA states they have made with the 970 specs.

    So I call out hypocrasy; your comment is just NVIDIA-bashing when there have been far
    more blatant deceptions in the past, from both sides. NVIDIA does the double-up VRAM
    nonsense aswell, eg. the sale ads for the Titan Z all state 12GB, as do the specs on the
    NVIDIA web site, but again it's just 6GB per GPU, so 6GB max visible to an application.
    Look back in time, you'll see the same mush published for cards like the GTX 295 and
    equivalent ATIs from back then.

    So quit moaning about what is merely a mistake which doesn't change the conclusions
    based on the initial 970 review performance results, and instead highlight the more blatant
    marketing fibs, especially on dual-GPU cards. Or of course feel free to cite in *any* dual-
    GPU review where you complained about the VRAM diddle.

    Sorry if I sound peeved, but your comment started by claiming something is true when
    it's just your opinion, based on what you'd like to believe is true.

    Ian.
  • alacard - Monday, January 26, 2015 - link

    "So you're saying this is not acceptable, and yet it's ok for AMD
    (and indeed NVIDIA) to market dual-GPU cards by advertising the sum of the VRAM on both
    GPUs, even though an application can only see & access the individual amount?"

    That's what's known as a straw-man, he never mentioned anything about dual GPUs. His point about ROPs is perfectly valid--and no Ian it's not ok to lie about that, nor about the amount of cache.

    "Sorry if I sound peeved, but your comment started by claiming something is true when
    it's just your opinion, based on what you'd like to believe is true."

    Why would you give Nvidia the benefit of the doubt here? If you really and truly believed no one brought this up before release or noticed it afterwards than you're a bigger fool than i could have ever guessed you are.

    Sorry if I sound peeved, but your comment started is claiming something is true when
    it's just your opinion, based on what you'd like to believe is true.
  • dragonsqrrl - Monday, January 26, 2015 - link

    "Why would you give Nvidia the benefit of the doubt here?"

    Why would Nvidia want to deceive the whole PC gaming world over something so minor? As Ryan stated in the article that would be genuinely stupid. Can you think of a reason why Nvidia would intentionally seed a slightly inaccurate spec sheet to the press? What would they gain from that? I don't think there's any reason to believe the initial spec sheet was anything other than a mistake by Nvidia, and neither does any credible tech journalist I know of.

    That being said I also highly doubt they weren't aware of the mistake until now. While I think their response to this incident has been good so far, I really think they should've come out with this information sooner (like last week when this started to really heat up). But I think that time was probably spent confirming what had happened and how to present it to the press.
  • alacard - Monday, January 26, 2015 - link

    " Can you think of a reason why Nvidia would intentionally seed a slightly inaccurate spec sheet to the press?"

    Is this a real question or some sort of a joke? You're asking why a company would knowingly inflate a spec sheet for a product they want to sell, and doing so with a straight face? Is that PT Barnum's johnson i see swinging from your asshole?
  • Galidou - Tuesday, January 27, 2015 - link

    People buy performance, don't say a thing about memory bandwidth rops and such install it in your computer. You paid it less than some video cards it outperforms, don't care about stats, you're on the good way.

    Companies lie to us about advertising any sort of things on tv and so on. I've seen many LCD monitors advertising X nits and not delivering totally the amount and no one ever sues them. If the monitor is still averages better or the same image quality than the best monitors in it's price class who cares about the advertisement.

    Not saying that lying to improve sales number is right, but SO MANY companies do that. Unless it turns out to be a really bad product for the price you paid, then sue them. But don't whine when there's a SLIGHT difference but still outperforms everything in it's price class, uses less power, has good drivers and so on.

    The only reason Nvidia would have to do this intentionally would be to back up a medium video card performance, a kind of semi failure, which the GTX 970 SURELY isn't. Why would a company need to boost sales while they know it's gonna be sold out for the next month because of it's price/performance ratio?
  • FlushedBubblyJock - Friday, January 30, 2015 - link

    Oh, so that's why AMD lied about the number of transistors in the Bullldozer core, claiming it was 2 billion, then months later correcting their lie to journalists and revising it downward quite a large leap to 1.2 billion, a full 40% drop.
    Yes, lying about a cruddy product that never met expectations by pumping up that core transistor count to give the impression of latent power just not yet utilized, by say, optimizations required for the Windows OS to use all the "8"/(4) cores better with improved threading...

    Hahahhaaa no it's not a joke...

    http://www.anandtech.com/show/5176/amd-revises-bul...
  • dragonsqrrl - Tuesday, January 27, 2015 - link

    Wow, disproportionately aggressive response to appropriate and logical questions. I can't tell if you're trying to intentionally mislead others or if you really have no clue what you're talking about. Yes, I'm asking why Nvidia would conspire to intentionally lie about something so minor in the initial spec sheet that would almost certainly be discovered soon after launch? I even tried to help you out a little: What would they gain from that?

    It just takes a simple risk assessment and a little bit of logic to pretty much rule this out as an intentional deception.
  • Galidou - Tuesday, January 27, 2015 - link

    Nvidia's way of thinking by the mad community: ''With the performance to cost ratio of that card when it's gonna be launched, it will be sold out for weeks to come even if we give the true spec sheets! Let's speak to marketing department and modifiy that so it can be SOLD OUT TIMES 2!! YEAH, now you speak, let's make the community so mad they have to wait for it! YEAH, we want the community to HATE US!''
  • alacard - Tuesday, January 27, 2015 - link

    Galadou, Dragonsqrrl: Can you explain how a 970 with one of the dram banks partitioned for low priority data is supposed to operate at 256 bits? Given that the last 512 chunk is only being accessed as a last resort, and only after all the other RAM is occupied, the memory subsystem could only be operating at 224 bits for the majority of cases.

    I could be wrong but i just don't see it. Given that, we're not merely talking about diminished ROP and cache count, but also a shallower memory interface which NVIDIA marketed specifically as being exactly the same as the 980. Here is a direct quote from their reviewer's guide:

    "Equipped with 13 SMX units and 1664 CUDA Cores the GeForce GTX 970 also has the rending horsepower to tackle next generation gaming. And with its 256-bit memory interface, 4GB frame buffer, and 7Gbps memory the GTX 970 ships with the SAME MEMORY SUBSYSTEM as our flagship GEFORCE GTX 980"

    If it really is only operating at 224 bits, THIS IS A BIG DEAL. Even if it were an honest mistake, it's still a big deal. Giving them the benefit of the doubt and assuming their initial materials were wrong, the idea they didn't notice it after release... come on.

    BTW that PT Barnum comment was just a joke that popped in my head at the last second and i couldn't resist adding it.

Log in

Don't have an account? Sign up now