Original Link: http://www.anandtech.com/show/2272



From our Windows Vista performance guide:

Except in a few cases where 64-bit code is clearly faster, the primary purpose for Vista x64's existence is to resolve the problems of 32-bit addressing space, and we're just not at the point yet where even most enthusiasts are pushing that limit. Once applications begin to push the 2GB addressing space limitation of Win32 (something we expect to hit very soon with games) or total systems need more than 4GB of RAM, then Vista x64 in its current incarnation would be a good choice.

For some time now we have been mentioning the potential problems that are likely to result from the switchover from 32bit(x86) Windows to 64bit(x64) Windows. Due to a multitude of issues, including Windows' memory management, the basic design of the PC architecture, and consumer support issues, there is no easy path for mass migration from 32bit Windows to 64bit Windows. As a result we have been expecting problems as consumers begin to make the messy transition.

We published the above mentioned guide on February 1st, expecting the fall/winter 2007 games to be the ones to push the 2GB addressing space limitation of Windows, and it turns out we were wrong. It turns out that two weeks after we published the above article, THQ published Supreme Commander, a RTS with a massive appetite for resources. It can be simultaneously GPU limited and CPU limited, which is why it's a standard benchmark here for our performance articles, it's also memory limited in more than one way: it's hitting the 2GB barrier of 32bit Windows.

An artifact of the design of 32bit processors and the 32bit API for Windows, the 2GB barrier is a cap on how much addressing space (related to but not equivalent to memory usage) a single application can use. This isn't a bug but rather the result of how hardware and software was created so many years ago, and while everyone has known this barrier will inevitably be hit, as we'll see there are several reasons why it can't simply be moved or bypassed. Meanwhile hitting it involves affected applications crashing for what can appear to be no good reason, and understanding why the 2GB barrier exists and what can be done will be important for resolving those crashes.

On a personal note, I am a semi-casual player of real time strategy(RTS) games and I've been playing Supreme Commander lately. This is a different kind of article, it's a record and the result of my own efforts to resolve why I was having crashing issues with Supreme Commander. With no intended disrespect towards THQ or the game's developers (Gas Powered Games) we could have not possibly asked for a better example of the 2GB barrier in action. It's exactly the experience we believe many people will have as they hit the 2GB barrier, mainly those power users who use large monolithic applications such as games or multimedia tools. This is an article on what the problem with the 2GB barrier is, what kind of experiences a user may expect when hitting it, and what can be done to fix it.

But before we get too far ahead of ourselves, let's discuss memory management in Windows. Understanding the problem with Supreme Commander requires understanding what the 2GB barrier is, why it's there, and what makes it so problematic.




A Primer on Windows' Memory Management

In 1986, Intel released the 386 processor, which offered support for a new instruction set (IA32), that was an extension of the original x86 instruction set. Among the most notable features in IA32 was an increase in the amount of memory the CPU could address, moving from 20bit addressing(1MB) to 32bit addressing(4GB)(ed: this is not including the convoluted mess that was segmented addressing). All x86 CPUs released since then have supported the same instruction set, including the same 4GB limit. Only recently have 64bit x86 CPUs been released, and while they support more than 4GB of memory, still are limited to 4GB when operating in 32bit mode.

As we have mentioned in previous articles, most modern system running in 32bit x86 mode have trouble seeing and using more than roughly 3GB of memory. This is because part of the total 4GB of memory space (not the physical memory) is reserved for various functions, such as computer components transferring data between each other using memory-mapped input-output(MMIO). The textbook example of this is the CPU transferring data to the memory of a video card, where a chunk of the address space equal to the size of the memory of the video card is reserved by the video card, and any data sent to those addresses actually ends up going to the video card. This design has many technical merits, but it makes the consumed memory addresses unavailable for use with physical memory.

Things only get more complex as we start including the operating system (in this case Windows) in to the equation. The above is actually handled by a combination of Windows and the BIOS, meanwhile Windows also needs some address space so that programs can communicate with the Windows kernel, for storing buffers, for storing memory tables, etc; all of which means we have lost even more address space. All of the above besides preventing us from addressing 4GB of physical memory are also the cause of the actual 2GB barrier that is the problem.

Quickly, there is one more pre-requisite piece of information: virtual address space. For a 32bit Windows application(Win32), each application has a full 4GB worth of private addressing space that it can use, which again is 32bits and a result of how a 32bit processor works. How the above exactly works is beyond the scope of this article, but it's sufficient to say that at some point virtual addresses get translated in to other addresses for mapping data between the application and physical memory or the swap file. The important thing to take from this is that each application has its own 4GB virtual address space, regardless of the hardware the computer contains or what applications are running. Now we may begin to understand the 2GB barrier.

Due to design reasons outside the scope of this article, Windows takes a pre-determined portion of each application's virtual address space and reserves it for itself. Windows uses its portion of the virtual address range for all of the address needs listed above. At the end of the day, and what really matters, is that in designing Windows Microsoft opted to split up the virtual address space of an application in half; 2GB goes to Windows (kernel space) and 2GB goes to the application (user space). Under normal circumstances this 2GB of space is all a 32bit application has to work with, this is the 2GB barrier and as we'll see is the cause of the problems with Supreme Commander.

It is also worth noting at this point that virtual address space is not directly correlated to physical memory size. Windows and any applications running under it may use up to all of their allocated virtual address space, with Windows simply swapping data between the hard drive and physical memory if the amount of physical memory needed is in excess of what's available - this is virtual memory, not to be confused with virtual address space. The only meaningful relation between virtual address space and physical memory is that the amount of physical memory an application can use can never be greater than its virtual address space. The user space must take up a larger portion of the virtual address space if an application is to use more than 2GB of physical memory.


Removing the 2GB Barrier

As it turns out, it's possible and actually quite easy to move the 2GB barrier by increasing the size of the user space, but at the cost of reducing the size of the kernel space. Under Windows XP, this is the fabled "/3gb" switch for boot.ini, and for Windows Vista it's the "IncreaseUserVa" option in BCDedit. By using these options applications can use more than 2GB of virtual address space (generally up to 3GB), and ideally this would be the end of the article.

Unfortunately this is not the case as there are problems on both the application and kernel side of things. On the application side, a common poor programming practice has been to always assume that an application will only be dealing with 2GB of user space; code that makes this assumption will likely error if more than 2GB of user space is actually available. This is avoidable by following proper programming practices, but as a safety precaution even with additional virtual address space allocated to user space Windows still defaults to limiting an application to 2GB. Only finally, if an application indicates to Windows that it is capable of handling more than 2GB, via the "/LARGEADDRESSAWARE" flag, may it have access to any space above 2GB.

As for the kernel, having had up to half of its space taken away must now find a way to live in a smaller space. The (in)ability of any specific system/Windows configuration to deal with this is why the 3gb switch is considered dangerous, seldom recommended, and just generally a bad idea. The biggest culprit here is drivers that run in kernel space. Like applications, they may assume that there's an entire 2GB of address space to work with, except unlike applications this space gets smaller instead of bigger.

Windows' own memory needs can also cause problems with the reduced kernel space. As we mentioned before, space is required for the kernel to do a multitude of things, if a lot of space is required - video cards with a lot of memory are a particular offender here - then everything needing space may not fit in the kernel space. Because there are no strong safeguards against these conditions it may cause a failure to boot or system instability, especially if the culprit is a driver that is well enough behaved to boot. Many modern drivers from hardware vendors that deal with enterprise-level hardware are capable of handling this, many more consumer hardware drivers are not. Stability concerns are the number one reason that breaking the 2GB barrier on a 32bit version of Windows is not recommended.

There is also a second concern however: performance. While an individual application may benefit from more user space in which to work, the kernel now has less space to cache data (as non-obvious as this may seem given all the addresses are virtual) and this can in theory hurt performance. This condition is something we will take a look at in a bit.

A Case Study: Supreme Commander

As we stated earlier, the inspiration behind this article is Supreme Commander, which was crashing during gameplay due to what we discovered to be was the game exhausting its user space. Without taking extreme and convoluted measures, most applications can do little besides (gracefully) crash when they've exhausted their user space. Server applications, which on the whole tend to be the king of memory hogs in the first place, tend to have some sort of compensation mechanism, but for more generalized applications like games this is not the case.

In this case, Supreme Commander was crashing late in to big games on a system that otherwise is completely stable. No amount of testing could come up with a problem outside of Supreme Commander, and most telltale of all was that it was only a problem with large maps with large number of players with high quality settings; turning anything down made the crashing stop. Although not clearly a memory error it greatly reduces the list of possible problems to a few things. The entire process, we suspect, will be similar for everyone hitting the 2GB barrier: the affected application is crashing under heavy load. Thankfully the problem is possible to debug, albeit with moderate difficulty.

With all of that said, I did not come up with the diagnosis or the solution myself and while the problem in retrospect is quite obvious, at the time I did not even take the 2GB barrier in to consideration. As such, credit for the diagnosis and solution belongs to MadBoris of the Gas Powered Games forums who identified Supreme Commander as hitting the 2GB barrier and publishing a solution for it, which we will cover here.

Getting to the root of the problem requires additional tools beyond what comes with Windows XP or Vista. The task manger, the logical tool for this task, doesn't track the information we need to know. At best, the task manager can list how much data is in the various memory pools altogether, however this is not the metric we're looking for. What we actually need to know is how much of the application's share of the virtual address space has been allocated (the fundamental problem after all is when we run out of unused virtual address space to allocate) which due to a multitude of reasons can be much greater than the amount of memory in use by an application. For finding this we'll need Sysinternals' Process Explorer.

The above screenshot was taken as this paragraph was written, showcasing several different memory values for the running processes. The WS Private (Working Set Private) column lists how much physical memory the application is using, and this is the same column listed by the Windows Task Manager for memory usage. The Private Bytes column is the total amount of memory the application is using. Last but not least however is the Virtual Size column, which finally lists the amount of virtual address space allocated, the metric we're after.

We'll quickly focus on Microsoft Word as an example of the discrepancy between physical memory usage and private size. While Word is only using 17MB of physical memory, its virtual size is 170MB, 10 times the physical memory usage and 9 times the total memory usage. This is more or less a worse case scenario but it also points out just how misleading any memory usage - physical or total - is when trying to diagnose this kind of problem. Applications or games with high memory usage tend to not have nearly as a severe spread, but as we can see it's possible for an application to hit the 2GB barrier well before it needs 2GB of physical memory.

Getting to the meat of things however, the process readout for Supreme Commander basically says everything that needs to be said. In order to prove an exaggerated point, we started a 6 player game with 5 AI units on a maximum size map (81km x 81km) and set the game speed to maximum with maximum quality settings, while patching Supreme Commander and adjusting our system configuration to break the 2GB barrier. This actually isn't very playable for a variety of reasons (we'll overtax the CPU quickly due to all the AI units, slowing the game down dramatically) but represents not only what can happen in bad situations, but also is something that happens even in more manageable situations on smaller maps later in to the game.

Just at the start of the game, our virtual size was already in excess of 1.6GB, dangerously close to the 2GB barrier. 16 minutes in to the game, we have already hit a virtual size of 2.2GB, meaning had we not broken the 2GB barrier the game would have already crashed. At this point our total memory usage (private bytes) is only at 1.7GB, with 1.4GB of that in physical memory, nearly maximizing the RAM usage of our 2GB system. Our spread between memory usage and virtual size is .5GB, showcasing how deceptive memory usage is in identifying 2GB barrier issues.

Finally at 22 minutes in to the game, the game crashes as the virtual size has reached the 2.6GB barrier we have reconfigured this system for. Perhaps the most troubling thing at this point is that Supreme Commander is not aware that it ran out of user space in its virtual address pool, as we are kicked out of the game with a generic error message. Unfortunately Windows Vista reverted to a non-accelerated desktop at this point, preventing us from capturing a screenshot of the exact memory readouts or the error message.



A Case Study, Cont

Now that we've seen what can happen when we reach the 2GB barrier and how easy it can be to pass it, let's talk about what it took to remove it in the first place for Supreme Command. Supreme Commander is unfortunately not compiled as being large address aware and must be modified so that Windows thinks that it is. Microsoft supplies a tool as part of the Visual Studio suite, Editbin, that can do just this by rewriting the file header to report to Windows that it is in fact large address aware.

To the best of our knowledge Supreme Commander was programmed using proper programming practices and can handle the larger address space, and this is merely an issue of turning it on. However on a more pragmatic note this can break future patches, and disturbingly it doesn't set off any sort of multiplayer cheat detection in the game in spite of the fact that we have modified the executable in a very visible way. Out of the changes we need to make to deal with the 2GB barrier though, this is the safer of the two.

Update: Gas Powered Games contacted us and let us know that the modified executable not setting off any cheat detection is intentional. The game code is all in a DLL, and the executable is just a launcher; it's left unchecked because of the various Digital Rights Management systems used change the exectuable.

Changing Windows on the other hand to allocate more of the virtual address space to applications is in practice just as dangerous as we theorized earlier. We initially set our copy of Windows Vista to adjust the split to 1GB kernel mode, 3GB user mode, only for Vista to encounter a BSOD while booting. We had to settle for a 1.4GB/2.6GB split before Vista would boot, and even then Vista still periodically encounters a BSOD upon booting at that allocation or any other allocation other than 2GB/2GB. While what problems may occur and with what values is highly variable from system to system, this is why trying to move the barrier at all can be dangerous.

Having made the above changes, we also used the chance to take a look at system performance both in and outside of Supreme Commander, taking interest in to the effect of allocating more user mode space. As we theorized before, taking space away from the kernel may impact performance, and this is something that needs to be tested. For this we ran a cut-down version of our normal system test suite, with allocations of 1.4GB/2.6GB, 1.7GB/2.3GB, and the default 2GB/2GB.

Software Test Bed
Processor AMD Athlon 64 4600+
(2x2.4GHz/512KB Cache, S939)
RAM OCZ EL Platinum DDR-400 (4x512MB)
Motherboard ASUS A8N-SLI Premium (nForce 4 SLI)
System Platform Drivers NV 15.00
Hard Drive Maxtor MaXLine Pro 500GB SATA
Video Cards 1 x GeForce 8800GTX
Video Drivers NV ForceWare 158.45
Power Supply OCZ GameXStream 700W
Desktop Resolution 1600x1200
Operating System Windows Vista Ultimate 32-Bit
.

Starting with Supreme Commander, the only test here that can even utilize more than 2GB of user space, we do find a minor but consistent variation in performance. Increasing the user space improved Supreme Commander performance by about 1 frame per second, which at around a 3.5% performance improvement is right along the edge of either being significant or a normal variation. We repeated this test several times just to make sure that it wasn't a variation and the results remained consistent, so it doesn't appear to be a variation. With that said the instability caused by adjusting the user space size does not justify the extremely minor performance improvement.

Going down the list of benchmarks, we find that there is no notable change in performance in any of our benchmarks. Since none of these benchmarks are capable of using more user space we weren't expecting an increase, but this puts to rest the idea of a performance decrease. There does not appear to be a performance decrease in adjusting the user space size; if it boots it'll perform well.




Other Problems, Other Solutions

Although for this article we are focusing on Supreme Commander, there are other games and applications that are already known to encounter this exact problem. Company of Heroes developer Relic has warned that this problem can occur, especially in conjunction with using Direct3D 10 mode and STALKER developer GSC Game World has also seen this problem. Both have gone so far as to ship the latest versions of their games with the LARGEADDRESSAWARE flag.

Although not an exhaustive list, we have compiled a list of recent games and applications and if they are flagged or not. Ideally every possible game and application that can run in to the 2GB barrier will be flagged so that it can be worked around without modifying the application directly itself.

Large Address Aware Status
STALKER: Shadow of Chernobyl Yes
Lost Planet No
Company of Heroes Yes
Supreme Commander No
Enemy Territory: Quake Wars Yes
Battlefield 2142 No
Call of Juarez DX10 BM No
World of Warcraft No
The Elder Scrolls IV: Oblivion No
Photoshop CS3 Yes
Dreamweaver CS3 No
.

The importance of the above applications and games being flagged is not just to avoid the need to modify the executable, but also because there is another solution to the issue we haven't talked about so far. 64bit versions of Windows(i.e. XP and Vista) do not suffer from the traditional 2GB barrier, as all the kernel mode addressing is usually moved to well above the confines of the limited 32bit addressing area. As such, these versions of Windows don't need to have their space allocations adjusted for an application to gain access to more addressing space, bypassing the instability and any possible performance problems that occurs as a result of making this adjustment.

However in order to maintain compatibility with older applications, Windows still keeps the artificial 2GB barrier in place to keep from triggering any bugs that result from the extra space. So even if we use a 64bit version of Windows, the offending application must still either be flagged by the developer or modified by the user if we want to get past the 2GB barrier.

This in turn is an interesting proposition for developers who may be looking at ways to deal with the 2GB barrier, but don't want to make the jump to having to program and support 2 versions of their executables and associated compiled code. Although as a 32bit application there is still the 4GB hard limit, flagging executables lets a developer keep a single package and is perfectly safe on both stock 32bit and 64bit installations of Windows. They can effectively buy another 2 years or so before they need to actually offer 64bit executables, and can direct users to install a 64bit version of Windows if they are hitting the 2GB barrier, instead of directing them to make the much riskier user space modification.

Of course this is not all roses. As we covered in our Vista performance guide, there are still some issues with Windows Vista 64bit, and Windows XP 64bit is even worse as a result of having been orphaned quickly after its release. For prospective 64bit Vista users, they will still find that driver support is not as good as with the 32bit version of Vista, and 64bit drivers may not be as stable as the 32bit versions. There are also still lingering concerns over application compatibility and performance.

Things have gotten better since we published our February article, but we're not ready to write them off completely yet. Users getting along with 32bit versions of Windows right now will find themselves in between a rock and a hard place as more applications and games start to hit the barrier - they'll either need to make the user space modification or switch to a 64bit version of Windows when neither of these is a perfect solution. This leaves the less imperfect but also less fun option of simply turning down the settings on any affected games. Lower settings result in lower user space usage and reduced chances of crashing, but in spite of the highest stability and compatibility offered by this option we suspect few users will actually opt for it.




Final Words

As we wrap things up, we'll reserve a few words for game & application developers who are working on projects that will hit the barrier. Supreme Commander is extremely disappointing in how it handles running out of addressing space. Ideally we'd like for it not to crash, but realistically we'd settle for just an error message pointing out that it hit the 2GB barrier so that we could quickly reach a solution. Otherwise seemingly-random crashes tend to be one of the hardest problems to resolve as a user. Developers need to take care here to offer some kind of warning when the 2GB barrier is the problem; not everyone is or will be well read on the subject or have the time to diagnose it, when it's actually an easily solvable problem.

Getting back to the point at hand however, we feel that this is only going to be the tip of the iceberg. As games and applications continue to come out that push the boundaries of computer hardware and run afoul of the 2GB barrier, these problems will only pick up in pace. For many power users this experience will be a common occurrence, and for most it will be a frustrating experience.

We're at the front end of a messy transition, one that may not end for several years. Today, 32bit games will hit the 2GB barrier, and tomorrow games with support for large addressing will hit the 3GB/4GB barrier. Not until 64bit versions of games are ubiquitous will we be completely through this transition, and that will still be a few years away.

Until that point, stories such as these will continue to be told as users unknowingly hit the 2GB barrier. Developers need to be doing a better job at handling the issue via better crash reports, but our inner cynic says that they can't be solely in charge of the matter. So it will be up to users to diagnose their own problems and take the appropriate action, be it switching operating systems and/or modifying executables. A little knowledge in this regard can go a long way.

Log in

Don't have an account? Sign up now