Windows on x86 and 4GB of RAM

A few months ago I decided to get a shiny new gaming system from Dell. I eventually decided to go with the XPS 720 with pretty much all the bells and whistles I thought were reasonable. One of which was going with 4GB of RAM. After all, just about everyone agrees that more RAM leads to better performance. This machine also came with Windows Vista Home Premium, I'm the early adopter type so I saw no issue in this.


To my surprise, Windows only saw 3GB of RAM! Since I do a little hobby OS development, I immediately had an idea of what could cause this. I jumped to the conclusion that PAE was not properly enabled, and started sifting through different sites to find the proper way to enable PAE. Of course, if this were the solution, I wouldn't really have much to say here... It turns out that PAE is in fact enabled by default on Windows Vista if your system supports it, after all, you need to in order to make use of the no-execute bit on x86.

At this point, I should probably explain what PAE is and why it should make 4GB available without a problem. Basically, there are two ways to look at memory on an x86 system. There is "physical memory," which is your system memory from start to finish in order. Note, this includes memory-mapped devices. This is how the memory would look if you used an OS that does not use paging. Then there is "virtual memory." The idea here is that your physical memory is viewed as blocks ("pages") that can be mapped to any location you wish in your virtual memory space. For example, as the OS writer, I could ask the system to map the 4K of memory at physical address 0x11223000 to address 0x00000000. In this case, any reads and writes that programs do in the first 4K of memory will occur at the physical address associated with it. Virtual memory is also what allows modern OSes to protect one application from another. It does this by switching the virtual memory layout during task switch so that each process has a unique view of what memory looks like.

The problem is that memory-mapped devices occupy physical address space as well, so if you have a 512MB video card, then that's half a gig of that 4GB physical memory space which can't be RAM. Here's where PAE comes in. Before PAE, your physical memory was limited to 4GB and your virtual memory was limited to 4GB (per process). PAE changes this to be 64GB (36-bits) of physical address while keeping the 4GB of virtual addresses. Sure, no single process can use more than 4GB at a time, but all the RAM could be put to use. There is one catch though, your memory-mapped devices must still be below the 4GB mark because they use 32-bit addressing when doing DMA. So the natural solution is to relocate the RAM that is displaced by devices to above the 4GB mark. Most modern motherboards support this.

It turns out that for "compatibility reasons" Microsoft has opted to simply ignore any RAM it sees above 4GB. Some people are convinced it is a hardware problem, saying:

To be perfectly clear, this isn't a Windows problem-- it's an x86 hardware problem. The memory hole is quite literally invisible to the CPU, no matter what 32-bit operating system you choose. The following diagram from Intel illustrates just where the memory hole is:

This simply isn't the case (Sorry Jeff, I love your blog BTW). It is a design choice by the windows engineers to take the easy way out. A perfectly viable solution is to divide your memory up into types, namely "suitable for DMA" and "not suitable for DMA." I know this works, because Linux does it. In fact, here's a screen shot of my shiny new dell using all 4GB of my RAM.

kinfocenter showing 4GB

This isn't a 64-bit build, it's 32-bit (arch reports "i686" not "x86-64") with the 64GB support config option used (which basically means enable PAE). There are plenty of people saying online that all 32-bit operating systems have this problem. This isn't true.

Jeff gets it right with this statement though:

As far as 32-bit Vista is concerned, the world ends at 4,096 megabytes. That's it. That's all there is. No más.

Addressing more than 4 GB of memory is possible in a 32-bit operating system, but it takes nasty hardware hacks like 36-bit PAE extensions in the CPU, together with nasty software hacks like the AWE API. Unless the application is specifically coded to be take advantage of these hacks, it's confined to 4 GB. Well, actually, it's stuck with even less-- 2 GB or 3 GB of virtual address space, at least on Windows.

Except that PAE isn't a nasty hack by any stretch, in fact, Vista uses it already as previously mentioned. User space software doesn't need to be specially programed to take advantage of the extra memory since it will only see 4GB at a time anyway (minus kernel land). Also AWE-API is used to address the 4GB of virtual memory limitation not the physical limitations! What AWE does is allow an application to selectively map physical RAM locations to user space virtual locations. A program can thus can access much more than the 2GB user space that windows will give it by default, just not all at the same time.

Microsoft of course does support upwards of 4GB on it's x86-64 builds, that's all fine and dandy, but my Dell didn't come with that. And to my knowledge Dell (as a company, not the hardware) doesn't support x86-64 officially yet. So maybe I'll take that up with them and demand a 64-bit copy.

All in all, it's a little lame that Windows doesn't support all the RAM that it could on 32-bit builds. It really wouldn't be hard, but it would likely require that driver writers start passing a flag to the allocator specifying that the memory be OK for DMA. I see that this is a problem since there are simply tons of drivers. But at the least, the extra RAM could have been used for things where DMA is clearly not involved (pretty much all user space uses since only drivers should be doing DMA). Also Microsoft could have done something clever like add a new flag to the driver's PE header which when not present would make the allocator only return addresses below 4GB and if set would allow the driver to use a more robust allocator API.

I hope this shed some light on the subject, because there is unfortunately a lot of mis-information out there.