Address Windowing Extension - c++

I have an 32bit application with very large memory requirements.
I noticed that there is something called Address Windowing Extension.
However I haven't found much information in regards to how to use it and also what disadvantages and problems one might run into while using this?

It shouldn't work on versions of Windows at 64bits (read here http://msdn.microsoft.com/en-us/library/aa366778.aspx Intel and AMD's specification of PAE does support the x86-64 architecture but the software layer of Microsoft's PAE (the API), called AWE, is not supported on 64 bit editions of Windows, so Windows Vista 64 bit cannot attribute more than 4 GiB of RAM for a 32 bit application.).
Even on Windows 32 bits there is a "license" limit on the amount of memory usable (same page shows all the limits).
And clearly it's complex to program :-) It's like using EMS on the old 8086.

Well the truth is that you can use AWE from a 32bits application running inside a Windows OS 64bit, and you don't need PAE. For example MS SQL Server (before 2012 version) can be configured in this mode.
But unless you have a very specific requirements, probably is far a better option to port to 64bits.
You have several disvantages:
Need to run with a user with SeLockMemoryPrivilege
The memory can not be shared with other process. It is allocated in physical memory. Leaving less memory to the OS and other applications (with AllocateUserPhysicalPages).
You need a virtual address in order to access such memory. So you can have a memory windows of 4GiB with LARGE_ADDRESS_AWARE flag.
If you want to access more thant 4GiB you have to map/unmap those physical pages (with MapUserPhysicalPages).
This article from 1999 explain how to use such API.

Related

Malloc fails to allocate memory of 1 GiB on Windows

I am trying to allocate memory of 1 GiB using malloc() on Windows and it fails. I know malloc's uncertainty. What is best solution to allocate memory of 1 GiB?
If you are using a 32-bit (x86) application, you are unlikely to be able to allocate a 1 GB continuous chunk of memory (and certainly can't allocate 2GB). As to why this happens, you should see the venerable presentation "Why Your Windows Game Won't Run In 2,147,352,576 Bytes" (Gamefest 2007) attached to this blog post.
You should build your application as an x64 native (x64) application instead.
You could enable /LARGEADDRESSAWARE and stick with a 32-bit application on Windows x64, but it has a number of quirks and may limit what kinds of 3rd party support libraries you can use. A better solution is to use x64 native if possible.
Use the /LARGEADDRESSAWARE flag to tell Windows that you're not doing funny things with addresses. This unlocks an extra 2GB of address space on Win64.

Application with LARGEADDRESSAWARE flag set getting less virtual memory

I have a 32-bit application consisting one EXE and multiple DLLs. The EXE has been built with /LARGEADDRESSAWARE flag set. So I expect on a 64-bit OS I should get 4 GB of user address space. But on some 64-bit Win 7 systems I am getting only 2 GB of user address space.
The physical memory is 8 GB if that matters. What could be reason for this behavior?
After browsing through MSDN, I found the following:
On http://msdn.microsoft.com/en-us/library/windows/desktop/aa366770(v=vs.85).aspx (the page for MEMORYSTATUSEX which is used by GlobalMemoryStatusEx (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366589(v=vs.85).aspx) ) the description for ullTotalVirtual is:
this value is approximately 2 GB for most 32-bit processes on an x86 processor and approximately 3 GB for 32-bit processes that are large address aware running on a system with 4-gigabyte tuning enabled.
The 4GB tuning page is: http://msdn.microsoft.com/en-us/library/windows/desktop/bb613473(v=vs.85).aspx and it says something like:
On 64-bit editions of Windows, 32-bit applications marked with the IMAGE_FILE_LARGE_ADDRESS_AWARE flag have 4 GB of address space available.
Itanium editions of Windows Server 2003: Prior to SP1, 32-bit processes have only 2 GB of address space available.
Also, the memory limits page (http://msdn.microsoft.com/en-us/library/aa366778.aspx#memory_limits) can come handy if you want to determine the total memory your system supports.
However the real useful information comes from Mark Russinowich's blog: http://blogs.technet.com/b/markrussinovich/archive/2008/07/21/3092070.aspx
While 4GB is the licensed limit for 32-bit client SKUs, the effective limit is actually lower and dependent on the system's chipset and connected devices. The reason is that the physical address map includes not only RAM, but device memory as well, and x86 and x64 systems map all device memory below the 4GB address boundary to remain compatible with 32-bit operating systems that don't know how to handle addresses larger than 4GB.
So the conclusion is that yes, this might depend on the configuration of the system. Maybe you can complete your question with a table, with the amount of the memory you get on each system and some important system configuration settings, and we might discover a pattern in this case.
The problem is that an Application has a whole has to be large Adress aware - so that pointers are to be treated as unsigned.
If however on "some" system some of your used DLL is not large adressaware this renders your whole program not large address aware.
http://blogs.msdn.com/b/oldnewthing/archive/2010/09/22/10065933.aspx

What to watch out for when writing 32-bit software on 64-bit machine?

I am purchasing a comfortable laptop for development but the only operating system available to me is 64-bit (Win7) , now I am basically aware that 64-bit has 8-byte integers and can utilize more RAM, that is about it.
My programming will vary (C++, sometimes PHP) but would like to know:
Can I build my C++ application to be 32-bit portable (run in a 32-bit computer without needing to build under a 32-bit virtual machine?)
What are the simple gotchas to watch out for when writing an application (casting, etc)
Processors have been 64 bit for some time. I'm perplexed about why people are afraid to make the move to a 64 bit operating system. A 32 bit OS can't address much more than 3Gb of RAM, so that's good enough reason to make the upgrade in my book!
When you're coding, the biggest difference I've encountered to look out for is the size of a pointer!
On a 32-bit compiled program, a pointer is normally 4 bytes. On a 64 bit compiled program, a pointer is normally 8 bytes.
Why does this matter?
Lets say your program uses sockets to communicate a data structure from one process to another. Maybe the server process is 32-bit and the client process is 64 bit.
While the struct might be defined identically in both 32 and 64 bit programs, the 64 bit exe will reserve 8 bytes per pointer (and structures normally contain pointers to other structs as in linked lists etc.).
This can lead to data misalignment when a 32 bit exe communicates a struct to a 64 bit exe.
In (almost?) all cases, communicating pointer values between processes is meaningless anyway, e.g. their data doesn't matter and can be omitted.
So you might think that communicating pointer values would be an uncommon practise - but the easy way to communicate a struct is to memcpy its contents over a socket, pointers and all!
This is the most significant snag I've found so far, when coding 64 bit clients, when our server software is 32-bit.
The simplest thing would be to simply build a 32-bit executable. With Visual Studio, just choose Win32 as the build type. With gcc, use the -m32 switch.
Specifically for windows, take a look at the msdn documentation of WOW64.
Windows 64 bit runs 32 bit applications in an emulator, so you can build an run 32 bit apps for your system. This has some (rather positive) effects on your app, e.g. virtual address space for your process increases, as the OS may use higher 64 bit addresses your 32 bit process isn't even aware of.
The MSVC++ data type sizes are also documented on MSDN. But if you are worried about casting, you should use bigger types that'll certainly fit your needs. The C++ standard doesn't define the exact size of types (afaik), only their relative sizes (short is shorter than int and so on). So, you cannot rely on the exact size of these types unless you use some int32 or __int32 which obviously won't change for 64 bit apps.
Cross compiling (for different platforms, pointer sizes, endianess, etc.) has been around for ages and as long as you use the right tools and flags to build your 32-bit executable, the build platform really shouldn't matter.
In case of the Microsoft compiler, it should as simple as firing up Visual Studio and compiling your program using the default "Win32 configuration. In case you prefer the command line, be sure to select the 32-bit tools by invoking the Visual Studio Command Prompt (as opposed the x64 version).
BTW, if you are running 64-bit Enterprise Server, you can turn on the Hypervisor Role, install 32-bit Win7 inside a virtual machine and actually test your built program. Finally, it's always a good idea to test the program on the actual target platform on which it wil execute :)...

Can you allocate a very large single chunk of memory ( > 4GB ) in c or c++?

With very large amounts of ram these days I was wondering, it is possible to allocate a single chunk of memory that is larger than 4GB? Or would I need to allocate a bunch of smaller chunks and handle switching between them?
Why???
I'm working on processing some openstreetmap xml data and these files are huge. I'm currently streaming them in since I can't load them all in one chunk but I just got curious about the upper limits on malloc or new.
Short answer: Not likely
In order for this to work, you absolutely would have to use a 64-bit processor.
Secondly, it would depend on the Operating System support for allocating more than 4G of RAM to a single process.
In theory, it would be possible, but you would have to read the documentation for the memory allocator. You would also be more susceptible to memory fragmentation issues.
There is good information on Windows memory management.
A Primer on physcal and virtual memory layouts
You would need a 64-bit CPU and O/S build and almost certainly enough memory to avoid thrashing your working set. A bit of background:
A 32 bit machine (by and large) has registers that can store one of 2^32 (4,294,967,296) unique values. This means that a 32-bit pointer can address any one of 2^32 unique memory locations, which is where the magic 4GB limit comes from.
Some 32 bit systems such as the SPARCV8 or Xeon have MMU's that pull a trick to allow more physical memory. This allows multiple processes to take up memory totalling more than 4GB in aggregate, but each process is limited to its own 32 bit virtual address space. For a single process looking at a virtual address space, only 2^32 distinct physical locations can be mapped by a 32 bit pointer.
I won't go into the details but This presentation (warning: powerpoint) describes how this works. Some operating systems have facilities (such as those described Here - thanks to FP above) to manipulate the MMU and swap different physical locations into the virtual address space under user level control.
The operating system and memory mapped I/O will take up some of the virtual address space, so not all of that 4GB is necessarily available to the process. As an example, Windows defaults to taking 2GB of this, but can be set to only take 1GB if the /3G switch is invoked on boot. This means that a single process on a 32 bit architecture of this sort can only build a contiguous data structure of somewhat less than 4GB in memory.
This means you would have to explicitly use the PAE facilities on Windows or Equivalent facilities on Linux to manually swap in the overlays. This is not necessarily that hard, but it will take some time to get working.
Alternatively you can get a 64-bit box with lots of memory and these problems more or less go away. A 64 bit architecture with 64 bit pointers can build a contiguous data structure with as many as 2^64 (18,446,744,073,709,551,616) unique addresses, at least in theory. This allows larger contiguous data structures to be built and managed.
The advantage of memory mapped files is that you can open a file much bigger than 4Gb (almost infinite on NTFS!) and have multiple <4Gb memory windows into it.
It's much more efficent than opening a file and reading it into memory,on most operating systems it uses the built-in paging support.
This shouldn't be a problem with a 64-bit OS (and a machine that has that much memory).
If malloc can't cope then the OS will certainly provide APIs that allow you to allocate memory directly. Under Windows you can use the VirtualAlloc API.
it depends on which C compiler you're using, and on what platform (of course) but there's no fundamental reason why you cannot allocate the largest chunk of contiguously available memory - which may be less than you need. And of course you may have to be using a 64-bit system to address than much RAM...
see Malloc for history and details
call HeapMax in alloc.h to get the largest available block size
Have you considered using memory mapped files? Since you are loading in really huge files, it would seem that this might be the best way to go.
It depends on whether the OS will give you virtual address space that allows addressing memory above 4GB and whether the compiler supports allocating it using new/malloc.
For 32-bit Windows you won't be able to get single chunk bigger than 4GB, as the pointer size is 32-bit, thus limiting your virtual address space to 4GB. (You could use Physical Address Extension to get more than 4GB memory; however, I believe you have to map that memory into the virtualaddress space of 4GB yourself)
For 64-bit Windows, the VC++ compiler supports 64-bit pointers with theoretical limit of the virtual address space to 8TB.
I suspect the same applies for Linux/gcc - 32-bit does not allow you, whereas 64-bit allows you.
As Rob pointed out, VirtualAlloc for Windows is a good option for this, as is an anonymouse file mapping. However, specifically with respect to your question, the answer to "if C or C++" can allocate, the answer is NO THIS IS NOT SUPPORTED EVEN ON WIN7 RC 64
In the PE/COFF specification for exe files, the field which specifies the HEAP reserve and HEAP commit, is a 32 bit quantity. This is in-line with the physical size limitations of the current heap implmentation in the windows CRT, which is just short of 4GB. So, there is no way to allocate more than 4GB from C/C++ (technicall the OS support facilities of CreateFileMapping and VirtualAlloc/VirtualAllocNuma etc... are not C or C++).
Also, BE AWARE that there are underlying x86 or amd64 ABI construct's known as the page table's. This WILL in effect do what you are concerened about, allocating smaller chunks for your larger request, even though this is happining in kernel memory, there is an effect on the overall system, these tables are finite.
If you are allocating memory in such grandious purportions, you would be well advised to allocate based on the allocation granularity (which VirtualAlloc enforces) and also to identify optional flags's or methods to enable larger pages.
4kb pages were the initial page size for the 386, subsaquently the pentium added 4MB. Today, the AMD64 (Software Optimization Guide for AMD Family 10h Processors) has a maximum page table entry size of 1GB. This mean's for your case here, let's say you just did 4GB, it would require only 4 unique entries in the kernel's directory to locate\assign and permission your process's memory.
Microsoft has also released this manual that articulates some of the finer points of application memory and it's use for the Vista/2008 platform and newer.
Contents
Introduction. 4
About the Memory Manager 4
Virtual Address Space. 5
Dynamic Allocation of Kernel Virtual
Address Space. 5
Details for x86 Architectures. 6
Details for 64-bit Architectures. 7
Kernel-Mode Stack Jumping in x86
Architectures. 7
Use of Excess Pool Memory. 8
Security: Address Space Layout
Randomization. 9
Effect of ASLR on Image Load
Addresses. 9
Benefits of ASLR.. 11
How to Create Dynamically Based
Images. 11
I/O Bandwidth. 11
Microsoft SuperFetch. 12
Page-File Writes. 12
Coordination of Memory Manager and
Cache Manager 13
Prefetch-Style Clustering. 14
Large File Management 15
Hibernate and Standby. 16
Advanced Video Model 16
NUMA Support 17
Resource Allocation. 17
Default Node and Affinity. 18
Interrupt Affinity. 19
NUMA-Aware System Functions for
Applications. 19
NUMA-Aware System Functions for
Drivers. 19
Paging. 20
Scalability. 20
Efficiency and Parallelism.. 20
Page-Frame Number and PFN Database. 20
Large Pages. 21
Cache-Aligned Pool Allocation. 21
Virtual Machines. 22
Load Balancing. 22
Additional Optimizations. 23
System Integrity. 23
Diagnosis of Hardware Errors. 23
Code Integrity and Driver Signing. 24
Data Preservation during Bug Checks. 24
What You Should Do. 24
For Hardware Manufacturers. 24
For Driver Developers. 24
For Application Developers. 25
For System Administrators. 25
Resources. 25
If size_t is greater than 32 bits on your system, you've cleared the first hurdle. But the C and C++ standards aren't responsible for determining whether any particular call to new or malloc succeeds (except malloc with a 0 size). That depends entirely on the OS and the current state of the heap.
Like everyone else said, getting a 64bit machine is the way to go. But even on a 32bit machine intel machine, you can address bigger than 4gb areas of memory if your OS and your CPU support PAE. Unfortunately, 32bit WinXP does not do this (does 32bit Vista?). Linux lets you do this by default, but you will be limited to 4gb areas, even with mmap() since pointers are still 32bit.
What you should do though, is let the operating system take care of the memory management for you. Get in an environment that can handle that much RAM, then read the XML file(s) into (a) data structure(s), and let it allocate the space for you. Then operate on the data structure in memory, instead of operating on the XML file itself.
Even in 64bit systems though, you're not going to have a lot of control over what portions of your program actually sit in RAM, in Cache, or are paged to disk, at least in most instances, since the OS and the MMU handle this themselves.

64bit Memory allocation

I've been asked to create a Delphi compatible dll in C++ to do simple 64bit memory management.
The background is that the system in Delphi needs to allocate a lots of chunks of memory that would go well outside 32bit addressable space. The Delphi developer explained to me that he could not allocate memory with the Delphi commands available to him. He says that he can hold a 64bit address, so he just wants to call a function I provide to allocate the memory and return a 64bit pointer to him. Then another function to free up the memory later.
Now, I only have VS 2008 at my disposal so firstly I'm not even sure I can create a Delphi compatible dll in the first place.
Any Delphi experts care to help me out. Maybe there is a way to achieve what he requires without re-inventing the wheel. Other developers must have come across this before in Delphi.
All comments appreciated.
Only 64 bit processes can address 64 bit memory. A 64 bit process can only load 64 bit dlls and 32 bits processes can only load 32 bits dlls. Delphi's compiler can only make 32 bits binaries.
So a 32 bits Delphi exe can not load your 64 bit c++ dll. It could load a 32 bit c++ dll, but then that dll wouldn't be able to address the 64 bit memory space. You are kind of stuck with this solution.
Delphi could, with the right compiler options and Windows switches address 3GB of memory without problems. Even more memory could be accessed by a 32 bits process if it uses Physical Address Extension. It then needs to swap memory pages in and out of the 32 bits memory through the use of Address Windowing Extensions.
Delphi pointers are 32-bit. Period. Your Delphi developer may be able to 'store' the 64-bit values you want to return to him, but he can't access the memory that they point to, so it's pretty futile.
Previously, I'd written:-
A 64-bit version of Delphi is on
Codegear/Embarcadero's road map
for "middle of 2009". Product quality
seems to be (at last!) taking
precedence over hitting ship dates
exactly, so don't hold your breath...
But, in August 2010, Embarcadero published a new roadmap here. This doesn't give specific dates, but mentions a 64-bit Compiler Preview, with Projected Availability, 1st Half of 2011.
You might take a look at Free Pascal as it includes a 64 bit version and is mostly Delphi compatible syntax.
In order to allocate memory shared by multiple process, you should use a memory mapped file.
The code available at http://www.delphifaq.com/faq/delphi_windows_API/f348.shtml can be used to communicate between a 32 bit and a 64 bit process.
Here are the steps:
Create a memory mapped file, either on disk, either on memory;
Create a mutex to notify file change;
One end write some data to the memory mapped file;
Then it flags the mutex;
Other end receive the mutex notification;
Then it reads the data from the memory mapped file.
It's up to you to create a custom binary layout in the memory mapped file, in order to share any data.
By design, memory mapped files are fast (it's a kernel-level / x86 CPU feature), and can handle huge memory (up to 1 GB for a 32 bit process, from my experiment).
This kind of communication is used by http://cc.embarcadero.com/Author/802978 to call any 64 bit dll from a 32 bit Delphi program.
You might also want to add a way to pin and unpin that 64-bit pointer to a 32-bit memory address. Since this is Delphi, I'm pretty sure it's Windows specific, so you might as well use Address Windowing Extensions. That way, you can support allocating, freeing, and pinning and unpinning memory to a 32-bit address range and still take advantage of a 64-bit memory allocation space. Assuming that the user will actually commit the memory such that it fits in the 32-bit virtual address space.