Out of memory (?) problem on Win32 (vs. Linux) - c++

I have the following problem:
A program run on a windows machine (32bit, 3.1Gb memory, both VC++2008 and mingw compiled code) fails with a bad_alloc exception thrown (after allocating around 1.2Gb; the exception is thrown when trying to allocate a vector of 9 million doubles, i.e. around 75Mb) with plenty of RAM still available (at least according to task manager).
The same program run on linux machines (32bit, 4Gb memory; 32bit, 2Gb memory) runs fine with peak memory usage of around 1.6Gb. Interestingly the win32 code generated by mingw run on the 4Gb linux machine under wine also fails with a bad_alloc, albeit at a different (later) place then when run under windows...
What are the possible problems?
Heap fragmentation? (How would I know? How can this be solved?)
Heap corruption? (I have run the code with pageheap.exe enabled with no errors reported; implemented vector access with bounds checking --- again no errors; the code is essentially free of pointers, only std::vectors and std::lists are used. Running
the program under Valgrind (memcheck) consumes too much memory and ends prematurely, but does not find any errors)
Out of memory??? (There should be enough memory)
Moreover, what could be the reason that the windows version fails while the
linux version works (and even on machines with less memory)? (Also note that
the /LARGEADDRESSAWARE linker flag is used with VC+2008 if that can have any effect)
Any ideas would be much appreciated, I am at my wits end with this... :-(

It has nothing to do with how much RAM is in your system. You are running out of virtual address space. For a 32 bit windows OS process, you get a 4GB virtual address space (irrespective of how much RAM you are using) out of 2GB for the user-mode (3GB in case of LARGEADDRESSAWARE) and 2 GB for kernel. When you do try to allocate memory using new, OS will try to find the contiguos block of virtual memory which is large enough to satisfy the memory allocation request. If your virtual address space is badly fragmented or you are asking for a huge block of memory then it will fail throwing a bad_alloc exception. Check how much virtual memory your process is using.

With Windows XP x86 and the default settings, 1.2 GB is about all the address space you have left for your heap after system libraries, your code, the stack and other stuff get their share. Note that largeaddressaware requires you to boot with the /3GB boot flag to try to give your process up to 3GB. The /3GB flag causes instability on a lot of XP systems, which is why it's not enabled by default.
Server variants of Windows x86 give you more address space, both by using the 3GB/1GB split and by using PAE to allow the use of your full 4GB of RAM.
Linux x86 uses a 3GB/1GB split by default.
A 64 bit OS would give you more address space, even for a 32bit process.

Are you compiling in Debug mode? If so, the allocation will generate a huge amount of debugging data which might generate the error you have seen, with a genuine out-of-memory. Try in Release to see if that solves the problem.
I have only experienced this with VC, not MinGW, but then I haven't checked either, this could still explain the problem.

To elaborate more about the virtual memory:
Your application fails when it tries to allocate a single 90MB array, and there is no contiguous space of virtual memory where this can fit left. You might be able to get a little farther if you switched to data structures that use less memory -- perhaps some class that approximates a huge array by using a tree where all data is kept in 1MB (or so) leaf nodes. Also, under c++ when doing a huge amount of allocations, it really helps if all those big allocations are of same size, this helps reusing memory and keeps fragmentation much lower.
However, the correct thing to do in the long run is simply to switch to a 64-bit system.

Related

32 bit process memory leak on x64 processor

I made a 32 bit c++ program which is always run on x64 machines. A client is saying that running 5 instances of this process is using causing all of their 24 GB RAM to be used.
Immediately I would think there was a memory leak but I am unable to reproduce this memory issue.
Doing a bit more research into memory allocations I found Memory Limits for Windows. This tells me that a 32 bit process will not be allowed more than 2 GB of memory by the OS.
Is it at all possible that a 32 bit application on a 64 bit windows will be able to have a memory leak use more than 2 GB?
P.S. Killing the process results in the memory being restored to normal operating levels (about 2 GB).
[EDIT] I have now seen that most of the memory being used is Kernel Memory: Nonpaged. Does this mean that it is some system resource which is being used and not a memory leak?
[UPDATE] The problem is not a driver or memory leak. It seems to be a process handle leak. There is something which is continuously starting new handles to a file. This was found using perfmon to monitor the process. As a rule of thumb if a process has more than 2000 to 3000 handles you should investigate. Especially if that number is increasing every few seconds.
As stated in Memory Limits for Windows, limit for 32-bit process on 64-bit system is 4 GB with IMAGE_FILE_LARGE_ADDRESS_AWARE set, thus your 5 processes could consume 20 GB of memory total. This can be set through LARGEADDRESSAWARE option, which expands virtual address space.
It is obviously possible, as the client is experiencing it.
(maybe you did expect like some ideas how? You don't provide much info or code, so in a very general way I would suggest the memory allocation may be not in the app itself directly. Maybe the app itself will take only ~1-2GiB, but will also stir the OS to do something stupid, like virtual memory map of file of size of 4+GiB, or other devices lock, where the device driver does something stupid, etc...)
You should profile the memory usage on the target system to have idea how much your code does use. Then you can try to search for the rest of it.
In general, using the /LARGEADDRESSAWARE:ON linker switch can allow a 32bit application use more than 2GB. Also using the Address Windowing Extensions can allow using more memory. But if you aren't using any of these techniques in your application then it should have the 2GB range. However since the upper 2GB range is used for system resources maybe you are leaking system resources?

Is it true that 32Bit program will be out of memory, if other programs use too much, in 64bit windows?

I am developing a 32 bit application and got out of memory error.
And I noticed that my Visual Studio and a plugin (other apps too) used too much memory which is around 4 or 5 GB.
So I suspected that these program use up all the memory addresses where my program is able to find free memory.
I suppose that 32 bit can only use the first 4 GB, other memory it can not use at all.
I don't know if I am correct with this, other wise I will look for other answers, like I have bug in my code.
Your statement of
I suppose that 32bit can only use the first 4 giga byte, othere momery
it can not use at all.
is definitely incorrect. In a 64-bit OS, all applications can use all of the memory, regardless of what bitness it is, thanks to the translation table for virtual to physical memory being 64-bit.
Some really ancient hardware may not allow DMA to addresses above 4GB, but I really hope most of that is in the junk-yard by now.
If the system as a whole is running low on memory, it will affect all applications more or less equally.
However, a 32-bit application can only, by default, use the lower 2GB of the virtual address range (although these 2GB can be placed anywhere in the physical memory, as described above by means of a 64-bit translation table). You can extend this to nearly 4GB (3GB in a 32-bit OS, and subject to the /3GB boot flag in this case) by using /LARGEADDRESSAWARE in your linking command - this simply tells the OS that your application will "understand" that addresses can be negative, and thus will operate correctly with addresses over 2GB.
Any system can be brought down by a too heavy load.
But in normal use in Windows and any other virtual memory OS, the memory consumption of other programs does not much affect any given program execution.
Getting an out of memory error is unusual, but it can happen if you make a large allocation or if you declare a large local automatic variable. It can also happen if you fail to properly deallocate memory that's no longer used, i.e. if the program is leaking memory. For a 32-bit program on a 64-bit machine it's then not memory itself that's used up, but available address space within the program.

Compilation hitting virtual memory limitation in g++ 4.7.1?

I'm compiling some code that makes a heave use of templates (its based on boost::msm framework). When compiled with g++ 4.7.1 the cc1plus process reaches about 2.4 Gb of RAM size and than fails with "virtual memory exhausted: Cannot allocate memory" error.
I'm using a 32-bit compiler (switching to 64 bit is not an option ATM), the machine itself is a 64-bit Ubuntu with 16Gb of RAM, the compilation is performed under a 64-bit chroot of Debian wheezy distribution. At the time of compilation there is plenty of RAM available, so if the compilation is to fail because of lack of physically available RAM it is to reach 4Gb first. I tried playing with "ulimit -m" options, setting to different values and setting it to smaller sizes causes the compiler to fail earlier but when left to "unlimited" it fails at the above mentions 2+ Gb.
So I guess something else must be limiting me. Maybe someone encountered a similar issue and knows a way to change the limitation?
In 32-bit application (including compilers), you typically get somewhere between 2 and 3GB that is available for usermode in virtual space. This is caused by a combination of memory space being reserved, memory space fragmentation (there is virtual memory available, just not a big enough chunk to hold whatever size block that new or malloc is requesting), and "memory reservation", where process has allocated a fairly large chunk of memory, but it's not actually using all of it, so it's not "populated".
Any particular reason you can't use a 64-bit GCC to generate 32-bit code - using -M32? That would be my solution.

allocate more than 1 GB memory on 32 bit XP

I'v run into an odd problem, my process cannot allocate more than what seems to be slightly below 1 GiB. Windows Task Manager "Mem Usage" column shows values close to 1 GiB when my software gives a bad_alloc exception. Yes, i'v checked that the value passed to memory allocation is sensible. ( no race condition / corruption exists that would make this fail ). Yes, I need all this memory and there is no way around it. ( It's a buffer for images, which cannot be compressed any further )
I'm not trying to allocate the whole 1 GiB memory in one go, there a few allocations around 300 MiB each. Would this cause problems? ( I'll try to see if making more smaller allocations works any better ). Is there some compiler switch or something else that I must set in order to get past 1 GiB? I've seen others complaining about the 2 GiB limit, which would be fine for me.. I just need little bit more :). I'm using VS 2005 with SP1 and i'm running it on a 32 bit XP and it's in C++.
On a 32-bit OS, a process has a 4GB address space in total.
On Windows, half of this is off-limits, so your process has 2GB.
This is 2GB of contiguous memory. But it gets fragmented. Your executable is loaded in at one address, each DLL is loaded at another address, then there's the stack, and heap allocations and so on. So while your process probably has enough free address space, there are no contiguous blocks large enough to fulfill your requests for memory. So making smaller allocations will probably solve it.
If your application is compiled with the LARGEADDRESSAWARE flag, it will be allowed to use as much of the remaining 2GB as Windows can spare. (And the value of that depends on your platform and environment.
for 32-bit code running on a 64-bit OS, you'll get a full 4-GB address space
for 32-bit code running on a 32-bit OS without the /3GB boot switch, the flag means nothing at all
for 32-bit code running on a 32-bit OS with the /3GB boot switch, you'll get 3GB of address space.
So really, setting the flag is always a good idea if your application can handle it (it's basically a capability flag. It tells Windows that we can handle more memory, so if Windows can too, it should just go ahead and give us as large an address space as possible), but you probably can't rely on it having an effect. Unless you're on a 64-bit OS, it's unlikely to buy you much. (The /3GB boot switch is necessary, and it has been known to cause problems with drivers, especially video drivers)
Allocating big chunks of continuous memory is always a problem.
It is very likely to get more memory in smaller chunks
You should redesign your memory structures.
You are right to suspect the larger 300MB allocations. Your process will be able to get close to 2GB (3 if you use the /3GB boot.ini switch and LARGEADDRESSAWARE link flag), but not as a large contiguous block.
Typical solutions for this are to break up the requests into tiles or strips of fixed size (say 256x256x4 bytes) and write an intermediate class to hide this representation detail.
You can quickly verify this by writing a small allocation loop that allocate blocks of different sizes.
You could also check this function from MSDN. 1GB rings a bell from here:
This parameter must be greater than or equal to 13 pages (for example,
53,248 on systems with a 4K page size), and less than the system-wide
maximum (number of available pages minus 512 pages). The default size
is 345 pages (for example, this is 1,413,120 bytes on systems with a
4K page size).
Here they mentioned that default maximum number of pages allowed for a process is 345 pages which is slightly more than 1GB.
When I have a few big allocs like that to do, I use the Windows function VirtualAlloc, to avoid stressing the default allocator.
Another way forward might be to use nedmalloc in your project.

How to reserve bottom 4GB VM in an x64 C++ app

Working on porting a 32bit Windows C++ app to 64 bit. Unfortunately, the code uses frequent casting in both directions between DWORD and pointer values.
One of the ideas is to reserve the first 4GB of virtual process space as early as possible during process startup so that all subsequent calls to reserve memory will be from virtual addresses greater than 4 GB. This would cause an access violation error any unsafe cast from pointer to DWORD and then back to pointer and would help catch errors early.
When I look at the memory map of a very simple one line C++ program, there are many libraries loaded within bottom 4GB? Is there a way to make sure that all libraries, etc get loaded only above 4GB?
Thanks
Compile your project with /Wp64 switch (Detect 64-bit Portability Issues) and fix all warnings.
As a programmer, what do I need to worry about when moving to 64-bit windows?
You could insert calls to VirtualAlloc() as early as possible in your application, to allocate memory in the lower 4GB. If you use the MEM_RESERVE parameter, then only virtual memory space is allocated and so this will only use a very small amount of actual RAM.
However, this will only help you for memory allocated from the heap - any static data in your program will have already been allocated before WinMain(), and so you won't be able to change it's location.
(As an aside, even if you could reserve memory before your main binary was loaded, I think that the main binary needs to be loaded at a specific address - unless it is a built as a position-independent executable.)
Bruce Dawson posted code for a technique to reserve the bottom 4 GB of VM:
https://randomascii.wordpress.com/2012/02/14/64-bit-made-easy/
It reserves most of the address space (not actual memory) using VirtualAlloc, then goes after the process heap with HeapAlloc, and finishes off the CRT heap with malloc. It is straightforward, fast, and works great. On my machine it does about 3.8 GB of virtual allocations and only 1 MB of actual allocations.
The first time I tried it, I immediately found a longstanding bug in the project I was working on. Highly recommended.
The best solution is to fix these casts ...
You may get away with it truncated the pointer regardless (Same as casting to a POINTER_32) because I believe windows favours the lower 4GB for your application anyway. This is in no way guaranteed, though. You really are best off fixing these problem.
Search the code for "(DWORD)" and fix any you find. There is no better solution ...
What you are asking for is, essentially, to run 64-bit code in a 32-bit memory mode with AWE enabled (ie lose all the real advantages of 64-bit). I don't think microsoft could be bothered providing this for so little gain ... and who can blame them?