32 bit process memory leak on x64 processor

32 bit process memory leak on x64 processor - c++

I made a 32 bit c++ program which is always run on x64 machines. A client is saying that running 5 instances of this process is using causing all of their 24 GB RAM to be used.
Immediately I would think there was a memory leak but I am unable to reproduce this memory issue.
Doing a bit more research into memory allocations I found Memory Limits for Windows. This tells me that a 32 bit process will not be allowed more than 2 GB of memory by the OS.
Is it at all possible that a 32 bit application on a 64 bit windows will be able to have a memory leak use more than 2 GB?
P.S. Killing the process results in the memory being restored to normal operating levels (about 2 GB).
[EDIT] I have now seen that most of the memory being used is Kernel Memory: Nonpaged. Does this mean that it is some system resource which is being used and not a memory leak?
[UPDATE] The problem is not a driver or memory leak. It seems to be a process handle leak. There is something which is continuously starting new handles to a file. This was found using perfmon to monitor the process. As a rule of thumb if a process has more than 2000 to 3000 handles you should investigate. Especially if that number is increasing every few seconds.

As stated in Memory Limits for Windows, limit for 32-bit process on 64-bit system is 4 GB with IMAGE_FILE_LARGE_ADDRESS_AWARE set, thus your 5 processes could consume 20 GB of memory total. This can be set through LARGEADDRESSAWARE option, which expands virtual address space.

It is obviously possible, as the client is experiencing it.
(maybe you did expect like some ideas how? You don't provide much info or code, so in a very general way I would suggest the memory allocation may be not in the app itself directly. Maybe the app itself will take only ~1-2GiB, but will also stir the OS to do something stupid, like virtual memory map of file of size of 4+GiB, or other devices lock, where the device driver does something stupid, etc...)
You should profile the memory usage on the target system to have idea how much your code does use. Then you can try to search for the rest of it.

In general, using the /LARGEADDRESSAWARE:ON linker switch can allow a 32bit application use more than 2GB. Also using the Address Windowing Extensions can allow using more memory. But if you aren't using any of these techniques in your application then it should have the 2GB range. However since the upper 2GB range is used for system resources maybe you are leaking system resources?

Related

virtual memory exhausted: Cannot allocate memory with 8 gb ram

I have code which is 32 bit and i think compiler too. But when i am compiling my c++ code, its taking more than 2 GB memory. As per my understanding on 32 bit system no process can take more than 2 GB.
Any suggestions how can i achieve this? I found lot of posts on this but those
are not helpful as they are adding swaps. But i already have 8 GB ram. So my problem is not available memory, its size of compiling process which could not be more than 2 GB.
Even i have 8 GB ram I have tried to adding swap and that's also not working.

On Windows 32 Bit, the maximum amount of RAM is 4 GB. By default, this address space is seperated into kernel memory and process memory, both being 2 GB large. Most programs don't need more than 2 GB of memory, but if you do, you can enlarge the process memory by specifying the /3GB switch, leaving less memory for the kernel.
Read here for more information: https://msdn.microsoft.com/en-us/library/windows/hardware/ff556232(v=vs.85).aspx
Edit: Keep in mind that if you want to make use of this additional memory, you also need to compile your program with the /LARGEADDRESSAWARE switch. That will set a flag in the Process Environment Block of your program, making Windows aware that your program might need more than 2 GB of memory.

Since you stated you have 8GB of RAM, I am presuming your OS and CPU are actually 64-bit. So you are asking how to make a 32-bit program access more than 2GB of virtual address space, on a 64-bit OS, i.e. running under WOW64.
In that case, using the /LARGEADDRESSAWARE linker option in Visual Studio will give your app 4GB of virtual address space, under WOW64. You won't see any benefit in 32-bit Windows, unless you force your users to boot their OS with a certain flag.
I believe your app doesn't really need more than 2GB of RAM, but it's impossible to tell without knowing any details.
In any case, the one correct answer is: switch to a 64-bit app, which will get you 8TB of virtual address space. That's 8 terabytes.

allocate more than 1 GB memory on 32 bit XP

I'v run into an odd problem, my process cannot allocate more than what seems to be slightly below 1 GiB. Windows Task Manager "Mem Usage" column shows values close to 1 GiB when my software gives a bad_alloc exception. Yes, i'v checked that the value passed to memory allocation is sensible. ( no race condition / corruption exists that would make this fail ). Yes, I need all this memory and there is no way around it. ( It's a buffer for images, which cannot be compressed any further )
I'm not trying to allocate the whole 1 GiB memory in one go, there a few allocations around 300 MiB each. Would this cause problems? ( I'll try to see if making more smaller allocations works any better ). Is there some compiler switch or something else that I must set in order to get past 1 GiB? I've seen others complaining about the 2 GiB limit, which would be fine for me.. I just need little bit more :). I'm using VS 2005 with SP1 and i'm running it on a 32 bit XP and it's in C++.

On a 32-bit OS, a process has a 4GB address space in total.
On Windows, half of this is off-limits, so your process has 2GB.
This is 2GB of contiguous memory. But it gets fragmented. Your executable is loaded in at one address, each DLL is loaded at another address, then there's the stack, and heap allocations and so on. So while your process probably has enough free address space, there are no contiguous blocks large enough to fulfill your requests for memory. So making smaller allocations will probably solve it.
If your application is compiled with the LARGEADDRESSAWARE flag, it will be allowed to use as much of the remaining 2GB as Windows can spare. (And the value of that depends on your platform and environment.
for 32-bit code running on a 64-bit OS, you'll get a full 4-GB address space
for 32-bit code running on a 32-bit OS without the /3GB boot switch, the flag means nothing at all
for 32-bit code running on a 32-bit OS with the /3GB boot switch, you'll get 3GB of address space.
So really, setting the flag is always a good idea if your application can handle it (it's basically a capability flag. It tells Windows that we can handle more memory, so if Windows can too, it should just go ahead and give us as large an address space as possible), but you probably can't rely on it having an effect. Unless you're on a 64-bit OS, it's unlikely to buy you much. (The /3GB boot switch is necessary, and it has been known to cause problems with drivers, especially video drivers)

Allocating big chunks of continuous memory is always a problem.
It is very likely to get more memory in smaller chunks
You should redesign your memory structures.

You are right to suspect the larger 300MB allocations. Your process will be able to get close to 2GB (3 if you use the /3GB boot.ini switch and LARGEADDRESSAWARE link flag), but not as a large contiguous block.
Typical solutions for this are to break up the requests into tiles or strips of fixed size (say 256x256x4 bytes) and write an intermediate class to hide this representation detail.
You can quickly verify this by writing a small allocation loop that allocate blocks of different sizes.

You could also check this function from MSDN. 1GB rings a bell from here:
This parameter must be greater than or equal to 13 pages (for example,
53,248 on systems with a 4K page size), and less than the system-wide
maximum (number of available pages minus 512 pages). The default size
is 345 pages (for example, this is 1,413,120 bytes on systems with a
4K page size).
Here they mentioned that default maximum number of pages allowed for a process is 345 pages which is slightly more than 1GB.

When I have a few big allocs like that to do, I use the Windows function VirtualAlloc, to avoid stressing the default allocator.
Another way forward might be to use nedmalloc in your project.

Dealing with large amounts of data in c++

I have an application that sometimes will utilize a large amount of data. The user has the option to load in a number of files which are used in a graphical display. If the user selects more data than the OS can handle, the application crashes pretty hard. On my test system, that number is about the 2 gigs of physical RAM.
What is a good way to handle this situation? I get the "bad alloc" thrown from new and tried trapping that but I still run into a crash. I feel as if I'm treading in nasty waters loading this much data but it is a requirement of this application to handle this sort of large data load.
Edit: I'm testing under a 32 bit Windows system for now but the application will run on various flavors of Windows, Sun and Linux, mostly 64 bit but some 32.
The error handling is not strong: It simply wraps the main instantiation code with a try catch block, the catch looking for any exception per another peer's complaint of not being able to trap the bad_alloc everytime.
I think you guys are right, I need a memory management system that doesn't load all of this data into the RAM, it just seems like it.
Edit2: Luther said it best. Thanks guy. For now, I just need a way to prevent a crash which with proper exception handling should be possible. But down the road I'll be implementing that acception solution.

There is the STXXL library which offers STL like containers for large Datasets.
http://stxxl.sourceforge.net/
Change "large" into "huge". It is designed and optimized for multicore processing of data sets that fit on terabyte-disks only. This might suffice for your problem, or the implementation could be a good starting point to tailor your own solution.
It is hard to say anything about your application crashing, because there are numerous hiccups involved when it comes to tight memory conditions: You could hit a hard address space limit (for example by default 32-bit Windows only has 2GB address space per user process, this can be changed, http://www.fmepedia.com/index.php/Category:Windows_3GB_Switch_FAQ ), or be eaten alive by the OOM killer ( Not a mythical beast:, see http://lwn.net/Articles/104179/ ).
What I'd suggest in any case to think about a way to keep the data on disk and treat the main memory as a kind of Level-4 cache for the data. For example if you have, say, blobs of data, then wrap these in a class which can transparently load the blobs from disk when they are needed and registers to some kind of memory manager which can ask some of the blob-holders to free up their memory before the memory conditions become unbearable. A buffer cache thus.

The user has the option to load in a number of files which are used in a graphical display.
Usual trick is not to load the data into memory directly, but rather use the memory mapping mechanism to make the files look like memory.
You need to make sure that the memory mapping is done in read-only mode to allow the OS to evict it from RAM if it is needed for something else.
If the user selects more data than the OS can handle, the application crashes pretty hard.
Depending on OS it is either: application is missing some memory allocation error handling or you really getting to the limit of available virtual memory.
Some OSs also have an administrative limit on how large the heap of application can grow.
On my test system, that number is about the 2 gigs of physical RAM.
It sounds like:
your application is 32-bits and
your OS uses the 2GB/2GB virtual memory split.
To avoid hitting the limit, your need to:
upgrade your app and OS to 64-bit or
tell OS (IIRC patch for Windows; most Linuxes already have it) to use 3GB/1GB virtual memory split. Some 32-bit OSs are using 2GB/2GB memory split: 2GB of virtual memory for kernel and 2 for the user application. 3/1 split means 1GB of VM for kernel, 3 for the user application.

How about maintaining a header table instead of loading the entire data. Load the actual page when the user requests the data.
Also use some data compression algorithms (like 7zip, znet etc.) which reduce the file size. (In my project they reduced the size from 200MB to 2MB)

I mention this because it was only briefly mentioned above, but it seems a "file paging system" could be a solution. These systems read large data sets in "chunks" by breaking the files into pieces. Once written, they generally "just work" and you hopefully won't have to tinker with them anymore.
Reading Large Files
Variable Length Data in File--Paging
New Link below with very good answer.
Handling Files greater than 2 GB
Search term: "file paging lang:C++" add large or above 2GB for more. HTH

Not sure if you are hitting it or not, but if you are using Linux, malloc will typically not fail, and operator new will typically not throw bad_alloc. This is because Linux will overcommit, and instead kill your process when it decides the system doesn't have enough memory, possibly at a page fault.
See: Google search for "oom killer".
You can disable this behavior with:
echo 2 > /proc/sys/vm/overcommit_memory

Upgrade to a 64-bit CPU, 64-bit OS and 64-bit compiler, and make sure you have plenty of RAM.
A 32-bit app is restricted to 2GB of memory (regardless of how much physical RAM you have). This is because a 32-bit pointer can address 2^32 bytes == 4GB of virtual memory. 20 years ago this seemed like a huge amount of memory, so the original OS designers allocated 2GB to the running application and reserved 2GB for use by the OS. There are various tricks you can do to access more than 2GB, but they're complex. It's probably easier to upgrade to 64-bit.

Out of memory (?) problem on Win32 (vs. Linux)

I have the following problem:
A program run on a windows machine (32bit, 3.1Gb memory, both VC++2008 and mingw compiled code) fails with a bad_alloc exception thrown (after allocating around 1.2Gb; the exception is thrown when trying to allocate a vector of 9 million doubles, i.e. around 75Mb) with plenty of RAM still available (at least according to task manager).
The same program run on linux machines (32bit, 4Gb memory; 32bit, 2Gb memory) runs fine with peak memory usage of around 1.6Gb. Interestingly the win32 code generated by mingw run on the 4Gb linux machine under wine also fails with a bad_alloc, albeit at a different (later) place then when run under windows...
What are the possible problems?
Heap fragmentation? (How would I know? How can this be solved?)
Heap corruption? (I have run the code with pageheap.exe enabled with no errors reported; implemented vector access with bounds checking --- again no errors; the code is essentially free of pointers, only std::vectors and std::lists are used. Running
the program under Valgrind (memcheck) consumes too much memory and ends prematurely, but does not find any errors)
Out of memory??? (There should be enough memory)
Moreover, what could be the reason that the windows version fails while the
linux version works (and even on machines with less memory)? (Also note that
the /LARGEADDRESSAWARE linker flag is used with VC+2008 if that can have any effect)
Any ideas would be much appreciated, I am at my wits end with this... :-(

It has nothing to do with how much RAM is in your system. You are running out of virtual address space. For a 32 bit windows OS process, you get a 4GB virtual address space (irrespective of how much RAM you are using) out of 2GB for the user-mode (3GB in case of LARGEADDRESSAWARE) and 2 GB for kernel. When you do try to allocate memory using new, OS will try to find the contiguos block of virtual memory which is large enough to satisfy the memory allocation request. If your virtual address space is badly fragmented or you are asking for a huge block of memory then it will fail throwing a bad_alloc exception. Check how much virtual memory your process is using.

With Windows XP x86 and the default settings, 1.2 GB is about all the address space you have left for your heap after system libraries, your code, the stack and other stuff get their share. Note that largeaddressaware requires you to boot with the /3GB boot flag to try to give your process up to 3GB. The /3GB flag causes instability on a lot of XP systems, which is why it's not enabled by default.
Server variants of Windows x86 give you more address space, both by using the 3GB/1GB split and by using PAE to allow the use of your full 4GB of RAM.
Linux x86 uses a 3GB/1GB split by default.
A 64 bit OS would give you more address space, even for a 32bit process.

Are you compiling in Debug mode? If so, the allocation will generate a huge amount of debugging data which might generate the error you have seen, with a genuine out-of-memory. Try in Release to see if that solves the problem.
I have only experienced this with VC, not MinGW, but then I haven't checked either, this could still explain the problem.

To elaborate more about the virtual memory:
Your application fails when it tries to allocate a single 90MB array, and there is no contiguous space of virtual memory where this can fit left. You might be able to get a little farther if you switched to data structures that use less memory -- perhaps some class that approximates a huge array by using a tree where all data is kept in 1MB (or so) leaf nodes. Also, under c++ when doing a huge amount of allocations, it really helps if all those big allocations are of same size, this helps reusing memory and keeps fragmentation much lower.
However, the correct thing to do in the long run is simply to switch to a 64-bit system.

How much memory should you be able to allocate?

Background: I am writing a C++ program working with large amounts of geodata, and wish to load large chunks to process at a single go. I am constrained to working with an app compiled for 32 bit machines. The machine I am testing on is running a 64 bit OS (Windows 7) and has 6 gig of ram. Using MS VS 2008.
I have the following code:
byte* pTempBuffer2[3];
try
{
//size_t nBufSize = nBandBytes*m_nBandCount;
pTempBuffer2[0] = new byte[nBandBytes];
pTempBuffer2[1] = new byte[nBandBytes];
pTempBuffer2[2] = new byte[nBandBytes];
}
catch (std::bad_alloc)
{
// If we didn't get the memory just don't buffer and we will get data one
// piece at a time.
return;
}
I was hoping that I would be able to allocate memory until the app reached the 4 gigabyte limit of 32 bit addressing. However, when nBandBytes is 466,560,000 the new throws std::bad_alloc on the second try. At this stage, the working set (memory) value for the process is 665,232 K So, it I don't seem to be able to get even a gig of memory allocated.
There has been some mention of a 2 gig limit for applications in 32 bit Windows which may be extended to 3 gig with the /3GB switch for win32. This is good advice under that environment, but not relevant to this case.
How much memory should you be able to allocate under the 64 bit OS with a 32 bit application?

As much as the OS wants to give you. By default, Windows lets a 32-bit process have 2GB of address space. And this is split into several chunks. One area is set aside for the stack, others for each executable and dll that is loaded. Whatever is left can be dynamically allocated, but there's no guarantee that it'll be one big contiguous chunk. It might be several smaller chunks of a couple of hundred MB each.
If you compile with the LargeAddressAware flag, 64-bit Windows will let you use the full 4GB address space, which should help a bit, but in general,
you shouldn't assume that the available memory is contiguous. You should be able to work with multiple smaller allocations rather than a few big ones, and
You should compile it as a 64-bit application if you need a lot of memory.

on windows 32 bit, the normal process can take 2 GB at maximum, but with /3GB switch it can reach to 3 GB (for windows 2003).
but in your case I think you are allocating contiguous memory, and so the exception occured.

You can allocate as much memory as your page file will let you - even without the /3GB switch, you can allocate 4GB of memory without much difficulty.
Read this article for a good overview of how to think about physical memory, virtual memory, and address space (all three are different things). In a nutshell, you have exactly as much physical memory as you have RAM, but your app really has no interaction with that physical memory at all - it's just a convenient place to store the data that in your virtual memory. Your virtual memory is limited by the size of your pagefile, and the amount your app can use is limited by how much other apps are using (although you can allocate more, providing you don't actually use it). Your address space in the 32 bit world is 4GB. Of those, 2 GB are allocated to the kernel (or 1GB if you use the /3BG switch). Of the 2GB that are left, some is going to be used up by your stack, some by the program you are currently running, (and all the dlls, etc..). It's going to get fragmented, and you are only going to be able to get so much contiguous space - this is where your allocation is failing. But since that address space is just a convenient way to access the virtual memory you have allocated for you, it's possible to allocate much more memory, and bring chunks of it into your address space a few at a time.
Raymond Chen has an example of how to allocate 4GB of memory and map part of it into a section of your address space.
Under 32-bit Windows, the maximum allocatable is 16TB and 256TB in 64 bit Windows.
And if you're really into how memory management works in Windows, read this article.

During the ElephantsDream project the Blender Foundation with Blender 3D had similar problems (though on Mac). Can't include the link but google: blender3d memory allocation problem and it will be the first item.
The solution involved File Mapping. Haven't tried it myself but you can read up on it here: http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx

With nBandBytes at 466,560,000, you are trying to allocate 1.4 GB. A 32-bit app typically only has access to 2 GB of memory (more if you boot with /3GB and the executable is marked as large address space aware). You may be hard pressed to find that many blocks of contiguous address space for your large chunks of memory.
If you want to allocate gigabytes of memory on a 64-bit OS, use a 64-bit process.

You should be able to allocate a total of about 2GB per process. This article (PDF) explains the details. However, you probably won't be able to get a single, contiguous block that is even close to that large.

Even if you allocate in smaller chunks, you couldn't get the memory you need, especially if the surrounding program has unpredictable memory behavior, or if you need to run on different operating systems. In my experience, the heap space on a 32-bit process caps at around 1.2GB.
At this amount of memory, I would recommend manually writing to disk. Wrap your arrays in a class that manages the memory and writes to temporary files when necessary. Hopefully the characteristics of your program are such that you could effectively cache parts of that data without hitting the disk too much.

Sysinternals VMMap is great for investigating virtual address space fragmentation, which is probably limiting how much contiguous memory you can allocate. I recommend setting it to display free space, then sorting by size to find the largest free areas, then sorting by address to see what is separating the largest free areas (probably rebased DLLs, shared memory regions, or other heaps).
Avoiding extremely large contiguous allocations is probably for the best, as others have suggested.
Setting LARGE_ADDRESS_AWARE=YES (as jalf suggested) is good, as long as the libraries that your application depends on are compatible with it. If you do so, you should test your code with the AllocationPreference registry key set to enable top-down virtual address allocation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js