I'm currently looking into memory consumption issues of a C++ application that I have written (a rendering engine using OpenGL) and have stumbled upon a rather unusual problem:
I'm using my own allocators basically everywhere in the system, which all obtain their memory from a default allocator which is using malloc()/free() for the actual memory.
It turns out that my application is always reserving at least 4096 bytes (the page size on my system) for every allocation through malloc(), even if the size is significantly smaller.
malloc(8) or even malloc(1) both result in an increase of memory of 4096 bytes. I'm tracking the used memory size through GetProcessMemoryInfo() directly before and after the allocation, as well as through the TaskManager (which basically shows the same values). Interestingly, using _msize(ptr) returns the correct size of the pointer.
I can only reproduce this behaviour within my own application, testing it with a new VS2012 C++ project did not yield the same results. This behaviour also seems independent of the current reserved size of the application, even with more than 10GB of free RAM it always reserves at least 4K per allocation.
I have no deep knowledge of the innards of the Windows operating system (if it is at all related to the OS), so if anyone has an idea what could cause this behaviour I would be greatful!
Check this, it's from 1993 :-)
http://msdn.microsoft.com/en-us/library/ms810603.aspx
This does not mean that the smallest amount of memory that can be allocated in a heap is 4096 bytes; rather, the heap manager commits pages of memory as needed to satisfy specific allocation requests. If, for example, an application allocates 100 bytes via a call to GlobalAlloc, the heap manager allocates a 100-byte chunk of memory within its committed region for this request. If there is not enough committed memory available at the time of the request, the heap manager simply commits another page to make the memory available.
You might be running with "full page heap"... a diagnostic mode to help more quickly catch memory access errors in your code.
Related
My question is a bit naive. I'm willing to have an overview as simple as possible and couldn't find any resource that made it clear to me. I am a developer and I want to understand what exactly is the memory displayed in the "memory" column by default in Windows Task Manager:
To make things a bit simpler, let's forget about the memory the process shares with other processes, and imagine the shared memory is negligible. Also I'm focussed on the big picture and mainly care for things at GB level.
As far as I know, the memory reserved by the process called "virtual memory", is partly stored in the main memory (RAM), partly on the disk. The system decides what goes where. The system basically keeps in RAM the parts of the virtual memory that is accessed sufficiently frequently by the process. A process can reserve more virtual memory than RAM available in the computer.
From a developer point of view, the virtual memory may only be partially allocated by the program through its own memory manager (with malloc() or new X() for example). I guess the system has no awareness of what part of the virtual memory is allocated since this is handled by the process in a "private" way and depends on the language, runtime, compiler... Q: Is this correct?
My hypothesis is that the memory displayed by the task manager is essentially the part of the virtual memory being stored in RAM by the system. Q: Is it correct? And is there a simple way to know the total virtual memory reserved by the process?
Memory on windows is... extremely complicated and asking 'how much memory does my process use' is effectively a nonsensical question. TO answer your questions lets get a little background first.
Memory on windows is allocated via ptr = VirtualAlloc(..., MEM_RESERVE, ...) and committed later with VirtualAlloc(ptr+n, MEM_COMMIT, ...).
Any reserved memory just uses up address space and so isn't interesting. Windows will let you MEM_RESERVE terabytes of memory just fine. Committing the memory does use up resources but not in the way you'd think. When you call commit windows does a few sums and basically works out (total physical ram + total swap - current commit) and lets you allocate memory if there's enough free. BUT the windows memory manager doesn't actually give you physical ram until you actually use it.
Later, however, if windows is tight for physical RAM it'll swap some of your RAM out to disk (it may compress it and also throw away unused pages, throw away anything directly mapped from a file and other optimisations). This means your total commit and total physical ram usage for your program may be wildly different. Both numbers are useful depending on what you're measuring.
There's one last large caveat - memory that is shared. When you load DLLs the code, the read-only memory [and even maybe the read/write section but this is COW'd] can be shared with other programs. This means that your app requires that memory but you cannot count that memory against just your app - after all it can be shared and so doesn't take up as much physical memory as a naive count would think.
(If you are writing a game or similar you also need to count GPU memory but I'm no expert here)
All of the above goodness is normally wrapped up by the heap the application uses and you see none of this - you ask for and use memory. And its just as optimal as possible.
You can see this by going to the details tab and looking at the various options - commit-size and working-set are really useful. If you just look at the main window in task-manager and it has a single value I'd hope you understand now that a single value for memory used has to be some kind of compromise as its not a question that makes sense.
Now to answer your questions
Firstly the OS knows exactly how much memory your app has reserved and how much it has committed. What it doesn't know is if the heap implementation you (or more likely the CRT) are using has kept some freed memory about which it hasn't released back to the operation system. Heaps often do this as an optimisation - asking for memory from the OS and freeing it back to the OS is a fairly expensive operation (and can only be done in large chunks known as pages) and so most of them keep some around.
Second question: Dont use that value, go to details and use the values there as only you know what you actually want to ask.
EDIT:
For your comment, yes, but this depends on the size of the allocation. If you allocate a large block of memory (say >= 1MB) then the heap in the CRT generally directly defers the allocation to the operating system and so freeing individual ones will actually free them. For small allocations the heap in the CRT asks for pages of memory from the operating system and then subdivides that to give out in allocations. And so if you then free every other one of those you'll be left with holes - and the heap cannot give those holes back to the OS as the OS generally only works in whole pages. So anything you see in task manager will show that all the memory is still used. Remember this memory isn't lost or leaked, its just effectively pooled and will be used again if allocations ask for that size. If you care about this memory you can use the crt heap statistics famliy of functions to keep an eye on those - specifically _CrtMemDumpStatistics
An object tries to allocate more memory then the allowed virtual address space (2Gb on win32). The std::bad_alloc is caught and the the object released. Process memory usage drops and the process is supposed to continue; however, any subsequent memory allocation fails with another std::bad_alloc. Checking the memory usage with VMMap showed that the heap memory appears to be released but it is actually marked as private, leaving no free space. The only thing to do seems to quit and restart. I would understand a fragmentation problem but why can't the process have the memory back after the release?
The object is a QList of QLists. The application is multithreaded. I could make a small reproducer but I could reproduce the problem only once, while most of the times the reproduces can use again the memory that was freed.
Is Qt doing something sneaky? Or maybe is it win32 delaying the release?
As I understand your problem, you are allocating large amounts of memory from heap which fails at some point. Releasing the memory back to the process heap does not necesarily mean that the heap manager actually frees the virtual pages that contain only free blocks of the heap (due to performance reasons). So, if you try to allocate a virtual memory directly (VirtualAlloc or VirtualAllocEx), the attempt fails since nearly all memory is consumed by the heap manager that has no chance of knowing about your direct allocation attempt.
Well, what you can possibly do with this. You can create your own heap (HeapCreate) and limit its maximum size. That may be quite tricky, since you need to persuade Qt to use this heap.
When allocating large amounts of memory, I recommend using VirtualAlloc rather than heap functions. If the requested size is >= 512 KB, the heap mamanger actually uses VirtualAlloc to satisfy your request. However, I don't know if it actually releases the pages when you free the region, or whether it starts using it for satisfying other heap allocation requests.
The answer by Martin Drab put me on the right path. Investigating about the heap allocations I found this old message that clarifies what is going on:
The issue here is that the blocks over 512k are direct calls to
VirtualAlloc, and everything else smaller than this are allocated out
of the heap segments. The bad news is that the segments are never
released (entirely or partially) so ones you take the entire address
space with small blocks you cannot use them for other heaps or blocks
over 512 K.
The problem is not Qt-related but Windows-related; I could finally reproduce it with a plain std::vector of char arrays. The default heap allocator leaves the address space segments unaltered even after the correspondent allocation was explicitly released. The ratio is that the process might ask again buffers of a similar size and the heap manager will save time reusing existent address segments instead of compacting older ones to create new ones.
Please note this has nothing to do with the amount of physical nor virtual memory available. It's only the address space that remains segmented, even though those segments are free. This is a serious problem on 32 bit architectures, where the address space is only 2Gb large (can be 3).
This is why the memory was marked as "private", even after being released, and apparently not usable by the same process for average-sized mallocs even though the committed memory was very low.
To reproduce the problem, just create a huge vector of chunks smaller than 512Kb (they must be allocated with new or malloc). After the memory is filled and then released (no matter if the limit is reached and an exception caught or the memory is just filled with no error), the process won't be able to allocate anything bigger than 512Kb. The memory is free, it's assigned to the same process ("private") but all the buckets are too small.
But there are worse news: there is apparently no way to force a compaction of the heap segments. I tried with this and this but had no luck; there is no exact equivalent of POSIX fork() (see here and here). The only solution is to do something more low level, like creating a private heap and destroying it after the small allocations (as suggested in the message cited above) or implementing a custom allocator (there might be some commercial solution out there). Both quite infeasible for large, existent software, where the easiest solution is to close the process and restart it.
So I have a native C++ application, and it needs to keep track of lots of things over long periods of time. It's running out of memory when task manager says that the process reaches somewhere between 800 and 1200 MB of memory, when the limit should be about 2GB.
I finally got a clue as to what's going on when I ran VMMap against my process, but that just gave me more questions. What I discovered:
The total size (type: total, column: size) is much larger than what task manager/process explorer were reporting
The total size seems to actually be the value that can't exceed 2GB before my program runs out of memory
The memory usage discrepency is almost entirely caused by "Private Data" - there is much more "size" than there is "committed". I have seen cases where there were around 800MB of committed private data, but a "Size" of around 1700MB.
The largest blocks of "Private Data" mainly consist of a pattern of pairs of one small sub-block (between 4K and 16K, generally) that has "Read/Write" protection and is fully committed, and one larger sub-block (between 90K and 400K) that has the "Reserved" protection and is not committed. This seems like a huge waste of resources. And there's usually one large (many megabytes) sub-block at the end that is "Reserved" and not committed.
The small part of the pair generally has strings that I recognize, while the larger block has no strings at all.
An example of these sub-block pairs: (not my application, but the idea is the same)
http://www.flickr.com/photos/95123032#N00/5280550393
It seems as though when one block of private data gets fully committed, a new block (usually the same or double the size of the previous largest block) gets allocated. Sounds fair. However, I have seen 3 blocks, all more than 100MB each, with less than 30MB committed. My application shouldn't behave in such a way (i.e. use up 400MB then shrink by 300MB in a matter of a few hours) that that would be possible.
As far as I can tell, the "Size" is the actual amount of virtual memory address space that has been allocated. "Committed" is the amount of "Size" that is actually being used (i.e. through calls to new/malloc). If that is indeed the case, then why is there such a huge discrepency between Size and Commited? And why is it allocating blocks that are multiple hundreds of megabytes in size?
The somewhat strange thing is that the behavior is entirely different when running on Windows 7. Whereas on 2003 Server, the application uses Private Data, on Windows 7, the application uses Heap. So...why? Why does VMMap show primarily private data usage on 2003, but primarily heap usage on 7? What's the difference? Well one difference is that I can't use the "Heap Allocations..." button in VMMap to see where all of that Private Data is being allocated.
I was beginning to wonder if excessive use of std::string was causing this problem since the strings that I recognized in the pairs (mentioned above) primarily consisted of strings stored in std::string that were frequently being created and destroyed (implying lots of memory allocation/deallocation). I converted all I could to use character arrays or using memory from a memory pool, but that seems to have had no effect. All of my other objects that are new/deleted frequently already have their own memory pools.
I also found out about the low fragmentation heap, so I tried enabling that, but it also didn't make a difference. I'm thinking it's because windows 2003 is not actually using the heap proper. VMMap shows that the low fragmentation heap is enabled, but since it's not actually used (i.e. it's using Private Data instead), it doesn't actually make a difference.
What actually seems to be happening is that those sub-block pairs are fragmenting the large Private Data blocks, which is causing the OS to allocate new blocks. Eventually, the fragmentation gets so bad that even though there's lots of uncommitted space, none of it seems to be usable and the process runs out of memory.
So my questions are:
Why is Windows Server 2003 using Private Data instead of Heap? Does it matter?
Is there a way to make Windows Server 2003 use Heap memory instead?
If so, would that improve my situation at all?
Is there any way to control how Private Data is allocated by the OS's memory allocator?
Is it possible to create my own custom heap and allocate off of that (without changing the majority of my codebase), and could that improve my situation? I know it's possible to make custom heaps, but as far as I can tell, you need to explicitly allocate from the custom heap instead of just calling new or just using STL containers normally.
Is there anything I'm missing or would be worth trying?
Private data is just a classification for all the memory that is not shared between two or more processes. Heap, relocated dll pages, stacks of all the threads in a process, unshared memory mapped files etc. fall in to the category of private data.
A request for memory from a process (via VirtualAlloc) would be failed by OS when one of the condition is true,
Contiguous virtual address space (not memory) is not available to hold the size requested.
The commit charge - the total memory committed memory of all the process and the operating system - has reached it's upper limit (that being RAM + page file size)
Apart from this Heap allocations may fail for their own reasons like, during expansion they would actually try to acquire more memory that the size of the allocation request that triggered the expansion - and if that fails they might just fail - though the actual requested size might be available through VirtualAlloc.
Few things that tend to accumulate memory are,
Having many heaps - they would hog memory - because they keep more in reserve. Many heaps means a lot of reserved space probably going unused. Heap compaction might help.
STL containers like vector and map might not shrink after elements are removed from them. Compacting them might help too.
Libraries like COM do some caching and thus accumulate memory - might help to investigate individual libraries to know about their memory hogging habits.
when task manager says that the process reaches somewhere between 800 and 1200 MB of memory, when the limit should be about 2GB
Probably you are looking at "Working Set" in Task Manager whereas the 2GB limit is on Virtual Memory. Task Manager doesn't show the amount of VM reserved; it will show the amount committed.
"Committed" is the amount of "Size" that is actually being used (i.e. through calls to new/malloc).
No, Committed means you actually touched the page (i.e. went to the address and did a load or store operation).
1.Why is Windows Server 2003 using Private Data instead of Heap?
According to "Windows Sysinternals Administrator's Reference" by Mark Russinovich and Aaron Margosis:
Private Data memory is memory that is allocated by VirtualAlloc and
that is not further handled by the Heap Manager or by the .Net runtime
So either your program is managing its memory differently on the two OS's, or VMmap is unable to detect the way in which this memory is being managed as a heap on Windows Server 2003.
4.Is there anything I'm missing or would be worth trying?
You can run with a 3GB limit on 32-bit OS and a 4GB limit for 32-bit processes on 64-bit OS. Google for "/3G" and "/4G".
A great source of information on this kind of stuff is the book "Windows Internals 6th Edition" by Mark Russinovich, David Solomon and Alex Ionescu.
I'm encountering the same issue.
In windows 2003, my application causes out of memory exception in a C++/CLI module when trying to allocate a 22MB array using gcnew. The same process works fine in windows 7.
VMMap shows the "private data" entry is almost 2 GB in win2003. After I enable /3GB flag, this entry also increased to almost 3GB. The "heap" entry is about 14 MB and the "managed heap" is nothing!
In windows 7, the "private data" is only 62 MB, the "heap" is 316MB and "managed heap" is 397MB. The entire memory usage is much less than win2003.
I'm using OS X 10.5.6. I have a C++ application with a GUI made with Qt. When I start my application it uses 30 MB of memory (reported by OS X Activity Monitor RSIZE).
I use this application to read in text files to memory, parse the data and finally visualize it. If I open (read to memory, parse, visualize) a 9 MB text file Activity Monitor reports that my application grows from the initial 30 MB of memory used to 103 MB.
Now if the file is closed and the parsed and visualized data is deleted, the size of the application stays at 103 MB. This sounds like a memory leak to me. But if I open the file again, reading it to memory, parsing it and visualizing it the application stays at 103 MB. No matter how many times I open the file (or another file of the same size) my applications memory use stays more or less unchanged. Does this mean that it's not a memory leak? If it was a leak the memory usage should keep growing each time the file is opened should it not? The only time it grows is if I open a larger file than the previous one.
Is this normal? Is this platform or library dependent? Is this some sort of caching done by the OS or libraries?
This seems relatively normal, but all OS are slightly different.
In the usual application life cycle the application requests memory from the OS and is given memory in huge chunks that it manages (via the C/C++ standard libraries). As the application acquires/releases memory this is all done internally within the application without recourse to the OS until the application has non left then a call is made to the OS for another huge chunk.
Memory is not usually returned to the OS until the application quits (though most OS do provide the mechanisms to do this if required and some C/C++ standard libraries will use this facility). Instead of returning memory to the OS the application uses everything it has been given and does its own memory management.
Though note: just because an application has memory does not mean that this is currently taking up RAM on a chip. Memory that is sporadically used or has not been used in a while will be temporarily saved onto secondary/tertiary storage.
Activity Monitor: Is not a very useful tool for checking memory usage, as you have discovered it only displays the total actually allocated to the application. It does not display any information about how the application has internal allocated this memory (most of which could be deallocated). Check the folder where XCode lives, there are a broad set of tools for examining how an application works provided with the development environment.
NB: I have avoided using terms like page etc as these are nothing to-do with C/C++/Objective C and are all OS/hardware specific.
This sounds like a memory fragmentation problem to me. Memory is acquired from OS in pages. Pages are usually several kB large, e.g. 4 kB. Now if you allocate, let's say, 100 MB of RAM for your objects, your memory allocator (new / malloc) asks OS for many free memory pages and allocates your objects on them. When your application finishes computations and deletes some, even most of, but not all of the previously allocated objects, the objects that were not deleted hold pages and disallow to return them back to the OS. A page can be returned only if all its memory is freed. So in extreme cases, an 8B object can prevent a full 4kB page from being returned.
The OS reports memory consumption by calculating the number of pages committed to your application, not by counting how much space your objects take on these pages. So if your memory is fragmented, the pages remain committed, and reported memory consumption stays the same.
The memory consumption does not grow on the second run, because on the second run the allocator reuses, previously acquired, mostly free pages.
The solution for fragmentation problems is usually preallocating a larger block of memory and using a custom memory allocator to allocate objects with similar lifetime from this larger block. Then, when you're done with objects, delete the whole block.
Another solution is switching to a fully garbage collected environment like Java or .NET - they have compacting garbage collectors that prevent such problems.
My Windows/C++ application allocates ~1Gb of data in memory with the operator new and processes this data. After processing the data is deleted.
I noticed that if I run the processing again without exiting the application, the second call to the operatornew to allocate ~1Gb of data fails.
I would expect Windows to deliver the memory back. Could this be managed in a better way with some other Win32 calls etc.?
I don't think this is a Windows problem. Check if you used delete or delete[] correctly. Perhaps it would help if you post the code that is allocating/freeing the memory.
In most runtime environments memory allocated to an application from the operating system remains in the application, and is seldom returned back to the operating system. Freeing a memory block allows you to reuse the block from within the application, but does not free it to the operating system to make it available to other applications.
Microsoft's C runtime library tries to return memory back to the operating system by having _heapmin_region call _heap_free_region or _free_partial_region which call VirtualFree to release data to the operating system. However, if whole pages in the corresponding region are not empty, then they will not be freed. A common cause of this is the bookkeeping information and storage caching of C++ containers.
This could be due to memory fragmentation (in reality, address space fragmentation), where various factors have contributed to your program address space not having a 1gb contiguous hole available. In reality, I suspect a bug in your memory management (sorry) - have you run your code through leak detection?
Since you are using very large memory blocks, you should consider using VirtualAlloc() and VirtualFree(), as they allow you to allocate and free pages directly, without the overhead (in memory and time) of interacting with a heap manager.
Since you are using C++, it is worth noting that you can construct C++ objects in the memory you allocate this way by using placement new.
This problem is almost certainly memory fragmentation. On 32 bit Windows the largest contiguous region you can allocate is about 1.1GB (because various DLLs in your EXE preventa larger contiguous range being available). If after deallocating a memory allocation (or a DLL load, or a memory mapped file) ends up in the middle of your previous 1GB region then there will no longer be a 1GB region available for your next call to new to allocate 1GB. Thus it will fail.
You can visualize this process using VM Validator.