What is using so much uncommitted "private data" on Windows Server 2003?

What is using so much uncommitted "private data" on Windows Server 2003? - c++

So I have a native C++ application, and it needs to keep track of lots of things over long periods of time. It's running out of memory when task manager says that the process reaches somewhere between 800 and 1200 MB of memory, when the limit should be about 2GB.
I finally got a clue as to what's going on when I ran VMMap against my process, but that just gave me more questions. What I discovered:
The total size (type: total, column: size) is much larger than what task manager/process explorer were reporting
The total size seems to actually be the value that can't exceed 2GB before my program runs out of memory
The memory usage discrepency is almost entirely caused by "Private Data" - there is much more "size" than there is "committed". I have seen cases where there were around 800MB of committed private data, but a "Size" of around 1700MB.
The largest blocks of "Private Data" mainly consist of a pattern of pairs of one small sub-block (between 4K and 16K, generally) that has "Read/Write" protection and is fully committed, and one larger sub-block (between 90K and 400K) that has the "Reserved" protection and is not committed. This seems like a huge waste of resources. And there's usually one large (many megabytes) sub-block at the end that is "Reserved" and not committed.
The small part of the pair generally has strings that I recognize, while the larger block has no strings at all.
An example of these sub-block pairs: (not my application, but the idea is the same)
http://www.flickr.com/photos/95123032#N00/5280550393
It seems as though when one block of private data gets fully committed, a new block (usually the same or double the size of the previous largest block) gets allocated. Sounds fair. However, I have seen 3 blocks, all more than 100MB each, with less than 30MB committed. My application shouldn't behave in such a way (i.e. use up 400MB then shrink by 300MB in a matter of a few hours) that that would be possible.
As far as I can tell, the "Size" is the actual amount of virtual memory address space that has been allocated. "Committed" is the amount of "Size" that is actually being used (i.e. through calls to new/malloc). If that is indeed the case, then why is there such a huge discrepency between Size and Commited? And why is it allocating blocks that are multiple hundreds of megabytes in size?
The somewhat strange thing is that the behavior is entirely different when running on Windows 7. Whereas on 2003 Server, the application uses Private Data, on Windows 7, the application uses Heap. So...why? Why does VMMap show primarily private data usage on 2003, but primarily heap usage on 7? What's the difference? Well one difference is that I can't use the "Heap Allocations..." button in VMMap to see where all of that Private Data is being allocated.
I was beginning to wonder if excessive use of std::string was causing this problem since the strings that I recognized in the pairs (mentioned above) primarily consisted of strings stored in std::string that were frequently being created and destroyed (implying lots of memory allocation/deallocation). I converted all I could to use character arrays or using memory from a memory pool, but that seems to have had no effect. All of my other objects that are new/deleted frequently already have their own memory pools.
I also found out about the low fragmentation heap, so I tried enabling that, but it also didn't make a difference. I'm thinking it's because windows 2003 is not actually using the heap proper. VMMap shows that the low fragmentation heap is enabled, but since it's not actually used (i.e. it's using Private Data instead), it doesn't actually make a difference.
What actually seems to be happening is that those sub-block pairs are fragmenting the large Private Data blocks, which is causing the OS to allocate new blocks. Eventually, the fragmentation gets so bad that even though there's lots of uncommitted space, none of it seems to be usable and the process runs out of memory.
So my questions are:
Why is Windows Server 2003 using Private Data instead of Heap? Does it matter?
Is there a way to make Windows Server 2003 use Heap memory instead?
If so, would that improve my situation at all?
Is there any way to control how Private Data is allocated by the OS's memory allocator?
Is it possible to create my own custom heap and allocate off of that (without changing the majority of my codebase), and could that improve my situation? I know it's possible to make custom heaps, but as far as I can tell, you need to explicitly allocate from the custom heap instead of just calling new or just using STL containers normally.
Is there anything I'm missing or would be worth trying?

Private data is just a classification for all the memory that is not shared between two or more processes. Heap, relocated dll pages, stacks of all the threads in a process, unshared memory mapped files etc. fall in to the category of private data.
A request for memory from a process (via VirtualAlloc) would be failed by OS when one of the condition is true,
Contiguous virtual address space (not memory) is not available to hold the size requested.
The commit charge - the total memory committed memory of all the process and the operating system - has reached it's upper limit (that being RAM + page file size)
Apart from this Heap allocations may fail for their own reasons like, during expansion they would actually try to acquire more memory that the size of the allocation request that triggered the expansion - and if that fails they might just fail - though the actual requested size might be available through VirtualAlloc.
Few things that tend to accumulate memory are,
Having many heaps - they would hog memory - because they keep more in reserve. Many heaps means a lot of reserved space probably going unused. Heap compaction might help.
STL containers like vector and map might not shrink after elements are removed from them. Compacting them might help too.
Libraries like COM do some caching and thus accumulate memory - might help to investigate individual libraries to know about their memory hogging habits.

when task manager says that the process reaches somewhere between 800 and 1200 MB of memory, when the limit should be about 2GB
Probably you are looking at "Working Set" in Task Manager whereas the 2GB limit is on Virtual Memory. Task Manager doesn't show the amount of VM reserved; it will show the amount committed.
"Committed" is the amount of "Size" that is actually being used (i.e. through calls to new/malloc).
No, Committed means you actually touched the page (i.e. went to the address and did a load or store operation).
1.Why is Windows Server 2003 using Private Data instead of Heap?
According to "Windows Sysinternals Administrator's Reference" by Mark Russinovich and Aaron Margosis:
Private Data memory is memory that is allocated by VirtualAlloc and
that is not further handled by the Heap Manager or by the .Net runtime
So either your program is managing its memory differently on the two OS's, or VMmap is unable to detect the way in which this memory is being managed as a heap on Windows Server 2003.
4.Is there anything I'm missing or would be worth trying?
You can run with a 3GB limit on 32-bit OS and a 4GB limit for 32-bit processes on 64-bit OS. Google for "/3G" and "/4G".
A great source of information on this kind of stuff is the book "Windows Internals 6th Edition" by Mark Russinovich, David Solomon and Alex Ionescu.

I'm encountering the same issue.
In windows 2003, my application causes out of memory exception in a C++/CLI module when trying to allocate a 22MB array using gcnew. The same process works fine in windows 7.
VMMap shows the "private data" entry is almost 2 GB in win2003. After I enable /3GB flag, this entry also increased to almost 3GB. The "heap" entry is about 14 MB and the "managed heap" is nothing!
In windows 7, the "private data" is only 62 MB, the "heap" is 316MB and "managed heap" is 397MB. The entire memory usage is much less than win2003.

Related

What part of the process virtual memory does Windows Task Manager display

My question is a bit naive. I'm willing to have an overview as simple as possible and couldn't find any resource that made it clear to me. I am a developer and I want to understand what exactly is the memory displayed in the "memory" column by default in Windows Task Manager:
To make things a bit simpler, let's forget about the memory the process shares with other processes, and imagine the shared memory is negligible. Also I'm focussed on the big picture and mainly care for things at GB level.
As far as I know, the memory reserved by the process called "virtual memory", is partly stored in the main memory (RAM), partly on the disk. The system decides what goes where. The system basically keeps in RAM the parts of the virtual memory that is accessed sufficiently frequently by the process. A process can reserve more virtual memory than RAM available in the computer.
From a developer point of view, the virtual memory may only be partially allocated by the program through its own memory manager (with malloc() or new X() for example). I guess the system has no awareness of what part of the virtual memory is allocated since this is handled by the process in a "private" way and depends on the language, runtime, compiler... Q: Is this correct?
My hypothesis is that the memory displayed by the task manager is essentially the part of the virtual memory being stored in RAM by the system. Q: Is it correct? And is there a simple way to know the total virtual memory reserved by the process?

Memory on windows is... extremely complicated and asking 'how much memory does my process use' is effectively a nonsensical question. TO answer your questions lets get a little background first.
Memory on windows is allocated via ptr = VirtualAlloc(..., MEM_RESERVE, ...) and committed later with VirtualAlloc(ptr+n, MEM_COMMIT, ...).
Any reserved memory just uses up address space and so isn't interesting. Windows will let you MEM_RESERVE terabytes of memory just fine. Committing the memory does use up resources but not in the way you'd think. When you call commit windows does a few sums and basically works out (total physical ram + total swap - current commit) and lets you allocate memory if there's enough free. BUT the windows memory manager doesn't actually give you physical ram until you actually use it.
Later, however, if windows is tight for physical RAM it'll swap some of your RAM out to disk (it may compress it and also throw away unused pages, throw away anything directly mapped from a file and other optimisations). This means your total commit and total physical ram usage for your program may be wildly different. Both numbers are useful depending on what you're measuring.
There's one last large caveat - memory that is shared. When you load DLLs the code, the read-only memory [and even maybe the read/write section but this is COW'd] can be shared with other programs. This means that your app requires that memory but you cannot count that memory against just your app - after all it can be shared and so doesn't take up as much physical memory as a naive count would think.
(If you are writing a game or similar you also need to count GPU memory but I'm no expert here)
All of the above goodness is normally wrapped up by the heap the application uses and you see none of this - you ask for and use memory. And its just as optimal as possible.
You can see this by going to the details tab and looking at the various options - commit-size and working-set are really useful. If you just look at the main window in task-manager and it has a single value I'd hope you understand now that a single value for memory used has to be some kind of compromise as its not a question that makes sense.
Now to answer your questions
Firstly the OS knows exactly how much memory your app has reserved and how much it has committed. What it doesn't know is if the heap implementation you (or more likely the CRT) are using has kept some freed memory about which it hasn't released back to the operation system. Heaps often do this as an optimisation - asking for memory from the OS and freeing it back to the OS is a fairly expensive operation (and can only be done in large chunks known as pages) and so most of them keep some around.
Second question: Dont use that value, go to details and use the values there as only you know what you actually want to ask.
EDIT:
For your comment, yes, but this depends on the size of the allocation. If you allocate a large block of memory (say >= 1MB) then the heap in the CRT generally directly defers the allocation to the operating system and so freeing individual ones will actually free them. For small allocations the heap in the CRT asks for pages of memory from the operating system and then subdivides that to give out in allocations. And so if you then free every other one of those you'll be left with holes - and the heap cannot give those holes back to the OS as the OS generally only works in whole pages. So anything you see in task manager will show that all the memory is still used. Remember this memory isn't lost or leaked, its just effectively pooled and will be used again if allocations ask for that size. If you care about this memory you can use the crt heap statistics famliy of functions to keep an eye on those - specifically _CrtMemDumpStatistics

Win7 C++ application always reserving at least 4k memory per allocation

I'm currently looking into memory consumption issues of a C++ application that I have written (a rendering engine using OpenGL) and have stumbled upon a rather unusual problem:
I'm using my own allocators basically everywhere in the system, which all obtain their memory from a default allocator which is using malloc()/free() for the actual memory.
It turns out that my application is always reserving at least 4096 bytes (the page size on my system) for every allocation through malloc(), even if the size is significantly smaller.
malloc(8) or even malloc(1) both result in an increase of memory of 4096 bytes. I'm tracking the used memory size through GetProcessMemoryInfo() directly before and after the allocation, as well as through the TaskManager (which basically shows the same values). Interestingly, using _msize(ptr) returns the correct size of the pointer.
I can only reproduce this behaviour within my own application, testing it with a new VS2012 C++ project did not yield the same results. This behaviour also seems independent of the current reserved size of the application, even with more than 10GB of free RAM it always reserves at least 4K per allocation.
I have no deep knowledge of the innards of the Windows operating system (if it is at all related to the OS), so if anyone has an idea what could cause this behaviour I would be greatful!

Check this, it's from 1993 :-)
http://msdn.microsoft.com/en-us/library/ms810603.aspx
This does not mean that the smallest amount of memory that can be allocated in a heap is 4096 bytes; rather, the heap manager commits pages of memory as needed to satisfy specific allocation requests. If, for example, an application allocates 100 bytes via a call to GlobalAlloc, the heap manager allocates a 100-byte chunk of memory within its committed region for this request. If there is not enough committed memory available at the time of the request, the heap manager simply commits another page to make the memory available.

You might be running with "full page heap"... a diagnostic mode to help more quickly catch memory access errors in your code.

File Based Memory Pool - Is it Possible?

Whenever a new / malloc is used, OS create a new(or reuse) heap memory segment, aligned to the page size and return it to the calling process. All these allocations will constitute to the Process's virtual memory. In 32bit computing, any process can scale only upto 4 GB. Higher the heap allocation, the rate of increase of process memory is higher. Though there are lot of memory management / memory pools available, all these utilities end up again in creating a heap and reusing it effeciently.
mmap (Memory mapping) on the other hand, provides the ablity to visualize a file as memory stream, and enables the program to use pointer manipulations directly on file. But here again, mmap is actually allocating the range of addresses in the process space. So if we mmap a 3GB file with size 3GB and take a pmap of the process, you could see the total memory consumed by the process is >= 3GB.
My question is, is it possible to have a file based memory pool [just like mmaping a file], however, does not constitute the process memory space. I visualize something like a memory DB, which is backed by a file, which is so fast for read/write, which supports pointer manipulations [i.e get a pointer to the record and store anything as if we do using new / malloc], which can grow on the disk, without touching the process virtual 4GB limit.
Is it possible ? if so, what are some pointers for me to start working.
I am not asking for a ready made solution / links, but to conceptually understand how it can be achieved.

It is generally possible but very coplicated. You would have to re-map if you wanted to acces different 3Gb segments of your file, which would probably kill the performance in case of scattered access. Pointers would only get much more difficult to work with, as remmpaing changes data but leaves the adresses the same.
I have seen STXXL project that might be interesting to you; or it might not. I have never used it so I cannot give you any other advice about it.

What you are looking for, is in principle, a memory backed file-cache. There are many such things in for example database implementations (where the whole database is way larger than the memory of the machine, and the application developer probably wants to have a bit of memory left for application stuff). This will involve having some sort of indirection - an index, hash or some such to indicate what area of the file you want to access, and using that indirection to determine if the memory is in memory or on disk. You would essentially have to replicate what the virtual memory handling of the OS and the processor does, by having tables that indicate where in physical memory your "virtual heap" is, and if it's not present in physical memory, read it in (and if the cache is full, get rid of some - and if it's been written, write it back again).
However, it's most likely that in today's world, you have a machine capable of 64-bit addressing, and thus, it would be much easier to recompile the application as a 64-bit application, usemmap or similar to access the large memory. In this case, even if RAM isn't sufficient, you can access the memory of the file via the virtual memory system, and it takes care of all the mapping back and forth between disk and RAM (physical memory).

Information about PTE's (Page Table Entries) in Windows

In order to find more easily buffer overflows I am changing our custom memory allocator so that it allocates a full 4KB page instead of only the wanted number of bytes. Then I change the page protection and size so that if the caller writes before or after its allocated piece of memory, the application immediately crashes.
Problem is that although I have enough memory, the application never starts up completely because it runs out of memory. This has two causes:
since every allocation needs 4 KB, we probably reach the 2 GB limit very soon. This problem could be solved if I would make a 64-bit executable (didn't try it yet).
even when I only need a few hundreds of megabytes, the allocations fail at a certain moment.
The second problem is the biggest one, and I think it's related to the maximum number of PTE's (page table entries, which store information on how Virtual Memory is mapped to physical memory, and whether pages should be read-only or not) you can have in a process.
My questions (or a cry-for-tips):
Where can I find information about the maximum number of PTE's in a process?
Is this different (higher) for 64-bit systems/applications or not?
Can the number of PTE's be configured in the application or in Windows?
Thanks,
Patrick
PS. note for those who will try to argument that you shouldn't write your own memory manager:
My application is rather specific so I really want full control over memory management (can't give any more details)
Last week we had a memory overwrite which we couldn't find using the standard C++ allocator and the debugging functionality of the C/C++ run time (it only said "block corrupt" minutes after the actual corruption")
We also tried standard Windows utilities (like GFLAGS, ...) but they slowed down the application by a factor of 100, and couldn't find the exact position of the overwrite either
We also tried the "Full Page Heap" functionality of Application Verifier, but then the application doesn't start up either (probably also running out of PTE's)

There is what i thought was a great series of blog posts by Mark Russinovich on technet called "Pushing the limits of Windows..."
http://blogs.technet.com/markrussinovich/archive/2008/07/21/3092070.aspx
It has a few articles on virtual memory, paged nonpaged memory, physical memory and others.
He mentions little utilities he uses to take measurements about a systems resources.
Hopefully you will find your answers there.

A shotgun approach is to allocate those isolated 4KB entries at random. This means that you will need to rerun the same tests, with the same input repeatedly. Sometimes it will catch the error, if you're lucky.
A slightly smarter approach is to use another algorithm than just random - e.g. make it dependent on the call stack whether an allocation is isolated. Do you trust std::string users, for instance, and suspect raw malloc use?

Take a look at the implementation of OpenBSD malloc. Much of the same ideas (and more) implemented by very skilled folk.

In order to find more easily buffer
overflows I am changing our custom
memory allocator so that it allocates
a full 4KB page instead of only the
wanted number of bytes.
This has already been done. Application Verifier with PageHeap.
Info on PTEs and the Memory architecture can be found in Windows Internals, 5th Ed. and the Intel Manuals.
Is this different (higher) for 64-bit systems/applications or not?
Of course. 64bit Windows has a much larger address space, so clearly more PTEs are needed to map it.
Where can I find information about the
maximum number of PTE's in a process?
This is not so important as the maximum amount of user address space available in a process. (The number of PTEs is this number divided by the page size.)
This is 2GB on 32 bit Windows and much bigger on x64 Windows. (The actual number varies, but it's "big enough").
Problem is that although I have enough
memory, the application never starts
up completely because it runs out of
memory.
Are you a) leaking memory? b) using horribly inefficient algorithms?

Memory usage of C++ / Qt application

I'm using OS X 10.5.6. I have a C++ application with a GUI made with Qt. When I start my application it uses 30 MB of memory (reported by OS X Activity Monitor RSIZE).
I use this application to read in text files to memory, parse the data and finally visualize it. If I open (read to memory, parse, visualize) a 9 MB text file Activity Monitor reports that my application grows from the initial 30 MB of memory used to 103 MB.
Now if the file is closed and the parsed and visualized data is deleted, the size of the application stays at 103 MB. This sounds like a memory leak to me. But if I open the file again, reading it to memory, parsing it and visualizing it the application stays at 103 MB. No matter how many times I open the file (or another file of the same size) my applications memory use stays more or less unchanged. Does this mean that it's not a memory leak? If it was a leak the memory usage should keep growing each time the file is opened should it not? The only time it grows is if I open a larger file than the previous one.
Is this normal? Is this platform or library dependent? Is this some sort of caching done by the OS or libraries?

This seems relatively normal, but all OS are slightly different.
In the usual application life cycle the application requests memory from the OS and is given memory in huge chunks that it manages (via the C/C++ standard libraries). As the application acquires/releases memory this is all done internally within the application without recourse to the OS until the application has non left then a call is made to the OS for another huge chunk.
Memory is not usually returned to the OS until the application quits (though most OS do provide the mechanisms to do this if required and some C/C++ standard libraries will use this facility). Instead of returning memory to the OS the application uses everything it has been given and does its own memory management.
Though note: just because an application has memory does not mean that this is currently taking up RAM on a chip. Memory that is sporadically used or has not been used in a while will be temporarily saved onto secondary/tertiary storage.
Activity Monitor: Is not a very useful tool for checking memory usage, as you have discovered it only displays the total actually allocated to the application. It does not display any information about how the application has internal allocated this memory (most of which could be deallocated). Check the folder where XCode lives, there are a broad set of tools for examining how an application works provided with the development environment.
NB: I have avoided using terms like page etc as these are nothing to-do with C/C++/Objective C and are all OS/hardware specific.

This sounds like a memory fragmentation problem to me. Memory is acquired from OS in pages. Pages are usually several kB large, e.g. 4 kB. Now if you allocate, let's say, 100 MB of RAM for your objects, your memory allocator (new / malloc) asks OS for many free memory pages and allocates your objects on them. When your application finishes computations and deletes some, even most of, but not all of the previously allocated objects, the objects that were not deleted hold pages and disallow to return them back to the OS. A page can be returned only if all its memory is freed. So in extreme cases, an 8B object can prevent a full 4kB page from being returned.
The OS reports memory consumption by calculating the number of pages committed to your application, not by counting how much space your objects take on these pages. So if your memory is fragmented, the pages remain committed, and reported memory consumption stays the same.
The memory consumption does not grow on the second run, because on the second run the allocator reuses, previously acquired, mostly free pages.
The solution for fragmentation problems is usually preallocating a larger block of memory and using a custom memory allocator to allocate objects with similar lifetime from this larger block. Then, when you're done with objects, delete the whole block.
Another solution is switching to a fully garbage collected environment like Java or .NET - they have compacting garbage collectors that prevent such problems.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js