In order to find more easily buffer overflows I am changing our custom memory allocator so that it allocates a full 4KB page instead of only the wanted number of bytes. Then I change the page protection and size so that if the caller writes before or after its allocated piece of memory, the application immediately crashes.
Problem is that although I have enough memory, the application never starts up completely because it runs out of memory. This has two causes:
since every allocation needs 4 KB, we probably reach the 2 GB limit very soon. This problem could be solved if I would make a 64-bit executable (didn't try it yet).
even when I only need a few hundreds of megabytes, the allocations fail at a certain moment.
The second problem is the biggest one, and I think it's related to the maximum number of PTE's (page table entries, which store information on how Virtual Memory is mapped to physical memory, and whether pages should be read-only or not) you can have in a process.
My questions (or a cry-for-tips):
Where can I find information about the maximum number of PTE's in a process?
Is this different (higher) for 64-bit systems/applications or not?
Can the number of PTE's be configured in the application or in Windows?
Thanks,
Patrick
PS. note for those who will try to argument that you shouldn't write your own memory manager:
My application is rather specific so I really want full control over memory management (can't give any more details)
Last week we had a memory overwrite which we couldn't find using the standard C++ allocator and the debugging functionality of the C/C++ run time (it only said "block corrupt" minutes after the actual corruption")
We also tried standard Windows utilities (like GFLAGS, ...) but they slowed down the application by a factor of 100, and couldn't find the exact position of the overwrite either
We also tried the "Full Page Heap" functionality of Application Verifier, but then the application doesn't start up either (probably also running out of PTE's)
There is what i thought was a great series of blog posts by Mark Russinovich on technet called "Pushing the limits of Windows..."
http://blogs.technet.com/markrussinovich/archive/2008/07/21/3092070.aspx
It has a few articles on virtual memory, paged nonpaged memory, physical memory and others.
He mentions little utilities he uses to take measurements about a systems resources.
Hopefully you will find your answers there.
A shotgun approach is to allocate those isolated 4KB entries at random. This means that you will need to rerun the same tests, with the same input repeatedly. Sometimes it will catch the error, if you're lucky.
A slightly smarter approach is to use another algorithm than just random - e.g. make it dependent on the call stack whether an allocation is isolated. Do you trust std::string users, for instance, and suspect raw malloc use?
Take a look at the implementation of OpenBSD malloc. Much of the same ideas (and more) implemented by very skilled folk.
In order to find more easily buffer
overflows I am changing our custom
memory allocator so that it allocates
a full 4KB page instead of only the
wanted number of bytes.
This has already been done. Application Verifier with PageHeap.
Info on PTEs and the Memory architecture can be found in Windows Internals, 5th Ed. and the Intel Manuals.
Is this different (higher) for 64-bit systems/applications or not?
Of course. 64bit Windows has a much larger address space, so clearly more PTEs are needed to map it.
Where can I find information about the
maximum number of PTE's in a process?
This is not so important as the maximum amount of user address space available in a process. (The number of PTEs is this number divided by the page size.)
This is 2GB on 32 bit Windows and much bigger on x64 Windows. (The actual number varies, but it's "big enough").
Problem is that although I have enough
memory, the application never starts
up completely because it runs out of
memory.
Are you a) leaking memory? b) using horribly inefficient algorithms?
Related
This question already has answers here:
How to determine CPU and memory consumption from inside a process
(10 answers)
Closed 6 years ago.
See, I wanted to measure memory usage of my C++ program. From inside the program, without profilers or process viewers, etc.
Why from inside the program?
Measurements will be done thousands of times—must be automated; therefore, having an eye on Task Manager, top, whatever, will not do
Measurements are to be done during production runs—performance degradation, which may be caused by profilers, is not acceptable since the run times are non-negligible already (several hours for large problem instances)
note. Why measure at all? The only reason to measure used memory (as reported by the OS) as opposed to calculating “expected” usage in advance is the fact that I can not directly, analytically “sizeof” how much does my principal data structure use. The structure itself is
unordered_map<bitset, map<uint16_t, int64_t> >
these are packed into a vector for all I care (a list would actually suffice as well, I only ever need to access the “neighbouring” structures; without details on memory usage, I can hardly decide which to choose)
vector< unordered_map<bitset, map<uint16_t, int64_t> > >
so if anybody knows how to “sizeof” the memory occupied by such a structure, that would also solve the issue (though I'd probably have to fork the question or something).
Environment: It may be assumed that the program runs all alone on the given machine (along with the OS, etc. of course; either a PC or a supercomputer's node); it is certain to be the only one program requiring large (say > 512 MiB) amounts of memory—computational experiment environment. The program is either run on my home PC (16GiB RAM; Windows 7 or Linux Mint 18.1) or the institution supercomputer's node (circa 100GiB RAM, CentOS 7), and the program may want to consume all that RAM. Note that the supercomputer effectively prohibits disk swapping of user processes, and my home PC has a smallish page file.
Memory usage pattern. The program can be said to sequentially fill a sort of table, each row wherein is the vector<...> as specified above. Say the prime data structure is called supp. Then, for each integer k, to fill supp[k], the data from supp[k-1] is required. As supp[k] is filled it is used to initialize supp[k+1]. Thus, at each time, this, prev, and next “table rows” must be readily accessible. After the table is filled, the program does a relatively quick (compared with “initializing” and filling the table), non-exhaustive search in the table, through which a solution is obtained. Note that the memory is only allocated through the STL containers, I never explicitly new() or malloc() myself.
Questions. Wishful thinking.
What is the appropriate way to measure total memory usage (including swapped to disk) of a process from inside its source code (one for Windows, one for Linux)?
Should probably be another question, or rather a good googling session, but still---what is the proper (or just easy) way to explicitly control (say encourage or discourage) swapping to disk? A pointer to an authoritative book on the subject would be very welcome. Again, forgive my ignorance, I'd like a means to say something on the lines of “NEVER swap supp” or
“swap supp[10]”; then, when I need it, “unswap supp[10]”—all from the program's code. I thought I'd have to resolve to serialize the data structures and explicitly store them as a binary file, then reverse the transformation.
On Linux, it appeared the easiest to just catch the heap pointers through sbrk(0), cast them as 64-bit unsigned integers, and compute the difference after the memory gets allocated, and this approach produced plausible results (did not do more rigorous tests yet).
edit 5. Removed reference to HeapAlloc wrangling—irrelevant.
edit 4. Windows solution
This bit of code reports the working set that matches the one in Task Manager; that's about all I wanted—tested on Windows 10 x64 (tested by allocations like new uint8_t[1024*1024], or rather, new uint8_t[1ULL << howMuch], not in my “production” yet ).
On Linux, I'd try getrusage or something to get the equivalent.
The principal element is GetProcessMemoryInfo, as suggested by #IInspectable and #conio
#include<Windows.h>
#include<Psapi.h>
//get the handle to this process
auto myHandle = GetCurrentProcess();
//to fill in the process' memory usage details
PROCESS_MEMORY_COUNTERS pmc;
//return the usage (bytes), if I may
if (GetProcessMemoryInfo(myHandle, &pmc, sizeof(pmc)))
return(pmc.WorkingSetSize);
else
return 0;
edit 5. Removed reference to GetProcessWorkingSetSize as irrelevant. Thanks #conio.
On Windows, the GlobalMemoryStatusEx function gives you useful information both about your process and the whole system.
Based on this table you might want to look at MEMORYSTATUSEX.ullAvailPhys to answer "Am I getting close to hitting swapping overhead?" and changes in (MEMORYSTATUSEX.ullTotalVirtual – MEMORYSTATUSEX.ullAvailVirtual) to answer "How much RAM is my process allocating?"
To know how much physical memory your process takes you need to query the process working set or, more likely, the private working set. The working set is (more or less) the amount of physical pages in RAM your process uses. Private working set excludes shared memory.
See
What is private bytes, virtual bytes, working set?
How to interpret Windows Task Manager?
https://blogs.msdn.microsoft.com/tims/2010/10/29/pdc10-mysteries-of-windows-memory-management-revealed-part-two/
for terminology and a little bit more details.
There are performance counters for both metrics.
(You can also use QueryWorkingSet(Ex) and calculate that on your own, but that's just nasty in my opinion. You can get the (non-private) working set with GetProcessMemoryInfo.)
But the more interesting question is whether or not this helps your program to make useful decisions. If nobody's asking for memory or using it, the mere fact that you're using most of the physical memory is of no interest. Or are you worried about your program alone using too much memory?
You haven't said anything about the algorithms it employs or its memory usage patterns. If it uses lots of memory, but does this mostly sequentially, and comes back to old memory relatively rarely it might not be a problem. Windows writes "old" pages to disk eagerly, before paging out resident pages is completely necessary to supply demand for physical memory. If everything goes well, reusing these already written to disk pages for something else is really cheap.
If your real concern is memory thrashing ("virtual memory will be of no use due to swapping overhead"), then this is what you should be looking for, rather than trying to infer (or guess...) that from the amount of physical memory used. A more useful metric would be page faults per unit of time. It just so happens that there are performance counters for this too. See, for example Evaluating Memory and Cache Usage.
I suspect this to be a better metric to base your decision on.
So I have a native C++ application, and it needs to keep track of lots of things over long periods of time. It's running out of memory when task manager says that the process reaches somewhere between 800 and 1200 MB of memory, when the limit should be about 2GB.
I finally got a clue as to what's going on when I ran VMMap against my process, but that just gave me more questions. What I discovered:
The total size (type: total, column: size) is much larger than what task manager/process explorer were reporting
The total size seems to actually be the value that can't exceed 2GB before my program runs out of memory
The memory usage discrepency is almost entirely caused by "Private Data" - there is much more "size" than there is "committed". I have seen cases where there were around 800MB of committed private data, but a "Size" of around 1700MB.
The largest blocks of "Private Data" mainly consist of a pattern of pairs of one small sub-block (between 4K and 16K, generally) that has "Read/Write" protection and is fully committed, and one larger sub-block (between 90K and 400K) that has the "Reserved" protection and is not committed. This seems like a huge waste of resources. And there's usually one large (many megabytes) sub-block at the end that is "Reserved" and not committed.
The small part of the pair generally has strings that I recognize, while the larger block has no strings at all.
An example of these sub-block pairs: (not my application, but the idea is the same)
http://www.flickr.com/photos/95123032#N00/5280550393
It seems as though when one block of private data gets fully committed, a new block (usually the same or double the size of the previous largest block) gets allocated. Sounds fair. However, I have seen 3 blocks, all more than 100MB each, with less than 30MB committed. My application shouldn't behave in such a way (i.e. use up 400MB then shrink by 300MB in a matter of a few hours) that that would be possible.
As far as I can tell, the "Size" is the actual amount of virtual memory address space that has been allocated. "Committed" is the amount of "Size" that is actually being used (i.e. through calls to new/malloc). If that is indeed the case, then why is there such a huge discrepency between Size and Commited? And why is it allocating blocks that are multiple hundreds of megabytes in size?
The somewhat strange thing is that the behavior is entirely different when running on Windows 7. Whereas on 2003 Server, the application uses Private Data, on Windows 7, the application uses Heap. So...why? Why does VMMap show primarily private data usage on 2003, but primarily heap usage on 7? What's the difference? Well one difference is that I can't use the "Heap Allocations..." button in VMMap to see where all of that Private Data is being allocated.
I was beginning to wonder if excessive use of std::string was causing this problem since the strings that I recognized in the pairs (mentioned above) primarily consisted of strings stored in std::string that were frequently being created and destroyed (implying lots of memory allocation/deallocation). I converted all I could to use character arrays or using memory from a memory pool, but that seems to have had no effect. All of my other objects that are new/deleted frequently already have their own memory pools.
I also found out about the low fragmentation heap, so I tried enabling that, but it also didn't make a difference. I'm thinking it's because windows 2003 is not actually using the heap proper. VMMap shows that the low fragmentation heap is enabled, but since it's not actually used (i.e. it's using Private Data instead), it doesn't actually make a difference.
What actually seems to be happening is that those sub-block pairs are fragmenting the large Private Data blocks, which is causing the OS to allocate new blocks. Eventually, the fragmentation gets so bad that even though there's lots of uncommitted space, none of it seems to be usable and the process runs out of memory.
So my questions are:
Why is Windows Server 2003 using Private Data instead of Heap? Does it matter?
Is there a way to make Windows Server 2003 use Heap memory instead?
If so, would that improve my situation at all?
Is there any way to control how Private Data is allocated by the OS's memory allocator?
Is it possible to create my own custom heap and allocate off of that (without changing the majority of my codebase), and could that improve my situation? I know it's possible to make custom heaps, but as far as I can tell, you need to explicitly allocate from the custom heap instead of just calling new or just using STL containers normally.
Is there anything I'm missing or would be worth trying?
Private data is just a classification for all the memory that is not shared between two or more processes. Heap, relocated dll pages, stacks of all the threads in a process, unshared memory mapped files etc. fall in to the category of private data.
A request for memory from a process (via VirtualAlloc) would be failed by OS when one of the condition is true,
Contiguous virtual address space (not memory) is not available to hold the size requested.
The commit charge - the total memory committed memory of all the process and the operating system - has reached it's upper limit (that being RAM + page file size)
Apart from this Heap allocations may fail for their own reasons like, during expansion they would actually try to acquire more memory that the size of the allocation request that triggered the expansion - and if that fails they might just fail - though the actual requested size might be available through VirtualAlloc.
Few things that tend to accumulate memory are,
Having many heaps - they would hog memory - because they keep more in reserve. Many heaps means a lot of reserved space probably going unused. Heap compaction might help.
STL containers like vector and map might not shrink after elements are removed from them. Compacting them might help too.
Libraries like COM do some caching and thus accumulate memory - might help to investigate individual libraries to know about their memory hogging habits.
when task manager says that the process reaches somewhere between 800 and 1200 MB of memory, when the limit should be about 2GB
Probably you are looking at "Working Set" in Task Manager whereas the 2GB limit is on Virtual Memory. Task Manager doesn't show the amount of VM reserved; it will show the amount committed.
"Committed" is the amount of "Size" that is actually being used (i.e. through calls to new/malloc).
No, Committed means you actually touched the page (i.e. went to the address and did a load or store operation).
1.Why is Windows Server 2003 using Private Data instead of Heap?
According to "Windows Sysinternals Administrator's Reference" by Mark Russinovich and Aaron Margosis:
Private Data memory is memory that is allocated by VirtualAlloc and
that is not further handled by the Heap Manager or by the .Net runtime
So either your program is managing its memory differently on the two OS's, or VMmap is unable to detect the way in which this memory is being managed as a heap on Windows Server 2003.
4.Is there anything I'm missing or would be worth trying?
You can run with a 3GB limit on 32-bit OS and a 4GB limit for 32-bit processes on 64-bit OS. Google for "/3G" and "/4G".
A great source of information on this kind of stuff is the book "Windows Internals 6th Edition" by Mark Russinovich, David Solomon and Alex Ionescu.
I'm encountering the same issue.
In windows 2003, my application causes out of memory exception in a C++/CLI module when trying to allocate a 22MB array using gcnew. The same process works fine in windows 7.
VMMap shows the "private data" entry is almost 2 GB in win2003. After I enable /3GB flag, this entry also increased to almost 3GB. The "heap" entry is about 14 MB and the "managed heap" is nothing!
In windows 7, the "private data" is only 62 MB, the "heap" is 316MB and "managed heap" is 397MB. The entire memory usage is much less than win2003.
My C++ program caches lots of objects, and in beginning of each major API call, I want to ensure that there is at least 500 MB available for the API call. I may either be running out of RAM+swap space (consider system with 1 GB RAM + 1 GB SWAP file), or I may be running out of Virtual Address in my process.(I may already be using 3.7 GB out of total 4GB address space). It's not easy for me to approximate how much data I have cached, but I can purge some of it if it is becoming an issue, and do so iteratively till I have 500 MB available in system or address space (whichever is becoming bottleneck). So my requirements are to find in C++ on 32 bit Linux:
A) Find how much RAM + SWAP space is free.
B) How much user space address space is available to my process.
C) How much Virtual Memory the process is already using. Consider it similar to 'Commit Size' or 'Working Set Size' of a process on Windows.
Any answers would be greatly appreciated.
Look at /proc/vmstat there is a lot of information about the system wide memory.
The /proc//maps will give you a lot of information about your process memory layout.
Note that if you check the memory before running a long job, another process may eat all the available memory and your program may crash anyway !
I do not know anything about your cached classes but if these objects are quite small you probably have overridden the new/delete operators. By this it is quite easy to keep track of the memory consumption (at least by counting objects)
Why not change your cache policy ? And flush old unused object.
Another ugly way is to try to allocate several chunk of memory and see the program can allocate it, and release it after that. On 32 bits it may fail because the heap may be fragmented, but if it works you sure that you have enough memory at this time.
Take a look at the source for the vmstat : here. Then search for domem() function, which gather all information about the memory (occupied and free).
I have an application that sometimes will utilize a large amount of data. The user has the option to load in a number of files which are used in a graphical display. If the user selects more data than the OS can handle, the application crashes pretty hard. On my test system, that number is about the 2 gigs of physical RAM.
What is a good way to handle this situation? I get the "bad alloc" thrown from new and tried trapping that but I still run into a crash. I feel as if I'm treading in nasty waters loading this much data but it is a requirement of this application to handle this sort of large data load.
Edit: I'm testing under a 32 bit Windows system for now but the application will run on various flavors of Windows, Sun and Linux, mostly 64 bit but some 32.
The error handling is not strong: It simply wraps the main instantiation code with a try catch block, the catch looking for any exception per another peer's complaint of not being able to trap the bad_alloc everytime.
I think you guys are right, I need a memory management system that doesn't load all of this data into the RAM, it just seems like it.
Edit2: Luther said it best. Thanks guy. For now, I just need a way to prevent a crash which with proper exception handling should be possible. But down the road I'll be implementing that acception solution.
There is the STXXL library which offers STL like containers for large Datasets.
http://stxxl.sourceforge.net/
Change "large" into "huge". It is designed and optimized for multicore processing of data sets that fit on terabyte-disks only. This might suffice for your problem, or the implementation could be a good starting point to tailor your own solution.
It is hard to say anything about your application crashing, because there are numerous hiccups involved when it comes to tight memory conditions: You could hit a hard address space limit (for example by default 32-bit Windows only has 2GB address space per user process, this can be changed, http://www.fmepedia.com/index.php/Category:Windows_3GB_Switch_FAQ ), or be eaten alive by the OOM killer ( Not a mythical beast:, see http://lwn.net/Articles/104179/ ).
What I'd suggest in any case to think about a way to keep the data on disk and treat the main memory as a kind of Level-4 cache for the data. For example if you have, say, blobs of data, then wrap these in a class which can transparently load the blobs from disk when they are needed and registers to some kind of memory manager which can ask some of the blob-holders to free up their memory before the memory conditions become unbearable. A buffer cache thus.
The user has the option to load in a number of files which are used in a graphical display.
Usual trick is not to load the data into memory directly, but rather use the memory mapping mechanism to make the files look like memory.
You need to make sure that the memory mapping is done in read-only mode to allow the OS to evict it from RAM if it is needed for something else.
If the user selects more data than the OS can handle, the application crashes pretty hard.
Depending on OS it is either: application is missing some memory allocation error handling or you really getting to the limit of available virtual memory.
Some OSs also have an administrative limit on how large the heap of application can grow.
On my test system, that number is about the 2 gigs of physical RAM.
It sounds like:
your application is 32-bits and
your OS uses the 2GB/2GB virtual memory split.
To avoid hitting the limit, your need to:
upgrade your app and OS to 64-bit or
tell OS (IIRC patch for Windows; most Linuxes already have it) to use 3GB/1GB virtual memory split. Some 32-bit OSs are using 2GB/2GB memory split: 2GB of virtual memory for kernel and 2 for the user application. 3/1 split means 1GB of VM for kernel, 3 for the user application.
How about maintaining a header table instead of loading the entire data. Load the actual page when the user requests the data.
Also use some data compression algorithms (like 7zip, znet etc.) which reduce the file size. (In my project they reduced the size from 200MB to 2MB)
I mention this because it was only briefly mentioned above, but it seems a "file paging system" could be a solution. These systems read large data sets in "chunks" by breaking the files into pieces. Once written, they generally "just work" and you hopefully won't have to tinker with them anymore.
Reading Large Files
Variable Length Data in File--Paging
New Link below with very good answer.
Handling Files greater than 2 GB
Search term: "file paging lang:C++" add large or above 2GB for more. HTH
Not sure if you are hitting it or not, but if you are using Linux, malloc will typically not fail, and operator new will typically not throw bad_alloc. This is because Linux will overcommit, and instead kill your process when it decides the system doesn't have enough memory, possibly at a page fault.
See: Google search for "oom killer".
You can disable this behavior with:
echo 2 > /proc/sys/vm/overcommit_memory
Upgrade to a 64-bit CPU, 64-bit OS and 64-bit compiler, and make sure you have plenty of RAM.
A 32-bit app is restricted to 2GB of memory (regardless of how much physical RAM you have). This is because a 32-bit pointer can address 2^32 bytes == 4GB of virtual memory. 20 years ago this seemed like a huge amount of memory, so the original OS designers allocated 2GB to the running application and reserved 2GB for use by the OS. There are various tricks you can do to access more than 2GB, but they're complex. It's probably easier to upgrade to 64-bit.
How can I measure runtime memory requirements of an application on windows platform?
Perfmon.exe will monitor the usage of a process.
Run perfmon.exe, right-click Add counters, pick Process for the Performance Object, then choose things like Virtual Bytes, Working Set, and Page File.
I'll assume you mean memory use at a particular point in time, not how much it could potentially ever need.
You can get the information about how much a process is consuming through the windows API, for example GetProcessMemoryInfo. Windows allocates memory in blocks so it may be more accurate than just checking how much memory or heap space is used.
See more details from MSDN
"Memory requirements" are not very well defined, in the first place. WHen you start, your executabel will be linked to many DLLs. Together with the first stack, this forms your initial process. Then, your proces might start extra threads, allocate more memory, and/or memory map some files.
Now Wwindows won't give you real RAM for all these needs. Many DLLs are already loaded for other reasons, so you'll share that RAM. Extra RAM for stacks is allocated when you get soft stackoverflows. Memory mapped files get RAM allocated when those pages fault.
So, one of the important questions is what you really want. You have to answer that first.