Checking number of actually allocated pages using anonymous mmap() - c++

I'm allocating some memory using mmap() with MAP_ANONYMOUS flag. Because of the problem's features, often I need to write some data to one page ( for example, located in the middle ) of large memory chunk, and leave the other part untouched. So, I wouldn't this rather big part of unused memory to allocate physically.
In my view, such mmap() call just give me a pointer to some 0-filled virtual pages, but due to to copy-on-write and demand paging mechanisms, no one page ( besides first maybe ) isn't actually allocated in RAM until first write attempt to it's memory.
The problem is: one moment I've got MAP_FAILED from mmap() ( there were a lot of successful calls with fewer allocation queries before ), when tried to allocate a memory chunk larger than my physical RAM size. So, it seems there are much more pages actually allocated, even if there wasn't write access to them.
I need your help guys with two questions, first:
Is my views to anonymous memory allocation correct, and (if not) , what are the inaccuracies?
And the second:
How could I measure number of actually allocated pages after anonymous mmap() is done? I've tried use mincore(), and it's results show me that almost all pages are "resident" in memory ( i.e., physically allocated? ). So, it seems mincore() results are wrong, or I'm totally stuck :(
Upd.
It seems memory overcommiting mentioned by #Art indeed could influence this. But when I'm trying to disable it ( setting /proc/sys/vm/overcommit_memory to 1 mode or using mmap() with MAP_NORESERVE flag, my machine is heavily freezing, and hard reset is the only thing that helps.

"It's complicated." (caveat: I have much more experience of VM systems from not Linux, but I've picked up a few details from Linux over the years)
Generally your assumption is true. Many operating systems don't actually do much on mmap other than record "there is an allocation of X pages of object Y at V". Access to those pages causes a page fault which leads to an actual allocation. Some (or maybe all?) versions of Linux map a pre-zeroed read-only page to such allocations (not sure if it has changed or if it's still done) and the rest is handled like copy-on-write (for me it feels like a bad idea because it should generate more TLB flushes for a dubious optimization of reading zeroed memory which should happen rarely, but I guess Linux people have benchmarked it and found it good).
There are a few caveats. Just because you don't use the RAM doesn't mean that you'll be allowed to overcommit so much memory. Certain Linux distributions started default to a setting that disallows overcommit or disallows more than X% overcommit. Centos/RedHat was one of those (this was probably a decade ago, I don't know the state today). This means that even when you're just using 5% of your actual physical memory, if you have created anonymous mappings of 100+X% of your RAM, mmap will fail. There is a sysctl for it, look it up. It's something like /proc/sys/vm/overcommit_memory or overcommit_ratio or both, or probably even more under there.
Then you have to be aware of resource limits. Check with ulimit if you're allowed to create mappings that big (ulimit -v), the problem could be as simple as that. Also (I'm not sure how Linux does it here, but you can create simple test programs to try it), resource limits for data size (ulimit -d) can be accounted as potential pages you can use rather than actual pages you currently use (in other words no overcommit in resource limits).
Then let's get back to faulting in only the pages you use. There has been research into detecting access patterns and predicting future faults (so that mmap:ed files could do read-ahead), but I don't know the state of it in Linux. It feels like it would be dumb to apply this to anonymous mappings, but you never know, maybe someone is trying to be clever. To make sure, use madvise(..., MADV_RANDOM) on the mmap:ed block you get.
Finally mincore. My experience from it on Linux is that it's garbage, last time I tried it it didn't work for anonymous mappings (or was it private?). In your case it could be as simple as it reporting that the read-only copy-on-write zeroed page for anonymous mappings is considered to be "in core".

Related

C++: Measuring memory usage from within the program, Windows and Linux [duplicate]

This question already has answers here:
How to determine CPU and memory consumption from inside a process
(10 answers)
Closed 6 years ago.
See, I wanted to measure memory usage of my C++ program. From inside the program, without profilers or process viewers, etc.
Why from inside the program?
Measurements will be done thousands of times—must be automated; therefore, having an eye on Task Manager, top, whatever, will not do
Measurements are to be done during production runs—performance degradation, which may be caused by profilers, is not acceptable since the run times are non-negligible already (several hours for large problem instances)
note. Why measure at all? The only reason to measure used memory (as reported by the OS) as opposed to calculating “expected” usage in advance is the fact that I can not directly, analytically “sizeof” how much does my principal data structure use. The structure itself is
unordered_map<bitset, map<uint16_t, int64_t> >
these are packed into a vector for all I care (a list would actually suffice as well, I only ever need to access the “neighbouring” structures; without details on memory usage, I can hardly decide which to choose)
vector< unordered_map<bitset, map<uint16_t, int64_t> > >
so if anybody knows how to “sizeof” the memory occupied by such a structure, that would also solve the issue (though I'd probably have to fork the question or something).
Environment: It may be assumed that the program runs all alone on the given machine (along with the OS, etc. of course; either a PC or a supercomputer's node); it is certain to be the only one program requiring large (say > 512 MiB) amounts of memory—computational experiment environment. The program is either run on my home PC (16GiB RAM; Windows 7 or Linux Mint 18.1) or the institution supercomputer's node (circa 100GiB RAM, CentOS 7), and the program may want to consume all that RAM. Note that the supercomputer effectively prohibits disk swapping of user processes, and my home PC has a smallish page file.
Memory usage pattern. The program can be said to sequentially fill a sort of table, each row wherein is the vector<...> as specified above. Say the prime data structure is called supp. Then, for each integer k, to fill supp[k], the data from supp[k-1] is required. As supp[k] is filled it is used to initialize supp[k+1]. Thus, at each time, this, prev, and next “table rows” must be readily accessible. After the table is filled, the program does a relatively quick (compared with “initializing” and filling the table), non-exhaustive search in the table, through which a solution is obtained. Note that the memory is only allocated through the STL containers, I never explicitly new() or malloc() myself.
Questions. Wishful thinking.
What is the appropriate way to measure total memory usage (including swapped to disk) of a process from inside its source code (one for Windows, one for Linux)?
Should probably be another question, or rather a good googling session, but still---what is the proper (or just easy) way to explicitly control (say encourage or discourage) swapping to disk? A pointer to an authoritative book on the subject would be very welcome. Again, forgive my ignorance, I'd like a means to say something on the lines of “NEVER swap supp” or
“swap supp[10]”; then, when I need it, “unswap supp[10]”—all from the program's code. I thought I'd have to resolve to serialize the data structures and explicitly store them as a binary file, then reverse the transformation.
On Linux, it appeared the easiest to just catch the heap pointers through sbrk(0), cast them as 64-bit unsigned integers, and compute the difference after the memory gets allocated, and this approach produced plausible results (did not do more rigorous tests yet).
edit 5. Removed reference to HeapAlloc wrangling—irrelevant.
edit 4. Windows solution
This bit of code reports the working set that matches the one in Task Manager; that's about all I wanted—tested on Windows 10 x64 (tested by allocations like new uint8_t[1024*1024], or rather, new uint8_t[1ULL << howMuch], not in my “production” yet ).
On Linux, I'd try getrusage or something to get the equivalent.
The principal element is GetProcessMemoryInfo, as suggested by #IInspectable and #conio
#include<Windows.h>
#include<Psapi.h>
//get the handle to this process
auto myHandle = GetCurrentProcess();
//to fill in the process' memory usage details
PROCESS_MEMORY_COUNTERS pmc;
//return the usage (bytes), if I may
if (GetProcessMemoryInfo(myHandle, &pmc, sizeof(pmc)))
return(pmc.WorkingSetSize);
else
return 0;
edit 5. Removed reference to GetProcessWorkingSetSize as irrelevant. Thanks #conio.
On Windows, the GlobalMemoryStatusEx function gives you useful information both about your process and the whole system.
Based on this table you might want to look at MEMORYSTATUSEX.ullAvailPhys to answer "Am I getting close to hitting swapping overhead?" and changes in (MEMORYSTATUSEX.ullTotalVirtual – MEMORYSTATUSEX.ullAvailVirtual) to answer "How much RAM is my process allocating?"
To know how much physical memory your process takes you need to query the process working set or, more likely, the private working set. The working set is (more or less) the amount of physical pages in RAM your process uses. Private working set excludes shared memory.
See
What is private bytes, virtual bytes, working set?
How to interpret Windows Task Manager?
https://blogs.msdn.microsoft.com/tims/2010/10/29/pdc10-mysteries-of-windows-memory-management-revealed-part-two/
for terminology and a little bit more details.
There are performance counters for both metrics.
(You can also use QueryWorkingSet(Ex) and calculate that on your own, but that's just nasty in my opinion. You can get the (non-private) working set with GetProcessMemoryInfo.)
But the more interesting question is whether or not this helps your program to make useful decisions. If nobody's asking for memory or using it, the mere fact that you're using most of the physical memory is of no interest. Or are you worried about your program alone using too much memory?
You haven't said anything about the algorithms it employs or its memory usage patterns. If it uses lots of memory, but does this mostly sequentially, and comes back to old memory relatively rarely it might not be a problem. Windows writes "old" pages to disk eagerly, before paging out resident pages is completely necessary to supply demand for physical memory. If everything goes well, reusing these already written to disk pages for something else is really cheap.
If your real concern is memory thrashing ("virtual memory will be of no use due to swapping overhead"), then this is what you should be looking for, rather than trying to infer (or guess...) that from the amount of physical memory used. A more useful metric would be page faults per unit of time. It just so happens that there are performance counters for this too. See, for example Evaluating Memory and Cache Usage.
I suspect this to be a better metric to base your decision on.

Allocating memory that can be freed by the OS if needed

I'm writing a program that generates thumbnails for every page in a large document. For performance reasons I would like to keep the thumbnails in memory for as long as possible, but I would like the OS to be able to reclaim that memory if it decides there is another more important use for it (e.g. the user has started running a different application.)
I can always regenerate the thumbnail later if the memory has gone away.
Is there any cross-platform method for flagging memory as can-be-removed-if-needed? The program is written in C++.
EDIT: Just to clarify, rather than being notified when memory is low or regularly monitoring the system's amount of memory, I'm thinking more along the lines of allocating memory and then "unlocking" it when it's not in use. The OS can then steal unlocked memory if needed (even for disk buffers if it thinks that would be a better use of the memory) and all I have to do as a programmer is just "lock" the memory again before I intend to use it. If the lock fails I know the memory has been reused for something else so I need to regenerate the thumbnail again, and if the lock succeeds I can just keep using the data from before.
The reason is I might be displaying maybe 20 pages of a document on the screen, but I may as well keep thumbnails of the other 200 or so pages in case the user scrolls around a bit. But if they go do something else for a while, that memory might be better used as a disk cache or for storing web pages or something, so I'd like to be able to tell the OS that it can reuse some of my memory if it wants to.
Having to monitor the amount of free system-wide memory may not achieve the goal (my memory will never be reclaimed to improve disk caching), and getting low-memory notifications will only help in emergencies. I was hoping that by having a lock/unlock method, this could be achieved in more of a lightweight way and benefit the performance of the system in a non-emergency situation.
Is there any cross-platform method for flagging memory as can-be-removed-if-needed? The program is written in C++
For Windows, at least, you can register for a memory resource notification.
HANDLE WINAPI CreateMemoryResourceNotification(
_In_ MEMORY_RESOURCE_NOTIFICATION_TYPE NotificationType
);
NotificationType
LowMemoryResourceNotification Available physical memory is running low.
HighMemoryResourceNotification Available physical memory is high.
Just be careful responding to both events. You might create a feedback loop (memory is low, release the thumbnails! and then memory is high, make all the thumbnails!).
In AIX, there is a signal SIGDANGER that is send to applications when available memory is low. You may handle this signal and free some memory.
There is a discussion among Linux people to implement this feature into Linux. But AFAIK it is not yet implemented in Linux. Maybe they think that application should not care about low level memory management, and it could be transparently handled in OS via swapping.
In posix standard there is a function posix_madvise might be used to mark an area of memory as less important. There is an advice POSIX_MADV_DONTNEED specifies that the application expects that it will not access the specified range in the near future.
But unfortunately, current Linux implementation will immediately free the memory range when posix_madvise is called with this advice.
So there's no portable solution to your question.
However, on almost every OS you are able to read the current available memory via some OS interface. So you can routinely read such value and manually free memory when available memory in OS is low.
There's nothing special you need to do. The OS will remove things from memory if they haven't been used recently automatically. Some OSes have platform-specific ways to improve this, but generally, nothing special is needed.
This question is very similar and has answers that cover things not covered here.
Allocating "temporary" memory (in Linux)
This shouldn't be too hard to do because this is exactly what the page cache does, using unused memory to cache the hard disk. In theory, someone could write a filesystem such that when you read from a certain file, it calculated something, and the page cache would cache it automatically.
All the basics of automatically freed cache space are already there in any OS with a disk cache, and It's hard to imagine there not being an API for something that would make a huge difference especially in things like mobile web browsers.

Keeping memory usage within available amount

I'm writing a program (a theorem prover as it happens) whose memory requirement is "as much as possible, please"; that is, it can always do better by using more memory, for practical purposes without upper bound, so what it actually needs to do is use just as much memory as is available, no more and no less. I can figure out how to prioritize data to delete the lowest value stuff when memory runs short; the problem I'm trying to solve is how to tell when this is happening.
Ideally I would like a system call that returns "how much memory is left" or "are we out of memory yet?"; as far as I can tell, no such thing exists?
Of course, malloc can signal out of memory by returning 0 and new can call a handler; these aren't ideal signals, but would be better than nothing. A problem, however, is that I really want to know when physical memory is running out, so I can avoid going deep into swap and thereby making everything grind to a halt; I don't suppose there's any way to ask "are we having to swap yet?" or tell the operating system "don't swap on my account, just fail my requests if it comes to that"?
Another approach would be to find out how much RAM is in the machine, and monitor how much memory the program is using at the moment. As far as I know, there is generally no way to tell the former? I also get the impression there is no reliable way to tell the latter except by wrapping malloc/free with a bookkeeper function (which is then more problematic in C++).
Are there any approaches I'm missing?
The ideal would be a portable solution, but I suspect that's not going to happen. Failing that, a solution that works on Windows and another one that works on Unix would be nice. Failing that, I could get by with a solution that works on Windows and another one that works on Linux.
I think the most useful and flexible way to use all the memory available is to let the user specify how much memory to use.
Let the user write it in a config file or through an interface, then create an allocator (or something similar) that will not provide more than this memory.
That way, you don't have to find statistics about the current computer as this will allways be biased by the fact that the OS could also run other programs as well. Don't even talk about the way the OS will manage cache, the differences between 32 and 64 bit making adress space limit your allocations etc.
In the end, human intelligence (assuming the user know about the context of use) is cheaper to implement when provided by the user.
To find out how much system memory is still unused, under Linux you can parse the file /proc/meminfo and look for a line starting with "MemFree:". Under Windows you can use GlobalMemoryStatusEx http://msdn.microsoft.com/en-us/library/aa366589%28VS.85%29.aspx
Relying on malloc to return 0 when no memory is available might cause problems on Linux, because Linux overcommits memory allocations. malloc will usually return a valid pointer (unless the process is out of virtual address space), but accessing the memory it points to may trigger the "OOM killer", a mechanism that kills your process or another process on the system. The system administrator can tune this behavior.
The best solution I can think of might be to query how many page faults have occurred within, say, the last second. If there's a lot of swapping going on, you should probably release some memory, and if not, you can try allocating more memory.
On Windows, WMI can probably give you some statistics you can use.
But it's a tough problem, since there is no hard limit you can ask the OS for and then stay below. You can keep allocating memory far beyond the point where you've run out of physical memory, which just means you'll cripple your process with excessive swapping.
So the best you can really do is some kind of approximation.
You can keep allocating memory beyond the point where it is useful to do so - i.e. that which requires the OS to swap, or page out important things. The trouble is, it is not necessarily easy to tell where this is.
Also, if your task does any (significant) IO, you will need to have some left for the OS buffers.
I recommend just examining how much there is in the machine, then allocating an amount as a function of that (proportion, or leave some free etc).

What's the graceful way of handling out of memory situations in C/C++?

I'm writing an caching app that consumes large amounts of memory.
Hopefully, I'll manage my memory well enough, but I'm just thinking about what
to do if I do run out of memory.
If a call to allocate even a simple object fails, is it likely that even a syslog call
will also fail?
EDIT: Ok perhaps I should clarify the question. If malloc or new returns a NULL or 0L value then it essentially means the call failed and it can't give you the memory for some reason. So, what would be the sensible thing to do in that case?
EDIT2: I've just realised that a call to "new" can throw an exception. This could be caught at a higher level so I can perhaps gracefully exit further up. At that point, it may even be possible to recover depending on how much memory is freed. In the least I should by that point hopefully be able to log something. So while I have seen code that checks the value of a pointer after new, it is unnecessary. While in C, you should check the return value for malloc.
Well, if you are in a case where there is a failure to allocate memory, you're going to get a std::bad_alloc exception. The exception causes the stack of your program to be unwound. In all likelihood, the inner loops of your application logic are not going to be handling out of memory conditions, only higher levels of your application should be doing that. Because the stack is getting unwound, a significant chunk of memory is going to be free'd -- which in fact should be almost all the memory used by your program.
The one exception to this is when you ask for a very large (several hundred MB, for example) chunk of memory which cannot be satisfied. When this happens though, there's usually enough smaller chunks of memory remaining which will allow you to gracefully handle the failure.
Stack unwinding is your friend ;)
EDIT: Just realized that the question was also tagged with C -- if that is the case, then you should be having your functions free their internal structures manually when out of memory conditions are found; not to do so is a memory leak.
EDIT2: Example:
#include <iostream>
#include <vector>
void DoStuff()
{
std::vector<int> data;
//insert a whole crapload of stuff into data here.
//Assume std::vector::push_back does the actual throwing
//i.e. data.resize(SOME_LARGE_VALUE_HERE);
}
int main()
{
try
{
DoStuff();
return 0;
}
catch (const std::bad_alloc& ex)
{ //Observe that the local variable `data` no longer exists here.
std::cerr << "Oops. Looks like you need to use a 64 bit system (or "
"get a bigger hard disk) for that calculation!";
return -1;
}
}
EDIT3: Okay, according to commenters there are systems out there which do not follow the standard in this regard. On the other hand, on such systems, you're going to be SOL in any case, so I don't see why they merit discussion. But if you are on such a platform, it is something to keep in mind.
Doesn't this question make assumptions regarding overcommitted memory?
I.e., an out of memory situation might not be recoverable! Even if you have no memory left, calls to malloc and other allocators may still succeed until the program attempts to use the memory. Then, BAM!, some process gets killed by the kernel in order to satisfy memory load.
I don't have any specific experience on Linux, but I spent a lot of time working in video games on games consoles, where running out of memory is verboten, and on Windows-based tools.
On a modern OS, you're most likely to run out of address space. Running out of memory, as such, is basically impossible. So just allocate a large buffer, or buffers, on startup, in order to hold all the data you'll ever need, whilst leaving a small amount for the OS. Writing random junk to these regions would probably be a good idea in order to force the OS to actually assign the memory to your process. If your process survives this attempt to use every byte it's asked for, there's some kind of backing now reserved for all of this stuff, so now you're golden.
Write/steal your own memory manager, and direct it to allocate from these buffers. Then use it, consistently, in your app, or take advantage of gcc's --wrap option to forward calls from malloc and friends appropriately. If you use any libraries that can't be directed to call into your memory manager, junk them, because they'll just get in your way. Lack of overridable memory management calls is evidence of deeper-seated issues; you're better of without this particular component. (Note: even if you're using --wrap, trust me, this is still evidence of a problem! Life is too short to use those libraries that don't let you overload their memory management!)
Once you run out of memory, OK, you're screwed, but you've still got that space you left free before, so if freeing up some of the memory you've asked for is too difficult you can (with care) call system calls to write a message to the system log and then terminate, or whatever. Just make sure to avoid calls to the C library, because they'll probably try to allocate some memory when you least expect it -- programmers who work with systems that have virtualised address spaces are notorious for this kind of thing -- and that's the very thing that has caused the problem in the first place.
This approach might sound like a pain in the arse. Well... it is. But it's straightforward, and it's worth putting in a bit of effort for that. I think there's a Kernighan-and/or-Ritche quote about this.
If your application is likely to allocate large blocks of memory and risks hitting the per-process or VM limits, waiting until an allocation actually fails is a difficult situation from which to recover. By the time malloc returns NULL or new throws std::bad_alloc, things may be too far gone to reliably recover. Depending on your recovery strategy, many operations may still require heap allocations themselves, so you have to be extremely careful on which routines you can rely.
Another strategy you may wish to consider is to query the OS and monitor the available memory, proactively managing your allocations. This way you can avoid allocating a large block if you know it is likely to fail, and will thus have a better chance of recovery.
Also, depending on your memory usage patterns, using a custom allocator may give you better results than the standard built-in malloc. For example, certain allocation patterns can actually lead to memory fragmentation over time, so even though you have free memory, the available blocks in the heap arena may not have an available block of the right size. A good example of this is Firefox, which switched to dmalloc and saw a great increase in memory efficiency.
I don't think that capturing the failure of malloc or new will gain you much in your situation. linux allocates large chunks of virtual pages in malloc by means of mmap. By this you may find yourself in a situation where you allocate much more virtual memory than you have (real + swap).
The program then will only fail much later with a segfault (SIGSEGV) when you write to the first page for which there isn't any place in swap. In theory you could test for such situations by writing a signal handler and then dirtying all pages that you allocate.
But usually this will not help much either, since your application will be in a very bad state long before that: constantly swapping, computing mechanically with your harddisk...
It's possible for writes to the syslog to fail in low memory conditions: there's no way to know that for every platform without looking at the source for the relevant functions. They could need dynamic memory to format strings that are passed in, for instance.
Long before you run out of memory, however, you'll start paging stuff to disk. And when that happens, you can forget any performance advantages from caching.
Personally, I'm convinced by the design behind Varnish: the operating system offers services to solve a lot of the relevant problems, and it makes sense to use those services (minor editing):
So what happens with Squid's elaborate memory management is that it gets into fights with the kernel's elaborate memory management ...
Squid creates a HTTP object in RAM and it gets used some times rapidly after creation. Then after some time it get no more hits and the kernel notices this. Then somebody tries to get memory from the kernel for something and the kernel decides to push those unused pages of memory out to swap space and use the (cache-RAM) more sensibly for some data which is actually used by a program. This however, is done without Squid knowing about it. Squid still thinks that these http objects are in RAM, and they will be, the very second it tries to access them, but until then, the RAM is used for something productive. ...
After some time, Squid will also notice that these objects are unused, and it decides to move them to disk so the RAM can be used for more busy data. So Squid goes out, creates a file and then it writes the http objects to the file.
Here we switch to the high-speed camera: Squid calls write(2), the address it gives is a "virtual address" and the kernel has it marked as "not at home". ...
The kernel tries to find a free page, if there are none, it will take a little used page from somewhere, likely another little used Squid object, write it to the paging ... space on the disk (the "swap area") when that write completes, it will read from another place in the paging pool the data it "paged out" into the now unused RAM page, fix up the paging tables, and retry the instruction which failed. ...
So now Squid has the object in a page in RAM and written to the disk two places: one copy in the operating system's paging space and one copy in the filesystem. ...
Here is how Varnish does it:
Varnish allocate some virtual memory, it tells the operating system to back this memory with space from a disk file. When it needs to send the object to a client, it simply refers to that piece of virtual memory and leaves the rest to the kernel.
If/when the kernel decides it needs to use RAM for something else, the page will get written to the backing file and the RAM page reused elsewhere.
When Varnish next time refers to the virtual memory, the operating system will find a RAM page, possibly freeing one, and read the contents in from the backing file.
And that's it. Varnish doesn't really try to control what is cached in RAM and what is not, the kernel has code and hardware support to do a good job at that, and it does a good job.
You may not need to write caching code at all.
As has been stated, exhausting memory means that all bets are off. IMHO the best method of handling this situation is to fail gracefully (as opposed to simply crashing!). Your cache could allocate a reasonable amount of memory on instantiation. The size of this memory would equate to an amount that, when freed, will allow the program to terminate reasonably. When your cache detects that memory is becoming low then it should release this memory and instigate a graceful shutdown.
I'm writing an caching app that consumes large amounts of memory.
Hopefully, I'll manage my memory well enough, but I'm just thinking about what to do if I do run out of memory.
If you are writing deamon which should run 24/7/365, then you should not use dynamic memory management: preallocate all the memory in advance and manage it using some slab allocator/memory pool mechanism. That will also protect you again the heap fragmentation.
If a call to allocate even a simple object fails, is it likely that even a syslog call will also fail?
Should not. This is partially reason why syslog exists as a syscall: that application can report an error independent of its internal state.
If malloc or new returns a NULL or 0L value then it essentially means the call failed and it can't give you the memory for some reason. So, what would be the sensible thing to do in that case?
I generally try in the situations to properly handle the error condition, applying the general error handling rules. If error happens during initialization - terminate with error, probably configuration error. If error happens during request processing - fail the request with out-of-memory error.
For plain heap memory, malloc() returning 0 generally means:
that you have exhausted the heap and unless your application free some memory, further malloc()s wouldn't succeed.
the wrong allocation size: it is quite common coding error to mix signed and unsigned types when calculating block size. If the size ends up mistakenly negative, passed to malloc() where size_t is expected, it becomes very large number.
So in some sense it is also not wrong to abort() to produce the core file which can be analyzed later to see why the malloc() returned 0. Though I prefer to (1) include the attempted allocation size in the error message and (2) try to proceed further. If application would crash due to other memory problem down the road (*), it would produce core file anyway.
(*) From my experience of making software with dynamic memory management resilient to malloc() errors I see that often malloc() returns 0 not reliably. First attempts returning 0 are followed by a successful malloc() returning valid pointer. But first access to the pointed memory would crash the application. This is my experience on both Linux and HP-UX - and I have seen similar pattern on Solaris 10 too. The behavior isn't unique to Linux. To my knowledge the only way to make an application 100% resilient to memory problems is to preallocate all memory in advance. And that is mandatory for mission critical, safety, life support and carrier grade applications - they are not allowed dynamic memory management past initialization phase.
I don't know why many of the sensible answers are voted down. In most server environments, running out of memory means that you have a leak somewhere, and that it makes little sense to 'free some memory and try to go on'. The nature of C++ and especially the standard library is that it requires allocations all the time. If you are lucky, you might be able to free some memory and execute a clean shutdown, or at least emit a warning.
It is however far more likely that you won't be able to do a thing, unless the allocation that failed was a huge one, and there is still memory available for 'normal' things.
Dan Bernstein is one of the very few guys I know that can implement server software that operates in memory constrained situations.
For most of the rest of us, we should probably design our software that it leaves things in a useful state when it bails out because of an out of memory error.
Unless you are some kind of brain surgeon, there isn't a lot else to do.
Also, very often you won't even get an std::bad_alloc or something like that, you'll just get a pointer in return to your malloc/new, and only die when you actually try to touch all of that memory. This can be prevented by turning off overcommit in the operating system, but still.
Don't count on being able to deal with the SIGSEGV when you touch memory that the kernel hoped you wouldn't be.. I'm not quite sure how this works on the windows side of things, but I bet they do overcommit too.
All in all, this is not one of C++'s strong spots.

Information about PTE's (Page Table Entries) in Windows

In order to find more easily buffer overflows I am changing our custom memory allocator so that it allocates a full 4KB page instead of only the wanted number of bytes. Then I change the page protection and size so that if the caller writes before or after its allocated piece of memory, the application immediately crashes.
Problem is that although I have enough memory, the application never starts up completely because it runs out of memory. This has two causes:
since every allocation needs 4 KB, we probably reach the 2 GB limit very soon. This problem could be solved if I would make a 64-bit executable (didn't try it yet).
even when I only need a few hundreds of megabytes, the allocations fail at a certain moment.
The second problem is the biggest one, and I think it's related to the maximum number of PTE's (page table entries, which store information on how Virtual Memory is mapped to physical memory, and whether pages should be read-only or not) you can have in a process.
My questions (or a cry-for-tips):
Where can I find information about the maximum number of PTE's in a process?
Is this different (higher) for 64-bit systems/applications or not?
Can the number of PTE's be configured in the application or in Windows?
Thanks,
Patrick
PS. note for those who will try to argument that you shouldn't write your own memory manager:
My application is rather specific so I really want full control over memory management (can't give any more details)
Last week we had a memory overwrite which we couldn't find using the standard C++ allocator and the debugging functionality of the C/C++ run time (it only said "block corrupt" minutes after the actual corruption")
We also tried standard Windows utilities (like GFLAGS, ...) but they slowed down the application by a factor of 100, and couldn't find the exact position of the overwrite either
We also tried the "Full Page Heap" functionality of Application Verifier, but then the application doesn't start up either (probably also running out of PTE's)
There is what i thought was a great series of blog posts by Mark Russinovich on technet called "Pushing the limits of Windows..."
http://blogs.technet.com/markrussinovich/archive/2008/07/21/3092070.aspx
It has a few articles on virtual memory, paged nonpaged memory, physical memory and others.
He mentions little utilities he uses to take measurements about a systems resources.
Hopefully you will find your answers there.
A shotgun approach is to allocate those isolated 4KB entries at random. This means that you will need to rerun the same tests, with the same input repeatedly. Sometimes it will catch the error, if you're lucky.
A slightly smarter approach is to use another algorithm than just random - e.g. make it dependent on the call stack whether an allocation is isolated. Do you trust std::string users, for instance, and suspect raw malloc use?
Take a look at the implementation of OpenBSD malloc. Much of the same ideas (and more) implemented by very skilled folk.
In order to find more easily buffer
overflows I am changing our custom
memory allocator so that it allocates
a full 4KB page instead of only the
wanted number of bytes.
This has already been done. Application Verifier with PageHeap.
Info on PTEs and the Memory architecture can be found in Windows Internals, 5th Ed. and the Intel Manuals.
Is this different (higher) for 64-bit systems/applications or not?
Of course. 64bit Windows has a much larger address space, so clearly more PTEs are needed to map it.
Where can I find information about the
maximum number of PTE's in a process?
This is not so important as the maximum amount of user address space available in a process. (The number of PTEs is this number divided by the page size.)
This is 2GB on 32 bit Windows and much bigger on x64 Windows. (The actual number varies, but it's "big enough").
Problem is that although I have enough
memory, the application never starts
up completely because it runs out of
memory.
Are you a) leaking memory? b) using horribly inefficient algorithms?