Does it take time to deallocate memory?

Does it take time to deallocate memory? - c++

I have a C++ program which, during execution, will allocate about 3-8Gb of memory to store a hash table (I use tr1/unordered_map) and various other data structures.
However, at the end of execution, there will be a long pause before returning to shell.
For example, at the very end of my main function I have
std::cout << "End of execution" << endl;
But the execution of my program will go something like
$ ./program
do stuff...
End of execution
[long pause of maybe 2 min]
$ -- returns to shell
Is this expected behavior or am I doing something wrong?
I'm guessing that the program is deallocating the memory at the end. But, commercial applications which use large amounts of memory (such as photoshop) do not exhibit this pause when you close the application.
Please advise :)
Edit: The biggest data structure is an unordered_map keyed with a string and stores a list of integers.
I am using g++ -O2 on linux, the computer I am using has 128GB of memory (with most of that free). There are a few giant objects
Solution: I ended up getting rid of the hashtable since it was almost full anyways. This solved my problem.

If the data structures are sufficiently complicated when your program finishes, freeing them might actually take a long time.
If your program actually must create such complicated structures (do some memory profiling to make sure), there probably is no clean way around this.
You can short cut that freeing of memory by a dirty hack - at least on those operating systems where all memory allocated by a process is automatically freed when the process terminates.
You would do that by directly calling the libc's exit(3) function or the operating system's _exit(2). However, I would be very careful about verifying this does not short-circuit any other (important) cleanups some C++ destructor code might be doing. And what this does or does not do is highly system dependent (operating system, compiler, libc, the APIs you were using, ...).

Yes the deallocation of memory can take some time, and also possibly you have code executing like destructors being called. Photoshop does not use 3-8GB of memory.
Also you should perhaps add profiling to your application to confirm it is the deallocation of memory and not something else.

(I started this as a reply to ndim, but it got to long)
As ndim already posted, termination can take a long time.
Likely reasons are:
you have lots of allocations, and parts of the heap are swapped to disk.
long running destructors
other atexit routines
OS specific cleanup, such as notifying DLL's of thread & process termination on Windows (don't know what exactly happens on Linux.)
exit is not the worst workaround here, however, actual behavior is system dependent. e.g. exit on WIndows / MSVC CRT will run global destructors / atexit routines, then call ExitProcess which does close handles (but not necessarily flush them - at least it's not guaranteed).
Downsides: Destructors of heap allocated objects don't run - if you rely on them (e.g. to save state), you are toast. Also, tracking down real memory leaks gets much harder.
Find the cause You should first analyze what is happening.
e.g. by manually freeing the root objects that are still allocated, you can separate the deallocation time from other process cleanup. Memory is the likely cause accordign to your description, but it's not the only possible one. Some cleanup code deadlocking before it runs into a timeout is possible, too. Monitoring stats (such as CPU/swap activity/disk use) can give clues.
Check the release build - debug builds usually use extra data on the heap that can immensely increase cleanup cost.
Different allocators
Ifdeallocation is the problem, you might benefit a lot from using custom allocation mechanisms. Example: if your map only grows (items are never removed), an arena allocator can help a lot. If your lists of integers have many nodes, switch to a vector, or use a rope if you need random insertion.

Certainly it's possible.
About 7 years ago I had a similar problem on a project, there was much less memory but computers were slower too I suppose.
We had to look at the assembly languge for free in the end to work out why it was so slow and it seemed that it was essentially keeping the freed blocks in a linked list so they could be reallocated and was also scanning that list looking for blocks to combine. Scanning the list was an O(n) operation but freeing 'n' objects turned it into O(n^2)
Our test data took about 5 seconds to free the memory but some customers had about 10 times as much data as we every used and it was taking 5-10 minutes to shut down the program on their systems.
We fixed it, as has been suggested by just terminating the process instead and letting the operating system clear up the mess (which we knew was safe to do on our application).
Perhaps you have a more sensible free function that we had several years ago, but I just wanted to post that it's entirely possible if you have many objects to free and an O(n) free operation.

I can't imagine how you'd use enough memory for it to matter, but one way I sped up a program was to use boost::object_pool to allocate memory for a binary tree. The major benefit for me was that I could just put the object pool as a member variable of the tree, and when the tree went out of scope or was deleted, the object pool would be deleted all at once (letting me not have to use a recursive deconstructor for the nodes). object_pool does call all of its objects decontructors at exit though. I'm not sure if it handles empty decontructors in a special way or not.
If you don't need your allocator to call a constructor, you can also use boost::pool, which I think may deallocate faster because it doesn't have to call deconstructors at all and just deleted the chunk of memory in one free().

Freeing memory may well take time - data structures are being updated. How much time depends on the allocator being used.
Also there might be more than just memory deallocation going on - if destructors are being executed, there may be a lot more than that going on.
2 minutes does sound like a lot of time though - you might want to step through the clean up code in a debugger (or use a profiler if that's more convenient) to see what's actually taking all the time.

The time is probably not entirely wasted deallocating memory, but calling all the destructors. You can provide your own allocator that does not call the destructor (if the object in the map doesn't need to be destructed, but only deallocated).
Also take a look at this other question: C++ STL-conforming Allocators

Normally, deallocating memory as a process ends is not taken care of as part of the process, but rather as an operating system cleanup function. You might try something like valgrind to make sure your memory is being dealt with properly. However, the compiler also does certain things to set up and tear down your program, so some sort of performance profiling, or using a debugger to step through what is taking place at teardown time might be useful.

when your program exits the destructors of all the global objects are called.
if one of them takes a long time, you will see this behavior.
look for global objects and investigate their destructors.

Sorry, but this is a terrible question. You need to show the source code showing the specific algorithms and data structures that you are using.
It could be de-allocating, but that's just a wild guess. What are your destructors doing? Maybe is paging like crazy. Just because your application allocates X amount of memory, that doesn't mean it will get it. Most likely it will be paging off virtual memory. Depending on how the specifics of your application and OS, you might be doing a lot of page faults.
In such cases, it might help to run iostat and vmstat on the background to see what the heck is going on. If you see a lot of I/O that's a sure sign you are page faulting. I/O operations will always be more expensive that memory ops.
I would be very surprised if indeed all that lapsed time at the end is purely due to de-allocation.
Run vmstat and iostat as soon as you get the "ending" message, and look for any indications of I/O going bananas.

The objects in memory are organized in a heap. They are not deleted at once, they are deleted one by one, and the cost of deleting an object is O(log n). Freeing them takes loooong.
The answer is then, yes, it takes so much time.

You can avoid free being called on an object by using a destructor call my_object->~my_class() instead of delete my_object. You can avoid free on all objects of a class by overriding and nullifying operator delete( void * ) {} inside the class. Derived classes with virtual destructors will inherit that delete, otherwise you can copy-paste (or maybe using base::operator delete;).
This is much cleaner than calling exit. Just be sure you don't need that memory back!

I guess your unordered map is a global variable, whose constructor is called at process startup, and destructor is called at process exit.
How could you know if the map is guilty?
You can test if your unordered_map is responsible (and I guess it is) by allocating it with a new, and, well, ahem... forget to delete it.
If your process' exit goes faster, then you have your culprit.
Why this is so sloooooow?
Now, just by reading your post, for your unordered map, I see potential allocations for:
strings allocated buffer
list items (each one being a string + other things)
unordered map items + the bucket array
If you have 3-8 Gb of data in this unordered map, this means that each item above will need some kind of new and delete. And if you free every item, one by one, it could take time.
Other reasons?
Note that if you add items to your map item by item while your process executing, the new are not exactly perceptible... But the moment you want to clean all, all your allocated items must be destroyed at the same time, which could explain the perceived difference between construction/use and destruction...
Now, the destructors could take time for an additional reason.
For example, on Visual C++ 2008 in debug mode, for example, upon destruction of STL iterators, the destructor verifies the iterators are still correct. This caused quite a slowdown upon my object destruction (which was basically a tree of nodes, each node having list of child nodes, with iterators everywhere).
You are working on gcc, so perhaps they have their own debug testing, or perhaps your destructors are doing additional work (e.g. logging?)...

In my experience, the calls to free or delete should not take a significant amount of time. That said, I have seen plenty of cases where it does take non-trivial time to destruct objects because of destructors that did non-trivial things. If you can't tell what's taking time during the destruction, use a debugger and/or a profiler to determine what's going on. If the profiler shows you that it really is calls to free() that take a lot of time, then you should improve your memory allocation scheme, because you must be creating an extremely large number of small objects.
As you noted plenty of applications allocate large amounts of memory, and incur no significant memory during shutdown, so there's no reason your program can't do the same.

I would recommend (as some others have) a simple forced process termination, if you're certain that you've nothing left to do but free memory (for example, no file i/o and such left to do).
The thing is that when you free memory, typically, it's not actually returned to the OS - it's held in a list to be reallocated, and this is obviously slow. However, if you terminate process, the OS will lump reclaim all your memory at once, which should be substantially faster. However, as others have said, if you have any destructors that need to run, you should ensure that they are run before force calling exit() or ExitProcess or anysuch function.
What you should be aware of is that deallocating memory that is spread out (e.g., two nodes in a map) is much slower due to cache effects than deallocating memory in a vector, because the CPU needs to access the memory to free it and run any destructors. If you deallocated a very large amount of memory that's very fragmented, you could be falling afoul of this, and should consider changing to some more contiguous structures.
I actually had a problem where allocating memory was faster than de-allocating it, and after allocating memory and then de-allocating it, I had a memory leak. Eventually, I worked out that this is why.

I am currently facing a similar issue, with a CPU & memory intensive research program of mine. It runs until a specified time limit, prints a solutions and exits. The destructor call of a single object (containing up to 10⁶ relatively small objects) was what unexpectedly took time at the end of execution (about 10sec. to free 5Gb of data).
I was not satisfied by the answers advising to avoid executing every destructor, so here is the solution I came up with:
Original code:
void process() {
vector<unordered_map<State, int>> large_obj(100);
// Processing...
} // Takes a few seconds to exit (destructor calls)
Solution:
void process(bool free_mem = false) {
auto * large_obj_ = new vector<unordered_map<State, int>>(100);
auto &large_obj = *large_obj;
// Processing...
// (No changes required here, 'large_obj' can be used exactly as before)
if(free_mem)
delete large_obj_;
}
It has the advantage of being completely transparent apart from a few lines to insert, and it can even be parametrized to take some time to free the memory if needed. It is explicit which object will intentionally not be freed to avoid leaving things in an "unstable" state. Memory is cleaned up instantly by the OS on exit when free_mem = false.

Related

Heap out of memory, all retained memory held by noscript_shared_function_infos

My application creates a lot of functions with callbacks, this is done in the following way:
v8::Local<v8::Function> myFunc = v8::Function::New(
i->GetCurrentContext(),
FunctionInvokerCallback(),
this->WrapDelegate(callbackInvoke),
0,
v8::ConstructorBehavior::kThrow,
v8::SideEffectType::kHasSideEffect).ToLocalChecked();
//persistentObject is a weak Global reference with a callback to clean up native resources
this->objectHandle->persistentObject->Get(i)->Set(i->GetCurrentContext(), functionName, myFunc).FromJust();
The memory grows and eventually there is an OOM error and a crash. Upon looking at the heap snapshot, I find that most of the retained memory is held by noscript_shared_function_infos in (strong roots).
My guess is either the sharedFunctionInfos aren't cleaned up (and grow and grow and grow), or worse my actual functions aren't cleaned up (when no longer in use).
How do I delete the infos / or the actual functions after I'm done?

Have you tried while(!V8::IdleNotification()) {};?
Shouldn't noscript_shared_function_infos be handled by v8's GC?
I have same issue but with Script::Run in new context. Seems like every time it invoked with script that has function ...() {} it appends some bytes to noscript_shared_function_infos WeakRef Array then never clean it.
I've found only some questions on different platforms (like that) and zero answers from someone who knows v8 well. It's sad.

Thread with and without a memory leak

I'm stuck with a c++ assignment where I should make a simple thread and another thread that has the same logic but also have a memory leak.
This should just be an easy thread example, even not doing anything useful in itself. So I guess my question is, what is the easiest thread that can be made in c++ and if I have understood correctly that to make it leak memory, I should make a variable, that is never deleted?
Also should this "leak" be placed in a loop or made to repeat in some other fashion...because for me just leaving one variable undeleted doesn't seem like a major leak.

This would be enough for a leak:
new char;
You can place it in a loop if you want more, but be careful -
while( true ) {
new char;
}
brings most systems to a halt quite quickly - they start swapping and become barely usable. IMO you should stick to leaking a couple of objects unless you have other specific requirements.

You could always allocate a large object (such as a big buffer) and never free it; that way a single allocation would be a substantial memory leak.
As well, if you had a thread that was designed as some kind of frequently called worker thread and had a small memory leak there, over the runtime of your program you could easily run into memory problems through "death of a thousand cuts" style leaks.

There is a Boost Thread library, which is probably your easiest option for threads in C++. Yes, a memory leak is just an undeleted variable. If you don't want a one-variable memory leak, just allocate an array of whatever size you deem necessary. new char[x], where x is how many bytes of memory leakage you want

Quick successful exit from C++ with lots of objects allocated

I'm looking for a way to quickly exit a C++ that has allocated a lot of structures in memory using C++ classes. The program finishes correctly, but after the final "return" in the program, all of the auto-destructors kick in. The problem is the program has allocated about 15GB of memory through lots of C++ class structures, and this auto-destruct process takes about 1 more hour itself to complete as it walks through all of the structures - even though I don't care about the results. The program only took 1 hour to complete the task up to this point. I would like to just return to the OS and let it do its normal wholesale process allocation deletion - which is very quick. I've been doing this by manually killing the process during the cleanup stage - but am looking for a better programic solution.
I would like to return a success to the OS, but don't care to keep any of the memory content. The program does perform a lot of dynamic allocation/deallocation during the normal processing, so it's not just simple heap management.
Any opinions?

In Standard C++ you only have abort(), but that has the process return failure to the OS.
On many platforms (Unix, MS Windows) you can use _exit() to exit the program without running cleanup and destructors.

C++0x std::quick_exit is what you are looking for if your compiler already supports it (g++-4.4.5 does).

If the 15 GB of memory is being allocated to a reasonably small number of classes, you could override operator delete for those classes. Just pass the call to the standard delete, but set up a global flag that, if set, will make the call to delete a no-op. Or, if the logic of your program is such that these objects are not deleted in the normal course of building your data structures, you could simply ignore delete in all cases for these classes.

As Naveen says, this can't be a matter of memory deallocation. I've written neural network simulations with evolutionary algorithms that where allocating and freed lots of memory in small and large chunks and this was never a major issue.

If you have a C99 compiler, you can use the _Exit function to end immediately without having global object destructors or any functions registered with atexit to be called; whether or not unwritten buffered file data is flushed, open streams are closed, or temporary files are removed is implementation-defined (C99 §7.20.4.4).
If you're on Windows, you can also use ExitProcess to achieve the same effect.
But, as others have said, your destructors should really not be taking an hour to run unless you're doing a fair amount of I/O (writing log files, etc.). I strongly, strongly recommend you profile your program to see where the time is spent.

The possible strategies depend on the number of objects that are directly visible in main through which you access the 15GB of data and if these are local to main or statically allocated.
If all access to the 15GB of data is through local objects in main, then you can simply replace the return 0; at the end of main with exit(0);.
exit will terminate your application and trigger cleanup of statically allocated variables, but not of local variables.
If the data is accessed through a handful of statically allocated variables, you could turn them into pointers (or references) to dynamically allocated memory and deliberately leak that.

Can the C++ `new` operator ever throw an exception in real life?

Can the new operator throw an exception in real life?
And if so, do I have any options for handling such an exception apart from killing my application?
Update:
Do any real-world, new-heavy applications check for failure and recover when there is no memory?
See also:
How often do you check for an exception in a C++ new instruction?
Is it useful to test the return of “new” in C++?
Will new return NULL in any case?

Yes, new can and will throw if allocation fails. This can happen if you run out of memory or you try to allocate a block of memory too large.
You can catch the std::bad_alloc exception and handle it appropriately. Sometimes this makes sense, other times (read: most of the time) it doesn't. If, for example, you were trying to allocate a huge buffer but could work with less space, you could try allocating successively smaller blocks.

The new operator, and new[] operator should throw std::bad_alloc, but this is not always the case as the behavior can be sometimes overridden.
One can use std::set_new_handler and suddenly something entirely different can happen than throwing std::bad_alloc. Although the standard requires that the user either make memory available, abort, or throw std::bad_alloc. But of course this may not be the case.
Disclaimer: I am not suggesting to do this.

If you are running on a typical embedded processor running Linux without virtual memory it is quite likely your process will be terminated by the operating system before new fails if you allocate too much memory.
If you are running your program on a machine with less physical memory than the maximum of virtual memory (2 GB on standard Windows) you will find that once you have allocated an amount of memory approximately equal to the available physical memory, further allocations will succeed but will cause paging to disk. This will bog your program down and you might not actually be able to get to the point of exhausting virtual memory. So you might not get an exception thrown.
If you have more physical memory than the virtual memory, and you simply keep allocating memory, you will get an exception when you have exhausted virtual memory to the point where you can not allocate the block size you are requesting.
If you have a long-running program that allocates and frees in many different block sizes, including small blocks, with a wide variety of lifetimes, the virtual memory may become fragmented to the point where new will be unable to find a large enough block to satisfy a request. Then new will throw an exception. If you happen to have a memory leak that leaks the occasional small block in a random location that will eventually fragment memory to the point where an arbitrarily small block allocation will fail, and an exception will be thrown.
If you have a program error that accidentally passes a huge array size to new[], new will fail and throw an exception. This can happen for example if the array size is actually some sort of random byte pattern, perhaps derived from uninitialized memory or a corrupted communication stream.
All the above is for the default global new. However, you can replace global new and you can provide class-specific new. These too can throw, and the meaning of that situation depends on how you programmed it. it is usual for new to include a loop that attempts all possible avenues for getting the requested memory. It throws when all those are exhausted. What you do then is up to you.
You can catch an exception from new and use the opportunity it provides to document the program state around the time of the exception. You can "dump core". If you have a circular instrumentation buffer allocated at program startup, you can dump it to disk before you terminate the program. The program termination can be graceful, which is an advantage over simply not handling the exception.
I have not personally seen an example where additional memory could be obtained after the exception. One possibility however, is the following: Suppose you have a memory allocator that is highly efficient but not good at reclaiming free space. For example, it might be prone to free space fragmentation, in which free blocks are adjacent but not coalesced. You could use an exception from new, caught in a new_handler, to run a compaction procedure for free space before retrying.
Serious programs should treat memory as a potentially scarce resource, control its allocation as much as possible, monitor its availability and react appropriately if something seems to have gone dramatically wrong. For example, you could make a case that in any real program there is quite a small upper bound on the size parameter passed to the memory allocator, and anything larger than this should cause some kind of error handling, whether or not the request can be satisfied. You could argue that the rate of memory increase of a long-running program should be monitored, and if it can be reasonably predicted that the program will exhaust available memory in the near future, an orderly restart of the process should be begun.

In Unix systems, it's customary to run long-running processes with memory limits (using ulimit) so that it doesn't eat up all of a system's memory. If your program hits that limit, you will get std::bad_alloc.
Update for OP's edit: the most typical case of programs recovering from an out-of-memory condition is in garbage-collected systems, which then performs a GC and continues. Though, this sort of on-demand GC is really for last-ditch efforts only; usually, good programs try to GC periodically to reduce stress on the collector.
It's less usual for non-GC programs to recover from out-of-memory issues, but for Internet-facing servers, one way to recover is to simply reject the request that's causing the memory to run out with a "temporary" error. ("First in, first served" strategy.)

osgx said:
Does any real-world applications
checks a lot number of news and can
recover when there is no memory?
I have answered this previously in my answer to this question, which is quoted below:
It is very difficult to handle this
sort of situation. You may want to
return a meaningful error to the user
of your application, but if it's a
problem caused by lack of memory, you
may not even be able to afford the
memory to allocate the error message.
It's a bit of a catch-22 situation
really.
There is a defensive programming
technique (sometimes called a memory
parachute or rainy day fund) where you
allocate a chunk of memory when your
application starts. When you then
handle the bad_alloc exception, you
free this memory up, and use the
available memory to close down the
application gracefully, including
displaying a meaningful error to the
user. This is much better than
crashing :)

You don't need to handle the exception in every single new :) Exceptions can propagate. Design your code so that there are certain points in each "module" where that error is handled.

It depends on the compiler/runtime and on the operator new that you are using (e.g. certain versions of Visual Studio will not throw out of the box, but would rather return a NULL pointer a la malloc instead.)
You can always catch a std::bad_alloc exception, or explicitly use nothrow new to return NULL instead of throwing. (Also see past StackOverflow posts revolving around the subject.)
Note that operator new, like malloc, will fail when you have run out of memory, out of address space (e.g. 2-3GB in a 32-bit process depending on the OS), out of quota (ulimit was already mentioned) or out of contiguous address space (e.g. fragmented heap.)

Yes, new can throw std::bad_alloc (a subclass of std::exception), which you may catch.
If you absolutely want to avoid this exception, and instead are ready to test the result of new for a null pointer, you may add a nothrow argument:
T* p = new (nothrow) T(...);
if (p == 0)
{
// Do something about the bad allocation!
}
else
{
// Here you may use p.
}

Yes new will throw an exception if there is no more memory available, but that doesn't mean you should wrap every new in a try ... catch. Only catch the exception if your program can actually do something about it.
If the program cannot do anything to handle that exceptional situation, what is often the case if you run out of memory, there is no use in catching the exception. If the only thing you could reasonably do is to abort the program you can as well just let the exception bubble up to top level, where it will terminate the program as well.

In many cases there's no reasonable recovery for an out of memory situation, in which case it's probably perfectly reasonable to let the application terminate. You might want to catch the exception at a high level to display a nicer error message than the compiler might give by default, but you might have to play some tricks to get even that to work (since the process is likely to be very low on resources at that point).
Unless you have a special situation that can be handled and recovered, there's probably no reason to spend a lot of effort trying to handle the exception.

Note that in Windows, very large new/mallocs will just allocate from virtual memory. In practice, your machine will crash before you see that exception.
char *pCrashMyMachine = new char[TWO_GIGABYTES];
Try it if you dare!

I use Mac OS X, and I've never seen malloc return NULL (which would imply an exception from new in C++). The machine bogs down, does its best to allocate dwindling memory to processes, and finally sends SIGSTOP and invites the user to kill processes rather than have them deal with allocation failure.
However, that's just one platform. CERTAINLY there are platforms where the default allocator does throw. And, as Chris says, ulimit may introduce an artificial constraint so that an exception would be the expected behavior.
Also, there are allocators besides the default one/malloc. If a class overrides operator new, you use custom arguments to new(…), or you pass an allocator object into a container, it probably defines its own conditions to throw bad_alloc.

new operator will throw std::bad_alloc exception when there are not enough available memory in the pool to fulfill runtime request.
This can happen on bad design or when memory allocated are not freed correctly.
Handling of such exception is based on your design, one way will be pause and retry some time later, hoping more memory returned to the pool and the request may succeed.

Most realistically new will throw due to a decision to limit a resource. Say this class (which may be memory intensive) takes memory out of the physicals pool and if to many objects take from it (we need memory for other things like sound, textures etc) it may throw instead of crashing later on when something that should be able to allocate memory takes it. (looks like a weird side effect).
Overloading new can be useful in devices with restricted memory. Such as handhelds or on consoles when its too easy to go overboard with cool effects.

Yes, new can and will throw.
Since you are asking about 'real' programs: I've worked on various shrink-wrapped commercial software applications for over 20 years. 'Real' programs with millions of users. That you can go and buy off the shelf today. Yes, new can throw.
There are various ways to handle this.
First, write your own new_handler (this is called before new gives up and throws - see set_new_handler() function). When your new_handler is called, see if you can free some things you don't really need. Also warn the user that they are running low on memory. (yes, it can be hard to warn the user about anything if you are really low).
One thing is to have pre-allocated, at the start of your program some 'extra' memory. When you run out of memory, use this extra memory to help save a copy of the user's document to disk. Then warn, and maybe exit gracefully.
Etc. This is just a overview, obviously there is more to it.
Handling low memory is not easy.

The new-handler function is the function called by allocation functions whenever new attempt to allocate the memory fails. We can have our own logging or some special action, eg,g arranging for more memory etc.
Its intended purpose is one of three things:
1) make more memory available
2) terminate the program (e.g. by calling std::terminate)
3) throw exception of type std::bad_alloc or derived from std::bad_alloc.
The default implementation throws std::bad_alloc. The user can have his own new-handler, which may offer behavior different than the default one. THis should be use only when you really need. See the example for more clarification and default behaviour,
#include <iostream>
#include <new>
void handler()
{
std::cout << "Memory allocation failed, terminating\n";
std::set_new_handler(nullptr);
}
int main()
{
std::set_new_handler(handler);
try {
while (true) {
new int[100000000ul];
}
} catch (const std::bad_alloc& e) {
std::cout << e.what() << '\n';
}
}

It's good to check/catch this exception when you are allocating memory based from something given from outside (from user space, network e.g.), because it could mean an attempt to compromise your application/service/system and you shouldn't allow this to happen.

new operator will throw std::bad_alloc exception when you run out of the memory ( virtual memory to be precise).
If new throws an exception then it is a serious error:
More than available VM is getting allocated ( it fails eventually). You can try reducing the amount of memory than exiting the program by catching std::bad_alloc exception.

What can modify the frame pointer?

I have a very strange bug cropping up right now in a fairly massive C++ application at work (massive in terms of CPU and RAM usage as well as code length - in excess of 100,000 lines). This is running on a dual-core Sun Solaris 10 machine. The program subscribes to stock price feeds and displays them on "pages" configured by the user (a page is a window construct customized by the user - the program allows the user to configure such pages). This program used to work without issue until one of the underlying libraries became multi-threaded. The parts of the program affected by this have been changed accordingly. On to my problem.
Roughly once in every three executions the program will segfault on startup. This is not necessarily a hard rule - sometimes it'll crash three times in a row then work five times in a row. It's the segfault that's interesting (read: painful). It may manifest itself in a number of ways, but most commonly what will happen is function A calls function B and upon entering function B the frame pointer will suddenly be set to 0x000002. Function A:
result_type emit(typename type_trait<T_arg1>::take _A_a1) const
{ return emitter_type::emit(impl_, _A_a1); }
This is a simple signal implementation. impl_ and _A_a1 are well-defined within their frame at the crash. On actual execution of that instruction, we end up at program counter 0x000002.
This doesn't always happen on that function. In fact it happens in quite a few places, but this is one of the simpler cases that doesn't leave that much room for error. Sometimes what will happen is a stack-allocated variable will suddenly be sitting on junk memory (always on 0x000002) for no reason whatsoever. Other times, that same code will run just fine. So, my question is, what can mangle the stack so badly? What can actually change the value of the frame pointer? I've certainly never heard of such a thing. About the only thing I can think of is writing out of bounds on an array, but I've built it with a stack protector which should come up with any instances of that happening. I'm also well within the bounds of my stack here. I also don't see how another thread could overwrite the variable on the stack of the first thread since each thread has it's own stack (this is all pthreads). I've tried building this on a linux machine and while I don't get segfaults there, roughly one out of three times it will freeze up on me.

Stack corruption, 99.9% definitely.
The smells you should be looking carefully for are:-
Use of 'C' arrays
Use of 'C' strcpy-style functions
memcpy
malloc and free
thread-safety of anything using pointers
Uninitialised POD variables.
Pointer Arithmetic
Functions trying to return local variables by reference

I had that exact problem today and was knee-deep in gdb mud and debugging for a straight hour before occurred to me that I simply wrote over array boundaries (where I didn't expect it the least) of a C array.
So, if possible, use vectors instead because any decend STL implementation will give good compiler messages if you try that in debug mode (whereas C arrays punish you with segfaults).

I'm not sure what you're calling a "frame pointer", as you say:
On actual execution of that
instruction, we end up at program
counter 0x000002
Which makes it sound like the return address is being corrupted. The frame pointer is a pointer that points to the location on the stack of the current function call's context. It may well point to the return address (this is an implementation detail), but the frame pointer itself is not the return address.
I don't think there's enough information here to really give you a good answer, but some things that might be culprits are:
incorrect calling convention. If you're calling a function using a calling convention different from how the function was compiled, the stack may become corrupted.
RAM hit. Anything writing through a bad pointer can cause garbage to end up on the stack. I'm not familiar with Solaris, but most thread implementations have the threads in the same process address space, so any thread can access any other thread's stack. One way a thread can get a pointer into another thread's stack is if the address of a local variable is passed to an API that ultimately deals with the pointer on a different thread. unless you synchronize things properly, this will end up with the pointer accessing invalid data. Given that you're dealing with a "simple signal implementation", it seems like it's possible that one thread is sending a signal to another. Maybe one of the parameters in that signal has a pointer to a local?

There's some confusion here between stack overflow and stack corruption.
Stack Overflow is a very specific issue cause by try to use using more stack than the operating system has allocated to your thread. The three normal causes are like this.
void foo()
{
foo(); // endless recursion - whoops!
}
void foo2()
{
char myBuffer[A_VERY_BIG_NUMBER]; // The stack can't hold that much.
}
class bigObj
{
char myBuffer[A_VERY_BIG_NUMBER];
}
void foo2( bigObj big1) // pass by value of a big object - whoops!
{
}
In embedded systems, thread stack size may be measured in bytes and even a simple calling sequence can cause problems. By default on windows, each thread gets 1 Meg of stack, so causing stack overflow is much less of a common problem. Unless you have endless recursion, stack overflows can always be mitigated by increasing the stack size, even though this usually is NOT the best answer.
Stack Corruption simply means writing outside the bounds of the current stack frame, thus potentially corrupting other data - or return addresses on the stack.
At it's simplest:-
void foo()
{
char message[10];
message[10] = '!'; // whoops! beyond end of array
}

That sounds like a stack overflow problem - something is writing beyond the bounds of an array and trampling over the stack frame (and probably the return address too) on the stack. There's a large literature on the subject. "The Shell Programmer's Guide" (2nd Edition) has SPARC examples that may help you.

With C++ unitialized variables and race conditions are likely suspects for intermittent crashes.

Is it possible to run the thing through Valgrind? Perhaps Sun provides a similar tool. Intel VTune (Actually I was thinking of Thread Checker) also has some very nice tools for thread debugging and such.
If your employer can spring for the cost of the more expensive tools, they can really make these sorts of problems a lot easier to solve.

It's not hard to mangle the frame pointer - if you look at the disassembly of a routine you will see that it is pushed at the start of a routine and pulled at the end - so if anything overwrites the stack it can get lost. The stack pointer is where the stack is currently at - and the frame pointer is where it started at (for the current routine).
Firstly I would verify that all of the libraries and related objects have been rebuilt clean and all of the compiler options are consistent - I've had a similar problem before (Solaris 2.5) that was caused by an object file that hadn't been rebuilt.
It sounds exactly like an overwrite - and putting guard blocks around memory isn't going to help if it is simply a bad offset.
After each core dump examine the core file to learn as much as you can about the similarities between the faults. Then try to identify what is getting overwritten. As I remember the frame pointer is the last stack pointer - so anything logically before the frame pointer shouldn't be modified in the current stack frame - so maybe record this and copy it elsewhere and compare upon return.

Is something meaning to assign a value of 2 to a variable but instead is assigning its address to 2?
The other details are lost on me but "2" is the recurring theme in your problem description. ;)

I would second that this definitely sounds like a stack corruption due to out of bound array or buffer writing. Stack protector would be good as long as the writing is sequential, not random.

I second the notion that it is likely stack corruption. I'll add that the switch to a multi-threaded library makes me suspicious that what has happened is a lurking bug has been exposed. Possibly the sequencing the buffer overflow was occurring on unused memory. Now it's hitting another thread's stack. There are many other possible scenarios.
Sorry if that doesn't give much of a hint at how to find it.

I tried Valgrind on it, but unfortunately it doesn't detect stack errors:
"In addition to the performance penalty an important limitation of Valgrind is its inability to detect bounds errors in the use of static or stack allocated data."
I tend to agree that this is a stack overflow problem. The tricky thing is tracking it down. Like I said, there's over 100,000 lines of code to this thing (including custom libraries developed in-house - some of it going as far back as 1992) so if anyone has any good tricks for catching that sort of thing, I'd be grateful. There's arrays being worked on all over the place and the app uses OI for its GUI (if you haven't heard of OI, be grateful) so just looking for a logical fallacy is a mammoth task and my time is short.
Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
No one asked this, but I built with gcc-4.2. Also, I can guarantee ABI safety here so that's also not the issue. As for the "garbage at the end of the stack" on the RAM hit, the fact that it is universally 2 (though in different places in the code) makes me doubt that as garbage tends to be random.

It is impossible to know, but here are some hints that I can come up with.
In pthreads you must allocate the stack and pass it to the thread. Did you allocate enough? There is no automatic stack growth like in a single threaded process.
If you are sure that you don't corrupt the stack by writing past stack allocated data check for rouge pointers (mostly uninitialized pointers).
One of the threads could overwrite some data that others depend on (check your data synchronisation).
Debugging is usually not very helpful here. I would try to create lots of log output (traces for entry and exit of every function/method call) and then analyze the log.
The fact that the error manifest itself differently on Linux may help. What thread mapping are you using on Solaris? Make sure you map every thread to it's own LWP to ease the debugging.

Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
If you pass anything on the stack by reference or by address, this would most certainly happen if another thread tried to use it after the first thread returned from a function.
You might be able to repro this by forcing the app onto a single processor. I don't know how you do that with Sparc.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js