How to free allocated memory in a recursive function in C++

How to free allocated memory in a recursive function in C++ - c++

I have a recursive method in a class that computes some stages (doesn't matter what this is). If it notice that the probability of success for a stage is to low, the stage will be delayed by storing it in a queue and the program will look for delayed stage later. It grabs a stage, copies the data for working and deletes the stage.
The program runs fine, but I got a memory problem. Since this program is randomized it could happen that it delays up to 10 million stages which results in something like 8 to 12 GB memory usage (or more but it crashes before that happens). It seems the program never frees the memory for a deleted stage before reaching the end of the call stack of the recursive function.
class StageWorker
{
private:
queue<Stage*> delayedStages;
void perform(Stage** start)
{
// [long value_from_stage = (**start).value; ]
delete *start;
// Do something here
// Delay a stage if necessary like this
this->delayedStages.push(new Stage(value));
// Do something more here
// Look for delayed stages like this
Stage* front = this->delayedStages.front();
this->delayedStages.pop();
this->perform(&front);
}
}
I use pointer to pointer as I thought the memory is not freed, because there is a pointer (front*) that points to the stage. So I used a pointer that point to the pointer, and so I can delete it. But it seems not to work. If I watch the memory usage on Performance monitor (on Windows) it like like this:
I marked the end of the recursive calls. This is also just a example plot. real data, but in a very small scenario.
Any ideas how to free the memory of not longer used sages before reaching the end of the call stack?
Edit
I followed some advice and removed the pointers. So now it looks like this:
class StageWorker
{
private:
queue<Stage> delayedStages;
void perform(Stage& start)
{
// [long value_from_stage = start.value; ]
// Do something here
// Delay a stage if necessary like this
this->delayedStages.push(Stage(value));
// Do something more here
// Look for delayed stages like this
this->perform(this->delayedStages.front());
this->delayedStages.pop();
}
}
But this changed nothing, memory usage is the same as before.
As mentioned it maybe is a problem of monitoring. Is there a better way to check the exact memory usage?

Just to close this question as answered:
Mats Petersson (thanks a lot) mentiond in the comments that it could be a problem of monitoring the memory usage.
In fact this exactly was the problem. The code I provided in the edit (after replacing the pointers by references) frees the space correctly, but it is not given back to the OS (in this case Windows). So from the monitor it looks like the space is not freed.
But then I made some experiments and saw that freed memory is reused by the application, so it does not have to ask the OS for more memory.
So what the monitor shows is the maximum memory used bevore getting back the whole memory the recursive function required, after the first call of this function ends.

Related

Debugging Memory Leak on C/C++ Application running on Embedded Linux Device

I have an application running on an ARM Cortex-A9. When I enter a certain portion of the code, I can see in the Linux tasks view 'top' that the application grows in memory usage until it gets Killed due to running out of physical memory.
Now, I have done some research on this and tried to implement mtrace, but it didn't give me very concise results. Basically I get something like this
Memory not freed:
-----------------
Address Size Caller
0x03aafe18 0x38 at 0x76e73c18
0x53a004a8 0x38 at 0x76e73c18
And I do not even think this is the big problem (maybe another smaller issue).
I also cannot use Valgrind (which would probably work great) because there is not enough space on the device to install it and a compiler...
So I fear that I just have to go through the code and look for something that could be causing growing memory usage. Is there a guide for this somewhere? In the code, "malloc" or "new" is almost never used.
I do have access to use gdb, if that can help.
One thing I am not clear on is if the following is a problem:
while(someloop){
...
double *someptr;
...
}
or like
while(someloop){
...
int32 someArray[100] = {0};
...
}
Of which there is a lot of in the code. When that loop comes around, and instantiates those variables or pointers, does it just keep using free space, or use the spaces from the last iteration?

If it is alocated on the stack, the memory is reused. However by alocating on heap you need to delete.
Also if you alocate with double * ptr; ... ptr = new double [5], you need to delete by delete [] ptr.
In C++ you can overwrite the new and delete operators to print some message for debuging.
Best would be to debug using gdb and see what object is created and not deleted.
It is possible that you use a class in your code that does not delete something internal.
Tip: for small objects alocating on stack is both faster and safer.

Unexpected Behaviour from tcmalloc

I have been using tcmalloc for a few months in a large project, and so far I must say that I am pretty happy about it, most of all for its HeapProfiling features which allowed to track memory leaks and remove them.
In the past couple of weeks though we experienced random crashes in our application, and we could not find the source of the random crash. In a very particular situation, When the application crashed, we found ourselves with a completely corrupted stack for one of the application threads. Several times instead I found that threads were stuck in tcmalloc::PageHeap::AllocLarge(), but since I dont have debug symbols of tcmalloc linked, I could not understand what the issue was.
After nearly one week of investigation, today I tried the most simple of things: removed tcmalloc from linkage to avoid using it, just to see what happened. Well... I finally found out what the problem was, and the offending code looks very much like this:
void AllocatingFunction()
{
Object object_on_stack;
ProcessObject(&object_on_stack);
}
void ProcessObject(Object* object)
{
...
// Do Whatever
...
delete object;
}
Using libc the application still crashed but I finally saw that I was calling delete on an object which was allocated on the stack.
What I still cant figure out is why tcmalloc instead kept the application running regardless of this very risky (if not utterly wrong) object deallocation, and the double deallocation when object_on_stack goes out of scope when AllocatingFunction ends. The fact is that the offending code could be called repeatedly without any hint of the underlying abomination.
I know that memory deallocation is one of those "undefined behaviour" when not used properly, but my surprise is such a different behaviour between "standard" libc and tcmalloc.
Does anyone have some sort of explanation of insight, on why tcmalloc keeps the application running?
Thanks in advance :)
Have a Nice Day

very risky (if not utterly wrong) object deallocation
well, I disagree here, it is utterly wrong, and since you invoke UB, anything can happen.
It very much depends on what the tcmalloc code acutally does on deallocation, and how it uses the (possibly garbage) data around the stack at that location.
I have seen tcmalloc crash on such occasions too, as well as glibc going into an infinite loop. What you see is just coincidence.

Firstly, there was no double free in your case. When object_on_stack goes out of scope there is no free call, just the stack pointer is decreased (or rather increased, as stack grows down...).
Secondly, during delete TcMalloc should be able to recognize that the address from a stack does not belong to the program heap. Here is a part of free(ptr) implementation:
const PageID p = reinterpret_cast<uintptr_t>(ptr) >> kPageShift;
Span* span = NULL;
size_t cl = Static::pageheap()->GetSizeClassIfCached(p);
if (cl == 0) {
span = Static::pageheap()->GetDescriptor(p);
if (!span) {
// span can be NULL because the pointer passed in is invalid
// (not something returned by malloc or friends), or because the
// pointer was allocated with some other allocator besides
// tcmalloc. The latter can happen if tcmalloc is linked in via
// a dynamic library, but is not listed last on the link line.
// In that case, libraries after it on the link line will
// allocate with libc malloc, but free with tcmalloc's free.
(*invalid_free_fn)(ptr); // Decide how to handle the bad free request
return;
}
Call to invalid_free_fn crashes.

Subtle Memory Leak, and is this common practice?

I think I might be creating a memory leak here:
void commandoptions(){
cout<< "You have the following options: \n 1). Buy Something.\n 2).Check you balance. \n3). See what you have bought.\n4.) Leave the store.\n\n Enter a number to make your choice:";
int input;
cin>>input;
if (input==1) buy();
//Continue the list of options.....
else
commandoptions(); //MEMORY LEAK IF YOU DELETE THE ELSE STATEMENTS!
}
inline void buy(){
//buy something
commandoptions();
}
Let's say commandoptions has just exectued for the first time the program has been run. The user selects '1', meaning the buy() subroutine is executed by the commandoptions() subroutine.
After buy() executes, it calls commandoptions() again.
Does the first commandoptions() ever return? Or did I just make a memory leak?
If I make a subroutine that does nothing but call itself, it will cause a stackoverflow because the other 'cycles' of that subroutine never exit. Am I doing/close to doing that here?
Note that I used the inline keyword on buy... does that make any difference?
I'd happily ask my professor, he just doesn't seem available. :/
EDIT: I can't believe it didn't occur to me to use a loop, but thanks, I learned something new about my terminology!

A memory leak is where you have allocated some memory using new like so:
char* memory = new char[100]; //allocate 100 bytes
and then you forget, after using this memory to delete the memory
delete[] memory; //return used memory back to system.
If you forget to delete then you are leaving this memory as in-use while your program is running and cannot be reused for something else. Seeing that memory is a limited resource, doing this millions of times for example, without the program terminating, would end you with no memory left to use.
This is why we clean up after ourselves.
In C++ you'd use an idiom like RAII to prevent memory leaks.
class RAII
{
public:
RAII() { memory = new char[100]; }
~RAII() { delete[] memory }
//other functions doing stuff
private:
char* memory;
};
Now you can use this RAII class, as so
{ // some scope
RAII r; // allocate some memory
//do stuff with r
} // end of scope destroys r and calls destructor, deleting memory
Your code doesn't show any memory allocations, therefore has no visible leak.
Your code does seem to have endless recursion, without a base case that will terminate the recursion.

Inline keyword won't cause a memory leak.
If this is all the code you have, there shouldn't be a memory leak. It does look like you have infinite recursion though. If the user types '1' then commandoptions() gets called again inside of buy(). Suppose they type '1' in that one. Repeat ad infinum, you then eventually crash because the stack got too deep.
Even if the user doesn't type '1', you still call commandoptions() again inside of commandoptions() at the else, which will have the exact same result -- a crash because of infinite recursion.
I don't see a memory leak with the exact code given however.

This is basically a recursion without a base case. So, the recursion will never end (until you run out of stack space that is).
For what you're trying to do, you're better off using a loop, rather than recursion.
And to answer your specific questions :
No, commandoptions never returns.
If you use a very broad definition of a memory leak, then this is a memory leak, since you're creating stack frames without ever removing them again. Most people wouldn't label it as such though (including me).
Yes, you are indeed gonna cause a stack overflow eventually.
The inline keyword won't make a difference in this.

This is not about memory leak, you are making infinite calls to commandoptions function no matter what the value of input is, which will result in stack crash. You need some exit point in your commandoptions function.

There is no memory leak here. What does happen (at least it looks that way in that butchered code snippet of yours) is that you get into an infinite loop. You might run out of stack space if tail call optimization doesn't kick in or isn't supported by your compiler (it's a bit hard to see whether or not your calls actually are in tail position though).

Does it take time to deallocate memory?

I have a C++ program which, during execution, will allocate about 3-8Gb of memory to store a hash table (I use tr1/unordered_map) and various other data structures.
However, at the end of execution, there will be a long pause before returning to shell.
For example, at the very end of my main function I have
std::cout << "End of execution" << endl;
But the execution of my program will go something like
$ ./program
do stuff...
End of execution
[long pause of maybe 2 min]
$ -- returns to shell
Is this expected behavior or am I doing something wrong?
I'm guessing that the program is deallocating the memory at the end. But, commercial applications which use large amounts of memory (such as photoshop) do not exhibit this pause when you close the application.
Please advise :)
Edit: The biggest data structure is an unordered_map keyed with a string and stores a list of integers.
I am using g++ -O2 on linux, the computer I am using has 128GB of memory (with most of that free). There are a few giant objects
Solution: I ended up getting rid of the hashtable since it was almost full anyways. This solved my problem.

If the data structures are sufficiently complicated when your program finishes, freeing them might actually take a long time.
If your program actually must create such complicated structures (do some memory profiling to make sure), there probably is no clean way around this.
You can short cut that freeing of memory by a dirty hack - at least on those operating systems where all memory allocated by a process is automatically freed when the process terminates.
You would do that by directly calling the libc's exit(3) function or the operating system's _exit(2). However, I would be very careful about verifying this does not short-circuit any other (important) cleanups some C++ destructor code might be doing. And what this does or does not do is highly system dependent (operating system, compiler, libc, the APIs you were using, ...).

Yes the deallocation of memory can take some time, and also possibly you have code executing like destructors being called. Photoshop does not use 3-8GB of memory.
Also you should perhaps add profiling to your application to confirm it is the deallocation of memory and not something else.

(I started this as a reply to ndim, but it got to long)
As ndim already posted, termination can take a long time.
Likely reasons are:
you have lots of allocations, and parts of the heap are swapped to disk.
long running destructors
other atexit routines
OS specific cleanup, such as notifying DLL's of thread & process termination on Windows (don't know what exactly happens on Linux.)
exit is not the worst workaround here, however, actual behavior is system dependent. e.g. exit on WIndows / MSVC CRT will run global destructors / atexit routines, then call ExitProcess which does close handles (but not necessarily flush them - at least it's not guaranteed).
Downsides: Destructors of heap allocated objects don't run - if you rely on them (e.g. to save state), you are toast. Also, tracking down real memory leaks gets much harder.
Find the cause You should first analyze what is happening.
e.g. by manually freeing the root objects that are still allocated, you can separate the deallocation time from other process cleanup. Memory is the likely cause accordign to your description, but it's not the only possible one. Some cleanup code deadlocking before it runs into a timeout is possible, too. Monitoring stats (such as CPU/swap activity/disk use) can give clues.
Check the release build - debug builds usually use extra data on the heap that can immensely increase cleanup cost.
Different allocators
Ifdeallocation is the problem, you might benefit a lot from using custom allocation mechanisms. Example: if your map only grows (items are never removed), an arena allocator can help a lot. If your lists of integers have many nodes, switch to a vector, or use a rope if you need random insertion.

Certainly it's possible.
About 7 years ago I had a similar problem on a project, there was much less memory but computers were slower too I suppose.
We had to look at the assembly languge for free in the end to work out why it was so slow and it seemed that it was essentially keeping the freed blocks in a linked list so they could be reallocated and was also scanning that list looking for blocks to combine. Scanning the list was an O(n) operation but freeing 'n' objects turned it into O(n^2)
Our test data took about 5 seconds to free the memory but some customers had about 10 times as much data as we every used and it was taking 5-10 minutes to shut down the program on their systems.
We fixed it, as has been suggested by just terminating the process instead and letting the operating system clear up the mess (which we knew was safe to do on our application).
Perhaps you have a more sensible free function that we had several years ago, but I just wanted to post that it's entirely possible if you have many objects to free and an O(n) free operation.

I can't imagine how you'd use enough memory for it to matter, but one way I sped up a program was to use boost::object_pool to allocate memory for a binary tree. The major benefit for me was that I could just put the object pool as a member variable of the tree, and when the tree went out of scope or was deleted, the object pool would be deleted all at once (letting me not have to use a recursive deconstructor for the nodes). object_pool does call all of its objects decontructors at exit though. I'm not sure if it handles empty decontructors in a special way or not.
If you don't need your allocator to call a constructor, you can also use boost::pool, which I think may deallocate faster because it doesn't have to call deconstructors at all and just deleted the chunk of memory in one free().

Freeing memory may well take time - data structures are being updated. How much time depends on the allocator being used.
Also there might be more than just memory deallocation going on - if destructors are being executed, there may be a lot more than that going on.
2 minutes does sound like a lot of time though - you might want to step through the clean up code in a debugger (or use a profiler if that's more convenient) to see what's actually taking all the time.

The time is probably not entirely wasted deallocating memory, but calling all the destructors. You can provide your own allocator that does not call the destructor (if the object in the map doesn't need to be destructed, but only deallocated).
Also take a look at this other question: C++ STL-conforming Allocators

Normally, deallocating memory as a process ends is not taken care of as part of the process, but rather as an operating system cleanup function. You might try something like valgrind to make sure your memory is being dealt with properly. However, the compiler also does certain things to set up and tear down your program, so some sort of performance profiling, or using a debugger to step through what is taking place at teardown time might be useful.

when your program exits the destructors of all the global objects are called.
if one of them takes a long time, you will see this behavior.
look for global objects and investigate their destructors.

Sorry, but this is a terrible question. You need to show the source code showing the specific algorithms and data structures that you are using.
It could be de-allocating, but that's just a wild guess. What are your destructors doing? Maybe is paging like crazy. Just because your application allocates X amount of memory, that doesn't mean it will get it. Most likely it will be paging off virtual memory. Depending on how the specifics of your application and OS, you might be doing a lot of page faults.
In such cases, it might help to run iostat and vmstat on the background to see what the heck is going on. If you see a lot of I/O that's a sure sign you are page faulting. I/O operations will always be more expensive that memory ops.
I would be very surprised if indeed all that lapsed time at the end is purely due to de-allocation.
Run vmstat and iostat as soon as you get the "ending" message, and look for any indications of I/O going bananas.

The objects in memory are organized in a heap. They are not deleted at once, they are deleted one by one, and the cost of deleting an object is O(log n). Freeing them takes loooong.
The answer is then, yes, it takes so much time.

You can avoid free being called on an object by using a destructor call my_object->~my_class() instead of delete my_object. You can avoid free on all objects of a class by overriding and nullifying operator delete( void * ) {} inside the class. Derived classes with virtual destructors will inherit that delete, otherwise you can copy-paste (or maybe using base::operator delete;).
This is much cleaner than calling exit. Just be sure you don't need that memory back!

I guess your unordered map is a global variable, whose constructor is called at process startup, and destructor is called at process exit.
How could you know if the map is guilty?
You can test if your unordered_map is responsible (and I guess it is) by allocating it with a new, and, well, ahem... forget to delete it.
If your process' exit goes faster, then you have your culprit.
Why this is so sloooooow?
Now, just by reading your post, for your unordered map, I see potential allocations for:
strings allocated buffer
list items (each one being a string + other things)
unordered map items + the bucket array
If you have 3-8 Gb of data in this unordered map, this means that each item above will need some kind of new and delete. And if you free every item, one by one, it could take time.
Other reasons?
Note that if you add items to your map item by item while your process executing, the new are not exactly perceptible... But the moment you want to clean all, all your allocated items must be destroyed at the same time, which could explain the perceived difference between construction/use and destruction...
Now, the destructors could take time for an additional reason.
For example, on Visual C++ 2008 in debug mode, for example, upon destruction of STL iterators, the destructor verifies the iterators are still correct. This caused quite a slowdown upon my object destruction (which was basically a tree of nodes, each node having list of child nodes, with iterators everywhere).
You are working on gcc, so perhaps they have their own debug testing, or perhaps your destructors are doing additional work (e.g. logging?)...

In my experience, the calls to free or delete should not take a significant amount of time. That said, I have seen plenty of cases where it does take non-trivial time to destruct objects because of destructors that did non-trivial things. If you can't tell what's taking time during the destruction, use a debugger and/or a profiler to determine what's going on. If the profiler shows you that it really is calls to free() that take a lot of time, then you should improve your memory allocation scheme, because you must be creating an extremely large number of small objects.
As you noted plenty of applications allocate large amounts of memory, and incur no significant memory during shutdown, so there's no reason your program can't do the same.

I would recommend (as some others have) a simple forced process termination, if you're certain that you've nothing left to do but free memory (for example, no file i/o and such left to do).
The thing is that when you free memory, typically, it's not actually returned to the OS - it's held in a list to be reallocated, and this is obviously slow. However, if you terminate process, the OS will lump reclaim all your memory at once, which should be substantially faster. However, as others have said, if you have any destructors that need to run, you should ensure that they are run before force calling exit() or ExitProcess or anysuch function.
What you should be aware of is that deallocating memory that is spread out (e.g., two nodes in a map) is much slower due to cache effects than deallocating memory in a vector, because the CPU needs to access the memory to free it and run any destructors. If you deallocated a very large amount of memory that's very fragmented, you could be falling afoul of this, and should consider changing to some more contiguous structures.
I actually had a problem where allocating memory was faster than de-allocating it, and after allocating memory and then de-allocating it, I had a memory leak. Eventually, I worked out that this is why.

I am currently facing a similar issue, with a CPU & memory intensive research program of mine. It runs until a specified time limit, prints a solutions and exits. The destructor call of a single object (containing up to 10⁶ relatively small objects) was what unexpectedly took time at the end of execution (about 10sec. to free 5Gb of data).
I was not satisfied by the answers advising to avoid executing every destructor, so here is the solution I came up with:
Original code:
void process() {
vector<unordered_map<State, int>> large_obj(100);
// Processing...
} // Takes a few seconds to exit (destructor calls)
Solution:
void process(bool free_mem = false) {
auto * large_obj_ = new vector<unordered_map<State, int>>(100);
auto &large_obj = *large_obj;
// Processing...
// (No changes required here, 'large_obj' can be used exactly as before)
if(free_mem)
delete large_obj_;
}
It has the advantage of being completely transparent apart from a few lines to insert, and it can even be parametrized to take some time to free the memory if needed. It is explicit which object will intentionally not be freed to avoid leaving things in an "unstable" state. Memory is cleaned up instantly by the OS on exit when free_mem = false.

What can modify the frame pointer?

I have a very strange bug cropping up right now in a fairly massive C++ application at work (massive in terms of CPU and RAM usage as well as code length - in excess of 100,000 lines). This is running on a dual-core Sun Solaris 10 machine. The program subscribes to stock price feeds and displays them on "pages" configured by the user (a page is a window construct customized by the user - the program allows the user to configure such pages). This program used to work without issue until one of the underlying libraries became multi-threaded. The parts of the program affected by this have been changed accordingly. On to my problem.
Roughly once in every three executions the program will segfault on startup. This is not necessarily a hard rule - sometimes it'll crash three times in a row then work five times in a row. It's the segfault that's interesting (read: painful). It may manifest itself in a number of ways, but most commonly what will happen is function A calls function B and upon entering function B the frame pointer will suddenly be set to 0x000002. Function A:
result_type emit(typename type_trait<T_arg1>::take _A_a1) const
{ return emitter_type::emit(impl_, _A_a1); }
This is a simple signal implementation. impl_ and _A_a1 are well-defined within their frame at the crash. On actual execution of that instruction, we end up at program counter 0x000002.
This doesn't always happen on that function. In fact it happens in quite a few places, but this is one of the simpler cases that doesn't leave that much room for error. Sometimes what will happen is a stack-allocated variable will suddenly be sitting on junk memory (always on 0x000002) for no reason whatsoever. Other times, that same code will run just fine. So, my question is, what can mangle the stack so badly? What can actually change the value of the frame pointer? I've certainly never heard of such a thing. About the only thing I can think of is writing out of bounds on an array, but I've built it with a stack protector which should come up with any instances of that happening. I'm also well within the bounds of my stack here. I also don't see how another thread could overwrite the variable on the stack of the first thread since each thread has it's own stack (this is all pthreads). I've tried building this on a linux machine and while I don't get segfaults there, roughly one out of three times it will freeze up on me.

Stack corruption, 99.9% definitely.
The smells you should be looking carefully for are:-
Use of 'C' arrays
Use of 'C' strcpy-style functions
memcpy
malloc and free
thread-safety of anything using pointers
Uninitialised POD variables.
Pointer Arithmetic
Functions trying to return local variables by reference

I had that exact problem today and was knee-deep in gdb mud and debugging for a straight hour before occurred to me that I simply wrote over array boundaries (where I didn't expect it the least) of a C array.
So, if possible, use vectors instead because any decend STL implementation will give good compiler messages if you try that in debug mode (whereas C arrays punish you with segfaults).

I'm not sure what you're calling a "frame pointer", as you say:
On actual execution of that
instruction, we end up at program
counter 0x000002
Which makes it sound like the return address is being corrupted. The frame pointer is a pointer that points to the location on the stack of the current function call's context. It may well point to the return address (this is an implementation detail), but the frame pointer itself is not the return address.
I don't think there's enough information here to really give you a good answer, but some things that might be culprits are:
incorrect calling convention. If you're calling a function using a calling convention different from how the function was compiled, the stack may become corrupted.
RAM hit. Anything writing through a bad pointer can cause garbage to end up on the stack. I'm not familiar with Solaris, but most thread implementations have the threads in the same process address space, so any thread can access any other thread's stack. One way a thread can get a pointer into another thread's stack is if the address of a local variable is passed to an API that ultimately deals with the pointer on a different thread. unless you synchronize things properly, this will end up with the pointer accessing invalid data. Given that you're dealing with a "simple signal implementation", it seems like it's possible that one thread is sending a signal to another. Maybe one of the parameters in that signal has a pointer to a local?

There's some confusion here between stack overflow and stack corruption.
Stack Overflow is a very specific issue cause by try to use using more stack than the operating system has allocated to your thread. The three normal causes are like this.
void foo()
{
foo(); // endless recursion - whoops!
}
void foo2()
{
char myBuffer[A_VERY_BIG_NUMBER]; // The stack can't hold that much.
}
class bigObj
{
char myBuffer[A_VERY_BIG_NUMBER];
}
void foo2( bigObj big1) // pass by value of a big object - whoops!
{
}
In embedded systems, thread stack size may be measured in bytes and even a simple calling sequence can cause problems. By default on windows, each thread gets 1 Meg of stack, so causing stack overflow is much less of a common problem. Unless you have endless recursion, stack overflows can always be mitigated by increasing the stack size, even though this usually is NOT the best answer.
Stack Corruption simply means writing outside the bounds of the current stack frame, thus potentially corrupting other data - or return addresses on the stack.
At it's simplest:-
void foo()
{
char message[10];
message[10] = '!'; // whoops! beyond end of array
}

That sounds like a stack overflow problem - something is writing beyond the bounds of an array and trampling over the stack frame (and probably the return address too) on the stack. There's a large literature on the subject. "The Shell Programmer's Guide" (2nd Edition) has SPARC examples that may help you.

With C++ unitialized variables and race conditions are likely suspects for intermittent crashes.

Is it possible to run the thing through Valgrind? Perhaps Sun provides a similar tool. Intel VTune (Actually I was thinking of Thread Checker) also has some very nice tools for thread debugging and such.
If your employer can spring for the cost of the more expensive tools, they can really make these sorts of problems a lot easier to solve.

It's not hard to mangle the frame pointer - if you look at the disassembly of a routine you will see that it is pushed at the start of a routine and pulled at the end - so if anything overwrites the stack it can get lost. The stack pointer is where the stack is currently at - and the frame pointer is where it started at (for the current routine).
Firstly I would verify that all of the libraries and related objects have been rebuilt clean and all of the compiler options are consistent - I've had a similar problem before (Solaris 2.5) that was caused by an object file that hadn't been rebuilt.
It sounds exactly like an overwrite - and putting guard blocks around memory isn't going to help if it is simply a bad offset.
After each core dump examine the core file to learn as much as you can about the similarities between the faults. Then try to identify what is getting overwritten. As I remember the frame pointer is the last stack pointer - so anything logically before the frame pointer shouldn't be modified in the current stack frame - so maybe record this and copy it elsewhere and compare upon return.

Is something meaning to assign a value of 2 to a variable but instead is assigning its address to 2?
The other details are lost on me but "2" is the recurring theme in your problem description. ;)

I would second that this definitely sounds like a stack corruption due to out of bound array or buffer writing. Stack protector would be good as long as the writing is sequential, not random.

I second the notion that it is likely stack corruption. I'll add that the switch to a multi-threaded library makes me suspicious that what has happened is a lurking bug has been exposed. Possibly the sequencing the buffer overflow was occurring on unused memory. Now it's hitting another thread's stack. There are many other possible scenarios.
Sorry if that doesn't give much of a hint at how to find it.

I tried Valgrind on it, but unfortunately it doesn't detect stack errors:
"In addition to the performance penalty an important limitation of Valgrind is its inability to detect bounds errors in the use of static or stack allocated data."
I tend to agree that this is a stack overflow problem. The tricky thing is tracking it down. Like I said, there's over 100,000 lines of code to this thing (including custom libraries developed in-house - some of it going as far back as 1992) so if anyone has any good tricks for catching that sort of thing, I'd be grateful. There's arrays being worked on all over the place and the app uses OI for its GUI (if you haven't heard of OI, be grateful) so just looking for a logical fallacy is a mammoth task and my time is short.
Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
No one asked this, but I built with gcc-4.2. Also, I can guarantee ABI safety here so that's also not the issue. As for the "garbage at the end of the stack" on the RAM hit, the fact that it is universally 2 (though in different places in the code) makes me doubt that as garbage tends to be random.

It is impossible to know, but here are some hints that I can come up with.
In pthreads you must allocate the stack and pass it to the thread. Did you allocate enough? There is no automatic stack growth like in a single threaded process.
If you are sure that you don't corrupt the stack by writing past stack allocated data check for rouge pointers (mostly uninitialized pointers).
One of the threads could overwrite some data that others depend on (check your data synchronisation).
Debugging is usually not very helpful here. I would try to create lots of log output (traces for entry and exit of every function/method call) and then analyze the log.
The fact that the error manifest itself differently on Linux may help. What thread mapping are you using on Solaris? Make sure you map every thread to it's own LWP to ease the debugging.

Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
If you pass anything on the stack by reference or by address, this would most certainly happen if another thread tried to use it after the first thread returned from a function.
You might be able to repro this by forcing the app onto a single processor. I don't know how you do that with Sparc.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js