How to avoid stack overflow of a recursive function? - c++

For example if we are traversaling a rather big tree by following function, it is possible that we get stack overflow.
void inorder(node* n)
{
if(n == null) return;
inorder(n->l);
n->print();
inorder(n->r);
}
How to add a condition or something into the function to prevent such overflow from happening?

consider iteration over recursion , if that is really a concern.
http://en.wikipedia.org/wiki/Tree_traversal
see the psedo code there for iteration
iterativeInorder
iterativePreorder
iterativePostorder
Basdically use your own list as stack data structure in a while loop , You can effectively replace function recursion.

There's no portable solution other than by replacing recursion
with explicit management of the stack (using
std::vector<Node*>). Non-portably, you can keep track of the
depth using a static variable; if you know the maximum stack
size, and how much stack each recursion takes, then you can
check that the depth doesn't exceed that.
A lot of systems, like Linux and Solaris, you can't know the
maximum stack depth up front, since the stack is allocated
dynamically. At least under Linux and Solaris, however, once
memory has been allocated to the stack, it will remain allocated
and remain affected to the stack. So you can recurse fairly
deeply at the start of the program (possibly crashing, but
before having done anything), and then check against this value
later:
static char const* lowerBound = nullptr;
// At start-up...
void
preallocateStack( int currentCount ) {
{
char dummyToTakeSpace[1000];
-- currentCount;
if ( currentCount <= 0 ) {
lowerBound = dummyToTakeSpace;
} else {
preallocateStack( currentCount - 1 );
}
}
void
checkStack()
{
char dummyForAddress;
if ( &dummyForAddress < lowerBound ) {
throw std::bad_alloc(); // Or something more specific.
}
}
You'll note that there are a couple of cases of
undefined/unspecified behavior floating around in that code, but
I've used it successfully on a couple of occasions (under
Solaris on Sparc, but Linux on PC works exactly the same in this
regard). It will, in fact, work on almost any system where:
- the stack grows down, and
- local variables are allocated on the stack.
Thus, it will also work on Windows, but if it fails to
allocate the initial stack, you'll have to relink, rather than
just run the program at a moment when there's less activity on
the box (or change ulimits) (since the stack size on Windows
is fixed at link time).
EDIT:
One comment concerning the use of an explicit stack: some
systems (including Linux, by default) overcommit, which means
that you cannot reliably get an out of memory error when
extending an std::vector<>; the system will tell
std::vector<> that the memory is there, and then give the
program a segment violation when it attempts to access it.

The thing about recursion is that you can never guarantee that it will never overflow the stack, unless you can put some bounds on both the (minimum) size of memory and (maximum) size of the input. What you can do, however, is guarantee that it will overflow the stack if you have an infinite loop...
I see your "if() return;" terminating condition, so you should avoid infinite loops as long every branch of your tree ends in a null. So one possibility is malformed input where some branch of the tree never reaches a null. (This would occur, e.g., if you have a loop in your tree data structure.)
The only other possibility I see is that your tree data structure is simply too big for the amount of stack memory available. (N.B. this is virtual memory and swap space can be used, so it's not necessarily an issue of insufficient RAM.) If that's the case, you may need to come up with some other algorithmic approach that is not recursive. Although your function has a small memory footprint, so unless you've omitted some additional processing that it does, your tree would really have to be REALLY DEEP for this to be an issue. (N.B. it's maximum depth that's an issue here, not the total number of nodes.)

You could increase the stack size for your OS. This is normally configured through ulimit if you're on a Unix-like environment.
E.g. on Linux you can do ulimit -s unlimited which will set the size of the stack to 'unlimited' although IIRC there is a hard limit and you cannot dedicate your entire memory to one process (although one of the answers in the links below mentions an unlimited amount).
My suggestions would be to run ulimit -s which will give you the current size of the stack and if you're still getting a stack overflow double that limit until you're happy.
Have a look here, here and here for a more detailed discussion on the size of the stack and how to update it.

If you have a very large tree, and you are running into issues with overrunning your stack using recursive traversals, the problem is likely that you do not have a balanced tree. The first suggestion then is to try a balanced binary tree, such as red-black tree or AVL tree, or a tree with more than 2 children per node, such as a B+ tree. The C++ library provides std::map<> and std::set<> which are typically implemented using a balanced tree.
However, one simple way to avoid recursive in-order traversals is to modify your tree to be threaded. That is, use the right pointer of leaf nodes indicate the next node. The traversal of such a tree would look something like this:
n = find_first(n);
while (! is_null(n)) {
n->print();
if (n->is_leaf()) n = n->r;
else n = find_first(n->r);
}

You can add a static variable to keep track of the times the function is called. If it's getting close to what you think would crash your system, perform some routine to notify the user of the error.

A small prototype of the alterations that can be made by associating another int variable with the recursive function.You could pass the variable as an argument to the function starting with zero value by default at the root and decrement it as u go down the tree ...
drawback: this solution comes at the cost of an overhead of an int variable being allocated for every node.
void inorder(node* n,int counter)
{
if(counter<limit) //limit specified as per system limit
{
if(n == null) return;
inorder(n->l,counter-1);
n->print();
inorder(n->r,counter-1);
}
else
n->print();
}
consider for further research:
Although the problem may not be with traversal if only recursion is to be considered. And could be avoided with better tree creation and updation. check the concept of balanced trees if not already considered.

Related

Fast synchronized access to shared array with changing base address (in C11)

I am currently designing a user space scheduler in C11 for a custom co-processor under Linux (user space, because the co-processor does not run its own OS, but is controlled by software running on the host CPU). It keeps track of all the tasks' states with an array. Task states are regular integers in this case. The array is dynamically allocated and each time a new task is submitted whose state does not fit into the array anymore, the array is reallocated to twice its current size. The scheduler uses multiple threads and thus needs to synchronize its data structures.
Now, the problem is that I very often need to read entries in that array, since I need to know the states of tasks for scheduling decisions and resource management. If the base address was guaranteed to always be the same after each reallocation, I would simply use C11 atomics for accessing it. Unfortunately, realloc obviously cannot give such a guarantee. So my current approach is wrapping each access (reads AND writes) with one big lock in the form of a pthread mutex. Obviously, this is really slow, since there is locking overhead for each read, and the read is really small, since it only consists of a single integer.
To clarify the problem, I give some code here showing the relevant passages:
Writing:
// pthread_mutex_t mut;
// size_t len_arr;
// int *array, idx, x;
pthread_mutex_lock(&mut);
if (idx >= len_arr) {
len_arr *= 2;
array = realloc(array, len_arr*sizeof(int));
if (array == NULL)
abort();
}
array[idx] = x;
pthread_mutex_unlock(&mut);
Reading:
// pthread_mutex_t mut;
// int *array, idx;
pthread_mutex_lock(&mut);
int x = array[idx];
pthread_mutex_unlock(&mut);
I have already used C11 atomics for efficient synchronization elsewhere in the implementation and would love to use them to solve this problem as well, but I could not find an efficient way to do so. In a perfect world, there would be an atomic accessor for arrays which performs address calculation and memory read/write in a single atomic operation. Unfortunately, I could not find such an operation. But maybe there is a similarly fast or even faster way of achieving synchronization in this situation?
EDIT:
I forgot to specify that I cannot reuse slots in the array when tasks terminate. Since I guarantee access to the state of every task ever submitted since the scheduler was started, I need to store the final state of each task until the application terminates. Thus, static allocation is not really an option.
Do you need to be so economical with virtual address space? Can't you just set a very big upper limit and allocate enough address space for it (maybe even a static array, or dynamic if you want the upper limit to be set at startup from command-line options).
Linux does lazy memory allocation so virtual pages that you never touch aren't actually using any physical memory. See Why is iterating though `std::vector` faster than iterating though `std::array`? that show by example that reading or writing an anonymous page for the first time causes a page fault. If it was a read access, it gets the kernel to CoW (copy-on-write) map it to a shared physical zero page. Only an initial write, or a write to a CoW page, triggers actual allocation of a physical page.
Leaving virtual pages completely untouched avoids even the overhead of wiring them into the hardware page tables.
If you're targeting a 64-bit ISA like x86-64, you have boatloads of virtual address space. Using up more virtual address space (as long as you aren't wasting physical pages) is basically fine.
Practical example of allocating more address virtual space than you can use:
If you allocate more memory than you could ever practically use (touching it all would definitely segfault or invoke the kernel's OOM killer), that will be as large or larger than you could ever grow via realloc.
To allocate this much, you may need to globally set /proc/sys/vm/overcommit_memory to 1 (no checking) instead of the default 0 (heuristic which makes extremely large allocations fail). Or use mmap(MAP_NORESERVE) to allocate it, making that one mapping just best-effort growth on page-faults.
The documentation says you might get a SIGSEGV on touching memory allocated with MAP_NORESERVE, which is different than invoking the OOM killer. But I think once you've already successfully touched memory, it is yours and won't get discarded. I think it's also not going to spuriously fail unless you're actually running out of RAM + swap space. IDK how you plan to detect that in your current design (which sounds pretty sketchy if you have no way to ever deallocate).
Test program:
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
int main(void) {
size_t sz = 1ULL << 46; // 2**46 = 64 TiB = max power of 2 for x86-64 with 48-bit virtual addresses
// in practice 1ULL << 40 (1TiB) should be more than enough.
// the smaller you pick, the less impact if multiple things use this trick in the same program
//int *p = aligned_alloc(64, sz); // doesn't use NORESERVE so it will be limited by overcommit settings
int *p = mmap(NULL, sz, PROT_WRITE|PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
madvise(p, sz, MADV_HUGEPAGE); // for good measure to reduce page-faults and TLB misses, since you're using large contiguous chunks of this array
p[1000000000] = 1234; // or sz/sizeof(int) - 1 will also work; this is only touching 1 page somewhere in the array.
printf("%p\n", p);
}
$ gcc -Og -g -Wall alloc.c
$ strace ./a.out
... process startup
mmap(NULL, 70368744177664, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x15c71ef7c000
madvise(0x15c71ef7c000, 70368744177664, MADV_HUGEPAGE) = 0
... stdio stuff
write(1, "0x15c71ef7c000\n", 15) = 15
0x15c71ef7c000
exit_group(0) = ?
+++ exited with 0 +++
My desktop has 16GiB of RAM (a lot of it in use by Chromium and some big files in /tmp) + 2GiB of swap. Yet this program allocated 64 TiB of virtual address space and touched 1 int of it nearly instantly. Not measurably slower than if it had only allocated 1MiB. (And future performance from actually using that memory should also be unaffected.)
The largest power-of-2 you can expect to work on current x86-64 hardware is 1ULL << 46. The total lower canonical range of the 48-bit virtual address space is 47 bits (user-space virtual address space on Linux), and some of that is already allocated for stack/code/data. Allocating a contiguous 64 TiB chunk of that still leaves plenty for other allocations.
(If you do actually have that much RAM + swap, you're probably waiting for a new CPU with 5-level page tables so you can use even more virtual address space.)
Speaking of page tables, the larger the array the more chance of putting some other future allocations very very far from existing blocks. This can have a minor cost in TLB-miss (page walk) time, if your actual in-use pages end up more scattered around your address space in more different sub-trees of the multi-level page tables. That's more page-table memory to keep in cache (including cached within the page-walk hardware).
The allocation size doesn't have to be a power of 2 but it might as well be. There's also no reason to make it that big. 1ULL << 40 (1TiB) should be fine on most systems. IDK if having more than half the available address space for a process allocated could slow future allocations; bookkeeping is I think based on extents (ptr + length) not bitmaps.
Keep in mind that if everyone starts doing this for random arrays in libraries, that could use up a lot of address space. This is great for the main array in a program that spends a lot of time using it. Keep it as small as you can while still being big enough to always be more than you need. (Optionally make it a config parameter if you want to avoid a "640kiB is enough for everyone" situation). Using up virtual address space is very low-cost, but it's probably better to use less.
Think of this as reserving space for future growth but not actually using it until you touch it. Even though by some ways of looking at it, the memory already is "allocated". But in Linux it really isn't. Linux defaults to allowing "overcommit": processes can have more total anonymous memory mapped than the system has physical RAM + swap. If too many processes try to use too much by actually touching all that allocated memory, the OOM killer has to kill something (because the "allocate" system calls like mmap have already returned success). See https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
(With MAP_NORESERVE, it's only reserving address space which is shared between threads, but not reserving any physical pages until you touch them.)
You probably want your array to be page-aligned: #include <stdalign.h> so you can use something like
alignas(4096) struct entry process_array[MAX_LEN];
Or for non-static, allocate it with C11 aligned_alloc().
Give back early parts of the array when you're sure all threads are done with it
Page alignment makes it easy do the calculations to "give back" a memory page (4kiB on x86) if your array's logical size shrinks enough. madvise(addr, 4096*n, MADV_FREE); (Linux 4.5 and later). This is kind of like mmap(MAP_FIXED) to replace some pages with new untouched anonymous pages (that will read as zeroes), except it doesn't split up the logical mapping extents and create more bookkeeping for the kernel.
Don't bother with this unless you're returning multiple pages, and leave at least one page unfreed above the current top to avoid page faults if you grow again soon. Like maybe maintain a high-water mark that you've ever touched (without giving back) and a current logical size. If high_water - logical_size > 16 pages give back all page from 4 past the logical size up to the high water mark.
If you will typically be actually using/touching at least 2MiB of your array, use madvise(MADV_HUGEPAGE) when you allocate it to get the kernel to prefer using transparent hugepages. This will reduce TLB misses.
(Use strace to see return values from your madvise system calls, and look at /proc/PID/smaps, to see if your calls are having the desired effect.)
If up-front allocation is unacceptable, RCU (read-copy-update) might be viable if it's read-mostly. https://en.wikipedia.org/wiki/Read-copy-update. But copying a gigantic array every time an element changes isn't going to work.
You'd want a different data-structure entirely where only small parts need to be copied. Or something other than RCU; like your answer, you might not need the read side being always wait-free. The choice will depend on acceptable worst-case latency and/or average throughput, and also how much contention there is for any kind of ref counter that has to bounce around between all threads.
Too bad there isn't a realloc variant that attempts to grow without copying so you could attempt that before bothering other threads. (e.g. have threads with idx>len spin-wait on len in case it increases without the array address changing.)
So, I came up with a solution:
Reading:
while(true) {
cnt++;
if (wait) {
cnt--;
yield();
} else {
break;
}
}
int x = array[idx];
cnt--;
Writing:
if (idx == len) {
wait = true;
while (cnt > 0); // busy wait to minimize latency of reallocation
array = realloc(array, 2*len*sizeof(int));
if (!array) abort(); // shit happens
len *= 2; // must not be updated before reallocation completed
wait = false;
}
// this is why len must be updated after realloc,
// it serves for synchronization with other writers
// exceeding the current length limit
while (idx > len) {yield();}
while(true) {
cnt++;
if (wait) {
cnt--;
yield();
} else {
break;
}
}
array[idx] = x;
cnt--;
wait is an atomic bool initialized as false, cnt is an atomic int initialized as zero.
This only works because I know that task IDs are chosen ascendingly without gaps and that no task state is read before it is initialized by the write operation. So I can always rely on the one thread which pulls the ID which only exceeds current array length by 1. New tasks created concurrently will block their thread until the responsible thread performed reallocation. Hence the busy wait, since the reallocation should happen quickly so the other threads do not have to wait for too long.
This way, I eliminate the bottlenecking big lock. Array accesses can be made concurrently at the cost of two atomic additions. Since reallocation occurs seldom (due to exponential growth), array access is practically block-free.
EDIT:
After taking a second look, I noticed that one has to be careful about reordering of stores around the length update. Also, the whole thing only works if concurrent writes always use different indices. This is the case for my implementation, but might not generally be. Thus, this is not as elegant as I thought and the solution presented in the accepted answer should be preferred.

Binary Search Tree preorder traversal, recursion vs loop?

we can traverse the binary search tree through recursion like:
void traverse1(node t) {
if( t != NULL) {
visit(t);
traverse1(t->left);
traverse1(t->right);
}
}
and also through loop( with stack) like:
void traverse2(node root) {
stack.push(root);
while (stack.notEmpty()) {
node next = stack.pop();
visit(next);
if (next->right != NULL)
stack.push(next->right);
if (next->left != NUll)
stack.push(next->left)
}
}
Question
Which one is more efficiency? why?
I think these two method time complexity is both O(n). so all the differences are all in the space complexity or ..?
It will depend on how you define efficiency? It is in runtime, amount of code, size of executable, how much memory/stack space is used, or how easy it is to understand the code?
The recursion is very easy to code, hopefully easy to understand, and is less code. Looping may be a bit more complex (depending on how you view complexity) and code. Recursion may be easier to understand and will be less in the amount of code and in executable size. Recursion will use more stack space assuming you have a few items to transverse.
Looping will have a larger amount of code (as your above example shows), and could possibly be considered a bit more complex. But the transverse is just one call to be place on the stack, rather than several. So if you have a lot of items to transverse, loop will be faster as you don't have the time to push items on the stack and the pop them off, which is what will occur when using recursion.
Apart from efficiency, if your tree is too deep or if your stack space is limited, you may run into overflow - stack overflow!!
With iterative approach, you can use the much larger heap space to place the allocated stack. With recursion, you don't have a choice as the stack frames are pushed and popped for you.
I know that such constrained stack environments may be a bit rare; nevertheless, one needs to be aware of it.
Both versions have the same space and time complexity.
The recursion implicitly uses the stack (memory location) for storing call context, and the second uses a stack abstract data type effectively emulating the first version, using stack explicitly.
The difference is that with the first version, you risk stack overflow with deep, unbalanced trees, however it's simpler conceptually (less opportunities for bugs). The second uses dynamic allocation for storing the pointers to parent nodes.
You will have to measure the difference to know for sure. I personally have a feeling that the recursive formulation will beat the one with an explicit stack in this particular instance.
What the non-recursive version has going for it is that it eliminates the calls. On the other hand - depending on the exact library implementation - the pushes and the pop might also resolve to function calls.
Any decent compiler will actually encode your recursive function in a way similar to the following pseudo-code:
void traverse1(node t) {
1:
if( t != NULL) {
visit(t);
traverse1(t->left);
t = t->right;
goto 1;
}
}
Thus eliminating one of the recursive calls. This is known as tail call elimination.
The time and space complexity are the same. The only difference is that traverse2 doesn't call itself recursively. This should make it slightly faster, as pushing/popping from a stack is a cheaper operation than calling a function.
That said, I think the recursive version is "cleaner", so I'd personally use that, unless it turns out to be too slow in practice.

Stack overflow from local variables?

Let me start by saying my question is not about stack overflows but on way to make it happen, without compile-time errors\warnings.
I know (first hand) you can overflow a stack with recursion:
void endlessRecursion()
{
int x = 1;
if(x) endlessRecursion(); //the 'if' is just to hush the compiler
}
My question is, is it possible to overflow the stack by declaring too many local variables.
The obvious way is just declare a huge array like so:
void myStackOverflow()
{
char maxedArrSize[0x3FFFFFFF]; // < 1GB, compiler didn't yell
}
In practice even 0xFFFFF bytes causes stack overflow on my machine
So, I was wondering:
Since I haven't tried it, if I declare enough variables would the
stack overflow?
Is there a way to use the preprocessor or other compile-time "tools" (like C++ template meta-programming) to do the first thing, i.e. make it declare a lot of local variables, by somehow causing it to loop? if so, how?
This is theoretical - Is there a way to know if the a program's stack would overflow at compilation time? and if so, please explain.
Yes, allocating a large amount of memory will cause a stack overflow. It shouldn't matter whether you allocate one large variable or a lot of small ones; the total size is what's relevant.
You can't do a compile-time loop with the preprocessor, but you can implement some shortcuts that let you generate large amounts of code without typing it all. For example:
#define DECLARE1 { int i;
#define END1 }
#define DECLARE2 DECLARE1 DECLARE1
#define END2 END1 END1
#define DECLARE4 DECLARE2 DECLARE2
#define END4 END2 END2
and so on. This puts the multiple int i; declarations in nested blocks, ensuring that all the objects exist at the same time while avoiding name conflicts. (I couldn't think of a way to give all the variables distinct names.)
DECLARE4 END4
expands to:
{ int i; { int i; { int i; { int i; } } } }
This won't work if your compiler imposes a limit on the length of a line after preprocessing.
The lesson here is that the preprocessor isn't really designed for this kind of thing. It's much easier and more flexible to write a program in your favorite scripting language that generates the declarations. For example, in bash:
for i in {1..100} ; do
echo " int i$i;"
done
On question 3, I believe the answer is no. The compiler can know how much stack space each function uses. But the total stack space depends on your call sequences, which depend on logic evaluated at runtime. As long as there are no recursive calls, it seems possible to determine an upper bound for the stack space used. If that upper bound is smaller than the available stack space, you could be certain that the stack will not overflow. A lower bound seems possible as well. If that is higher than the stack size, you could be certain that the stack will overflow.
Except for very trivial programs, this would only give you boundaries, not an exact amount of stack space. And once recursion gets involved, I don't think there's an upper bound that you can statically determine in the general case. That almost starts sounding like the halting problem.
All of the very limited options above obviously assume that you have a given stack size. As other posters mentioned, the compiler can generally not know the stack size, because it's often part of the system configuration.
The closest I have seen are static analyzers. I seem to remember that some of them flag large stack variables. But I doubt that they try to analyze actual stack usage. It's probably just a simple heuristic that basically tells you that having large variables on the stack is a bad idea, and that you may want to avoid it.
Yes, too many variables will blow the stack.
Since I haven't tried it, if I declare enough variables would the stack overflow?
Yes, declaring a single array of large size and declaring multiple variables in same scope is similar.
Is there a way to use the preprocessor to do the first thing, i.e. make it declare a lot of local variables, by somehow causing it to loop?
I don't think so, as your compilation (and memory allocation) starts from main(). Whatever you declare using preprocessor commands is expanded in preprocessing stage. This stage doesn't involve any memory allocation.
This is theoretical - Is there a way to know if the a program's stack would overflow?
Yes, for linux system you can get the amount of stack memory allocated to your program, anything more than that will lead to stack Overflow. You can read the this link for details as how to know stack size of any process.
As to #3
Is there a way to know if the a program's stack would overflow at compilation time?
Yes. This is a standard feature of some PIC compilers for embedded processors especially those using a Harvard architecture But it comes at a cost: no recursion nor VLAs. Thus at compile time, an analysis of the code reports the maximum depth in the main processor code as well as the max depth handling interrupts. But the analysis does not prove that the maximum combined depth of those two will occur.
Depending on processor type, an ample stack can be allocated at compile time preventing possible overflow.

What is the order of magnitude of the maximum number of recursive calls in C++?

I have a recursive function which calls itself a very large number of times given certain inputs - which is exactly what it should do. I know my function isn't infinitely looping - it just gets to a certain number of calls and overflows. I want to know if this is a problem with putting too much memory on the stack, or just a normal restriction in number of calls. Obviously it's very hard to say a specific number of calls which is the maximum, but can anybody give me a rough estimate of the order of magnitude? Is it in the thousands? Hundreds? Millions?
So, as you've guessed, the problem is the (eponymous) stack overflow. Each call requires setting up a new stack frame, pushing new information onto the stack; stack size is fixed, and eventually runs out.
What sets the stack size? That's a property of the compiler -- that is, it's fixed for a binary executable. In Microsoft's compiler (used in VS2010) it defaults to 1 megabyte, and you can override it with "/F " in compiler options (see here for an '03 example, but the syntax is the same).
It's very difficult to figure out how many calls that equates to in practice. A function's stack size is determined by it's local variables, the size of the return address, and how parameters are passed (some may go on the stack), and much of that depends on architecture, also. Generally you may assume that the latter two are less than a hundred bytes (that's a gross estimate). The former depends on what you're doing in the function. If you assume the function takes, say, 256 bytes on the stack, then with a 1M stack you'd get 4096 function calls before overflowing -- but that doesn't take into account the overhead of the main function, etc.
You could try to reduce local variables and parameter overhead, but the real solution is Tail Call Optimization, in which the compiler releases the calling function as it invokes the recursing function. You can read more about doing that in MSVC here. If you can't do tail calls, and you can't reduce your stack size acceptably, then you can look at increasing stack size with the "/F" parameter, or (the preferred solution) look at a redesign.
It completely depends on how much information you use on the stack. However, the default stack on Windows is 1MB and the default stack on Unix is 8MB. Simply making a call can involve pushing a few 32bit registers and a return address, say, so you could be looking at maybe 20bytes a call, which would put the maximum at about 50k on Windows and 400k on Unix- for an empty function.
Of course, as far as I'm aware, you can change the stack size.
One option for you may be to change/increase the default stacksize. Here's one way http://msdn.microsoft.com/en-us/library/tdkhxaks(v=vs.80).aspx
There are tools to measure the stack usage. They fill the stack in advance with a certain byte pattern and look afterwards up to what address it got changed. With those you can find out how close to the limit you get.
Maybe one of the valgrind tools can do that.
The amount of stack space used by a recursive function depends on the depth of the recursion and the amount of memory space used by each call.
The depth of the recursion refers to the number of levels of call active at any given moment. For example, a binary tree might have, say, a million nodes, but if it's well balanced you can traverse it with no more that 20 simultaneous active calls. If it's not well balanced, the maximum depth might be much greater.
The amount of memory used by each call depends on the total size of the variables declared in your recursive function.
There's no fixed limit on the maximum depth of recursion; you'll get a stack overflow if your total usage exceeds the stack limit imposed by the system.
You might be able to reduce memory usage by somehow reducing the depth of your recursion, perhaps by restructuring whatever it is you're traversing (you didn't tell us much about that), or by reducing the total size of any local objects declared inside your recursive function (note that heap-allocated objects don't contribute to stack size), or some combination of the two.
And as others have said, you might be able to increase your allowed stack size, but that will probably be of only limited use -- and it's an extra thing you'll have to do before running your program. It could also consume resources and interfere with other processes on the system (limits are imposed for a reason).
Changing the algorithm to avoid recursion might be a possibility, but again, we don't have enough information to say much about that.

Why infinite recursion leads to seg fault

Why infinite recursion leads to seg fault ?
Why stack overflow leads to seg fault.
I am looking for detailed explanation.
int f()
{
f();
}
int main()
{
f();
}
Every time you call f(), you increase the size of the stack - that's where the return address is stored so the program knows where to go to when f() completes. As you never exit f(), the stack is going to increase by at least one return address each call. Once the stack segment is full up, you get a segfault error. You'll get similar results in every OS.
Segmentation fault is a condition when your program tries to access a memory location that it is not allowed to access. Infinite recursion causes your stack to grow. And grow. And grow. Eventually it will grow to a point when it will spill into an area of memory that your program is forbidden to access by the operating system. That's when you get the segmentation fault.
Your system resources are finite. They are limited. Even if your system has the most memory and storage on the entire Earth, infinite is WAY BIGGER than what you have. Remember that now.
The only way to do something an "infinite number of times" is to "forget" previous information. That is, you have to "forget" what has been done before. Otherwise you have to remember what happened before and that takes storage of one form or another (cache, memory, disk space, writing things down on paper, ...)--this is inescapable. If you are storing things, you have a finite amount of space available. Recall, that infinite is WAY BIGGER than what you have. If you try to store an infinite amount of information, you WILL run out of storage space.
When you employ recursion, you are implicitly storing previous information with each recursive call. Thus, at some point you will exhaust your storage if you try to do this an infinite number of takes. Your storage space in this case is the stack. The stack is a piece of finite memory. When you use it all up and try to access beyond what you have, the system will generate an exception which may ultimately result in a seg fault if the memory it tried to access was write-protected. If it was not write-protected, it will keep on going, overwriting god-knows-what until such time as it either tries to write to memory that just does not exist, or it tries to write to some other piece of write protected memory, or until it corrupts your code (in memory).
It's still a stackoverflow ;-)
The thing is that the C runtime doesn't provide "instrumentalisation" like other managed languages do (e.g. Java, Python, etc.), so writing outside the space designated for the stack instead of causing a detailed exception just raises a lower level error, that has the generic name of "segmentation fault".
This is for performance reasons, as those memory access watchdogs can be set with help of hardware support with little or none overhead; I cannot remember the exact details now, but it's usually done by marking the MMU page tables or with the mostly obsolete segment offsets registers.
AFAIK: The ends of the stack are protected by addresses that aren't accessible to the process. This prevents the stack from growing over allocated data-structures, and is more efficient than checking the stack size explicitly, since you have to check the memory protection anyway.
A program copunter or instruction pointer is a register which contains the value of next instruction to be executed.
In a function call, the current value of program counter pushed into the stack and then program counter points to first instruction of the function. The old value is poped after returning from that function and assigned to program counter. In infinite recursion the value is pushed again and again and leads to the stack overflow.
It's essentially the same principle as a buffer overflow; the OS allocates a fixed amount of memory for the stack, and when you run out (stack overflow) you get undefined behavior, which in this context means a SIGSEGV.
The basic idea:
int stack[A_LOT];
int rsp=0;
void call(Func_p fn)
{
stack[rsp++] = rip;
rip = fn;
}
void retn()
{
rip = stack[--rsp];
}
/*recurse*/
for(;;){call(somefunc);}
eventually rsp moves past the end of the stack and you try to put the next return address in unallocated storage and your program barfs. Obviously real systems are a lot more complicated than that, but that could (and has) take up several large books.
At a "low" level, the stack is "maintained" through a pointer (the stack pointer), kept in a processor register. This register points to memory, since stack is memory after all. When you push values on the stack, its "value" is decremented (stack pointer moves from higher addresses to lower addresses). Each time you enter a function some space is "taken" from the stack (local variables); moreover, on many architectures the call to a subroutine pushes the return value on the stack (and if the processor has no a special register stack pointer, likely a "normal" register is used for the purpose, since stack is useful even where subroutines can be called with other mechanisms), so that the stack is at least diminuished by the size of a pointer (say, 4 or 8 bytes).
In an infinite recursion loop, in the best case only the return value causes the stack to be decremented... until it points to a memory that can't be accessed by the program. And you see the segmentation fault problem.
You may find interesting this page.