Heap vs stack allocation in a large project - c++

I have read numerous other answers on this topic but they don't quite answer what I'm looking for. Examples:
Class members and explicit stack/heap allocation
When should a class be allocated on the stack instead of the heap
Member function memory allocation stack or heap?
C++ stack vs heap allocation
These answers cover the mechanical differences between the two (automatic vs manual memory management, variable lifetimes, etc) but I am more interested in best practices, and how to write code that can scale.
Context
I am writing a class which processes a large stream of data, say 10-100s's of GB. Let's assume that the performance bottleneck is how fast my class can process the data, e.g. the source and destination of the data are both fast.
My class works by splitting the data into chunks of size N bytes, and processing. The optimal size N for maximal throughput depends on the processing performed, which is only known at runtime. N can range from 10's of bytes up to 1000's of bytes. If I did everything in the stack, for say N = 256, the total sum of member variables in the class is <1MB.
I also tried stack allocating different sizes of arrays for a small set of different Ns, and using only one at a given time. This is so far the fastest implementation. Nevertheless, comparing implementations that use all stack vs heap, performance difference is fairly small to use the heap, so that ends up being simpler.
Questions
If I make the choice to use stack vs. heap now, how does that affect future users of my class? For example, in theory one could write a program that has 100's of instances of this class. If I used all stack, and the user put all my instances on the stack, it would blow up.
How is stack usage factored into the design of a large hierarchical system? I don't see that mentioned when I read online and books. Mostly the stack is mentioned in the context of excessive recursion, trying to outright declare 100MB array, etc.
Generally, is the author of a class (whose underlying workings are abstracted away) supposed to give the end user some information about the stack footprint? Or some direction on when/whether heap allocation is required?

Related

How to find out what size array fits in stack memory?

It is my impression that when doing this
void stuff() {
int arr[1000];
//do some stuff
}
Then the array could be allocated either in heap or stack memory depending on its size and the limits of the CPU.
I'm optimizing a script that bruteforce tests some calculations and since the estimated time is currently several weeks, I'd like to get that down as much as possible, getting as much sample data as possible fitted in the stack would help I think.
Is there a way to find out how big of an array I can declare?
The stack on a typical modern machine varies, but is usually around/less than 5 MB. On embedded platforms in might be a lot less. Some platforms allow to specify a hint for the stack size of a created thread, but often those are only hints and are not strictly followed.
Also, how big object you put on the stack depends on the current stack depth, you could be in the beginning of the stack or near its end. You cannot really tell in advance, I mean at coding time, which is when and where the size of stack arrays is specified. You could assume you will be in the beginning of the stack in the case of a trivial example, but in production code function calls may be nested in an order and depth which are hard to predict, or even impossible if they depend on user input.
I am not aware if there is any portable way, but the platform specific way would be to put an object on the stack and measure the difference between its address and the end of the stack, this will give you at runtime the stack space you have left.
At any rate, if your intention is to put a big object on the stack, you don't really gain much from not using heap memory allocation, the penalty from the memory allocation will be negligible, and afterwards, accessing data from it will be as fast as if it was on the stack, at the added benefit it will not depend on the limited stack size.
The purpose of the stack is not to store a program's bulk data, it is to keep track of the program's core structure so it can run. Do not put bulk data on the stack ever. Due to that intent, the stack size is almost always not more than a few megabytes, even for applications which are intended to use gigabytes of ram memory.

Why is stack memory size so limited?

When you allocate memory on the heap, the only limit is free RAM (or virtual memory). It makes Gb of memory.
So why is stack size so limited (around 1 Mb)? What technical reason prevents you to create really big objects on the stack?
Update: My intent might not be clear, I do not want to allocate huge objects on the stack and I do not need a bigger stack. This question is just pure curiosity!
My intuition is the following. The stack is not as easy to manage as the heap. The stack need to be stored in continuous memory locations. This means that you cannot randomly allocate the stack as needed, but you need to at least reserve virtual addresses for that purpose. The larger the size of the reserved virtual address space, the fewer threads you can create.
For example, a 32-bit application generally has a virtual address space of 2GB. This means that if the stack size is 2MB (as default in pthreads), then you can create a maximum of 1024 threads. This can be small for applications such as web servers. Increasing the stack size to, say, 100MB (i.e., you reserve 100MB, but do not necessarily allocated 100MB to the stack immediately), would limit the number of threads to about 20, which can be limiting even for simple GUI applications.
A interesting question is, why do we still have this limit on 64-bit platforms. I do not know the answer, but I assume that people are already used to some "stack best practices": be careful to allocate huge objects on the heap and, if needed, manually increase the stack size. Therefore, nobody found it useful to add "huge" stack support on 64-bit platforms.
One aspect that nobody has mentioned yet:
A limited stack size is an error detection and containment mechanism.
Generally, the main job of the stack in C and C++ is to keep track of the call stack and local variables, and if the stack grows out of bounds, it is almost always an error in the design and/or the behaviour of the application.
If the stack would be allowed to grow arbitrarily large, these errors (like infinite recursion) would be caught very late, only after the operating systems resources are exhausted. This is prevented by setting an arbitrary limit to the stack size. The actual size is not that important, apart from it being small enough to prevent system degradation.
It is just a default size. If you need more, you can get more - most often by telling the linker to allocate extra stack space.
The downside to having large stacks is that if you create many threads, they will need one stack each. If all the stacks are allocating multi-MBs, but not using it, the space will be wasted.
You have to find the proper balance for your program.
Some people, like #BJovke, believe that virtual memory is essentially free. It is true that you don't need to have physical memory backing all the virtual memory. You do have to be able to at least give out addresses to the virtual memory.
However, on a typical 32-bit PC the size of the virtual memory is the same as the size of the physical memory - because we only have 32 bits for any address, virtual or not.
Because all threads in a process share the same address space, they have to divide it between them. And after the operating system has taken its part, there is "only" 2-3 GB left for an application. And that size is the limit for both the physical and the virtual memory, because there just aren't any more addresses.
For one thing, the stack is continuous, so if you allocate 12MB, you must remove 12MB when you want to go below whatever you created. Also moving objects around becomes much harder. Here is a real world example that may make things easier to understand:
Say you are stacking boxes around a room. Which is easier to manage:
stacking boxes of any weight on top of each other, but when you need to get something on the bottom you have to undo your entire pile. If you want to take a item out of the pile and give it to someone else you must take off all of the boxes and move the box to the other person's pile (Stack only)
You put all of your boxes (except for really small boxes) over in a special area where you do not stack stuff on top of other stuff and write down where you put it on a piece of paper (a pointer) and put the paper on the pile. If you need to give the box to someone else you just hand them the slip of paper from your pile, or just give them a photocopy of the paper and leave the original where it was in your pile. (Stack + heap)
Those two examples are gross generalizations and there are some points that are blatantly wrong in the analogy but it is close enough that it hopefully will help you see the advantages in both cases.
Think of the stack in the order of near to far. Registers are close to the CPU (fast), the stack is a bit further (but still relatively close) and the heap is far away (slow access).
The stack lives on the heap ofcourse, but still, since it's being used continuously, it probably never leaves the CPU cache(s), making it faster than just the average heap access.
This is a reason to keep the stack reasonably sized; to keep it cached as much as possible. Allocating big stack objects (possibly automatically resizing the stack as you get overflows) goes against this principle.
So it's a good paradigm for performance, not just a left-over from old times.
Allocating large objects in a, say, 100MB stack would make it impossible on most machines to have them loaded at once into cache, which pretty much defeats the purpose of the stack.
The point of the stack is to have small objects that belong to the same scope (and are, therefore, usually needed together or close to each other) stored together in contiguous memory addresses, so that the program can have them all loaded into cache at the same time, minimizing cache misses and, in general, the time CPU has to wait until it gets some missing piece of data from the slower RAM.
A 50MB object stored in the stack would not fit into the cache, meaning after every cache line there would be a CPU waiting time until the next piece of data is brought from RAM, meaning one would be clogging the call stack and not getting any significant benefit (in terms of speed) as compared to loading from the heap.
Many of the things you think you need a big stack for, can be done some other way.
Sedgewick's "Algorithms" has a couple good examples of "removing" recursion from recursive algorithms such as QuickSort, by replacing the recursion with iteration. In reality, the algorithm is still recursive, and there is still as stack, but you allocate the sorting stack on the heap, rather than using the runtime stack.
(I favor the second edition, with algorithms given in Pascal. It can be had used for eight bucks.)
Another way to look at it, is if you think you need a big stack, your code is inefficient. There is a better way that uses less stack.
If you could have an infinte stack then every virtual address could potentially be used by the stack. If the stack can use evey address, then there is no place for the heap to go. Every address you picked for a heap variable could be overwritten by a growing stack.
To put it another way, variables on the stack and variables on the heap occupy the same virtual address space. We need some way of preventing the heap allocator from allocating data where the stack might grow into. A stack size is an easy way to do it. The heap allocator knows that the stack addresses are taken and so it uses something else.
I don't think there is any technical reason, but it would be a strange app that just created just one huge super-object on the stack. Stack objects lack flexibility that becomes more problematic with increasing size - you cannot return without destroying them and you cannot queue them to other threads.

questions about memory pool

I need some clarifications for the concept & implementation on memory pool.
By memory pool on wiki, it says that
also called fixed-size-blocks allocation, ... ,
as those implementations suffer from fragmentation because of variable
block sizes, it can be impossible to use them in a real time system
due to performance.
How "variable block size causes fragmentation" happens? How fixed sized allocation can solve this? This wiki description sounds a bit misleading to me. I think fragmentation is not avoided by fixed sized allocation or caused by variable size. In memory pool context, fragmentation is avoided by specific designed memory allocators for specific application, or reduced by restrictly using an intended block of memory.
Also by several implementation samples, e.g., Code Sample 1 and Code Sample 2, it seems to me, to use memory pool, the developer has to know the data type very well, then cut, split, or organize the data into the linked memory chunks (if data is close to linked list) or hierarchical linked chunks (if data is more hierarchical organized, like files). Besides, it seems the developer has to predict in prior how much memory he needs.
Well, I could imagine this works well for an array of primitive data. What about C++ non-primitive data classes, in which the memory model is not that evident? Even for primitive data, should the developer consider the data type alignment?
Is there good memory pool library for C and C++?
Thanks for any comments!
Variable block size indeed causes fragmentation. Look at the picture that I am attaching:
The image (from here) shows a situation in which A, B, and C allocates chunks of memory, variable sized chunks.
At some point, B frees all its chunks of memory, and suddenly you have fragmentation. E.g., if C needed to allocate a large chunk of memory, that still would fit into available memory, it could not do because available memory is split in two blocks.
Now, if you think about the case where each chunk of memory would be of the same size, this situation would clearly not arise.
Memory pools, of course, have their own drawbacks, as you yourself point out. So you should not think that a memory pool is a magical wand. It has a cost and it makes sense to pay it under specific circumstances (i.e., embedded system with limited memory, real time constraints and so on).
As to which memory pool is good in C++, I would say that it depends. I have used one under VxWorks that was provided by the OS; in a sense, a good memory pool is effective when it is tightly integrated with the OS. Actually each RTOS offers an implementation of memory pools, I guess.
If you are looking for a generic memory pool implementation, look at this.
EDIT:
From you last comment, it seems to me that possibly you are thinking of memory pools as "the" solution to the problem of fragmentation. Unfortunately, this is not the case. If you want, fragmentation is the manifestation of entropy at the memory level, i.e., it is inevitable. On the other hand, memory pools are a way to manage memory in such a way as to effectively reduce the impact of fragmentation (as I said, and as wikipedia mentioned, mostly on specific systems like real time systems). This comes to a cost, since a memory pool can be less efficient than a "normal" memory allocation technique in that you have a minimum block size. In other words, the entropy reappears under disguise.
Furthermore, that are many parameters that affect the efficiency of a memory pool system, like block size, block allocation policy, or whether you have just one memory pool or you have several memory pools with different block sizes, different lifetimes or different policies.
Memory management is really a complex matter and memory pools are just a technique that, like any other, improves things in comparison to other techniques and exact a cost of its own.
In a scenario where you always allocate fixed-size blocks, you either have enough space for one more block, or you don't. If you have, the block fits in the available space, because all free or used spaces are of the same size. Fragmentation is not a problem.
In a scenario with variable-size blocks, you can end up with multiple separate free blocks with varying sizes. A request for a block of a size that is less than the total memory that is free may be impossible to be satisfied, because there isn't one contiguous block big enough for it. For example, imagine you end up with two separate free blocks of 2KB, and need to satisfy a request for 3KB. Neither of these blocks will be enough to provide for that, even though there is enough memory available.
Both fix-size and variable size memory pools will feature fragmentation, i.e. there will be some free memory chunks between used ones.
For variable size, this might cause problems, since there might not be a free chunk that is big enough for a certain requested size.
For fixed-size pools, on the other hand, this is not a problem, since only portions of the pre-defined size can be requested. If there is free space, it is guaranteed to be large enough for (a multiple of) one portion.
If you do a hard real time system, you might need to know in advance that you can allocate memory within the maximum time allowed. That can be "solved" with fixed size memory pools.
I once worked on a military system, where we had to calculate the maximum possible number of memory blocks of each size that the system could ever possibly use. Then those numbers were added to a grand total, and the system was configured with that amount of memory.
Crazily expensive, but worked for the defence.
When you have several fixed size pools, you can get a secondary fragmentation where your pool is out of blocks even though there is plenty of space in some other pool. How do you share that?
With a memory pool, operations might work like this:
Store a global variable that is a list of available objects (initially empty).
To get a new object, try to return one from the global list of available. If there isn't one, then call operator new to allocate a new object on the heap. Allocation is extremely fast which is important for some applications that might currently be spending a lot of CPU time on memory allocations.
To free an object, simply add it to the global list of available objects. You might place a cap on the number of items allowed in the global list; if the cap is reached then the object would be freed instead of returned to the list. The cap prevents the appearance of a massive memory leak.
Note that this is always done for a single data type of the same size; it doesn't work for larger ones and then you probably need to use the heap as usual.
It's very easy to implement; we use this strategy in our application. This causes a bunch of memory allocations at the beginning of the program, but no more memory freeing/allocating occurs which incurs significant overhead.

C++ Some Stack questions

Let me start by saying that I have read this tutorial and have read this question. My questions are:
How big can the stack get ? Is it
processor/architecture/compiler
dependent ?
Is there a way to know exactly how
much memory is available to my
function/class stack and how much is
currently being used in order to
avoid overflows ?
Using modern compilers (say gcc 4.5)
on a modern computer (say 6 GB ram),
do I need to worry for stack
overflows or is it a thing of the
past ?
Is the actual stack memory
physically on RAM or on CPU cache(s) ?
How much faster is stack memory
access and read compared to heap
access and read ? I realize that
times are PC specific, so a ratio is
enough.
I've read that it is not advisable
to allocate big vars/objects on the
stack. How much is too big ? This
question here is given an answer
of 1MB for a thread in win32. How
about a thread in Linux amd64 ?
I apologize if those questions have been asked and answered already, any link is welcome !
Yes, the limit on the stack size varies, but if you care you're probably doing something wrong.
Generally no you can't get information about how much memory is available to your program. Even if you could obtain such information, it would usually be stale before you could use it.
If you share access to data across threads, then yes you normally need to serialize access unless they're strictly read-only.
You can pass the address of a stack-allocated object to another thread, in which case you (again) have to serialize unless the access is strictly read-only.
You can certainly overflow the stack even on a modern machine with lots of memory. The stack is often limited to only a fairly small fraction of overall memory (e.g., 4 MB).
The stack is allocated as system memory, but usually used enough that at least the top page or two will typically be in the cache at any given time.
Being part of the stack vs. heap makes no direct difference to access speed -- the two typically reside in identical memory chips, and often even at different addresses in the same memory chip. The main difference is that the stack is normally contiguous and heavily used, do the top few pages will almost always be in the cache. Heap-based memory is typically fragmented, so there's a much greater chance of needing data that's not in the cache.
Little has changed with respect to the maximum size of object you should allocate on the stack. Even if the stack can be larger, there's little reason to allocate huge objects there.
The primary way to avoid memory leaks in C++ is RAII (AKA SBRM, Stack-based resource management).
Smart pointers are a large subject in themselves, and Boost provides several kinds. In my experience, collections make a bigger difference, but the basic idea is largely the same either way: relieve the programmer of keeping track of every circumstance when a particular object can be used or should be freed.
1.How big can the stack get ? Is it processor/architecture/compiler dependent ?
The size of the stack is limited by the amount of memory on the platform and the amount of memory allocated to the process by the operating system.
2.Is there a way to know exactly how much memory is available to my function/class stack and how much is currently being used in order to avoid overflows ?
There is no C or C++ facility for determining the amount of available memory. There may be platform specific functions for this. In general, most programs try to allocate memory, then come up with a solution for when the allocation fails.
3.Using modern compilers (say gcc 4.5) on a modern computer (say 6 GB ram), do I need to worry for stack overflows or is it a thing of the past ?
Stack Overflows can happen depending on the design of the program. Recursion is a good example of depleting the stack, regardless of the amount of memory.
4.Is the actual stack memory physically on RAM or on CPU cache(s) ?
Platform dependent. Some CPU's can load up their cache with local variables on the stack. Wide variety of scenarios on this topic. Not defined in the language specification.
5.How much faster is stack memory access and read compared to heap access and read ?
I realize that times are PC specific, so a ratio is enough.
Usuallly there is no difference in speed. Depends on how the platform organizes its memory (physically) and how the executable's memory is laid out. The heap or stack could reside in a serial access memory chip (a slow method) or even on a Flash memory chip. Not specified in the language specification.
6.I've read that it is not advisable to allocate big vars/objects on the stack. How much is too big ? This question here is given an answer of 1MB for a thread in win32. How about a thread in Linux amd64 ?
The best advice is to allocate local small variables as needed (a.k.a. via stack). Huge items are either allocted from dynamic memory (a.k.a. heap), or some kind of global (static local to function or local to translation unit or even global variable). If the size is known at compile time, use the global type allocation. Use dynamic memory when the size may change during run-time.
The stack also contains information about function addresses. This is one major reason to not allocate a lot of objects locally. Some compilers have smaller limits for stacks than for heap or global variables. The premise is that nested function calls require less memory than large data arrays or buffers.
Remember that when switching threads or tasks, the OS needs to save the state somewhere. The OS may have different rules for saving stack memory versus other types.
1-2 : On some embedded CPUs the stack may be limited to a few kbytes; on some machines it may expand to gigabytes. There's no platform-independent way to know how big the stack can get, in some measure because some platforms are capable of expanding the stack when they reach the limit; the success of such an operation cannot always be predicted in advance.
3 : The effects of nearly-simultaneous writes, or of writes in one thread that occur nearly simultaneously with reads in another, are largely unpredictable in the absence of locks, mutexes, or other such devices. Certain things can be assumed (for example, if one thread reads a heap-stored 'int' while another thread changes it from 4 to 5, the first thread may see 4 or it may see 5; on most platforms, it would be guaranteed not to see 27).
4 : Some platforms share stack address space among threads; others do not. Passing pointers to things on the stack is usually a bad idea, though, since the the foreign thread receiving the pointer will have no way of ensuring that the target is in scope and won't go out of scope.
5 : Generally one does not need to worry about stack space in any routine which is written to limit recursion to a reasonable level. One does, however, need to worry about the possibility of defective data structures causing infinite recursion, which would wipe out any stack no matter how large it might be. One should also be mindful of the possibility of nasty input which would cause a much greater stack depth than expected. For example, a compiler using a recursive-descent parser might choke if fed a file containing a billion repetitions of the sequence "1+(". Even if the machine has a gig of stack space, if each nested sub-expression uses 64 bytes of stack, the aforementioned three-gig file could kill it.
6 : Stack is stored generally in RAM and/or cache; the most-recently-accessed parts will generally be in cache, while the less-recently-accessed parts will be in main memory. The same is generally true of code, heap, and static storage areas as well.
7 : That is very system dependent; generally, "finding" something on the heap will take as much time as accessing a few things on the stack, but in many cases making multiple accesses to different parts of the same heap object can be as fast as accessing a stack object.

When do you worry about stack size?

When you are programming in a language that allows you to use automatic allocation for very large objects, when and how do you worry about stack size? Are there any rules of thumb for reasoning about stack size?
When you are programming in a language that allows you to use automatic allocation for very large objects ...
If I want to allocate a very large object, then instead of on the stack I might allocate it on the heap but wrapped in an auto_ptr (in which case it will be deallocated when it goes out of scope, just like a stack-resident object, but without worrying about stack size).
... when and how do you worry about stack size?
I use the stack conservatively out of habit (e.g. any object bigger than about 512 bytes is allocated on the heap instead), and I know how big the stack is (e.g. about a megabyte by default), and therefore know that I don't need to worry about it.
Are there any rules of thumb for reasoning about stack size?
Very big objects can blow the stack
Very deep recursion can blow the stack
The default stack size might be too big (take too much total memory) if there are many threads and if you're running on a limited-memory embedded device, in which case you might want to use an O/S API or linker option to reduce the size of the stack per thread.
You care about it on a microcontroller, where you often have to specify stack space explicitly (or you get whatever's left over after RAM gets used for static allocation + any RAM program space).
You start to worry about stack size when
someone on your team cunningly invents a recursive function that goes on and on and on...
you create a thread factory and suddenly need a tenfold of the stack that you used to need (each thread needs a stack => the more threads you have, the less free space remains for a given stack size)
If you're writing for a tiny little embedded platform, you worry about it all the time, but you also know exactly how big it is, and probably have some useful tools available to find the high-water mark of the stack.
If you aren't, then don't worry until your program crashes :)
Unless you are allocating seriously huge objects (many tens of KB), then it is never going to be a problem.
Note, however, that objects on the stack are, by definition, temporary. Constructing (and possibly destructing) large objects frequently may cause you a performance problem - so if you have a large object it probably should be persistent and heap-based for reasons other than stack size.
I never worry about it. If there is a stack overflow, I will soon know about it. Also, in C++ it is actually very hard to create very large objects on the stack. About the only way of doing it is:
struct S {
char big[1000000];
};
but use of std::string or std::vector makes that problem go away.
Shouldn't you be avoiding using the stack for allocating large objects in the first place? Use the heap, no?
my experience:
when you use recursive functions, take care of the stack size!!
When do you worry about stack size?
Never.
If you have stack size problems it means you're doing something else wrong and should fix that instead of worrying about stack size.
For instace:
Allocating unreasonably large structures on the stack - don't do it. allocate on the heap.
Having a ridiculously long recursion. I mean in the order of painting an image and iterating over the pixels using recursion. - find a better way to do it.
I worry about stack size on embedded systems when call stack goes very deep and each function allocates variables (on the stack). Generally, panic evolves when the system crashes unexpectedly due to variables changing on the stack (the stack overflows).
Played this game a lot on Symbian: when to use TBuf (a string with storage on the stack), and when to use HBufC (which allocate the string storage on the heap, like std::string, so you have to cope with Leave, and your function needs a means of failing).
At the time (maybe still, I'm not sure), Symbian threads had 4k of stack by default. To manipulate filenames, you need to count on using up to 512 bytes (256 characters).
As you can imagine, the received wisdom was "never put a filename on the stack". But actually, it turned out that you could get away with it a lot more often than you'd think. When we started running real programs (TM), such as games, we found that we needed way more than the default stack size anyway, and it wasn't due to filenames or other specific large objects, it was due to the complexity of the game code.
If using stack makes your code simpler, and as long as you're testing properly, and as long as you don't go completely overboard (don't have multiple levels of file-handling functions which all put a filename on the stack), then I'd say just try it. Especially if the function would need to be able to fail anyway, whether you're using stack or heap. If it goes wrong, you either double the stack size and be more careful in future, or you add another failure case to your function. Neither is the end of the world.
You usually can't really have large objects on the stack. They almost always use the heap internally so even if they are 'on the stack' their data members are not. Even an object with tons of data members will usually be under 64 bytes on the stack, the rest on the heap. The stack usually only becomes an issue these days when you have lots of threads and lots of recursion.
Only time really is when you are threading and have to define it yourself, when you are doing recursion or when for some reason you are allocating to the stack. Otherwise the compiler takes care of making sure you have enough stack space.
CreateThread by default only allocates 0x100000 bytes for the stack.
When the code you've written for a PC suddenly is supposed to run on a mobile phone
When the code you've ported to run on a mobile phone suddenly is supposed to run on a DSP
(And yes, these are real-life snafus.)
When deciding whether to allocate objects on the stack vs. the heap, there are also perf issues to be taken into consideration. Allocation of memory on the stack is very fast - it just involves moving the stack pointer, whereas dynamic allocation/deallocation using new/delete or malloc/free is fairly expensive, especially in multithreaded code that doesn't have a heap per thread. If you have a function that is being called in a tight loop, you might well err on the side of putting larger objects on the stack, keeping all of the multithreading caveats mentioned in other answers in mind, even if that means having to increase stack space, which most linkers will allow you to do.
In general, big allocations on the stack are bad for several reasons, not the least of which is that they can cause problems to remain well hidden for a long time.
The problem is that detecting stack overflow is not easy, and big allocations can subvert most of the commonly used methods.
If the processor has no memory management or memory protection unit, you have to be particularly careful. But event with some sort of MMU or MPU, the hardware can fail to detect a stack overflow. One common scheme, reserving a page below the stack to catch overflow, fails if the big stack object is bigger than a page. There just might be the stack of another thread sitting there and oops! you just created a very nasty, hard to find bug.
Unlimited recursion is usually easy to catch because the stack growth is usually small and will trigger the hardware protection.
I don't. Worrying about this things whilst writing programming normal things is either a case of premature pessimization or premature optimization. It's pretty hard to blow things up on a modern computer anyway.
I once wrote a CSV parser and whilst playing around with trying to get the best performance I was allocating hundereds of thousands of 1K buffers on the stack. The performance was stellar but the RAM went up to about 1GB from memory from normal 30MB. This was due to each cell in the CSV file had a fixed size 1K buffer.
Like everyone is saying unless you are doing recursion you do not have to worry about it.
You worry about it when you write a callback that will be called from threads spawned by a runtime you don't control (for example, MS RPC runtime) with stack size at the discretion of that runtime. Somehow like this.
I have had problems running out of stack space when:
A function accidentally calls itself
A function uses recursion to a deep level
A function allocates a large object on the stack, and there is a heap.
A function uses complicated templates and the compiler crashes
Provided I:
Allocate large objects on the heap (eg. using "auto_ptr foo = new Foo" instead of "Foo foo")
Use recursion judiciously.
I don't normally have any problems, so unfortunately don't know what good defaults should be.
You start to worry about stack size when:
when your program crashes - usually these bugs tend to be weird first time you see them :)
you are running an algorithm that uses recursion and has user input as one of its parameters (you don't know how much stack your algorithm could use)
you are running on embedded platforms (or platforms where each resource is important). Usually on these platforms stack is allocated before process is created - so a good estimation about stack requirements must be made
you are creating objects on the stack depending on some parameters modifiable by user input (see the sample below)
when the code executed in a thread/process/task is very big and there are a lot of function calls that go deep into the stack and generate a huge call-stack. This usually happens in big frameworks that combine a lot of triggers and event processing (a GUI framework; for example: receive_click-> find_clicked_window->
send_msg_to_window->
process_message->
process_click->
is_inside_region->
trigger_drawing->
write_to_file-> ... ). To put it short, you should worry about call-stack in case of complex code or unknown/binary 3rd party modules.
sample for modifiable input parameters:
in my_func(size_t input_param)
{
char buffer[input_param];
// or any other initialization of a big object on the stack
....
}
An advice:
you should mark the stack with some magic numbers (in case you allocate it) and check if those magic numbers will be modified (in that case the stack will not be enough for the task/thread/process and should probably be increased)