Why is stack memory size so limited? - c++

When you allocate memory on the heap, the only limit is free RAM (or virtual memory). It makes Gb of memory.
So why is stack size so limited (around 1 Mb)? What technical reason prevents you to create really big objects on the stack?
Update: My intent might not be clear, I do not want to allocate huge objects on the stack and I do not need a bigger stack. This question is just pure curiosity!

My intuition is the following. The stack is not as easy to manage as the heap. The stack need to be stored in continuous memory locations. This means that you cannot randomly allocate the stack as needed, but you need to at least reserve virtual addresses for that purpose. The larger the size of the reserved virtual address space, the fewer threads you can create.
For example, a 32-bit application generally has a virtual address space of 2GB. This means that if the stack size is 2MB (as default in pthreads), then you can create a maximum of 1024 threads. This can be small for applications such as web servers. Increasing the stack size to, say, 100MB (i.e., you reserve 100MB, but do not necessarily allocated 100MB to the stack immediately), would limit the number of threads to about 20, which can be limiting even for simple GUI applications.
A interesting question is, why do we still have this limit on 64-bit platforms. I do not know the answer, but I assume that people are already used to some "stack best practices": be careful to allocate huge objects on the heap and, if needed, manually increase the stack size. Therefore, nobody found it useful to add "huge" stack support on 64-bit platforms.

One aspect that nobody has mentioned yet:
A limited stack size is an error detection and containment mechanism.
Generally, the main job of the stack in C and C++ is to keep track of the call stack and local variables, and if the stack grows out of bounds, it is almost always an error in the design and/or the behaviour of the application.
If the stack would be allowed to grow arbitrarily large, these errors (like infinite recursion) would be caught very late, only after the operating systems resources are exhausted. This is prevented by setting an arbitrary limit to the stack size. The actual size is not that important, apart from it being small enough to prevent system degradation.

It is just a default size. If you need more, you can get more - most often by telling the linker to allocate extra stack space.
The downside to having large stacks is that if you create many threads, they will need one stack each. If all the stacks are allocating multi-MBs, but not using it, the space will be wasted.
You have to find the proper balance for your program.
Some people, like #BJovke, believe that virtual memory is essentially free. It is true that you don't need to have physical memory backing all the virtual memory. You do have to be able to at least give out addresses to the virtual memory.
However, on a typical 32-bit PC the size of the virtual memory is the same as the size of the physical memory - because we only have 32 bits for any address, virtual or not.
Because all threads in a process share the same address space, they have to divide it between them. And after the operating system has taken its part, there is "only" 2-3 GB left for an application. And that size is the limit for both the physical and the virtual memory, because there just aren't any more addresses.

For one thing, the stack is continuous, so if you allocate 12MB, you must remove 12MB when you want to go below whatever you created. Also moving objects around becomes much harder. Here is a real world example that may make things easier to understand:
Say you are stacking boxes around a room. Which is easier to manage:
stacking boxes of any weight on top of each other, but when you need to get something on the bottom you have to undo your entire pile. If you want to take a item out of the pile and give it to someone else you must take off all of the boxes and move the box to the other person's pile (Stack only)
You put all of your boxes (except for really small boxes) over in a special area where you do not stack stuff on top of other stuff and write down where you put it on a piece of paper (a pointer) and put the paper on the pile. If you need to give the box to someone else you just hand them the slip of paper from your pile, or just give them a photocopy of the paper and leave the original where it was in your pile. (Stack + heap)
Those two examples are gross generalizations and there are some points that are blatantly wrong in the analogy but it is close enough that it hopefully will help you see the advantages in both cases.

Think of the stack in the order of near to far. Registers are close to the CPU (fast), the stack is a bit further (but still relatively close) and the heap is far away (slow access).
The stack lives on the heap ofcourse, but still, since it's being used continuously, it probably never leaves the CPU cache(s), making it faster than just the average heap access.
This is a reason to keep the stack reasonably sized; to keep it cached as much as possible. Allocating big stack objects (possibly automatically resizing the stack as you get overflows) goes against this principle.
So it's a good paradigm for performance, not just a left-over from old times.

Allocating large objects in a, say, 100MB stack would make it impossible on most machines to have them loaded at once into cache, which pretty much defeats the purpose of the stack.
The point of the stack is to have small objects that belong to the same scope (and are, therefore, usually needed together or close to each other) stored together in contiguous memory addresses, so that the program can have them all loaded into cache at the same time, minimizing cache misses and, in general, the time CPU has to wait until it gets some missing piece of data from the slower RAM.
A 50MB object stored in the stack would not fit into the cache, meaning after every cache line there would be a CPU waiting time until the next piece of data is brought from RAM, meaning one would be clogging the call stack and not getting any significant benefit (in terms of speed) as compared to loading from the heap.

Many of the things you think you need a big stack for, can be done some other way.
Sedgewick's "Algorithms" has a couple good examples of "removing" recursion from recursive algorithms such as QuickSort, by replacing the recursion with iteration. In reality, the algorithm is still recursive, and there is still as stack, but you allocate the sorting stack on the heap, rather than using the runtime stack.
(I favor the second edition, with algorithms given in Pascal. It can be had used for eight bucks.)
Another way to look at it, is if you think you need a big stack, your code is inefficient. There is a better way that uses less stack.

If you could have an infinte stack then every virtual address could potentially be used by the stack. If the stack can use evey address, then there is no place for the heap to go. Every address you picked for a heap variable could be overwritten by a growing stack.
To put it another way, variables on the stack and variables on the heap occupy the same virtual address space. We need some way of preventing the heap allocator from allocating data where the stack might grow into. A stack size is an easy way to do it. The heap allocator knows that the stack addresses are taken and so it uses something else.

I don't think there is any technical reason, but it would be a strange app that just created just one huge super-object on the stack. Stack objects lack flexibility that becomes more problematic with increasing size - you cannot return without destroying them and you cannot queue them to other threads.

Related

How to find out what size array fits in stack memory?

It is my impression that when doing this
void stuff() {
int arr[1000];
//do some stuff
}
Then the array could be allocated either in heap or stack memory depending on its size and the limits of the CPU.
I'm optimizing a script that bruteforce tests some calculations and since the estimated time is currently several weeks, I'd like to get that down as much as possible, getting as much sample data as possible fitted in the stack would help I think.
Is there a way to find out how big of an array I can declare?
The stack on a typical modern machine varies, but is usually around/less than 5 MB. On embedded platforms in might be a lot less. Some platforms allow to specify a hint for the stack size of a created thread, but often those are only hints and are not strictly followed.
Also, how big object you put on the stack depends on the current stack depth, you could be in the beginning of the stack or near its end. You cannot really tell in advance, I mean at coding time, which is when and where the size of stack arrays is specified. You could assume you will be in the beginning of the stack in the case of a trivial example, but in production code function calls may be nested in an order and depth which are hard to predict, or even impossible if they depend on user input.
I am not aware if there is any portable way, but the platform specific way would be to put an object on the stack and measure the difference between its address and the end of the stack, this will give you at runtime the stack space you have left.
At any rate, if your intention is to put a big object on the stack, you don't really gain much from not using heap memory allocation, the penalty from the memory allocation will be negligible, and afterwards, accessing data from it will be as fast as if it was on the stack, at the added benefit it will not depend on the limited stack size.
The purpose of the stack is not to store a program's bulk data, it is to keep track of the program's core structure so it can run. Do not put bulk data on the stack ever. Due to that intent, the stack size is almost always not more than a few megabytes, even for applications which are intended to use gigabytes of ram memory.

Max amount one should allocate on the stack

I've been searching Stack Overflow for a guideline on the max amount of memory one should allocate on the stack.
I see best practices for stack vs. heap allocation but nothing has numbers on a guideline on how much should be allocated on the stack and how much should be allocated on the heap.
Any ideas/numbers I can use as a guideline? When should I allocate on the stack vs. the heap and how much is too much?
In a typical case, the stack is limited to around 1-4 megabytes. To leave space for other parts of the code, you typically want to limit a single stack frame to no more than a few tens of kilobytes or so if possible. When/if recursion gets (or might get) involved, you typically want to limit it quite a bit more than that.
The answer here depends on the environment in which the code is running. On a small embedded system, the whole stack may be a few kilobytes. On a large system running on a desktop, the stack is typically in the megabytes.
For desktop/big embedded system, a few kilobytes is typically fine. For small embedded systems, that may not work well at all.
On the other hand, excessive use of the heap can lead to excessive overhead when calling new/delete frequently. So in a typical situation, you shouldn't use heap allocation for very small objects - unless necessary from other design criteria (e.g. you need a pointer to store permanently somewhere, and stack won't work for that as you are returning from the current function before the object has been finished with).
Of course, it's the overall design that matters. If you have a very simple application, with a few functions, none of which are recursive, it could be fine to allocate a few hundred kilobytes in main or a level above. On the other hand, if you are making a library for generic use, using more than a few kilobytes will probably not make you popular with the developers using the library. And if the library is being developed to run on low memory systems (in a washing machine, old style mobile phone, etc) then using more than a couple of hundred bytes is probably a bad idea.
Allocate on the stack as small as possible. Use the heap for datasets or else the stack allocation will carry through the scope's life, possibly thrashing the cache.

Is it a good idea to extend the size of the stack for large local data processing?

My environment is gcc, C++, Linux.
When my application does some data calculation, it may need a "large" (may be a few MBs) number of memory to store data, calculation results and other things. I got some code using kind of new, delete to finish this. Since there is no ownership outside some function scope, i think all these memory can be allocated in stack.
The problem is, the default stack size(8192Kb in my system) may not be enough. I may need to change stack size for these stack allocation. Morever, if the calculation needs more data in future, i may need to extend stack size again.
So is it an option to extend stack size? Since it cannot be allocated for specific functions, how will it impact on the whole app? Is it REALLY an improvement to allocate data on stack instead of on heap?
You bring up a controversial question that does not have a direct answer. There are pros and cons on each side. In particular:
Memory on the heap is more easy to control: you can check the return value or allow throwing exceptions. When the stack overflows, your thread is simply unloaded with a good change that debugger will not show anything meaningful.
On the contrary, stack allocations happen automatically, and you do not need to do anything specific. I always favor this simplicity.
There is nothing fundamentally wrong in allocating large amounts of data on the stack. At the end of the day any type of memory is finally a memory. This is means that the total amount of required memory is what really matters. Where this memory is allocated is less important. When there is enough memory for your application to work, there is no difference where the memory is allocated. It can be static for example.
Different systems have different rules of allocation. This means that final decision may depend on the actual system.
While it's true that stack allocations are more efficient (more apparent in multi-threaded programs), but if the usage pattern in your case is "allocate a big chunk of memory, process the data, deallocate it", then there won't be much of improvement.
Instead rewrite the code to use RAII, e.g. std::vector or std::unique_ptr so there won't be explicit buggy deletes.
If you use Linux, you can change the stack size with the ulimit command. However, I think the memory which allocated from the heap also is good for you.

C++ Some Stack questions

Let me start by saying that I have read this tutorial and have read this question. My questions are:
How big can the stack get ? Is it
processor/architecture/compiler
dependent ?
Is there a way to know exactly how
much memory is available to my
function/class stack and how much is
currently being used in order to
avoid overflows ?
Using modern compilers (say gcc 4.5)
on a modern computer (say 6 GB ram),
do I need to worry for stack
overflows or is it a thing of the
past ?
Is the actual stack memory
physically on RAM or on CPU cache(s) ?
How much faster is stack memory
access and read compared to heap
access and read ? I realize that
times are PC specific, so a ratio is
enough.
I've read that it is not advisable
to allocate big vars/objects on the
stack. How much is too big ? This
question here is given an answer
of 1MB for a thread in win32. How
about a thread in Linux amd64 ?
I apologize if those questions have been asked and answered already, any link is welcome !
Yes, the limit on the stack size varies, but if you care you're probably doing something wrong.
Generally no you can't get information about how much memory is available to your program. Even if you could obtain such information, it would usually be stale before you could use it.
If you share access to data across threads, then yes you normally need to serialize access unless they're strictly read-only.
You can pass the address of a stack-allocated object to another thread, in which case you (again) have to serialize unless the access is strictly read-only.
You can certainly overflow the stack even on a modern machine with lots of memory. The stack is often limited to only a fairly small fraction of overall memory (e.g., 4 MB).
The stack is allocated as system memory, but usually used enough that at least the top page or two will typically be in the cache at any given time.
Being part of the stack vs. heap makes no direct difference to access speed -- the two typically reside in identical memory chips, and often even at different addresses in the same memory chip. The main difference is that the stack is normally contiguous and heavily used, do the top few pages will almost always be in the cache. Heap-based memory is typically fragmented, so there's a much greater chance of needing data that's not in the cache.
Little has changed with respect to the maximum size of object you should allocate on the stack. Even if the stack can be larger, there's little reason to allocate huge objects there.
The primary way to avoid memory leaks in C++ is RAII (AKA SBRM, Stack-based resource management).
Smart pointers are a large subject in themselves, and Boost provides several kinds. In my experience, collections make a bigger difference, but the basic idea is largely the same either way: relieve the programmer of keeping track of every circumstance when a particular object can be used or should be freed.
1.How big can the stack get ? Is it processor/architecture/compiler dependent ?
The size of the stack is limited by the amount of memory on the platform and the amount of memory allocated to the process by the operating system.
2.Is there a way to know exactly how much memory is available to my function/class stack and how much is currently being used in order to avoid overflows ?
There is no C or C++ facility for determining the amount of available memory. There may be platform specific functions for this. In general, most programs try to allocate memory, then come up with a solution for when the allocation fails.
3.Using modern compilers (say gcc 4.5) on a modern computer (say 6 GB ram), do I need to worry for stack overflows or is it a thing of the past ?
Stack Overflows can happen depending on the design of the program. Recursion is a good example of depleting the stack, regardless of the amount of memory.
4.Is the actual stack memory physically on RAM or on CPU cache(s) ?
Platform dependent. Some CPU's can load up their cache with local variables on the stack. Wide variety of scenarios on this topic. Not defined in the language specification.
5.How much faster is stack memory access and read compared to heap access and read ?
I realize that times are PC specific, so a ratio is enough.
Usuallly there is no difference in speed. Depends on how the platform organizes its memory (physically) and how the executable's memory is laid out. The heap or stack could reside in a serial access memory chip (a slow method) or even on a Flash memory chip. Not specified in the language specification.
6.I've read that it is not advisable to allocate big vars/objects on the stack. How much is too big ? This question here is given an answer of 1MB for a thread in win32. How about a thread in Linux amd64 ?
The best advice is to allocate local small variables as needed (a.k.a. via stack). Huge items are either allocted from dynamic memory (a.k.a. heap), or some kind of global (static local to function or local to translation unit or even global variable). If the size is known at compile time, use the global type allocation. Use dynamic memory when the size may change during run-time.
The stack also contains information about function addresses. This is one major reason to not allocate a lot of objects locally. Some compilers have smaller limits for stacks than for heap or global variables. The premise is that nested function calls require less memory than large data arrays or buffers.
Remember that when switching threads or tasks, the OS needs to save the state somewhere. The OS may have different rules for saving stack memory versus other types.
1-2 : On some embedded CPUs the stack may be limited to a few kbytes; on some machines it may expand to gigabytes. There's no platform-independent way to know how big the stack can get, in some measure because some platforms are capable of expanding the stack when they reach the limit; the success of such an operation cannot always be predicted in advance.
3 : The effects of nearly-simultaneous writes, or of writes in one thread that occur nearly simultaneously with reads in another, are largely unpredictable in the absence of locks, mutexes, or other such devices. Certain things can be assumed (for example, if one thread reads a heap-stored 'int' while another thread changes it from 4 to 5, the first thread may see 4 or it may see 5; on most platforms, it would be guaranteed not to see 27).
4 : Some platforms share stack address space among threads; others do not. Passing pointers to things on the stack is usually a bad idea, though, since the the foreign thread receiving the pointer will have no way of ensuring that the target is in scope and won't go out of scope.
5 : Generally one does not need to worry about stack space in any routine which is written to limit recursion to a reasonable level. One does, however, need to worry about the possibility of defective data structures causing infinite recursion, which would wipe out any stack no matter how large it might be. One should also be mindful of the possibility of nasty input which would cause a much greater stack depth than expected. For example, a compiler using a recursive-descent parser might choke if fed a file containing a billion repetitions of the sequence "1+(". Even if the machine has a gig of stack space, if each nested sub-expression uses 64 bytes of stack, the aforementioned three-gig file could kill it.
6 : Stack is stored generally in RAM and/or cache; the most-recently-accessed parts will generally be in cache, while the less-recently-accessed parts will be in main memory. The same is generally true of code, heap, and static storage areas as well.
7 : That is very system dependent; generally, "finding" something on the heap will take as much time as accessing a few things on the stack, but in many cases making multiple accesses to different parts of the same heap object can be as fast as accessing a stack object.

When do you worry about stack size?

When you are programming in a language that allows you to use automatic allocation for very large objects, when and how do you worry about stack size? Are there any rules of thumb for reasoning about stack size?
When you are programming in a language that allows you to use automatic allocation for very large objects ...
If I want to allocate a very large object, then instead of on the stack I might allocate it on the heap but wrapped in an auto_ptr (in which case it will be deallocated when it goes out of scope, just like a stack-resident object, but without worrying about stack size).
... when and how do you worry about stack size?
I use the stack conservatively out of habit (e.g. any object bigger than about 512 bytes is allocated on the heap instead), and I know how big the stack is (e.g. about a megabyte by default), and therefore know that I don't need to worry about it.
Are there any rules of thumb for reasoning about stack size?
Very big objects can blow the stack
Very deep recursion can blow the stack
The default stack size might be too big (take too much total memory) if there are many threads and if you're running on a limited-memory embedded device, in which case you might want to use an O/S API or linker option to reduce the size of the stack per thread.
You care about it on a microcontroller, where you often have to specify stack space explicitly (or you get whatever's left over after RAM gets used for static allocation + any RAM program space).
You start to worry about stack size when
someone on your team cunningly invents a recursive function that goes on and on and on...
you create a thread factory and suddenly need a tenfold of the stack that you used to need (each thread needs a stack => the more threads you have, the less free space remains for a given stack size)
If you're writing for a tiny little embedded platform, you worry about it all the time, but you also know exactly how big it is, and probably have some useful tools available to find the high-water mark of the stack.
If you aren't, then don't worry until your program crashes :)
Unless you are allocating seriously huge objects (many tens of KB), then it is never going to be a problem.
Note, however, that objects on the stack are, by definition, temporary. Constructing (and possibly destructing) large objects frequently may cause you a performance problem - so if you have a large object it probably should be persistent and heap-based for reasons other than stack size.
I never worry about it. If there is a stack overflow, I will soon know about it. Also, in C++ it is actually very hard to create very large objects on the stack. About the only way of doing it is:
struct S {
char big[1000000];
};
but use of std::string or std::vector makes that problem go away.
Shouldn't you be avoiding using the stack for allocating large objects in the first place? Use the heap, no?
my experience:
when you use recursive functions, take care of the stack size!!
When do you worry about stack size?
Never.
If you have stack size problems it means you're doing something else wrong and should fix that instead of worrying about stack size.
For instace:
Allocating unreasonably large structures on the stack - don't do it. allocate on the heap.
Having a ridiculously long recursion. I mean in the order of painting an image and iterating over the pixels using recursion. - find a better way to do it.
I worry about stack size on embedded systems when call stack goes very deep and each function allocates variables (on the stack). Generally, panic evolves when the system crashes unexpectedly due to variables changing on the stack (the stack overflows).
Played this game a lot on Symbian: when to use TBuf (a string with storage on the stack), and when to use HBufC (which allocate the string storage on the heap, like std::string, so you have to cope with Leave, and your function needs a means of failing).
At the time (maybe still, I'm not sure), Symbian threads had 4k of stack by default. To manipulate filenames, you need to count on using up to 512 bytes (256 characters).
As you can imagine, the received wisdom was "never put a filename on the stack". But actually, it turned out that you could get away with it a lot more often than you'd think. When we started running real programs (TM), such as games, we found that we needed way more than the default stack size anyway, and it wasn't due to filenames or other specific large objects, it was due to the complexity of the game code.
If using stack makes your code simpler, and as long as you're testing properly, and as long as you don't go completely overboard (don't have multiple levels of file-handling functions which all put a filename on the stack), then I'd say just try it. Especially if the function would need to be able to fail anyway, whether you're using stack or heap. If it goes wrong, you either double the stack size and be more careful in future, or you add another failure case to your function. Neither is the end of the world.
You usually can't really have large objects on the stack. They almost always use the heap internally so even if they are 'on the stack' their data members are not. Even an object with tons of data members will usually be under 64 bytes on the stack, the rest on the heap. The stack usually only becomes an issue these days when you have lots of threads and lots of recursion.
Only time really is when you are threading and have to define it yourself, when you are doing recursion or when for some reason you are allocating to the stack. Otherwise the compiler takes care of making sure you have enough stack space.
CreateThread by default only allocates 0x100000 bytes for the stack.
When the code you've written for a PC suddenly is supposed to run on a mobile phone
When the code you've ported to run on a mobile phone suddenly is supposed to run on a DSP
(And yes, these are real-life snafus.)
When deciding whether to allocate objects on the stack vs. the heap, there are also perf issues to be taken into consideration. Allocation of memory on the stack is very fast - it just involves moving the stack pointer, whereas dynamic allocation/deallocation using new/delete or malloc/free is fairly expensive, especially in multithreaded code that doesn't have a heap per thread. If you have a function that is being called in a tight loop, you might well err on the side of putting larger objects on the stack, keeping all of the multithreading caveats mentioned in other answers in mind, even if that means having to increase stack space, which most linkers will allow you to do.
In general, big allocations on the stack are bad for several reasons, not the least of which is that they can cause problems to remain well hidden for a long time.
The problem is that detecting stack overflow is not easy, and big allocations can subvert most of the commonly used methods.
If the processor has no memory management or memory protection unit, you have to be particularly careful. But event with some sort of MMU or MPU, the hardware can fail to detect a stack overflow. One common scheme, reserving a page below the stack to catch overflow, fails if the big stack object is bigger than a page. There just might be the stack of another thread sitting there and oops! you just created a very nasty, hard to find bug.
Unlimited recursion is usually easy to catch because the stack growth is usually small and will trigger the hardware protection.
I don't. Worrying about this things whilst writing programming normal things is either a case of premature pessimization or premature optimization. It's pretty hard to blow things up on a modern computer anyway.
I once wrote a CSV parser and whilst playing around with trying to get the best performance I was allocating hundereds of thousands of 1K buffers on the stack. The performance was stellar but the RAM went up to about 1GB from memory from normal 30MB. This was due to each cell in the CSV file had a fixed size 1K buffer.
Like everyone is saying unless you are doing recursion you do not have to worry about it.
You worry about it when you write a callback that will be called from threads spawned by a runtime you don't control (for example, MS RPC runtime) with stack size at the discretion of that runtime. Somehow like this.
I have had problems running out of stack space when:
A function accidentally calls itself
A function uses recursion to a deep level
A function allocates a large object on the stack, and there is a heap.
A function uses complicated templates and the compiler crashes
Provided I:
Allocate large objects on the heap (eg. using "auto_ptr foo = new Foo" instead of "Foo foo")
Use recursion judiciously.
I don't normally have any problems, so unfortunately don't know what good defaults should be.
You start to worry about stack size when:
when your program crashes - usually these bugs tend to be weird first time you see them :)
you are running an algorithm that uses recursion and has user input as one of its parameters (you don't know how much stack your algorithm could use)
you are running on embedded platforms (or platforms where each resource is important). Usually on these platforms stack is allocated before process is created - so a good estimation about stack requirements must be made
you are creating objects on the stack depending on some parameters modifiable by user input (see the sample below)
when the code executed in a thread/process/task is very big and there are a lot of function calls that go deep into the stack and generate a huge call-stack. This usually happens in big frameworks that combine a lot of triggers and event processing (a GUI framework; for example: receive_click-> find_clicked_window->
send_msg_to_window->
process_message->
process_click->
is_inside_region->
trigger_drawing->
write_to_file-> ... ). To put it short, you should worry about call-stack in case of complex code or unknown/binary 3rd party modules.
sample for modifiable input parameters:
in my_func(size_t input_param)
{
char buffer[input_param];
// or any other initialization of a big object on the stack
....
}
An advice:
you should mark the stack with some magic numbers (in case you allocate it) and check if those magic numbers will be modified (in that case the stack will not be enough for the task/thread/process and should probably be increased)