How can I determine appropriate stack and heap sizes for ARM Cortex, using C++ - c++

The cortex M3 processor startup file allows you to specify the amount of RAM dedicated to the stack and the heap. For a c++ code base, is there a general rule of thumb or perhaps some more explicit way to determine the values for the stack and heap sizes? For example, would you count the number and size of unique objects, or maybe use the compiled code size?

The cortex M3 processor startup file
allows you to specify the amount of
RAM dedicated to the stack and the
heap.
That is not a feature of the Cortex-M3, but rather the start-up code provided by your development toolchain. It is the way the Keil ARM-MDK default start-up files for M3 work. It is slightly unusual; more commonly you would specify a stack size, and any remaining memory after stack and static memory allocation by the linker becomes the heap; this is arguably better since you do not end up with a pool of unusable memory. You could modify that and use an alternative scheme, but you'd need to know what you are doing.
If you are using Keil ARM-MDK, the linker options --info=stack and --callgraph add information to the map file that aids stack requirement analysis. These and other techniques are described here.
If you are using an RTOS or multi-tasking kernel, each task will have its own stack. The OS may provide stack analysis tools, Keil's RTX kernel viewer shows current stack usage but not peak stack usage (so is mostly useless, and it only works correctly for tasks with default stack lengths).
If you have to implement stack checking tools yourself, the normal method is to fill the stack with a known value, and starting from the high address, inspect the value until you find the first value that is not the fill byte, this will give the likley high tide mark of the stack. You can implement code to do this, or you can manually fill the memory from the debugger, and then monitor stack usage in a debugger memory window.
Heap requirement will depend on the run-time behaviour of your code; you'll have to analyse that yourself however in ARM/Keil Realview, the MemManage Exception handler will be called when C++'s new throws an exception; I am not sure if malloc() does that or simply returns NULL. You can place a breakpoint in the exception handler or modify the handler to emit an error message to detect heap exhaustion during testing. There is also a a __heapstats() function that can be used to output heap information. It has a somewhat cumbersome interface, I wrapped it thus:
void heapinfo()
{
typedef int (*__heapprt)(void *, char const *, ...);
__heapstats( (__heapprt)std::fprintf, stdout ) ;
}

The compiled code size will not help as the code does not run in the stack nor the heap. Cortex-M3 devices are typically implemented on microcontrollers with built in Flash and a relatively small amount of RAM. In this configuration, the code will typically run from Flash.
The heap is used for dynamic memory allocation. Counting the number of unique objects will give you a rough estimate but you also have to account for any other elements that use dynamic memory allocation (using the new keyword in C++). Generally, dynamic memory allocation is avoided in embedded systems for the precise reason that heap size is hard to manage.
The stack will be used for variable passing, local variables, and context saving during exception handling routines. It is generally hard to get a good idea of stack usage unless you're code allocates a large block of local memory or a large objects. One technique that may help is to allocate all of the available RAM you have for the stack. Fill the stack with a known pattern (0x00 or 0xff are not the best choices since these values occur frequently), run the system for a while then examine the stack to see how much was used. Admittedly, this not a very precise nor scientific approach but still helpful in many cases.

The latest version of the IAR Compiler has a feature that will determine what stack size you need, based on a static analysis of your code (assuming you don't have any recursion).
The general approach, if you don't have an exact number, is to make as big as you can, and then when you start running out of memory, start trimming the stack down until your program crashes due to a stack over flow. I wish that was a joke, but that is the way it is usually done.

Reducing until it crashes is a quick ad-hoc way. You can also fill the stack with a known value, say, 0xCCCC, and then monitor maximum stack usage by scanning for the 0xCCCC.
It's imperfect, but much better than looking for a crash.
The rationale being, reducing stack size does not guarantee that stack overflow will munch something "visible".

Related

Why Stack and Heap Size are not defined in User Manual of microcontroller?

I am quite new to embedded programming.So may be this is a quite easy question for you.
I have seen different linker script file/linker configuration files of different SDK(e.g IAR EWARM, Tasking etc) in which the size of stack/heap are defined.
The size/Range of RAM and flash are also defined of every microcontroler in Linker file.Which are usually taken from memory map of User Manual.(address range are provided i user manual)
My question is how this size of stack and heap are calculated?
Can i select any value to the size of stack/heap size? Or is their any criteria foe that?
These are not defined in the microcontroller user manual because they are not hardware defined constraints. Rather they are application defined. It is a software dependent partitioning of memory, not hardware dependent.
Local, non-static variables, function arguments and call return addresses are generally stored on the stack; so the required stack size depends on the call depth and the number and size of local-variables and parameters for each function in a call-tree. The stack usage is dynamic, but there will be some worst-case path where the combination of variables and call-depth causes a peak usage.
On top of that on many architectures you have to also account for interrupt handler stack usage, which is generally less deterministic, but still has a "worst-case" of interrupt nesting and call depth. For this reasons ISR should generally be short, deterministic and use few variables.
Further is you have a multi-threaded environment such as an RTOS scheduler, each thread will have a separate stack. Typically these thread stacks are statically allocated arrays or dynamically (heap) allocated rather then defined by the linker script. The linker script normally defines only the system stack for the main() thread and interrupt/exception handlers.
Estimating the required stack usage is not always easy, but methods for doing so exist, using either static or dynamic analysis. Some examples (partly toolchain specific) at:
https://www.keil.com/support/man/docs/armclang_intro/armclang_intro_hla1474359990839.htm
https://www.keil.com/appnotes/docs/apnt_316.asp for example.
Many default linker scripts automatically expand the heap to fill all remaining space available after static data and stack allocation. One notable exception is the Keil ARM-MDK toolchain, which requires you to explicitly set a heap size.
A linker script may reserve memory regions for other purposes; especially if the memory is not homogeneous - for example on-chip MCU memory will typically be faster to access than external RAM, and may itself be subdivided on different busses so for example there might be a small segment useful for DMA on a separate buss so avoiding bus contention and yielding more deterministic execution.
The use of dynamic memory (heap) allocation in embedded systems needs to be carefully considered (or even banned as #Lundin would suggest, but not all embedded systems are subject to the same constraints). There are a number of issues to consider, including:
Memory constraints - many embedded systems have very small memories, you have to consider the response, safety and functionality of the system in the event an allocation request cannot be satisfied.
Memory leaks - your own, your colleagues on a team and third party code may not be as high a quality as you would hope; you need to be certain that the entire code base is free of memory leaks (failing to deallocate/free memory appropriately).
Determinism - most heap allocators take a variable and non-deterministic length of time to allocate memory, and even freeing can be non-deterministic if it involves block consolidation.
Heap corruption - an owner of an allocated block can easily under/overrun an allocation and corrupt adjacent memory. Typically such memory contains the heap-management meta-data for the block or other flocks, and the actual data for other allocations. Corrupting this data has non-deterministic effects on other code most often unrelated to the code that caused the error, such that it is common for failure to occur some-time after and in code unrelated to the event that caused the error. Such bugs hard hard to spot and resolve. If the heap meta-data is corrupted, often the error is detected when when further heap operations (alloc/free) fail.
Efficiency - Heap allocations mage by malloc() et-al are normally 8 byte aligned and have a block of pre-pended meta-data. Some implementations may add some "buffer" region to help detect overruns (especially in debug builds). As such making numerous allocations of very small blocks can be a remarkably inefficient use of a scarce resource.
Common strategies in embedded system to deal with these issues include:
Disallowing any dynamic memory allocations. This is common in safety critical and MISRA compliant applications for example.
Allowing dynamic memory allocation only during initialisation, and disallowing free(). This may seem counterintuitive, but can be useful where an application itself is "dynamic" and perhaps in some configurations not all tasks or device drivers etc. are started, where static allocation might lead to a great deal of unused/unusable memory.
Replacing the default heap with a deterministic memory allocation means such as a fixed-block allocator. Often these have a separate API rather then overriding malloc/free, so not then strictly a replacement; just a different solution.
Disallowing dynamic memory allocation in hard-real-time critical code. This addresses only the determinism issue, but in systems with large memories, and carefully design code, and perhaps MMU protection of allocations, there maybe mitigations for those.
Basically the stack size is picked depending on expected program size. For larger and more complex programs, you will want more stack size. It also depends on architecture, 32 bitters will generally consume slightly more memory than 8 and 16 bitters. The exact value is picked based on experience, though once you know exactly how much RAM your program actually uses, you can increase the stack size to use most of the unused memory.
It's also custom to map the stack so that it grows into a harmless area upon overflow, such as non-mapped memory or flash. Ideally so that you get a hardware exception, "software interrupt" or similar when stack overflow happens. You should never map it so that it grows into .data/.bss and overwrites other variables there.
As for the heap, the size is almost always picked to 0 and the segment is removed completely from the linker script. Heap allocation is banned in almost every microcontroller application.
Stack and heap are part of your program itself. They are based on how your program is structured and written How much memory it is taking up. rest free memory will work as Stack or Heap depending on how you set it up.
In Linker script you can define these values.

Using whole stack memory

Hello I heard that in c++ stack memory is being used for "normal" variables. How do I make stack full? I tried to use ton of arrays but it didnt help. How big is stack and where is it located?
The C++ language doesn't specify such thing as "stack". It is an implementation detail, and as such it doesn't make sense deliberating about unless we are discussing a particular implementation of C++.
But yes, in a typical C++ implementation, automatic variables are stored on the execution stack.
How do I make stack full?
Step 1: Use a language implementation that has limited stack size. This is quite common.
Step 2: Create an automatic variable that exceeds the limit. Or nest too many non-tail-recursive function calls. If you're lucky, the program may crash.
You wouldn't want stack to be exhausted in production use.
How big is stack
Depends on language implementation. It may even be configurable. The default is one to a few megabytes on common desktop/server systems. Less on embedded systems.
and where is it located?
Somewhere in memory where the language implementation has chosen.
The most important thing to take out of this is that the memory available for automatic variables is typically limited. As such:
Don't use large automatic variables.
Don't use recursion when asymptotic growth of depth is linear or worse.
Don't let user input affect the amount or size of automatic variables or depth of recursion without constraint.
Hello I heard that in c++ stack memory is being used for "normal" variables.
Local (automatic) variables declared in a function or main are allocated memory mostly on stack (or register) and are deallocated when the execution is done.
How do I make stack full? I tried to use ton of arrays but it didnt help.
Using ton of arrays, many recursive calls, parameter passing large structs that contain ton of arrays are ways. Another way might also be to reduce stack size: -Wl,--stack,number (for gcc)
How big is stack and where is it located?
It depends on platform, operating system so on. Standard does not determine any stack size. Its location is determined by OS before the program starts. OS allocates a memory for stack from virtual memory.

How to calculate remaining size of the stack? [duplicate]

I'm architecting a small software engine and I'd like to make expensive use of the stack for rapid iterations of large number sets. But then it occurred to me that this might be a bad idea since the stack isn't as large a memory store as the heap. But I am attracted to the stack's speed and lack of dynamic allocation coding practices.
Is there a way to find out how far I can push the stack on a given platform? I am looking mainly at mobile devices but the issue could come up on any platform.
On *nix, use getrlimit:
RLIMIT_STACK
The maximum size of the process stack, in bytes. Upon
reaching this limit, a SIGSEGV signal is generated. To handle
this signal, a process must employ an alternate signal stack
(sigaltstack(2)).
On Windows, use VirtualQuery:
For the first call, pass it the address of any value on the stack to
get the base address and size, in bytes, of the committed stack space.
On an x86 machine where the stack grows downwards, subtract the size
from the base address and VirtualQuery again: this will give you the
size of the space reserved for the stack (assuming you're not
precisely on the limit of stack size at the time). Summing the two
naturally gives you the total stack size.
There is no platform-independent method since stack size is left to the implementation and host system logically - on an embedded mini-SOC there are less resources to distribute than on a 128GB RAM server. You can however influence the stack size of a specific thread on all OS'es as well with API-specific calls.
A possible portable solution is to write an allocator yourself.
You do not have to make use of the process stack, just simulate it in the heap.
Allocate a large amount of memory in the beginning, and write a stack allocator on top of it to use it while allocating.
Google 'Allocator Requirements' for information on how to achieve it in C++.
I'm not sure if the term 'Stack Allocator' is canonical, but I mean that you have to put stack like restrictions on where the allocation or deallocation has to happen.
Since you said that your algorithm is suited to this pattern, I think it'd be easy.
In standard C++, definitely not. In a portable way, probably not. In a particular OS, sometimes. If nothing else, you could open your own executable size and inspect the headers of the executable file to see it's stacksize. [The next problem is of course "how much of the stack was used before this bit of code" - which can be difficult to determine].
If you run the code in a separate thread, many of the (low level) thread interfaces allow you to specify a stack (or stacksize), E.g Posix threads pthread_set_stacksize or MS _beginthread. Again, you don't know EXACTLY how much space has been used up before it gets to the actual thread code - but it's probably not a huge amount.
Of course, in an embedded system (e.g. mobile phone), the stacksize is typically quite small, 4K, 12K or 64KB is very much normal - sometimes even a lot smaller than that in some systems.
Another potential problem is that you can't really know how much space is ACTUALLY used on the stack - you can measure after the fact in a compiled system, and of course, if you have a stack local array of int array[25];, we can know it takes up at least 25 * sizeof(int) - but there may be padding, the compiler saves registers on the stack, etc, etc.
Edit, as an afterthought:
I also don't really see much benefit in having two code-paths:
if (enough_stack_space_for_something)
use_stack_based_algorithm();
else
use_heap_based_algorithm();
This would add a fair amount of extra overhead, and more code is generally not a good plan in an embedded/mobile system.
Edit2: Also, if allocating memory is a major part of the runtime, perhaps looking at why that is, for example block-creation of objects would help?
To expand on the answers already given about why there is no portable way to do this, the entire concept of an actual stack is not part of the standard. You could write a C or C++ runtime that doesn't use a stack at all other than the function call records (which might internally be a linked list or something else).
The stack is an implementation detail of a particular machine/OS/compiler. Hence any technique to access stack metrics will be specific to machine/OS/compiler.
While not an actual answer to your specific question (Niels covered that quite well) but as advice to your problem domain: just allocate a large chunk of memory in the heap. There's no reason aside from convenience that the "real" stack is any different. Highly recursive (non-tail-recursive) algorithms often need to do this to ensure that they have a virtually unbounded "stack." Scripting languages that want to ensure they give a runtime error/exception rather than crashing the host application also often do this. To be efficient about things, you can either implement a "split stack" (like a std::deque would give you) or you can just be sure to preallocate a stack big enough for your needs.
There's no standard way to do it from within the language. I'm not even aware of a documented extension that is able to query.
However some compilers have options to set the stack size. And platform may specify what it does when launching a process, and/or provide ways to set stack size of a new thread, maybe even manipulate existing one.
For small platforms it's usual to know the whole memory size, have all the data segments on one end, a set size arena for the heap (may be 0), and the rest is stack, approaching from the other side.

Is it possible to determine how much space is available on the stack?

I'm architecting a small software engine and I'd like to make expensive use of the stack for rapid iterations of large number sets. But then it occurred to me that this might be a bad idea since the stack isn't as large a memory store as the heap. But I am attracted to the stack's speed and lack of dynamic allocation coding practices.
Is there a way to find out how far I can push the stack on a given platform? I am looking mainly at mobile devices but the issue could come up on any platform.
On *nix, use getrlimit:
RLIMIT_STACK
The maximum size of the process stack, in bytes. Upon
reaching this limit, a SIGSEGV signal is generated. To handle
this signal, a process must employ an alternate signal stack
(sigaltstack(2)).
On Windows, use VirtualQuery:
For the first call, pass it the address of any value on the stack to
get the base address and size, in bytes, of the committed stack space.
On an x86 machine where the stack grows downwards, subtract the size
from the base address and VirtualQuery again: this will give you the
size of the space reserved for the stack (assuming you're not
precisely on the limit of stack size at the time). Summing the two
naturally gives you the total stack size.
There is no platform-independent method since stack size is left to the implementation and host system logically - on an embedded mini-SOC there are less resources to distribute than on a 128GB RAM server. You can however influence the stack size of a specific thread on all OS'es as well with API-specific calls.
A possible portable solution is to write an allocator yourself.
You do not have to make use of the process stack, just simulate it in the heap.
Allocate a large amount of memory in the beginning, and write a stack allocator on top of it to use it while allocating.
Google 'Allocator Requirements' for information on how to achieve it in C++.
I'm not sure if the term 'Stack Allocator' is canonical, but I mean that you have to put stack like restrictions on where the allocation or deallocation has to happen.
Since you said that your algorithm is suited to this pattern, I think it'd be easy.
In standard C++, definitely not. In a portable way, probably not. In a particular OS, sometimes. If nothing else, you could open your own executable size and inspect the headers of the executable file to see it's stacksize. [The next problem is of course "how much of the stack was used before this bit of code" - which can be difficult to determine].
If you run the code in a separate thread, many of the (low level) thread interfaces allow you to specify a stack (or stacksize), E.g Posix threads pthread_set_stacksize or MS _beginthread. Again, you don't know EXACTLY how much space has been used up before it gets to the actual thread code - but it's probably not a huge amount.
Of course, in an embedded system (e.g. mobile phone), the stacksize is typically quite small, 4K, 12K or 64KB is very much normal - sometimes even a lot smaller than that in some systems.
Another potential problem is that you can't really know how much space is ACTUALLY used on the stack - you can measure after the fact in a compiled system, and of course, if you have a stack local array of int array[25];, we can know it takes up at least 25 * sizeof(int) - but there may be padding, the compiler saves registers on the stack, etc, etc.
Edit, as an afterthought:
I also don't really see much benefit in having two code-paths:
if (enough_stack_space_for_something)
use_stack_based_algorithm();
else
use_heap_based_algorithm();
This would add a fair amount of extra overhead, and more code is generally not a good plan in an embedded/mobile system.
Edit2: Also, if allocating memory is a major part of the runtime, perhaps looking at why that is, for example block-creation of objects would help?
To expand on the answers already given about why there is no portable way to do this, the entire concept of an actual stack is not part of the standard. You could write a C or C++ runtime that doesn't use a stack at all other than the function call records (which might internally be a linked list or something else).
The stack is an implementation detail of a particular machine/OS/compiler. Hence any technique to access stack metrics will be specific to machine/OS/compiler.
While not an actual answer to your specific question (Niels covered that quite well) but as advice to your problem domain: just allocate a large chunk of memory in the heap. There's no reason aside from convenience that the "real" stack is any different. Highly recursive (non-tail-recursive) algorithms often need to do this to ensure that they have a virtually unbounded "stack." Scripting languages that want to ensure they give a runtime error/exception rather than crashing the host application also often do this. To be efficient about things, you can either implement a "split stack" (like a std::deque would give you) or you can just be sure to preallocate a stack big enough for your needs.
There's no standard way to do it from within the language. I'm not even aware of a documented extension that is able to query.
However some compilers have options to set the stack size. And platform may specify what it does when launching a process, and/or provide ways to set stack size of a new thread, maybe even manipulate existing one.
For small platforms it's usual to know the whole memory size, have all the data segments on one end, a set size arena for the heap (may be 0), and the rest is stack, approaching from the other side.

When do you worry about stack size?

When you are programming in a language that allows you to use automatic allocation for very large objects, when and how do you worry about stack size? Are there any rules of thumb for reasoning about stack size?
When you are programming in a language that allows you to use automatic allocation for very large objects ...
If I want to allocate a very large object, then instead of on the stack I might allocate it on the heap but wrapped in an auto_ptr (in which case it will be deallocated when it goes out of scope, just like a stack-resident object, but without worrying about stack size).
... when and how do you worry about stack size?
I use the stack conservatively out of habit (e.g. any object bigger than about 512 bytes is allocated on the heap instead), and I know how big the stack is (e.g. about a megabyte by default), and therefore know that I don't need to worry about it.
Are there any rules of thumb for reasoning about stack size?
Very big objects can blow the stack
Very deep recursion can blow the stack
The default stack size might be too big (take too much total memory) if there are many threads and if you're running on a limited-memory embedded device, in which case you might want to use an O/S API or linker option to reduce the size of the stack per thread.
You care about it on a microcontroller, where you often have to specify stack space explicitly (or you get whatever's left over after RAM gets used for static allocation + any RAM program space).
You start to worry about stack size when
someone on your team cunningly invents a recursive function that goes on and on and on...
you create a thread factory and suddenly need a tenfold of the stack that you used to need (each thread needs a stack => the more threads you have, the less free space remains for a given stack size)
If you're writing for a tiny little embedded platform, you worry about it all the time, but you also know exactly how big it is, and probably have some useful tools available to find the high-water mark of the stack.
If you aren't, then don't worry until your program crashes :)
Unless you are allocating seriously huge objects (many tens of KB), then it is never going to be a problem.
Note, however, that objects on the stack are, by definition, temporary. Constructing (and possibly destructing) large objects frequently may cause you a performance problem - so if you have a large object it probably should be persistent and heap-based for reasons other than stack size.
I never worry about it. If there is a stack overflow, I will soon know about it. Also, in C++ it is actually very hard to create very large objects on the stack. About the only way of doing it is:
struct S {
char big[1000000];
};
but use of std::string or std::vector makes that problem go away.
Shouldn't you be avoiding using the stack for allocating large objects in the first place? Use the heap, no?
my experience:
when you use recursive functions, take care of the stack size!!
When do you worry about stack size?
Never.
If you have stack size problems it means you're doing something else wrong and should fix that instead of worrying about stack size.
For instace:
Allocating unreasonably large structures on the stack - don't do it. allocate on the heap.
Having a ridiculously long recursion. I mean in the order of painting an image and iterating over the pixels using recursion. - find a better way to do it.
I worry about stack size on embedded systems when call stack goes very deep and each function allocates variables (on the stack). Generally, panic evolves when the system crashes unexpectedly due to variables changing on the stack (the stack overflows).
Played this game a lot on Symbian: when to use TBuf (a string with storage on the stack), and when to use HBufC (which allocate the string storage on the heap, like std::string, so you have to cope with Leave, and your function needs a means of failing).
At the time (maybe still, I'm not sure), Symbian threads had 4k of stack by default. To manipulate filenames, you need to count on using up to 512 bytes (256 characters).
As you can imagine, the received wisdom was "never put a filename on the stack". But actually, it turned out that you could get away with it a lot more often than you'd think. When we started running real programs (TM), such as games, we found that we needed way more than the default stack size anyway, and it wasn't due to filenames or other specific large objects, it was due to the complexity of the game code.
If using stack makes your code simpler, and as long as you're testing properly, and as long as you don't go completely overboard (don't have multiple levels of file-handling functions which all put a filename on the stack), then I'd say just try it. Especially if the function would need to be able to fail anyway, whether you're using stack or heap. If it goes wrong, you either double the stack size and be more careful in future, or you add another failure case to your function. Neither is the end of the world.
You usually can't really have large objects on the stack. They almost always use the heap internally so even if they are 'on the stack' their data members are not. Even an object with tons of data members will usually be under 64 bytes on the stack, the rest on the heap. The stack usually only becomes an issue these days when you have lots of threads and lots of recursion.
Only time really is when you are threading and have to define it yourself, when you are doing recursion or when for some reason you are allocating to the stack. Otherwise the compiler takes care of making sure you have enough stack space.
CreateThread by default only allocates 0x100000 bytes for the stack.
When the code you've written for a PC suddenly is supposed to run on a mobile phone
When the code you've ported to run on a mobile phone suddenly is supposed to run on a DSP
(And yes, these are real-life snafus.)
When deciding whether to allocate objects on the stack vs. the heap, there are also perf issues to be taken into consideration. Allocation of memory on the stack is very fast - it just involves moving the stack pointer, whereas dynamic allocation/deallocation using new/delete or malloc/free is fairly expensive, especially in multithreaded code that doesn't have a heap per thread. If you have a function that is being called in a tight loop, you might well err on the side of putting larger objects on the stack, keeping all of the multithreading caveats mentioned in other answers in mind, even if that means having to increase stack space, which most linkers will allow you to do.
In general, big allocations on the stack are bad for several reasons, not the least of which is that they can cause problems to remain well hidden for a long time.
The problem is that detecting stack overflow is not easy, and big allocations can subvert most of the commonly used methods.
If the processor has no memory management or memory protection unit, you have to be particularly careful. But event with some sort of MMU or MPU, the hardware can fail to detect a stack overflow. One common scheme, reserving a page below the stack to catch overflow, fails if the big stack object is bigger than a page. There just might be the stack of another thread sitting there and oops! you just created a very nasty, hard to find bug.
Unlimited recursion is usually easy to catch because the stack growth is usually small and will trigger the hardware protection.
I don't. Worrying about this things whilst writing programming normal things is either a case of premature pessimization or premature optimization. It's pretty hard to blow things up on a modern computer anyway.
I once wrote a CSV parser and whilst playing around with trying to get the best performance I was allocating hundereds of thousands of 1K buffers on the stack. The performance was stellar but the RAM went up to about 1GB from memory from normal 30MB. This was due to each cell in the CSV file had a fixed size 1K buffer.
Like everyone is saying unless you are doing recursion you do not have to worry about it.
You worry about it when you write a callback that will be called from threads spawned by a runtime you don't control (for example, MS RPC runtime) with stack size at the discretion of that runtime. Somehow like this.
I have had problems running out of stack space when:
A function accidentally calls itself
A function uses recursion to a deep level
A function allocates a large object on the stack, and there is a heap.
A function uses complicated templates and the compiler crashes
Provided I:
Allocate large objects on the heap (eg. using "auto_ptr foo = new Foo" instead of "Foo foo")
Use recursion judiciously.
I don't normally have any problems, so unfortunately don't know what good defaults should be.
You start to worry about stack size when:
when your program crashes - usually these bugs tend to be weird first time you see them :)
you are running an algorithm that uses recursion and has user input as one of its parameters (you don't know how much stack your algorithm could use)
you are running on embedded platforms (or platforms where each resource is important). Usually on these platforms stack is allocated before process is created - so a good estimation about stack requirements must be made
you are creating objects on the stack depending on some parameters modifiable by user input (see the sample below)
when the code executed in a thread/process/task is very big and there are a lot of function calls that go deep into the stack and generate a huge call-stack. This usually happens in big frameworks that combine a lot of triggers and event processing (a GUI framework; for example: receive_click-> find_clicked_window->
send_msg_to_window->
process_message->
process_click->
is_inside_region->
trigger_drawing->
write_to_file-> ... ). To put it short, you should worry about call-stack in case of complex code or unknown/binary 3rd party modules.
sample for modifiable input parameters:
in my_func(size_t input_param)
{
char buffer[input_param];
// or any other initialization of a big object on the stack
....
}
An advice:
you should mark the stack with some magic numbers (in case you allocate it) and check if those magic numbers will be modified (in that case the stack will not be enough for the task/thread/process and should probably be increased)