Allocating on heap vs allocating on stack in recursive functions - c++

When I'm defining a recursive function, is it better/safer to allocate local variables on heap, and then clean them up before the function returns, than to allocate them on stack. The stack size on embedded systems is very limited, and there is a danger of stack overflow, when the recursion runs too deep.

The answer depends on your application domain and the specifics of your platform. "Embedded system" is a fuzzy term. A big oscilloscope running MS Windows or Linux is on one end of the spectrum. Most of it can be programmed like a normal PC. They should not fail, but if they do, one simply reboots them.
On the other end of the spectrum are controllers in safety-critical fields. Consider these Siemens switches which must react under all circumstances within milliseconds. Failure is not an option. They probably do not run Windows. Available resources are limited. Programming rules and procedures are much different here.
Now let's examine the options you have, recursion (or not!) with dynamic or automatic memory allocation.
Performance-wise the stack is much faster, so it is preferred. Dynamic allocation also involves some memory overhead for book keeping which is big if the data units are small. And there can be issues with memory fragmentation which cannot occur with automatic memory (although the scenarios which lead to fragmentation -- different lifetimes of objects -- are probably not directly solvable without dynamic allocation).
But it is true that on some systems the stack size is (much) smaller than the heap size; you must read about the memory layout on your system. The oscilloscope will have lots of memory and big stacks; the power switch will not.
If you are concerned about running out of memory I would follow Christian's advice to avoid recursion altogether and instead use a loop. Iteration may just keep the memory usage flat. Besides, recursion always uses stack space, e.g. for the return address and -value. The idea to allocate "local" variables dynamically would only make sense for larger data structures because you would still have to hold pointers to the data as automatic variables, which use up space on the stack, too (and would increase overall memory foot print).
In general, on systems with limited resources it is important to limit the maximum resource use by your program. Time is also a resource; dynamic allocation makes real time near impossible.
The application domain dictates the safety requirements. In safety-critical fields (pacemaker!) the program must not fail at all. That ideal is impossible to achieve unless the program is trivial, but great efforts are made to come close. In other fields a program may fail in a defined way under circumstances it cannot handle, but it must not fail silently or in undefined ways (for example, by overwriting data). For example, instead of allocating unknown amounts of data dynamically one would just have a pre-defined array of fixed size for data and use the elements in the array instead, with bound checks.

When I'm defining a recursive function, is it better/safer to allocate local variables on heap, and then clean them up before the function returns, than to allocate them on stack.
You have both the C and C++ tags. Its a valid question for both, but I can only comment on C++.
In C++, its better to use the heap even though its slightly less efficient. That's be cause new can fail if you run out of heap memory. In the case of failure, new will throw an exception. However, running out of stack space does not cause an exception, and its one of the reasons alloca is frowned upon in C++.

I think generally speaking you should avoid recursion in embedded systems altogether, as every time the function is called the return address is pushed to the stack. This could cause an unexpected overflow. Try switching to a loop.
Back to your question though, mallocing will be slower but safer. If there is no heap space then malloc will return an error and you can safely clean up the memory. This comes at a large cost in speed as malloc can be quite slow.
If you know how many iterations you expect ahead of time, you have the option of mallocing an array of the variables that you need. That way you will only malloc once which saves time, and won't risk unexpectedly filling up the heap or stack. Also you will only be left with one variable to free.

Related

C++ Should I allocate large structs on the heap or the stack?

At what struct size should I consider allocating on the heap / free store using new keyword (or any other method of dynamic allocation) instead of on the stack?
10 bytes? 20 bytes? 200 bytes? 2KB? 2MB? Never?
Even if I wanted to pass it around by pointer, I could still take a reference from the stack variable. I understand that a stack variable will disappear at the end of the scope and dynamically allocated variables will not. I can deal with that either way, but so far I've not found any guidance for when to allocate dynamically. Sure, avoid stack overflow by not putting too much on the stack... but how much is too much?
Any guidance would be appreciated.
To actually answer the question, you'll need to know:
How big the stack is. This is often configurable at a compile-time, but may be capped by the target platform.
What is on the stack already. This knowledge is obtainable either by using deterministic call graph or by making decision actively, based on the current value of the stack pointer.
Without all of the above, any passive decision would be a gamble. Which also means that it's a gamble by default — indeed, in most cases we have to trust the compiler developers to understand how much of a stack space a "typical" program would need, and that our views on "typical" programs do align well and often.
In the long term, just like with any optimization problem, put your bets on measuring the overall performance and testing edge cases that may cause the stack overflow.
(Note. I probably should have searched before answering, but this question is essentially a duplicate of How much stack usage is too much?, nevertheless, here is my opinion on it.)
If you intend to keep a large buffer around for an extended period of time, then you should allocate it on the heap.
If you are in a recursive function, then allocating large buffers on the stack can quickly lead to problems.
Personally, I would keep buffers below ~4KiB on the stack and allocate larger buffers on the heap, unless you have a good overview of your program, and more specifically, how and where your functions are called.
That being said, if you constantly create and destroy buffers, consider putting them on the stack.
(If you are working on an embedded system, then that's a very different story.)

Max amount one should allocate on the stack

I've been searching Stack Overflow for a guideline on the max amount of memory one should allocate on the stack.
I see best practices for stack vs. heap allocation but nothing has numbers on a guideline on how much should be allocated on the stack and how much should be allocated on the heap.
Any ideas/numbers I can use as a guideline? When should I allocate on the stack vs. the heap and how much is too much?
In a typical case, the stack is limited to around 1-4 megabytes. To leave space for other parts of the code, you typically want to limit a single stack frame to no more than a few tens of kilobytes or so if possible. When/if recursion gets (or might get) involved, you typically want to limit it quite a bit more than that.
The answer here depends on the environment in which the code is running. On a small embedded system, the whole stack may be a few kilobytes. On a large system running on a desktop, the stack is typically in the megabytes.
For desktop/big embedded system, a few kilobytes is typically fine. For small embedded systems, that may not work well at all.
On the other hand, excessive use of the heap can lead to excessive overhead when calling new/delete frequently. So in a typical situation, you shouldn't use heap allocation for very small objects - unless necessary from other design criteria (e.g. you need a pointer to store permanently somewhere, and stack won't work for that as you are returning from the current function before the object has been finished with).
Of course, it's the overall design that matters. If you have a very simple application, with a few functions, none of which are recursive, it could be fine to allocate a few hundred kilobytes in main or a level above. On the other hand, if you are making a library for generic use, using more than a few kilobytes will probably not make you popular with the developers using the library. And if the library is being developed to run on low memory systems (in a washing machine, old style mobile phone, etc) then using more than a couple of hundred bytes is probably a bad idea.
Allocate on the stack as small as possible. Use the heap for datasets or else the stack allocation will carry through the scope's life, possibly thrashing the cache.

Is it a good idea to extend the size of the stack for large local data processing?

My environment is gcc, C++, Linux.
When my application does some data calculation, it may need a "large" (may be a few MBs) number of memory to store data, calculation results and other things. I got some code using kind of new, delete to finish this. Since there is no ownership outside some function scope, i think all these memory can be allocated in stack.
The problem is, the default stack size(8192Kb in my system) may not be enough. I may need to change stack size for these stack allocation. Morever, if the calculation needs more data in future, i may need to extend stack size again.
So is it an option to extend stack size? Since it cannot be allocated for specific functions, how will it impact on the whole app? Is it REALLY an improvement to allocate data on stack instead of on heap?
You bring up a controversial question that does not have a direct answer. There are pros and cons on each side. In particular:
Memory on the heap is more easy to control: you can check the return value or allow throwing exceptions. When the stack overflows, your thread is simply unloaded with a good change that debugger will not show anything meaningful.
On the contrary, stack allocations happen automatically, and you do not need to do anything specific. I always favor this simplicity.
There is nothing fundamentally wrong in allocating large amounts of data on the stack. At the end of the day any type of memory is finally a memory. This is means that the total amount of required memory is what really matters. Where this memory is allocated is less important. When there is enough memory for your application to work, there is no difference where the memory is allocated. It can be static for example.
Different systems have different rules of allocation. This means that final decision may depend on the actual system.
While it's true that stack allocations are more efficient (more apparent in multi-threaded programs), but if the usage pattern in your case is "allocate a big chunk of memory, process the data, deallocate it", then there won't be much of improvement.
Instead rewrite the code to use RAII, e.g. std::vector or std::unique_ptr so there won't be explicit buggy deletes.
If you use Linux, you can change the stack size with the ulimit command. However, I think the memory which allocated from the heap also is good for you.

When do you worry about stack size?

When you are programming in a language that allows you to use automatic allocation for very large objects, when and how do you worry about stack size? Are there any rules of thumb for reasoning about stack size?
When you are programming in a language that allows you to use automatic allocation for very large objects ...
If I want to allocate a very large object, then instead of on the stack I might allocate it on the heap but wrapped in an auto_ptr (in which case it will be deallocated when it goes out of scope, just like a stack-resident object, but without worrying about stack size).
... when and how do you worry about stack size?
I use the stack conservatively out of habit (e.g. any object bigger than about 512 bytes is allocated on the heap instead), and I know how big the stack is (e.g. about a megabyte by default), and therefore know that I don't need to worry about it.
Are there any rules of thumb for reasoning about stack size?
Very big objects can blow the stack
Very deep recursion can blow the stack
The default stack size might be too big (take too much total memory) if there are many threads and if you're running on a limited-memory embedded device, in which case you might want to use an O/S API or linker option to reduce the size of the stack per thread.
You care about it on a microcontroller, where you often have to specify stack space explicitly (or you get whatever's left over after RAM gets used for static allocation + any RAM program space).
You start to worry about stack size when
someone on your team cunningly invents a recursive function that goes on and on and on...
you create a thread factory and suddenly need a tenfold of the stack that you used to need (each thread needs a stack => the more threads you have, the less free space remains for a given stack size)
If you're writing for a tiny little embedded platform, you worry about it all the time, but you also know exactly how big it is, and probably have some useful tools available to find the high-water mark of the stack.
If you aren't, then don't worry until your program crashes :)
Unless you are allocating seriously huge objects (many tens of KB), then it is never going to be a problem.
Note, however, that objects on the stack are, by definition, temporary. Constructing (and possibly destructing) large objects frequently may cause you a performance problem - so if you have a large object it probably should be persistent and heap-based for reasons other than stack size.
I never worry about it. If there is a stack overflow, I will soon know about it. Also, in C++ it is actually very hard to create very large objects on the stack. About the only way of doing it is:
struct S {
char big[1000000];
};
but use of std::string or std::vector makes that problem go away.
Shouldn't you be avoiding using the stack for allocating large objects in the first place? Use the heap, no?
my experience:
when you use recursive functions, take care of the stack size!!
When do you worry about stack size?
Never.
If you have stack size problems it means you're doing something else wrong and should fix that instead of worrying about stack size.
For instace:
Allocating unreasonably large structures on the stack - don't do it. allocate on the heap.
Having a ridiculously long recursion. I mean in the order of painting an image and iterating over the pixels using recursion. - find a better way to do it.
I worry about stack size on embedded systems when call stack goes very deep and each function allocates variables (on the stack). Generally, panic evolves when the system crashes unexpectedly due to variables changing on the stack (the stack overflows).
Played this game a lot on Symbian: when to use TBuf (a string with storage on the stack), and when to use HBufC (which allocate the string storage on the heap, like std::string, so you have to cope with Leave, and your function needs a means of failing).
At the time (maybe still, I'm not sure), Symbian threads had 4k of stack by default. To manipulate filenames, you need to count on using up to 512 bytes (256 characters).
As you can imagine, the received wisdom was "never put a filename on the stack". But actually, it turned out that you could get away with it a lot more often than you'd think. When we started running real programs (TM), such as games, we found that we needed way more than the default stack size anyway, and it wasn't due to filenames or other specific large objects, it was due to the complexity of the game code.
If using stack makes your code simpler, and as long as you're testing properly, and as long as you don't go completely overboard (don't have multiple levels of file-handling functions which all put a filename on the stack), then I'd say just try it. Especially if the function would need to be able to fail anyway, whether you're using stack or heap. If it goes wrong, you either double the stack size and be more careful in future, or you add another failure case to your function. Neither is the end of the world.
You usually can't really have large objects on the stack. They almost always use the heap internally so even if they are 'on the stack' their data members are not. Even an object with tons of data members will usually be under 64 bytes on the stack, the rest on the heap. The stack usually only becomes an issue these days when you have lots of threads and lots of recursion.
Only time really is when you are threading and have to define it yourself, when you are doing recursion or when for some reason you are allocating to the stack. Otherwise the compiler takes care of making sure you have enough stack space.
CreateThread by default only allocates 0x100000 bytes for the stack.
When the code you've written for a PC suddenly is supposed to run on a mobile phone
When the code you've ported to run on a mobile phone suddenly is supposed to run on a DSP
(And yes, these are real-life snafus.)
When deciding whether to allocate objects on the stack vs. the heap, there are also perf issues to be taken into consideration. Allocation of memory on the stack is very fast - it just involves moving the stack pointer, whereas dynamic allocation/deallocation using new/delete or malloc/free is fairly expensive, especially in multithreaded code that doesn't have a heap per thread. If you have a function that is being called in a tight loop, you might well err on the side of putting larger objects on the stack, keeping all of the multithreading caveats mentioned in other answers in mind, even if that means having to increase stack space, which most linkers will allow you to do.
In general, big allocations on the stack are bad for several reasons, not the least of which is that they can cause problems to remain well hidden for a long time.
The problem is that detecting stack overflow is not easy, and big allocations can subvert most of the commonly used methods.
If the processor has no memory management or memory protection unit, you have to be particularly careful. But event with some sort of MMU or MPU, the hardware can fail to detect a stack overflow. One common scheme, reserving a page below the stack to catch overflow, fails if the big stack object is bigger than a page. There just might be the stack of another thread sitting there and oops! you just created a very nasty, hard to find bug.
Unlimited recursion is usually easy to catch because the stack growth is usually small and will trigger the hardware protection.
I don't. Worrying about this things whilst writing programming normal things is either a case of premature pessimization or premature optimization. It's pretty hard to blow things up on a modern computer anyway.
I once wrote a CSV parser and whilst playing around with trying to get the best performance I was allocating hundereds of thousands of 1K buffers on the stack. The performance was stellar but the RAM went up to about 1GB from memory from normal 30MB. This was due to each cell in the CSV file had a fixed size 1K buffer.
Like everyone is saying unless you are doing recursion you do not have to worry about it.
You worry about it when you write a callback that will be called from threads spawned by a runtime you don't control (for example, MS RPC runtime) with stack size at the discretion of that runtime. Somehow like this.
I have had problems running out of stack space when:
A function accidentally calls itself
A function uses recursion to a deep level
A function allocates a large object on the stack, and there is a heap.
A function uses complicated templates and the compiler crashes
Provided I:
Allocate large objects on the heap (eg. using "auto_ptr foo = new Foo" instead of "Foo foo")
Use recursion judiciously.
I don't normally have any problems, so unfortunately don't know what good defaults should be.
You start to worry about stack size when:
when your program crashes - usually these bugs tend to be weird first time you see them :)
you are running an algorithm that uses recursion and has user input as one of its parameters (you don't know how much stack your algorithm could use)
you are running on embedded platforms (or platforms where each resource is important). Usually on these platforms stack is allocated before process is created - so a good estimation about stack requirements must be made
you are creating objects on the stack depending on some parameters modifiable by user input (see the sample below)
when the code executed in a thread/process/task is very big and there are a lot of function calls that go deep into the stack and generate a huge call-stack. This usually happens in big frameworks that combine a lot of triggers and event processing (a GUI framework; for example: receive_click-> find_clicked_window->
send_msg_to_window->
process_message->
process_click->
is_inside_region->
trigger_drawing->
write_to_file-> ... ). To put it short, you should worry about call-stack in case of complex code or unknown/binary 3rd party modules.
sample for modifiable input parameters:
in my_func(size_t input_param)
{
char buffer[input_param];
// or any other initialization of a big object on the stack
....
}
An advice:
you should mark the stack with some magic numbers (in case you allocate it) and check if those magic numbers will be modified (in that case the stack will not be enough for the task/thread/process and should probably be increased)

Any reason to overload global new and delete?

Unless you're programming parts of an OS or an embedded system are there any reasons to do so? I can imagine that for some particular classes that are created and destroyed frequently overloading memory management functions or introducing a pool of objects might lower the overhead, but doing these things globally?
Addition
I've just found a bug in an overloaded delete function - memory wasn't always freed. And that was in a not-so memory critical application. Also, disabling these overloads decreases performance by ~0.5% only.
We overload the global new and delete operators where I work for many reasons:
pooling all small allocations -- decreases overhead, decreases fragmentation, can increase performance for small-alloc-heavy apps
framing allocations with a known lifetime -- ignore all the frees until the very end of this period, then free all of them together (admittedly we do this more with local operator overloads than global)
alignment adjustment -- to cacheline boundaries, etc
alloc fill -- helping to expose usage of uninitialized variables
free fill -- helping to expose usage of previously deleted memory
delayed free -- increasing the effectiveness of free fill, occasionally increasing performance
sentinels or fenceposts -- helping to expose buffer overruns, underruns, and the occasional wild pointer
redirecting allocations -- to account for NUMA, special memory areas, or even to keep separate systems separate in memory (for e.g. embedded scripting languages or DSLs)
garbage collection or cleanup -- again useful for those embedded scripting languages
heap verification -- you can walk through the heap data structure every N allocs/frees to make sure everything looks ok
accounting, including leak tracking and usage snapshots/statistics (stacks, allocation ages, etc)
The idea of new/delete accounting is really flexible and powerful: you can, for example, record the entire callstack for the active thread whenever an alloc occurs, and aggregate statistics about that. You could ship the stack info over the network if you don't have space to keep it locally for whatever reason. The types of info you can gather here are only limited by your imagination (and performance, of course).
We use global overloads because it's convenient to hang lots of common debugging functionality there, as well as make sweeping improvements across the entire app, based on the statistics we gather from those same overloads.
We still do use custom allocators for individual types too; in many cases the speedup or capabilities you can get by providing custom allocators for e.g. a single point-of-use of an STL data structure far exceeds the general speedup you can get from the global overloads.
Take a look at some of the allocators and debugging systems that are out there for C/C++ and you'll rapidly come up with these and other ideas:
valgrind
electricfence
dmalloc
dlmalloc
Application Verifier
Insure++
BoundsChecker
...and many others... (the gamedev industry is a great place to look)
(One old but seminal book is Writing Solid Code, which discusses many of the reasons you might want to provide custom allocators in C, most of which are still very relevant.)
Obviously if you can use any of these fine tools you will want to do so rather than rolling your own.
There are situations in which it is faster, easier, less of a business/legal hassle, nothing's available for your platform yet, or just more instructive: dig in and write a global overload.
The most common reason to overload new and delete are simply to check for memory leaks, and memory usage stats. Note that "memory leak" is usually generalized to memory errors. You can check for things such as double deletes and buffer overruns.
The uses after that are usually memory-allocation schemes, such as garbage collection, and pooling.
All other cases are just specific things, mentioned in other answers (logging to disk, kernel use).
In addition to the other important uses mentioned here, like memory tagging, it's also the only way to force all allocations in your app to go through fixed-block allocation, which has enormous implications for performance and fragmentation.
For example, you may have a series of memory pools with fixed block sizes. Overriding global new lets you direct all 61-byte allocations to, say, the pool with 64-byte blocks, all 768-1024 byte allocs to the the 1024b-block pool, all those above that to the 2048 byte block pool, and anything larger than 8kb to the general ragged heap.
Because fixed block allocators are much faster and less prone to fragmentation than allocating willy-nilly from the heap, this lets you force even crappy 3d party code to allocate from your pools and not poop all over the address space.
This is done often in systems which are time- and space-critical, such as games. 280Z28, Meeh, and Dan Olson have described why.
UnrealEngine3 overloads global new and delete as part of its core memory management system. There are multiple allocators that provide different features (profiling, performance, etc.) and they need all allocations to go through it.
Edit: For my own code, I would only ever do it as a last resort. And by that I mean I would almost positively never use it. But my personal projects are obviously much smaller/very different requirements.
Some realtime systems overload them to avoid them being used after init..
Overloading new & delete makes it possible to add a tag to your memory allocations. I tag allocations per system or control or by middleware. I can view, at runtime, how much each uses. Maybe I want to see the usage of a parser separated from the UI or how much a piece of middleware is really using!
You can also use it to put guard bands around the allocated memory. If/when your app crashes you can take a look at the address. If you see the contents as "0xABCDABCD" (or whatever you choose as guard) you are accessing memory you don't own.
Perhaps after calling delete you can fill this space with a similarly recognizable pattern.
I believe VisualStudio does something similar in debug. Doesn't it fill uninitialized memory with 0xCDCDCDCD?
Finally, if you have fragmentation issues you could use it to redirect to a block allocator? I am not sure how often this is really a problem.
You need to overload them when the call to new and delete doesn't work in your environment.
For example, in kernel programming, the default new and delete don't work as they rely on user mode library to allocate memory.
From a practical standpoint it may just be better to override malloc on a system library level, since operator new will probably be calling it anyway.
On linux, you can put your own version of malloc in place of the system one, as in this example here:
http://developers.sun.com/solaris/articles/lib_interposers.html
In that article, they are trying to collect performance statistics. But you may also detect memory leaks if you also override free.
Since you are doing this in a shared library with LD_PRELOAD, you don't even need to recompile your application.
I've seen it done in a system that for 'security'* reasons was required to write over all memory it used on de-allocation. The approach was to allocate an extra few bytes at the start of each block of memory which would contain the size of the overall block which would then be overwritten with zeros on delete.
This had a number of problems as you can probably imagine but it did work (mostly) and saved the team from reviewing every single memory allocation in a reasonably large, existing application.
Certainly not saying that it is a good use but it is probably one of the more imaginative ones out there...
* sadly it wasn't so much about actual security as the appearance of security...
Photoshop plugins written in C++ should override operator new so that they obtain memory via Photoshop.
I've done it with memory mapped files so that data written to the memory is automatically also saved to disk.
It's also used to return memory at a specific physical address if you have memory mapped IO devices, or sometimes if you need to allocate a certain block of contiguous memory.
But 99% of the time it's done as a debugging feature to log how often, where, when memory is being allocated and released.
It's actually pretty common for games to allocate one huge chunk of memory from the system and then provide custom allocators via overloaded new and delete. One big reason is that consoles have a fixed memory size, making both leaks and fragmentation large problems.
Usually (at least on a closed platform) the default heap operations come with a lack of control and a lack of introspection. For many applications this doesn't matter, but for games to run stably in fixed-memory situations the added control and introspection are both extremely important.
It can be a nice trick for your application to be able to respond to low memory conditions by something else than a random crash. To do this your new can be a simple proxy to the default new that catches its failures, frees up some stuff and tries again.
The simplest technique is to reserve a blank block of memory at start-up time for that very purpose. You may also have some cache you can tap into - the idea is the same.
When the first allocation failure kicks in, you still have time to warn your user about the low memory conditions ("I'll be able to survive a little longer, but you may want to save your work and close some other applications"), save your state to disk, switch to survival mode, or whatever else makes sense in your context.
The most common use case is probably leak checking.
Another use case is when you have specific requirements for memory allocation in your environment which are not satisfied by the standard library you are using, like, for instance, you need to guarantee that memory allocation is lock free in a multi threaded environment.
As many have already stated this is usually done in performance critical applications, or to be able to control memory alignment or track your memory. Games frequently use custom memory managers, especially when targeting specific platforms/consoles.
Here is a pretty good blog post about one way of doing this and some reasoning.
Overloaded new operator also enables programmers to squeeze some extra performance out of their programs. For example, In a class, to speed up the allocation of new nodes, a list of deleted nodes is maintained so that their memory can be reused when new nodes are allocated.In this case, the overloaded delete operator will add nodes to the list of deleted nodes and the overloaded new operator will allocate memory from this list rather than from the heap to speedup memory allocation. Memory from the heap can be used when the list of deleted nodes is empty.