I need to instantiate a char[16384] buffer before calling a c function. After the function returns I will read some parts of it and discard it.
Is it ok to allocate it on the stack or should I use the heap?
EDIT: I'll add some information. The code will run on several platforms, from PC to iPhone, where I guess the stack space will not be so big, but I have no idea about that.
It's hard to give a definitive yes or no to this question because the answer is highly dependent on your environment and at what point in the program the function which allocates the memory is invoked.
Personally though if I saw this in a code review I'd raise a red flag. That's a lot of memory to be using for a stack based buffer. It may work today in the very specific place you're using it but what about tomorrow when you're called with a much bigger stack below you? Or when the customer hits a scenario you didn't consider?
But like I said it's scenario dependent and it may be just fine for your specific scenario. There's simply not enough detail in your question to say yes or no
Unless you're programming for embedded systems, code which might be running from a thread other than the main thread, or code which is called recursively, I would say 16k is well within the reasonable size you can allocate on the stack.
As for threads, if you're using POSIX threads and want your program to be portable, you can use the pthread_attr_setstacksize interface to specify the amount of stack space your thread needs, and then as long as you know the calling patterns and over-estimate by a good margin in choosing the size, you can be sure it will be safe.
Depends entirely on your OS and process definitions. Better allocate it from the heap by malloc and check the result (which may fail). Allocating failure on stack may cause stack corruption, which you won't be able to catch at run time.
If you're using C++ (since the question has that tag) use a vector<char> buffer(16384) - that way you get automatic deallocation, but the large buffer gets allocated on the heap.
The only potential drawback is that the buffer will be default initialized. There's a small chance that that might be something you can't afford (though it'll probably be of no consequence).
I'd say it depends upon the intended lifetime of the buffer.
If the intention is for the buffer to exist only in the scope of the creating function and the functions it calls, then stack-based is an excellent mechanism to avoid memory leaks.
If the intention is for the buffer to be long-lived, outliving the scope of the creating function, then I would malloc(3) the buffer.
My pthread_attr_setstacksize(3) says to look in pthread_create(3) for details on default stack size; sadly, all I have on my system is the POSIX-provided pthread_create(3posix) manpage, which lacks these details; but my recollection is that default stack size is so large that most people who want to know how to set their stack size want to shrink it, so they can run more threads in a given amount of memory. :)
if your code is not used by multiple threads AND it is not re-entrant... then I would just do one malloc at program initialization for that buffer. You will have less of a worry about architecture issues surrounding stack size. You definitely don't want to do a malloc/free per call.
Related
At what struct size should I consider allocating on the heap / free store using new keyword (or any other method of dynamic allocation) instead of on the stack?
10 bytes? 20 bytes? 200 bytes? 2KB? 2MB? Never?
Even if I wanted to pass it around by pointer, I could still take a reference from the stack variable. I understand that a stack variable will disappear at the end of the scope and dynamically allocated variables will not. I can deal with that either way, but so far I've not found any guidance for when to allocate dynamically. Sure, avoid stack overflow by not putting too much on the stack... but how much is too much?
Any guidance would be appreciated.
To actually answer the question, you'll need to know:
How big the stack is. This is often configurable at a compile-time, but may be capped by the target platform.
What is on the stack already. This knowledge is obtainable either by using deterministic call graph or by making decision actively, based on the current value of the stack pointer.
Without all of the above, any passive decision would be a gamble. Which also means that it's a gamble by default — indeed, in most cases we have to trust the compiler developers to understand how much of a stack space a "typical" program would need, and that our views on "typical" programs do align well and often.
In the long term, just like with any optimization problem, put your bets on measuring the overall performance and testing edge cases that may cause the stack overflow.
(Note. I probably should have searched before answering, but this question is essentially a duplicate of How much stack usage is too much?, nevertheless, here is my opinion on it.)
If you intend to keep a large buffer around for an extended period of time, then you should allocate it on the heap.
If you are in a recursive function, then allocating large buffers on the stack can quickly lead to problems.
Personally, I would keep buffers below ~4KiB on the stack and allocate larger buffers on the heap, unless you have a good overview of your program, and more specifically, how and where your functions are called.
That being said, if you constantly create and destroy buffers, consider putting them on the stack.
(If you are working on an embedded system, then that's a very different story.)
I'm architecting a small software engine and I'd like to make expensive use of the stack for rapid iterations of large number sets. But then it occurred to me that this might be a bad idea since the stack isn't as large a memory store as the heap. But I am attracted to the stack's speed and lack of dynamic allocation coding practices.
Is there a way to find out how far I can push the stack on a given platform? I am looking mainly at mobile devices but the issue could come up on any platform.
On *nix, use getrlimit:
RLIMIT_STACK
The maximum size of the process stack, in bytes. Upon
reaching this limit, a SIGSEGV signal is generated. To handle
this signal, a process must employ an alternate signal stack
(sigaltstack(2)).
On Windows, use VirtualQuery:
For the first call, pass it the address of any value on the stack to
get the base address and size, in bytes, of the committed stack space.
On an x86 machine where the stack grows downwards, subtract the size
from the base address and VirtualQuery again: this will give you the
size of the space reserved for the stack (assuming you're not
precisely on the limit of stack size at the time). Summing the two
naturally gives you the total stack size.
There is no platform-independent method since stack size is left to the implementation and host system logically - on an embedded mini-SOC there are less resources to distribute than on a 128GB RAM server. You can however influence the stack size of a specific thread on all OS'es as well with API-specific calls.
A possible portable solution is to write an allocator yourself.
You do not have to make use of the process stack, just simulate it in the heap.
Allocate a large amount of memory in the beginning, and write a stack allocator on top of it to use it while allocating.
Google 'Allocator Requirements' for information on how to achieve it in C++.
I'm not sure if the term 'Stack Allocator' is canonical, but I mean that you have to put stack like restrictions on where the allocation or deallocation has to happen.
Since you said that your algorithm is suited to this pattern, I think it'd be easy.
In standard C++, definitely not. In a portable way, probably not. In a particular OS, sometimes. If nothing else, you could open your own executable size and inspect the headers of the executable file to see it's stacksize. [The next problem is of course "how much of the stack was used before this bit of code" - which can be difficult to determine].
If you run the code in a separate thread, many of the (low level) thread interfaces allow you to specify a stack (or stacksize), E.g Posix threads pthread_set_stacksize or MS _beginthread. Again, you don't know EXACTLY how much space has been used up before it gets to the actual thread code - but it's probably not a huge amount.
Of course, in an embedded system (e.g. mobile phone), the stacksize is typically quite small, 4K, 12K or 64KB is very much normal - sometimes even a lot smaller than that in some systems.
Another potential problem is that you can't really know how much space is ACTUALLY used on the stack - you can measure after the fact in a compiled system, and of course, if you have a stack local array of int array[25];, we can know it takes up at least 25 * sizeof(int) - but there may be padding, the compiler saves registers on the stack, etc, etc.
Edit, as an afterthought:
I also don't really see much benefit in having two code-paths:
if (enough_stack_space_for_something)
use_stack_based_algorithm();
else
use_heap_based_algorithm();
This would add a fair amount of extra overhead, and more code is generally not a good plan in an embedded/mobile system.
Edit2: Also, if allocating memory is a major part of the runtime, perhaps looking at why that is, for example block-creation of objects would help?
To expand on the answers already given about why there is no portable way to do this, the entire concept of an actual stack is not part of the standard. You could write a C or C++ runtime that doesn't use a stack at all other than the function call records (which might internally be a linked list or something else).
The stack is an implementation detail of a particular machine/OS/compiler. Hence any technique to access stack metrics will be specific to machine/OS/compiler.
While not an actual answer to your specific question (Niels covered that quite well) but as advice to your problem domain: just allocate a large chunk of memory in the heap. There's no reason aside from convenience that the "real" stack is any different. Highly recursive (non-tail-recursive) algorithms often need to do this to ensure that they have a virtually unbounded "stack." Scripting languages that want to ensure they give a runtime error/exception rather than crashing the host application also often do this. To be efficient about things, you can either implement a "split stack" (like a std::deque would give you) or you can just be sure to preallocate a stack big enough for your needs.
There's no standard way to do it from within the language. I'm not even aware of a documented extension that is able to query.
However some compilers have options to set the stack size. And platform may specify what it does when launching a process, and/or provide ways to set stack size of a new thread, maybe even manipulate existing one.
For small platforms it's usual to know the whole memory size, have all the data segments on one end, a set size arena for the heap (may be 0), and the rest is stack, approaching from the other side.
I'm architecting a small software engine and I'd like to make expensive use of the stack for rapid iterations of large number sets. But then it occurred to me that this might be a bad idea since the stack isn't as large a memory store as the heap. But I am attracted to the stack's speed and lack of dynamic allocation coding practices.
Is there a way to find out how far I can push the stack on a given platform? I am looking mainly at mobile devices but the issue could come up on any platform.
On *nix, use getrlimit:
RLIMIT_STACK
The maximum size of the process stack, in bytes. Upon
reaching this limit, a SIGSEGV signal is generated. To handle
this signal, a process must employ an alternate signal stack
(sigaltstack(2)).
On Windows, use VirtualQuery:
For the first call, pass it the address of any value on the stack to
get the base address and size, in bytes, of the committed stack space.
On an x86 machine where the stack grows downwards, subtract the size
from the base address and VirtualQuery again: this will give you the
size of the space reserved for the stack (assuming you're not
precisely on the limit of stack size at the time). Summing the two
naturally gives you the total stack size.
There is no platform-independent method since stack size is left to the implementation and host system logically - on an embedded mini-SOC there are less resources to distribute than on a 128GB RAM server. You can however influence the stack size of a specific thread on all OS'es as well with API-specific calls.
A possible portable solution is to write an allocator yourself.
You do not have to make use of the process stack, just simulate it in the heap.
Allocate a large amount of memory in the beginning, and write a stack allocator on top of it to use it while allocating.
Google 'Allocator Requirements' for information on how to achieve it in C++.
I'm not sure if the term 'Stack Allocator' is canonical, but I mean that you have to put stack like restrictions on where the allocation or deallocation has to happen.
Since you said that your algorithm is suited to this pattern, I think it'd be easy.
In standard C++, definitely not. In a portable way, probably not. In a particular OS, sometimes. If nothing else, you could open your own executable size and inspect the headers of the executable file to see it's stacksize. [The next problem is of course "how much of the stack was used before this bit of code" - which can be difficult to determine].
If you run the code in a separate thread, many of the (low level) thread interfaces allow you to specify a stack (or stacksize), E.g Posix threads pthread_set_stacksize or MS _beginthread. Again, you don't know EXACTLY how much space has been used up before it gets to the actual thread code - but it's probably not a huge amount.
Of course, in an embedded system (e.g. mobile phone), the stacksize is typically quite small, 4K, 12K or 64KB is very much normal - sometimes even a lot smaller than that in some systems.
Another potential problem is that you can't really know how much space is ACTUALLY used on the stack - you can measure after the fact in a compiled system, and of course, if you have a stack local array of int array[25];, we can know it takes up at least 25 * sizeof(int) - but there may be padding, the compiler saves registers on the stack, etc, etc.
Edit, as an afterthought:
I also don't really see much benefit in having two code-paths:
if (enough_stack_space_for_something)
use_stack_based_algorithm();
else
use_heap_based_algorithm();
This would add a fair amount of extra overhead, and more code is generally not a good plan in an embedded/mobile system.
Edit2: Also, if allocating memory is a major part of the runtime, perhaps looking at why that is, for example block-creation of objects would help?
To expand on the answers already given about why there is no portable way to do this, the entire concept of an actual stack is not part of the standard. You could write a C or C++ runtime that doesn't use a stack at all other than the function call records (which might internally be a linked list or something else).
The stack is an implementation detail of a particular machine/OS/compiler. Hence any technique to access stack metrics will be specific to machine/OS/compiler.
While not an actual answer to your specific question (Niels covered that quite well) but as advice to your problem domain: just allocate a large chunk of memory in the heap. There's no reason aside from convenience that the "real" stack is any different. Highly recursive (non-tail-recursive) algorithms often need to do this to ensure that they have a virtually unbounded "stack." Scripting languages that want to ensure they give a runtime error/exception rather than crashing the host application also often do this. To be efficient about things, you can either implement a "split stack" (like a std::deque would give you) or you can just be sure to preallocate a stack big enough for your needs.
There's no standard way to do it from within the language. I'm not even aware of a documented extension that is able to query.
However some compilers have options to set the stack size. And platform may specify what it does when launching a process, and/or provide ways to set stack size of a new thread, maybe even manipulate existing one.
For small platforms it's usual to know the whole memory size, have all the data segments on one end, a set size arena for the heap (may be 0), and the rest is stack, approaching from the other side.
My environment is gcc, C++, Linux.
When my application does some data calculation, it may need a "large" (may be a few MBs) number of memory to store data, calculation results and other things. I got some code using kind of new, delete to finish this. Since there is no ownership outside some function scope, i think all these memory can be allocated in stack.
The problem is, the default stack size(8192Kb in my system) may not be enough. I may need to change stack size for these stack allocation. Morever, if the calculation needs more data in future, i may need to extend stack size again.
So is it an option to extend stack size? Since it cannot be allocated for specific functions, how will it impact on the whole app? Is it REALLY an improvement to allocate data on stack instead of on heap?
You bring up a controversial question that does not have a direct answer. There are pros and cons on each side. In particular:
Memory on the heap is more easy to control: you can check the return value or allow throwing exceptions. When the stack overflows, your thread is simply unloaded with a good change that debugger will not show anything meaningful.
On the contrary, stack allocations happen automatically, and you do not need to do anything specific. I always favor this simplicity.
There is nothing fundamentally wrong in allocating large amounts of data on the stack. At the end of the day any type of memory is finally a memory. This is means that the total amount of required memory is what really matters. Where this memory is allocated is less important. When there is enough memory for your application to work, there is no difference where the memory is allocated. It can be static for example.
Different systems have different rules of allocation. This means that final decision may depend on the actual system.
While it's true that stack allocations are more efficient (more apparent in multi-threaded programs), but if the usage pattern in your case is "allocate a big chunk of memory, process the data, deallocate it", then there won't be much of improvement.
Instead rewrite the code to use RAII, e.g. std::vector or std::unique_ptr so there won't be explicit buggy deletes.
If you use Linux, you can change the stack size with the ulimit command. However, I think the memory which allocated from the heap also is good for you.
When you are programming in a language that allows you to use automatic allocation for very large objects, when and how do you worry about stack size? Are there any rules of thumb for reasoning about stack size?
When you are programming in a language that allows you to use automatic allocation for very large objects ...
If I want to allocate a very large object, then instead of on the stack I might allocate it on the heap but wrapped in an auto_ptr (in which case it will be deallocated when it goes out of scope, just like a stack-resident object, but without worrying about stack size).
... when and how do you worry about stack size?
I use the stack conservatively out of habit (e.g. any object bigger than about 512 bytes is allocated on the heap instead), and I know how big the stack is (e.g. about a megabyte by default), and therefore know that I don't need to worry about it.
Are there any rules of thumb for reasoning about stack size?
Very big objects can blow the stack
Very deep recursion can blow the stack
The default stack size might be too big (take too much total memory) if there are many threads and if you're running on a limited-memory embedded device, in which case you might want to use an O/S API or linker option to reduce the size of the stack per thread.
You care about it on a microcontroller, where you often have to specify stack space explicitly (or you get whatever's left over after RAM gets used for static allocation + any RAM program space).
You start to worry about stack size when
someone on your team cunningly invents a recursive function that goes on and on and on...
you create a thread factory and suddenly need a tenfold of the stack that you used to need (each thread needs a stack => the more threads you have, the less free space remains for a given stack size)
If you're writing for a tiny little embedded platform, you worry about it all the time, but you also know exactly how big it is, and probably have some useful tools available to find the high-water mark of the stack.
If you aren't, then don't worry until your program crashes :)
Unless you are allocating seriously huge objects (many tens of KB), then it is never going to be a problem.
Note, however, that objects on the stack are, by definition, temporary. Constructing (and possibly destructing) large objects frequently may cause you a performance problem - so if you have a large object it probably should be persistent and heap-based for reasons other than stack size.
I never worry about it. If there is a stack overflow, I will soon know about it. Also, in C++ it is actually very hard to create very large objects on the stack. About the only way of doing it is:
struct S {
char big[1000000];
};
but use of std::string or std::vector makes that problem go away.
Shouldn't you be avoiding using the stack for allocating large objects in the first place? Use the heap, no?
my experience:
when you use recursive functions, take care of the stack size!!
When do you worry about stack size?
Never.
If you have stack size problems it means you're doing something else wrong and should fix that instead of worrying about stack size.
For instace:
Allocating unreasonably large structures on the stack - don't do it. allocate on the heap.
Having a ridiculously long recursion. I mean in the order of painting an image and iterating over the pixels using recursion. - find a better way to do it.
I worry about stack size on embedded systems when call stack goes very deep and each function allocates variables (on the stack). Generally, panic evolves when the system crashes unexpectedly due to variables changing on the stack (the stack overflows).
Played this game a lot on Symbian: when to use TBuf (a string with storage on the stack), and when to use HBufC (which allocate the string storage on the heap, like std::string, so you have to cope with Leave, and your function needs a means of failing).
At the time (maybe still, I'm not sure), Symbian threads had 4k of stack by default. To manipulate filenames, you need to count on using up to 512 bytes (256 characters).
As you can imagine, the received wisdom was "never put a filename on the stack". But actually, it turned out that you could get away with it a lot more often than you'd think. When we started running real programs (TM), such as games, we found that we needed way more than the default stack size anyway, and it wasn't due to filenames or other specific large objects, it was due to the complexity of the game code.
If using stack makes your code simpler, and as long as you're testing properly, and as long as you don't go completely overboard (don't have multiple levels of file-handling functions which all put a filename on the stack), then I'd say just try it. Especially if the function would need to be able to fail anyway, whether you're using stack or heap. If it goes wrong, you either double the stack size and be more careful in future, or you add another failure case to your function. Neither is the end of the world.
You usually can't really have large objects on the stack. They almost always use the heap internally so even if they are 'on the stack' their data members are not. Even an object with tons of data members will usually be under 64 bytes on the stack, the rest on the heap. The stack usually only becomes an issue these days when you have lots of threads and lots of recursion.
Only time really is when you are threading and have to define it yourself, when you are doing recursion or when for some reason you are allocating to the stack. Otherwise the compiler takes care of making sure you have enough stack space.
CreateThread by default only allocates 0x100000 bytes for the stack.
When the code you've written for a PC suddenly is supposed to run on a mobile phone
When the code you've ported to run on a mobile phone suddenly is supposed to run on a DSP
(And yes, these are real-life snafus.)
When deciding whether to allocate objects on the stack vs. the heap, there are also perf issues to be taken into consideration. Allocation of memory on the stack is very fast - it just involves moving the stack pointer, whereas dynamic allocation/deallocation using new/delete or malloc/free is fairly expensive, especially in multithreaded code that doesn't have a heap per thread. If you have a function that is being called in a tight loop, you might well err on the side of putting larger objects on the stack, keeping all of the multithreading caveats mentioned in other answers in mind, even if that means having to increase stack space, which most linkers will allow you to do.
In general, big allocations on the stack are bad for several reasons, not the least of which is that they can cause problems to remain well hidden for a long time.
The problem is that detecting stack overflow is not easy, and big allocations can subvert most of the commonly used methods.
If the processor has no memory management or memory protection unit, you have to be particularly careful. But event with some sort of MMU or MPU, the hardware can fail to detect a stack overflow. One common scheme, reserving a page below the stack to catch overflow, fails if the big stack object is bigger than a page. There just might be the stack of another thread sitting there and oops! you just created a very nasty, hard to find bug.
Unlimited recursion is usually easy to catch because the stack growth is usually small and will trigger the hardware protection.
I don't. Worrying about this things whilst writing programming normal things is either a case of premature pessimization or premature optimization. It's pretty hard to blow things up on a modern computer anyway.
I once wrote a CSV parser and whilst playing around with trying to get the best performance I was allocating hundereds of thousands of 1K buffers on the stack. The performance was stellar but the RAM went up to about 1GB from memory from normal 30MB. This was due to each cell in the CSV file had a fixed size 1K buffer.
Like everyone is saying unless you are doing recursion you do not have to worry about it.
You worry about it when you write a callback that will be called from threads spawned by a runtime you don't control (for example, MS RPC runtime) with stack size at the discretion of that runtime. Somehow like this.
I have had problems running out of stack space when:
A function accidentally calls itself
A function uses recursion to a deep level
A function allocates a large object on the stack, and there is a heap.
A function uses complicated templates and the compiler crashes
Provided I:
Allocate large objects on the heap (eg. using "auto_ptr foo = new Foo" instead of "Foo foo")
Use recursion judiciously.
I don't normally have any problems, so unfortunately don't know what good defaults should be.
You start to worry about stack size when:
when your program crashes - usually these bugs tend to be weird first time you see them :)
you are running an algorithm that uses recursion and has user input as one of its parameters (you don't know how much stack your algorithm could use)
you are running on embedded platforms (or platforms where each resource is important). Usually on these platforms stack is allocated before process is created - so a good estimation about stack requirements must be made
you are creating objects on the stack depending on some parameters modifiable by user input (see the sample below)
when the code executed in a thread/process/task is very big and there are a lot of function calls that go deep into the stack and generate a huge call-stack. This usually happens in big frameworks that combine a lot of triggers and event processing (a GUI framework; for example: receive_click-> find_clicked_window->
send_msg_to_window->
process_message->
process_click->
is_inside_region->
trigger_drawing->
write_to_file-> ... ). To put it short, you should worry about call-stack in case of complex code or unknown/binary 3rd party modules.
sample for modifiable input parameters:
in my_func(size_t input_param)
{
char buffer[input_param];
// or any other initialization of a big object on the stack
....
}
An advice:
you should mark the stack with some magic numbers (in case you allocate it) and check if those magic numbers will be modified (in that case the stack will not be enough for the task/thread/process and should probably be increased)