How to detect the amount of stack space available to my program? - c++

My Win32 C++ application acts as an RPC server - it has a set of functions for processing requests and RPC runtime creates a separate thread and invokes one of my functions in that thread.
In my function I have an std::auto_ptr which is used to control a heap-allocated char[] array of size known at compile time. It accidentially works when compiled with VC++ but it's undefined behaviour according to C++ standard and I'd like to get rid of it.
I have two options: std::vector or a stack-allocated array. Since I have no idea why there's a heap-allocated array I would like to consider replacing it with a stack-allocated one. The array is 10k elements and I can hypothetically face a stack overflow if the RPC runtime spawns a thread with a very small stack.
I would like to detect how much stack space is typilcally allocated to the thread and how much of it is available to my function (its callees certainly consume some of allocated space). How could I do that?

I don't know of any way of figuring out the stack size directly using the API if you don't have access to the CreateThread call or, if it's the main thread, looking into the EXE's default thread size in the PE header.
In your situation, I would allocate on the heap to be safe, even though a 10K array of small data is unlikely to max out the stack in non-recursive scenarios.
However, you can probe for the stack limit, if done carefully. The stack gets committed in 4K pages as you touch them (via guard pages) until you hit the limit, whereupon Windows will throw a stack overflow exception. There is still one page of stack left when the exception gets dispatched, so that the exception dispatching logic itself (including filter functions) can execute - but Windows throws the exception because it couldn't allocate another guard page. That means that the next stack overflow, or probe, will not result in a stack overflow exception, but an access violation. So to make probing work reliably (and in particular, repeatably) you need to decommit the memory allocated by the probing and reinstate a guard page.
This article on KB describes how to decommit stack memory and reinstate the guard page. It probes using recursion and 10,000-byte increments; the compiler by default implements its own stack probing for stack allocations of locals >4KB, so that the stack growth mechanism works correctly.

In windows, the default stack size is 1MB, so you are unlikely to stack overflow with only a 10k array. That said, I think that allocating so much memory on the stack is a bad practice, and you should try to favour allocating it dynamically, if you can. There is also the Scoped Array which is well defined for automatically managing arrays - unlike the vector class, it is non-copyable.

I second 1800 INFORMATION:
Allocate your data on heap if you can. It's safer (e.g. buffer overflows are harder to exploit) and more flexible when (not if) you need to extend your design later.
Use std::vector, boost::scoped_array or boost::shared_array.
I know it's not answering your question on detecting stack size but I think it's a logical answer to your problem.

I'm not sure what you're after.
If you just want typical numbers, then go ahead and try! Create a function with nested scopes, each of which allocates some more stack space. Output in each scope. See how far the thing gets.
If you want concrete numbers in a concrete situation, ask yourself what you would want to do once you have them? Branch into different implementations? This sounds like a maintenance problem the use of which should be very well justified. What do you expect to gain? Is this really worth such a hassle?
I agree that 10k usually shouldn't be a problem. So if your code isn't mission critical, go ahead and use boost::array (or std::tr1::array, if your std lib comes with it). Otherwise just use std::vector or, if you feel you must, boost::scoped_array (or std::tr1::scoped_array, if your std lib comes with it).

"std::auto_ptr which is used to
control a heap-allocated char[] ...
it's undefined behaviour according to
C++"
It is wrong assumption!
STL's auto_ptr has precise description of behavior. If you are worried about loosing control during sophisticated assignment review possibility to use reference-counter pattern to control destroying heap-allocated array.

Related

Is this a good place to allocate memory with alloca() [duplicate]

alloca() allocates memory on the stack rather than on the heap, as in the case of malloc(). So, when I return from the routine the memory is freed. So, actually this solves my problem of freeing up dynamically allocated memory. Freeing of memory allocated through malloc() is a major headache and if somehow missed leads to all sorts of memory problems.
Why is the use of alloca() discouraged in spite of the above features?
The answer is right there in the man page (at least on Linux):
RETURN VALUE
The alloca() function returns a pointer to the beginning of the
allocated space. If the
allocation causes
stack overflow, program behaviour is undefined.
Which isn't to say it should never be used. One of the OSS projects I work on uses it extensively, and as long as you're not abusing it (alloca'ing huge values), it's fine. Once you go past the "few hundred bytes" mark, it's time to use malloc and friends, instead. You may still get allocation failures, but at least you'll have some indication of the failure instead of just blowing out the stack.
One of the most memorable bugs I had was to do with an inline function that used alloca. It manifested itself as a stack overflow (because it allocates on the stack) at random points of the program's execution.
In the header file:
void DoSomething() {
wchar_t* pStr = alloca(100);
//......
}
In the implementation file:
void Process() {
for (i = 0; i < 1000000; i++) {
DoSomething();
}
}
So what happened was the compiler inlined DoSomething function and all the stack allocations were happening inside Process() function and thus blowing the stack up. In my defence (and I wasn't the one who found the issue; I had to go and cry to one of the senior developers when I couldn't fix it), it wasn't straight alloca, it was one of ATL string conversion macros.
So the lesson is - do not use alloca in functions that you think might be inlined.
Old question but nobody mentioned that it should be replaced by variable length arrays.
char arr[size];
instead of
char *arr=alloca(size);
It's in the standard C99 and existed as compiler extension in many compilers.
alloca() is very useful if you can't use a standard local variable because its size would need to be determined at runtime and you can
absolutely guarantee that the pointer you get from alloca() will NEVER be used after this function returns.
You can be fairly safe if you
do not return the pointer, or anything that contains it.
do not store the pointer in any structure allocated on the heap
do not let any other thread use the pointer
The real danger comes from the chance that someone else will violate these conditions sometime later. With that in mind it's great for passing buffers to functions that format text into them :)
As noted in this newsgroup posting, there are a few reasons why using alloca can be considered difficult and dangerous:
Not all compilers support alloca.
Some compilers interpret the intended behaviour of alloca differently, so portability is not guaranteed even between compilers that support it.
Some implementations are buggy.
One issue is that it isn't standard, although it's widely supported. Other things being equal, I'd always use a standard function rather than a common compiler extension.
still alloca use is discouraged, why?
I don't perceive such a consensus. Lots of strong pros; a few cons:
C99 provides variable length arrays, which would often be used preferentially as the notation's more consistent with fixed-length arrays and intuitive overall
many systems have less overall memory/address-space available for the stack than they do for the heap, which makes the program slightly more susceptible to memory exhaustion (through stack overflow): this may be seen as a good or a bad thing - one of the reasons the stack doesn't automatically grow the way heap does is to prevent out-of-control programs from having as much adverse impact on the entire machine
when used in a more local scope (such as a while or for loop) or in several scopes, the memory accumulates per iteration/scope and is not released until the function exits: this contrasts with normal variables defined in the scope of a control structure (e.g. for {int i = 0; i < 2; ++i) { X } would accumulate alloca-ed memory requested at X, but memory for a fixed-sized array would be recycled per iteration).
modern compilers typically do not inline functions that call alloca, but if you force them then the alloca will happen in the callers' context (i.e. the stack won't be released until the caller returns)
a long time ago alloca transitioned from a non-portable feature/hack to a Standardised extension, but some negative perception may persist
the lifetime is bound to the function scope, which may or may not suit the programmer better than malloc's explicit control
having to use malloc encourages thinking about the deallocation - if that's managed through a wrapper function (e.g. WonderfulObject_DestructorFree(ptr)), then the function provides a point for implementation clean up operations (like closing file descriptors, freeing internal pointers or doing some logging) without explicit changes to client code: sometimes it's a nice model to adopt consistently
in this pseudo-OO style of programming, it's natural to want something like WonderfulObject* p = WonderfulObject_AllocConstructor(); - that's possible when the "constructor" is a function returning malloc-ed memory (as the memory remains allocated after the function returns the value to be stored in p), but not if the "constructor" uses alloca
a macro version of WonderfulObject_AllocConstructor could achieve this, but "macros are evil" in that they can conflict with each other and non-macro code and create unintended substitutions and consequent difficult-to-diagnose problems
missing free operations can be detected by ValGrind, Purify etc. but missing "destructor" calls can't always be detected at all - one very tenuous benefit in terms of enforcement of intended usage; some alloca() implementations (such as GCC's) use an inlined macro for alloca(), so runtime substitution of a memory-usage diagnostic library isn't possible the way it is for malloc/realloc/free (e.g. electric fence)
some implementations have subtle issues: for example, from the Linux manpage:
On many systems alloca() cannot be used inside the list of arguments of a function call, because the stack space reserved by alloca() would appear on the stack in the middle of the space for the function arguments.
I know this question is tagged C, but as a C++ programmer I thought I'd use C++ to illustrate the potential utility of alloca: the code below (and here at ideone) creates a vector tracking differently sized polymorphic types that are stack allocated (with lifetime tied to function return) rather than heap allocated.
#include <alloca.h>
#include <iostream>
#include <vector>
struct Base
{
virtual ~Base() { }
virtual int to_int() const = 0;
};
struct Integer : Base
{
Integer(int n) : n_(n) { }
int to_int() const { return n_; }
int n_;
};
struct Double : Base
{
Double(double n) : n_(n) { }
int to_int() const { return -n_; }
double n_;
};
inline Base* factory(double d) __attribute__((always_inline));
inline Base* factory(double d)
{
if ((double)(int)d != d)
return new (alloca(sizeof(Double))) Double(d);
else
return new (alloca(sizeof(Integer))) Integer(d);
}
int main()
{
std::vector<Base*> numbers;
numbers.push_back(factory(29.3));
numbers.push_back(factory(29));
numbers.push_back(factory(7.1));
numbers.push_back(factory(2));
numbers.push_back(factory(231.0));
for (std::vector<Base*>::const_iterator i = numbers.begin();
i != numbers.end(); ++i)
{
std::cout << *i << ' ' << (*i)->to_int() << '\n';
(*i)->~Base(); // optionally / else Undefined Behaviour iff the
// program depends on side effects of destructor
}
}
Lots of interesting answers to this "old" question, even some relatively new answers, but I didn't find any that mention this....
When used properly and with care, consistent use of alloca()
(perhaps application-wide) to handle small variable-length allocations
(or C99 VLAs, where available) can lead to lower overall stack
growth than an otherwise equivalent implementation using oversized
local arrays of fixed length. So alloca() may be good for your stack if you use it carefully.
I found that quote in.... OK, I made that quote up. But really, think about it....
#j_random_hacker is very right in his comments under other answers: Avoiding the use of alloca() in favor of oversized local arrays does not make your program safer from stack overflows (unless your compiler is old enough to allow inlining of functions that use alloca() in which case you should upgrade, or unless you use alloca() inside loops, in which case you should... not use alloca() inside loops).
I've worked on desktop/server environments and embedded systems. A lot of embedded systems don't use a heap at all (they don't even link in support for it), for reasons that include the perception that dynamically allocated memory is evil due to the risks of memory leaks on an application that never ever reboots for years at a time, or the more reasonable justification that dynamic memory is dangerous because it can't be known for certain that an application will never fragment its heap to the point of false memory exhaustion. So embedded programmers are left with few alternatives.
alloca() (or VLAs) may be just the right tool for the job.
I've seen time & time again where a programmer makes a stack-allocated buffer "big enough to handle any possible case". In a deeply nested call tree, repeated use of that (anti-?)pattern leads to exaggerated stack use. (Imagine a call tree 20 levels deep, where at each level for different reasons, the function blindly over-allocates a buffer of 1024 bytes "just to be safe" when generally it will only use 16 or less of them, and only in very rare cases may use more.) An alternative is to use alloca() or VLAs and allocate only as much stack space as your function needs, to avoid unnecessarily burdening the stack. Hopefully when one function in the call tree needs a larger-than-normal allocation, others in the call tree are still using their normal small allocations, and the overall application stack usage is significantly less than if every function blindly over-allocated a local buffer.
But if you choose to use alloca()...
Based on other answers on this page, it seems that VLAs should be safe (they don't compound stack allocations if called from within a loop), but if you're using alloca(), be careful not to use it inside a loop, and make sure your function can't be inlined if there's any chance it might be called within another function's loop.
All of the other answers are correct. However, if the thing you want to alloc using alloca() is reasonably small, I think that it's a good technique that's faster and more convenient than using malloc() or otherwise.
In other words, alloca( 0x00ffffff ) is dangerous and likely to cause overflow, exactly as much as char hugeArray[ 0x00ffffff ]; is. Be cautious and reasonable and you'll be fine.
I don't think anyone has mentioned this: Use of alloca in a function will hinder or disable some optimizations that could otherwise be applied in the function, since the compiler cannot know the size of the function's stack frame.
For instance, a common optimization by C compilers is to eliminate use of the frame pointer within a function, frame accesses are made relative to the stack pointer instead; so there's one more register for general use. But if alloca is called within the function, the difference between sp and fp will be unknown for part of the function, so this optimization cannot be done.
Given the rarity of its use, and its shady status as a standard function, compiler designers quite possibly disable any optimization that might cause trouble with alloca, if would take more than a little effort to make it work with alloca.
UPDATE:
Since variable-length local arrays have been added to C, and since these present very similar code-generation issues to the compiler as alloca, I see that 'rarity of use and shady status' does not apply to the underlying mechanism; but I would still suspect that use of either alloca or VLA tends to compromise code generation within a function that uses them. I would welcome any feedback from compiler designers.
Everyone has already pointed out the big thing which is potential undefined behavior from a stack overflow but I should mention that the Windows environment has a great mechanism to catch this using structured exceptions (SEH) and guard pages. Since the stack only grows as needed, these guard pages reside in areas that are unallocated. If you allocate into them (by overflowing the stack) an exception is thrown.
You can catch this SEH exception and call _resetstkoflw to reset the stack and continue on your merry way. Its not ideal but it's another mechanism to at least know something has gone wrong when the stuff hits the fan. *nix might have something similar that I'm not aware of.
I recommend capping your max allocation size by wrapping alloca and tracking it internally. If you were really hardcore about it you could throw some scope sentries at the top of your function to track any alloca allocations in the function scope and sanity check this against the max amount allowed for your project.
Also, in addition to not allowing for memory leaks alloca does not cause memory fragmentation which is pretty important. I don't think alloca is bad practice if you use it intelligently, which is basically true for everything. :-)
One pitfall with alloca is that longjmp rewinds it.
That is to say, if you save a context with setjmp, then alloca some memory, then longjmp to the context, you may lose the alloca memory. The stack pointer is back where it was and so the memory is no longer reserved; if you call a function or do another alloca, you will clobber the original alloca.
To clarify, what I'm specifically referring to here is a situation whereby longjmp does not return out of the function where the alloca took place! Rather, a function saves context with setjmp; then allocates memory with alloca and finally a longjmp takes place to that context. That function's alloca memory is not all freed; just all the memory that it allocated since the setjmp. Of course, I'm speaking about an observed behavior; no such requirement is documented of any alloca that I know.
The focus in the documentation is usually on the concept that alloca memory is associated with a function activation, not with any block; that multiple invocations of alloca just grab more stack memory which is all released when the function terminates. Not so; the memory is actually associated with the procedure context. When the context is restored with longjmp, so is the prior alloca state. It's a consequence of the stack pointer register itself being used for allocation, and also (necessarily) saved and restored in the jmp_buf.
Incidentally, this, if it works that way, provides a plausible mechanism for deliberately freeing memory that was allocated with alloca.
I have run into this as the root cause of a bug.
Here's why:
char x;
char *y=malloc(1);
char *z=alloca(&x-y);
*z = 1;
Not that anyone would write this code, but the size argument you're passing to alloca almost certainly comes from some sort of input, which could maliciously aim to get your program to alloca something huge like that. After all, if the size isn't based on input or doesn't have the possibility to be large, why didn't you just declare a small, fixed-size local buffer?
Virtually all code using alloca and/or C99 vlas has serious bugs which will lead to crashes (if you're lucky) or privilege compromise (if you're not so lucky).
alloca () is nice and efficient... but it is also deeply broken.
broken scope behavior (function scope instead of block scope)
use inconsistant with malloc (alloca()-ted pointer shouldn't be freed, henceforth you have to track where you pointers are coming from to free() only those you got with malloc())
bad behavior when you also use inlining (scope sometimes goes to the caller function depending if callee is inlined or not).
no stack boundary check
undefined behavior in case of failure (does not return NULL like malloc... and what does failure means as it does not check stack boundaries anyway...)
not ansi standard
In most cases you can replace it using local variables and majorant size. If it's used for large objects, putting them on the heap is usually a safer idea.
If you really need it C you can use VLA (no vla in C++, too bad). They are much better than alloca() regarding scope behavior and consistency. As I see it VLA are a kind of alloca() made right.
Of course a local structure or array using a majorant of the needed space is still better, and if you don't have such majorant heap allocation using plain malloc() is probably sane.
I see no sane use case where you really really need either alloca() or VLA.
Processes only have a limited amount of stack space available - far less than the amount of memory available to malloc().
By using alloca() you dramatically increase your chances of getting a Stack Overflow error (if you're lucky, or an inexplicable crash if you're not).
A place where alloca() is especially dangerous than malloc() is the kernel - kernel of a typical operating system has a fixed sized stack space hard-coded into one of its header; it is not as flexible as the stack of an application. Making a call to alloca() with an unwarranted size may cause the kernel to crash.
Certain compilers warn usage of alloca() (and even VLAs for that matter) under certain options that ought to be turned on while compiling a kernel code - here, it is better to allocate memory in the heap that is not fixed by a hard-coded limit.
alloca is not worse than a variable-length array (VLA), but it's riskier than allocating on the heap.
On x86 (and most often on ARM), the stack grows downwards, and that brings with it a certain amount of risk: if you accidentally write beyond the block allocated with alloca (due to a buffer overflow for example), then you will overwrite the return address of your function, because that one is located "above" on the stack, i.e. after your allocated block.
The consequence of this is two-fold:
The program will crash spectacularly and it will be impossible to tell why or where it crashed (stack will most likely unwind to a random address due to the overwritten frame pointer).
It makes buffer overflow many times more dangerous, since a malicious user can craft a special payload which would be put on the stack and can therefore end up executed.
In contrast, if you write beyond a block on the heap you "just" get heap corruption. The program will probably terminate unexpectedly but will unwind the stack properly, thereby reducing the chance of malicious code execution.
Sadly the truly awesome alloca() is missing from the almost awesome tcc. Gcc does have alloca().
It sows the seed of its own destruction. With return as the destructor.
Like malloc() it returns an invalid pointer on fail which will segfault on modern systems with a MMU (and hopefully restart those without).
Unlike auto variables you can specify the size at run time.
It works well with recursion. You can use static variables to achieve something similar to tail recursion and use just a few others pass info to each iteration.
If you push too deep you are assured of a segfault (if you have an MMU).
Note that malloc() offers no more as it returns NULL (which will also segfault if assigned) when the system is out of memory. I.e. all you can do is bail or just try to assign it any way.
To use malloc() I use globals and assign them NULL. If the pointer is not NULL I free it before I use malloc().
You can also use realloc() as general case if want copy any existing data. You need to check pointer before to work out if you are going to copy or concatenate after the realloc().
3.2.5.2 Advantages of alloca
Actually, alloca is not guaranteed to use the stack.
Indeed, the gcc-2.95 implementation of alloca allocates memory from the heap using malloc itself. Also that implementation is buggy, it may lead to a memory leak and to some unexpected behavior if you call it inside a block with a further use of goto. Not, to say that you should never use it, but some times alloca leads to more overhead than it releaves frome.
In my opinion, alloca(), where available, should be used only in a constrained manner. Very much like the use of "goto", quite a large number of otherwise reasonable people have strong aversion not just to the use of, but also the existence of, alloca().
For embedded use, where the stack size is known and limits can be imposed via convention and analysis on the size of the allocation, and where the compiler cannot be upgraded to support C99+, use of alloca() is fine, and I've been known to use it.
When available, VLAs may have some advantages over alloca(): The compiler can generate stack limit checks that will catch out-of-bounds access when array style access is used (I don't know if any compilers do this, but it can be done), and analysis of the code can determine whether the array access expressions are properly bounded. Note that, in some programming environments, such as automotive, medical equipment, and avionics, this analysis has to be done even for fixed size arrays, both automatic (on the stack) and static allocation (global or local).
On architectures that store both data and return addresses/frame pointers on the stack (from what I know, that's all of them), any stack allocated variable can be dangerous because the address of the variable can be taken, and unchecked input values might permit all sorts of mischief.
Portability is less of a concern in the embedded space, however it is a good argument against use of alloca() outside of carefully controlled circumstances.
Outside of the embedded space, I've used alloca() mostly inside logging and formatting functions for efficiency, and in a non-recursive lexical scanner, where temporary structures (allocated using alloca() are created during tokenization and classification, then a persistent object (allocated via malloc()) is populated before the function returns. The use of alloca() for the smaller temporary structures greatly reduces fragmentation when the persistent object is allocated.
Why no one mentions this example introduced by GNU documention?
https://www.gnu.org/software/libc/manual/html_node/Advantages-of-Alloca.html
Nonlocal exits done with longjmp (see Non-Local Exits) automatically
free the space allocated with alloca when they exit through the
function that called alloca. This is the most important reason to use
alloca
Suggest reading order 1->2->3->1:
https://www.gnu.org/software/libc/manual/html_node/Advantages-of-Alloca.html
Intro and Details from Non-Local Exits
Alloca Example
I don't think that anybody has mentioned this, but alloca also has some serious security issues not necessarily present with malloc (though these issues also arise with any stack based arrays, dynamic or not). Since the memory is allocated on the stack, buffer overflows/underflows have much more serious consequences than with just malloc.
In particular, the return address for a function is stored on the stack. If this value gets corrupted, your code could be made to go to any executable region of memory. Compilers go to great lengths to make this difficult (in particular by randomizing address layout). However, this is clearly worse than just a stack overflow since the best case is a SEGFAULT if the return value is corrupted, but it could also start executing a random piece of memory or in the worst case some region of memory which compromises your program's security.
IMO the biggest risk with alloca and variable length arrays is it can fail in a very dangerous manner if the allocation size is unexpectedly large.
Allocations on the stack typically have no checking in user code.
Modern operating systems will generally put a guard page in place below* to detect stack overflow. When the stack overflows the kernel may either expand the stack or kill the process. Linux expanded this guard region in 2017 to be significantly large than a page, but it's still finite in size.
So as a rule it's best to avoid allocating more than a page on the stack before making use of the previous allocations. With alloca or variable length arrays it's easy to end up allowing an attacker to make arbitrary size allocations on the stack and hence skip over any guard page and access arbitrary memory.
* on most widespread systems today the stack grows downwards.
Most answers here largely miss the point: there's a reason why using _alloca() is potentially worse than merely storing large objects in the stack.
The main difference between automatic storage and _alloca() is that the latter suffers from an additional (serious) problem: the allocated block is not controlled by the compiler, so there's no way for the compiler to optimize or recycle it.
Compare:
while (condition) {
char buffer[0x100]; // Chill.
/* ... */
}
with:
while (condition) {
char* buffer = _alloca(0x100); // Bad!
/* ... */
}
The problem with the latter should be obvious.

Prefer heap over stack?

I recently dove into graphics programming and I noticed that many graphic engines (i.e Ogre), and many coders overall, prefer to initialize class instances dynamically. Here's an example from Ogre Basic Tutorial 1
//...
Ogre::Entity* ogreHead = mSceneMgr->createEntity("Head", "ogrehead.mesh");
Ogre::SceneNode* headNode = mSceneMgr->getRootSceneNode()->createChildSceneNode("HeadNode");
//...
ogreHead and headNode data members and methods are then referred to as ogreHead->blabla.
Why mess around with object pointers instead of plain objects?
BTW, I've also read somewhere that heap memory allocation is much slower than stack memory allocation.
Heap allocation is, inevitably much slower than stack allocation. More on "How much slower?" later. However, in many cases, the choice is "made for you", for several reasons:
Stack is limited. And if you run out, the application almost always gets terminated - there is no real good recovery, even printing an error message to say "I ran out of stack" may be hard...
Stack allocation "goes away" when you leave the function where the allocation was made.
Variability is much more well defined and easy to deal with. C++ does not cope with "variable length arrays" very well, and it's certainly not guaranteed to work in all compilers.
How much slower is heap over stack?
We'll get to "and does it matter" in a bit.
For a given allocation, stack allocation is simply a subtract operation [1], where at the very minimum new or malloc will be a function call, and probably even the most simple allocator will be several dozen instructions, in complex cases thousands [because memory has to be gotten from the OS, and cleared of it's previous content]. So anything from a 10x to "infinitely" slower, give or take. Exact numbers will depend on the exact system the code is running in, size of the allocation, and often "previous calls to the allocator" (e.g. a long list of "freed" allocations can make allocating a new object slower, because a good fit has to be searched for). And of course, unless you do the "ostrich" method of heap management, you also need to free the object and cope with "out of memory" which adds more code/time to the execution and complexity of the code.
With some reasonably clever programming, however, this can be mostly hidden - for example, allocating something that stays allocated for a long time, over the lifetime of the object, will be "nothing to worry about". Allocating objects from the heap for every pixel or every trianle in a 3D game would CLEARLY be a bad idea. But if the lifetime of the object is many frames or even the entire game, the time to allocate and free it will be nearly nothing.
Similarly, instead of doing 10000 individual object allocations, make one for 10000 objects. Object pool is one such concept.
Further, often the allocation time isn't where the time is spent. For example, reading a triangle list from a file from a disk will take much longer than allocating the space for the same triangle list - even if you allocate each single one!
To me, the rule is:
Does it fit nicely on the stack? Typically a few kilobytes is fine, many kilobytes not so good, and megabytes definitely not ok.
Is the number (e.g. array of objects) known, and the maximum such that you can fit it on the stack?
Do you know what the object will be? In other words abstract/polymorphic classes will probably need to be allocated on the heap.
Is its lifetime the same as the scope it is in? If not, use the heap (or stack further down, and pass it up the stack).
[1] Or add if stack is "grows towards high addresses" - I don't know of a machine which has such an architecture, but it is conceivable and I think some have been made. C certainly makes no promises as to which way the stack grows, or anything else about how the runtime stack works.
The scope of the stack is limited: it only exists within a function. Now, modern user-interfacing programs are usually event driven, which means that a function of yours is invoked to handle an event, and then that function must return in order for the program to continue running. So, if your event handler function wishes to create an object which will remain in existence after the function has returned, clearly, that object cannot be allocated on the stack of that function, because it will cease to exist as soon as the function returns. That's the main reason why we allocate things on the heap.
There are other reasons, too.
Sometimes, the exact size of a class is not known during compilation time. If the exact size of a class is not known, it cannot be created on the stack, because the compiler needs to have precise knowledge of how much space it needs to allocate for each item on the stack.
Furthermore, factory methods like whatever::createEntity() are often used. If you have to invoke a separate method to create an object for you, then that object cannot be created on the stack, for the reason explained in the first paragraph of this answer.
Why pointers instead of objects?
Because pointers help make things fast. If you pass an object by value, to another function, for example
shoot(Orge::Entity ogre)
instead of
shoot(Orge::Entity* ogrePtr)
If ogre isn't a pointer, what happens is you are passing the whole object into the function, rather than a reference. If the compiler doesn't optimize, you are left with an inefficient program. There are other reasons too, with the pointer, you can modify the passed in object (some argue references are better but that's a different discussion). Otherwise you would be spending too much time copying modified objects back and forth.
Why heap?
In some sense heap is a safer type of memory to access and allows you to safely reset/recover. If you call new and don't have memory, you can flag that as an error. If you are using the stack, there is actually no good way to know you have caused stackoverflow, without some other supervising program, at which point you are already in danger zone.
Depends on your application. Stack has local scope so if the object goes out of scope, it will deallocate memory for the object. If you need the object in some other function, then no real way to do that.
Applies more to OS, heap is comparatively much larger than stack, especially in multi-threaded application where each thread can have a limited stack size.

Does a stack frame really get pushed onto the stack when a function is called?

The way I've been taught for quite some time is that when I run a program, the first thing that immediately goes on the stack is a stack frame for the main method. And if I call on a function called foo() from within main, then a stack frame that is the size of the local variables ( automatic objects) and the parameters gets pushed onto the stack as well.
However, I've ran into a couple things that contradict this. And I'm hoping someone can clear up my confusion or explain why there really aren't any contradictions.
First contradiction:
In the book, "The C++ Programming Language" 3rd edition by Bjarne Stroustrup, it says on page 244, "A named automatic object is created each time its declaration is encountered in the execution of the program." If that's not clear enough, on the next page it says, "The constructor for a local variable is executed each time the thread of control passes through the declaration of the local variable."
Does this mean that the total memory for a stack frame is not allocated all at once, but rather block by block as the variable declarations are encountered ?
Also, does this mean that a stack frame may not be the same size every time if a variable declaration was not encountered due to an if statement ?
Second contradiction:
I've done a little coding in assembly ( ARM to be specific ), and the way my class was taught was that when a function was called, we immediately used the registers and never pushed any of the local variables of the current function onto the stack unless the algorithm was not possible to perform with the limited amount of registers. And even then, we only pushed the leftover variables.
Does this mean when a function is called, a stack frame may not be created at all ?
Does this also imply that a stack frame may differ in size due to the use of registers ?
Regarding your first question:
The creation of the object has nothing to do with the allocation of the data itself. To be more specific: the fact that the object has its reserved space on the stack doesn't imply anything about when its constructor is called.
Does this mean that the total memory for a stack frame is not allocated all at once, but rather block by block as the variable declarations are encountered?
This question is really compiler specific. A stack pointer is just a pointer, how it is used by the binary is up to the compiler. Actually some compilers may reserve the whole activation record, some may reserve just little by little, some other may reserve it dynamically according to the specific invocation and so on. This is even tightly coupled with optimization so that the compiler is able to arrange things in the way it thinks is better.
Does this mean when a function is called, a stack frame may not be created at all ? Does this also imply that a stack frame may differ in size due to the use of registers ?
Again, there is no strict answer here. Usually compilers rely on register allocation algorithms that are able to allocate registers in a way that minimizes "spilled" (on stack) variables. Of course, if you are writing in assembly by hand, you can decide to assign specific registers to specific variables throughout your program just because you know by their content how you want to make it work.
A compiler can't guess this, but it can see when a variable starts to be used or is no longer needed and arrange things in a way that minimize memory accesses (so stack size). For example, it could implement a policy such that some registers should be saved by the called, some others by the callee and assign or whatever.
Constructing a C++ object has very little to do with acquiring memory for the object. In fact, it would be more accurate to say "reserving memory", since in general, computers do not have little teams of RAM-builders which spring into action every time you ask for a new object. The memory is more or less permanent (although we could quibble about VM). Of course, the compiler has to arrange for its program to only use a particular range of memory for one thing at a time. That may (and probably does) require it to reserve a range of memory prior to the object's existence, and avoid using it for other objects until some time after the object's disappearance. For efficiency, the compiler may (even in the case of objects with dynamic storage duration) optimize reservations by reserving several blocks of memory at once, if it knows it will need them. In any event, when C++ talks about "constructing an object", it means just that: taking a range of memory with undefined contents, and doing what is necessary to create the representation of the object (and whatever else in the state of the world is implied by the creation of the object, which might not be limited to a particular hunk of memory.)
There is no requirement for stack frames to exist. There is no requirement for a stack to exist. That's all up to the compiler. Most compilers do generate code which uses a stack, of course, and good compilers will figure out when it is possible to abbreviate or even omit a stack frame. So, yes, frames may vary in size.
You are absolutely right, a stack frame is not required. Stack frames are a quick and dirty solution to the problem of managing the local space, easier to debug than to manage changes in the stack pointer during the course of the function. If there is a need for the stack within the function it is easier to just adjust the stack pointer on entry and restore it on return.
This is also not black and white, compilers are programs like any other program, and if you dont already know then you will come to realize that given any number of programmers you are going to get multiple solutions to the same problem. Even if the number of programmers is one that one person may choose to solve the problem over and over again until they are satisfied and/or for whatever reason may choose to release the various versions. The use of the stack is very common for local variables, it is really how you do it but that does not mean you have to use a stack frame created on entry and restored on return.
And as you have learned in your classes and is very easy to see through experiments (compile some simple functions, with various levels of optimization from no optimization to some) that for example gcc wont use the stack unless it has to. We are talking arm here right where the normal calling convention is register based (there is nothing that says the compiler author(s) have to follow that convention, it is possible to use stack based on arm if a compiler choose to do that). Processors where the normal convention is stack based since the code is already dealing with the stack it may choose to use a stack frame anyway. It is likely that in those cases the stack based convention is used because the processor lacks general purpose registers and relies more on the stack than other processors with more registers, which means that processor likely needs the stack often not just for the calling convention but for most of the local storage.

No stack allocation whole program compilation?

If you write an app that is:
Single threaded
Has no cycles in call graph
Doesn't use alloca or VLAs
Can modern whole program optimizing compilers optimize away all stack allocation (e.g. GCC, MSVC, ICC)? It seems like in those circumstances it should be able to allocate all possible stack space statically. By 'whole program' I mean the compiler has access to /all/ source code (no possiblity of dlopen'ing things at runtime, etc.).
If you can guarantee the conditions you stated, then yes: it would be possible to effectively have the stack be completely statically allocated. Each function would have a block of stack memory.
However, will actual compilers do it? No.
It gains absolutely nothing to do so. Indeed, it may gain less than nothing. Often times, much of the working stack is in the cache, so modifications to it are pretty cheap. If the stack were in static memory, then the only time any particular function's "stack" memory would be cached would be if you had recently called that function. Using a real stack, you're more likely to be working in the cache.
Furthermore, giving each function a block of stack memory can easily make your program's static memory usage much larger than it needs to be. The stack is a fixed-size construct; no matter how many functions you have, the stack takes up a certain size. If you have 100,000 functions, and each function takes up 64 bytes of space, then your static "stack" must take up ~6.4MB of space.
Why? You're never going to be using most of that memory at any one time. The program would run just fine in a 1MB or even 512KB stack; why take up 6x that memory for nothing?
So it is both not a performance optimization and can bloats your program's memory.
This is a comment that's too long to be a comment:
Note that while all stack allocations may be theoretically optimized away, more may be allocated than necessary. This is not what the OP was asking, but it could be interesting to consider. Finding the minimum-sized allocation required would be equivalent to solving the halting problem. Imagine a program structured as:
<do 'something'>
<call last thing which happens to require more
stack space than everything else in 'something'>
You only need the additional stack space if <do 'something'> "halts".
You can also imagine other variations where optimizing becomes arbitrarily hard. For example, your program could simply evaluate a 3SAT expression with user input and do something depending on that -- but that 3SAT expression may or may not have any value that results in true.
Perhaps there is a more trivial case: The user may simply never enter input that requires more stack space for processing.
It's possible for a compiler to do this, but it would be such a specific optimization that it probably wouldn't.
If you had a program that was completely inlined, you would take care of the overhead of setting up stack frames for function calls.
However, if you wanted to also get rid of stack allocations for local variables, the compiler has to transform those local variables into global variables. No compiler I know of does that, and on some platforms it takes extra instructions to reference a global variable compared to a local variable (as the address must be loaded with two instructions rather than one). Plus, since referencing a stack variable is such a common operation, it's usually encoded into a smaller instruction.
Unless you add "doesn't use any external libraries" to your list, any calls to external functions will require stack setup because they would have been compiled expecting the calling code to pass its parameters in a particular way, most likely on the stack. Additionally such libraries would almost certainly have to adjust the stack themslves for their own locals.
Additionally, depending on your precise application, even if you know that the stack could be allocated statically, it may be very hard for the compiler to know that there aren't any callbacks into your code, etc, that would cause a need for stack allocation.
I just can't see a case where the compile would attempt this optimization because allocating stack space is trivially fast already (a couple register manipulations I believe)

Calling the constructor of a large array of objects on a stack

I'm modifying some C++ source code and I've noticed the author really went out of their way to allocate everything on the stack. Most likely for the deallocation benefits (are there any performance benefits as well??).
I want to keep the same consistency but I need to create a large array of objects and something like:
Object os[1000] = {Object(arg), Object(arg), ....};
isn't going to cut it. Searching around it seems like a way around this is just:
vector<Object> os(1000, Object(arg));
This still allocates on the heap but deallocates like a stack (from what I've read in other posts). I'm just wondering are there any other options because this just seems like a syntax issue. Perhaps a clever #define people know.
The stack shouldn't be used for large blocks of memory. You simply have to pay the higher price of heap allocation in exchange for the benefit of accessing more memory. Another option is declaring an array with static storage duration, but that has other drawbacks (not re-entrant, not thread-safe). Everything is a tradeoff.
In any case, when allocating complex objects, the cost of calling 1000 constructors will dwarf the time spent in the allocator. Just use std::vector unless you have profiler data that shows a performance problem.
Yes, there are other options. You can use something like alloca. This will get you stack allocation and automatic free, but not automatic construction or destruction. You would need to use placement new and explicit invocation of the destructors.
Yes, there may be a performance advantage, but you're also begging to blow the stack, and this pattern is not exception safe like the vector solution would be (that is, if the object your allocating has a non-trivial destructor).
Allocating large amounts of data on the stack is, generally speaking, a bad idea. The stack on most operating systems is a scratch space and fairly limited in size. Allocating a large amount of stack space for objects can quickly consume all your available stack space, resulting in a segfault or other exception when something attempts to allocate just one more thing on the stack (for instance, a return address for a function call).
As far as other options, you have a few.. std::vector as you've already noticed, along with boost::array are to such examples.
This ought to work:
Object os[1000];
os[0] = Object(args);
std::copy(os, os + 999, os + 1);
This creates the array, initializes one object, then loops through, initializing each element with the last one.
Of course, you probably shouldn't use this. It seems like a bad idea even if it works, and even if Object os[1000] doesn't cause you problems.