Is this a good place to allocate memory with alloca() [duplicate]

Is this a good place to allocate memory with alloca() [duplicate] - c++

alloca() allocates memory on the stack rather than on the heap, as in the case of malloc(). So, when I return from the routine the memory is freed. So, actually this solves my problem of freeing up dynamically allocated memory. Freeing of memory allocated through malloc() is a major headache and if somehow missed leads to all sorts of memory problems.
Why is the use of alloca() discouraged in spite of the above features?

The answer is right there in the man page (at least on Linux):
RETURN VALUE
The alloca() function returns a pointer to the beginning of the
allocated space. If the
allocation causes
stack overflow, program behaviour is undefined.
Which isn't to say it should never be used. One of the OSS projects I work on uses it extensively, and as long as you're not abusing it (alloca'ing huge values), it's fine. Once you go past the "few hundred bytes" mark, it's time to use malloc and friends, instead. You may still get allocation failures, but at least you'll have some indication of the failure instead of just blowing out the stack.

One of the most memorable bugs I had was to do with an inline function that used alloca. It manifested itself as a stack overflow (because it allocates on the stack) at random points of the program's execution.
In the header file:
void DoSomething() {
wchar_t* pStr = alloca(100);
//......
}
In the implementation file:
void Process() {
for (i = 0; i < 1000000; i++) {
DoSomething();
}
}
So what happened was the compiler inlined DoSomething function and all the stack allocations were happening inside Process() function and thus blowing the stack up. In my defence (and I wasn't the one who found the issue; I had to go and cry to one of the senior developers when I couldn't fix it), it wasn't straight alloca, it was one of ATL string conversion macros.
So the lesson is - do not use alloca in functions that you think might be inlined.

Old question but nobody mentioned that it should be replaced by variable length arrays.
char arr[size];
instead of
char *arr=alloca(size);
It's in the standard C99 and existed as compiler extension in many compilers.

alloca() is very useful if you can't use a standard local variable because its size would need to be determined at runtime and you can
absolutely guarantee that the pointer you get from alloca() will NEVER be used after this function returns.
You can be fairly safe if you
do not return the pointer, or anything that contains it.
do not store the pointer in any structure allocated on the heap
do not let any other thread use the pointer
The real danger comes from the chance that someone else will violate these conditions sometime later. With that in mind it's great for passing buffers to functions that format text into them :)

As noted in this newsgroup posting, there are a few reasons why using alloca can be considered difficult and dangerous:
Not all compilers support alloca.
Some compilers interpret the intended behaviour of alloca differently, so portability is not guaranteed even between compilers that support it.
Some implementations are buggy.

One issue is that it isn't standard, although it's widely supported. Other things being equal, I'd always use a standard function rather than a common compiler extension.

still alloca use is discouraged, why?
I don't perceive such a consensus. Lots of strong pros; a few cons:
C99 provides variable length arrays, which would often be used preferentially as the notation's more consistent with fixed-length arrays and intuitive overall
many systems have less overall memory/address-space available for the stack than they do for the heap, which makes the program slightly more susceptible to memory exhaustion (through stack overflow): this may be seen as a good or a bad thing - one of the reasons the stack doesn't automatically grow the way heap does is to prevent out-of-control programs from having as much adverse impact on the entire machine
when used in a more local scope (such as a while or for loop) or in several scopes, the memory accumulates per iteration/scope and is not released until the function exits: this contrasts with normal variables defined in the scope of a control structure (e.g. for {int i = 0; i < 2; ++i) { X } would accumulate alloca-ed memory requested at X, but memory for a fixed-sized array would be recycled per iteration).
modern compilers typically do not inline functions that call alloca, but if you force them then the alloca will happen in the callers' context (i.e. the stack won't be released until the caller returns)
a long time ago alloca transitioned from a non-portable feature/hack to a Standardised extension, but some negative perception may persist
the lifetime is bound to the function scope, which may or may not suit the programmer better than malloc's explicit control
having to use malloc encourages thinking about the deallocation - if that's managed through a wrapper function (e.g. WonderfulObject_DestructorFree(ptr)), then the function provides a point for implementation clean up operations (like closing file descriptors, freeing internal pointers or doing some logging) without explicit changes to client code: sometimes it's a nice model to adopt consistently
in this pseudo-OO style of programming, it's natural to want something like WonderfulObject* p = WonderfulObject_AllocConstructor(); - that's possible when the "constructor" is a function returning malloc-ed memory (as the memory remains allocated after the function returns the value to be stored in p), but not if the "constructor" uses alloca
a macro version of WonderfulObject_AllocConstructor could achieve this, but "macros are evil" in that they can conflict with each other and non-macro code and create unintended substitutions and consequent difficult-to-diagnose problems
missing free operations can be detected by ValGrind, Purify etc. but missing "destructor" calls can't always be detected at all - one very tenuous benefit in terms of enforcement of intended usage; some alloca() implementations (such as GCC's) use an inlined macro for alloca(), so runtime substitution of a memory-usage diagnostic library isn't possible the way it is for malloc/realloc/free (e.g. electric fence)
some implementations have subtle issues: for example, from the Linux manpage:
On many systems alloca() cannot be used inside the list of arguments of a function call, because the stack space reserved by alloca() would appear on the stack in the middle of the space for the function arguments.
I know this question is tagged C, but as a C++ programmer I thought I'd use C++ to illustrate the potential utility of alloca: the code below (and here at ideone) creates a vector tracking differently sized polymorphic types that are stack allocated (with lifetime tied to function return) rather than heap allocated.
#include <alloca.h>
#include <iostream>
#include <vector>
struct Base
{
virtual ~Base() { }
virtual int to_int() const = 0;
};
struct Integer : Base
{
Integer(int n) : n_(n) { }
int to_int() const { return n_; }
int n_;
};
struct Double : Base
{
Double(double n) : n_(n) { }
int to_int() const { return -n_; }
double n_;
};
inline Base* factory(double d) __attribute__((always_inline));
inline Base* factory(double d)
{
if ((double)(int)d != d)
return new (alloca(sizeof(Double))) Double(d);
else
return new (alloca(sizeof(Integer))) Integer(d);
}
int main()
{
std::vector<Base*> numbers;
numbers.push_back(factory(29.3));
numbers.push_back(factory(29));
numbers.push_back(factory(7.1));
numbers.push_back(factory(2));
numbers.push_back(factory(231.0));
for (std::vector<Base*>::const_iterator i = numbers.begin();
i != numbers.end(); ++i)
{
std::cout << *i << ' ' << (*i)->to_int() << '\n';
(*i)->~Base(); // optionally / else Undefined Behaviour iff the
// program depends on side effects of destructor
}
}

Lots of interesting answers to this "old" question, even some relatively new answers, but I didn't find any that mention this....
When used properly and with care, consistent use of alloca()
(perhaps application-wide) to handle small variable-length allocations
(or C99 VLAs, where available) can lead to lower overall stack
growth than an otherwise equivalent implementation using oversized
local arrays of fixed length. So alloca() may be good for your stack if you use it carefully.
I found that quote in.... OK, I made that quote up. But really, think about it....
#j_random_hacker is very right in his comments under other answers: Avoiding the use of alloca() in favor of oversized local arrays does not make your program safer from stack overflows (unless your compiler is old enough to allow inlining of functions that use alloca() in which case you should upgrade, or unless you use alloca() inside loops, in which case you should... not use alloca() inside loops).
I've worked on desktop/server environments and embedded systems. A lot of embedded systems don't use a heap at all (they don't even link in support for it), for reasons that include the perception that dynamically allocated memory is evil due to the risks of memory leaks on an application that never ever reboots for years at a time, or the more reasonable justification that dynamic memory is dangerous because it can't be known for certain that an application will never fragment its heap to the point of false memory exhaustion. So embedded programmers are left with few alternatives.
alloca() (or VLAs) may be just the right tool for the job.
I've seen time & time again where a programmer makes a stack-allocated buffer "big enough to handle any possible case". In a deeply nested call tree, repeated use of that (anti-?)pattern leads to exaggerated stack use. (Imagine a call tree 20 levels deep, where at each level for different reasons, the function blindly over-allocates a buffer of 1024 bytes "just to be safe" when generally it will only use 16 or less of them, and only in very rare cases may use more.) An alternative is to use alloca() or VLAs and allocate only as much stack space as your function needs, to avoid unnecessarily burdening the stack. Hopefully when one function in the call tree needs a larger-than-normal allocation, others in the call tree are still using their normal small allocations, and the overall application stack usage is significantly less than if every function blindly over-allocated a local buffer.
But if you choose to use alloca()...
Based on other answers on this page, it seems that VLAs should be safe (they don't compound stack allocations if called from within a loop), but if you're using alloca(), be careful not to use it inside a loop, and make sure your function can't be inlined if there's any chance it might be called within another function's loop.

All of the other answers are correct. However, if the thing you want to alloc using alloca() is reasonably small, I think that it's a good technique that's faster and more convenient than using malloc() or otherwise.
In other words, alloca( 0x00ffffff ) is dangerous and likely to cause overflow, exactly as much as char hugeArray[ 0x00ffffff ]; is. Be cautious and reasonable and you'll be fine.

I don't think anyone has mentioned this: Use of alloca in a function will hinder or disable some optimizations that could otherwise be applied in the function, since the compiler cannot know the size of the function's stack frame.
For instance, a common optimization by C compilers is to eliminate use of the frame pointer within a function, frame accesses are made relative to the stack pointer instead; so there's one more register for general use. But if alloca is called within the function, the difference between sp and fp will be unknown for part of the function, so this optimization cannot be done.
Given the rarity of its use, and its shady status as a standard function, compiler designers quite possibly disable any optimization that might cause trouble with alloca, if would take more than a little effort to make it work with alloca.
UPDATE:
Since variable-length local arrays have been added to C, and since these present very similar code-generation issues to the compiler as alloca, I see that 'rarity of use and shady status' does not apply to the underlying mechanism; but I would still suspect that use of either alloca or VLA tends to compromise code generation within a function that uses them. I would welcome any feedback from compiler designers.

Everyone has already pointed out the big thing which is potential undefined behavior from a stack overflow but I should mention that the Windows environment has a great mechanism to catch this using structured exceptions (SEH) and guard pages. Since the stack only grows as needed, these guard pages reside in areas that are unallocated. If you allocate into them (by overflowing the stack) an exception is thrown.
You can catch this SEH exception and call _resetstkoflw to reset the stack and continue on your merry way. Its not ideal but it's another mechanism to at least know something has gone wrong when the stuff hits the fan. *nix might have something similar that I'm not aware of.
I recommend capping your max allocation size by wrapping alloca and tracking it internally. If you were really hardcore about it you could throw some scope sentries at the top of your function to track any alloca allocations in the function scope and sanity check this against the max amount allowed for your project.
Also, in addition to not allowing for memory leaks alloca does not cause memory fragmentation which is pretty important. I don't think alloca is bad practice if you use it intelligently, which is basically true for everything. :-)

One pitfall with alloca is that longjmp rewinds it.
That is to say, if you save a context with setjmp, then alloca some memory, then longjmp to the context, you may lose the alloca memory. The stack pointer is back where it was and so the memory is no longer reserved; if you call a function or do another alloca, you will clobber the original alloca.
To clarify, what I'm specifically referring to here is a situation whereby longjmp does not return out of the function where the alloca took place! Rather, a function saves context with setjmp; then allocates memory with alloca and finally a longjmp takes place to that context. That function's alloca memory is not all freed; just all the memory that it allocated since the setjmp. Of course, I'm speaking about an observed behavior; no such requirement is documented of any alloca that I know.
The focus in the documentation is usually on the concept that alloca memory is associated with a function activation, not with any block; that multiple invocations of alloca just grab more stack memory which is all released when the function terminates. Not so; the memory is actually associated with the procedure context. When the context is restored with longjmp, so is the prior alloca state. It's a consequence of the stack pointer register itself being used for allocation, and also (necessarily) saved and restored in the jmp_buf.
Incidentally, this, if it works that way, provides a plausible mechanism for deliberately freeing memory that was allocated with alloca.
I have run into this as the root cause of a bug.

Here's why:
char x;
char *y=malloc(1);
char *z=alloca(&x-y);
*z = 1;
Not that anyone would write this code, but the size argument you're passing to alloca almost certainly comes from some sort of input, which could maliciously aim to get your program to alloca something huge like that. After all, if the size isn't based on input or doesn't have the possibility to be large, why didn't you just declare a small, fixed-size local buffer?
Virtually all code using alloca and/or C99 vlas has serious bugs which will lead to crashes (if you're lucky) or privilege compromise (if you're not so lucky).

alloca () is nice and efficient... but it is also deeply broken.
broken scope behavior (function scope instead of block scope)
use inconsistant with malloc (alloca()-ted pointer shouldn't be freed, henceforth you have to track where you pointers are coming from to free() only those you got with malloc())
bad behavior when you also use inlining (scope sometimes goes to the caller function depending if callee is inlined or not).
no stack boundary check
undefined behavior in case of failure (does not return NULL like malloc... and what does failure means as it does not check stack boundaries anyway...)
not ansi standard
In most cases you can replace it using local variables and majorant size. If it's used for large objects, putting them on the heap is usually a safer idea.
If you really need it C you can use VLA (no vla in C++, too bad). They are much better than alloca() regarding scope behavior and consistency. As I see it VLA are a kind of alloca() made right.
Of course a local structure or array using a majorant of the needed space is still better, and if you don't have such majorant heap allocation using plain malloc() is probably sane.
I see no sane use case where you really really need either alloca() or VLA.

Processes only have a limited amount of stack space available - far less than the amount of memory available to malloc().
By using alloca() you dramatically increase your chances of getting a Stack Overflow error (if you're lucky, or an inexplicable crash if you're not).

A place where alloca() is especially dangerous than malloc() is the kernel - kernel of a typical operating system has a fixed sized stack space hard-coded into one of its header; it is not as flexible as the stack of an application. Making a call to alloca() with an unwarranted size may cause the kernel to crash.
Certain compilers warn usage of alloca() (and even VLAs for that matter) under certain options that ought to be turned on while compiling a kernel code - here, it is better to allocate memory in the heap that is not fixed by a hard-coded limit.

alloca is not worse than a variable-length array (VLA), but it's riskier than allocating on the heap.
On x86 (and most often on ARM), the stack grows downwards, and that brings with it a certain amount of risk: if you accidentally write beyond the block allocated with alloca (due to a buffer overflow for example), then you will overwrite the return address of your function, because that one is located "above" on the stack, i.e. after your allocated block.
The consequence of this is two-fold:
The program will crash spectacularly and it will be impossible to tell why or where it crashed (stack will most likely unwind to a random address due to the overwritten frame pointer).
It makes buffer overflow many times more dangerous, since a malicious user can craft a special payload which would be put on the stack and can therefore end up executed.
In contrast, if you write beyond a block on the heap you "just" get heap corruption. The program will probably terminate unexpectedly but will unwind the stack properly, thereby reducing the chance of malicious code execution.

Sadly the truly awesome alloca() is missing from the almost awesome tcc. Gcc does have alloca().
It sows the seed of its own destruction. With return as the destructor.
Like malloc() it returns an invalid pointer on fail which will segfault on modern systems with a MMU (and hopefully restart those without).
Unlike auto variables you can specify the size at run time.
It works well with recursion. You can use static variables to achieve something similar to tail recursion and use just a few others pass info to each iteration.
If you push too deep you are assured of a segfault (if you have an MMU).
Note that malloc() offers no more as it returns NULL (which will also segfault if assigned) when the system is out of memory. I.e. all you can do is bail or just try to assign it any way.
To use malloc() I use globals and assign them NULL. If the pointer is not NULL I free it before I use malloc().
You can also use realloc() as general case if want copy any existing data. You need to check pointer before to work out if you are going to copy or concatenate after the realloc().
3.2.5.2 Advantages of alloca

Actually, alloca is not guaranteed to use the stack.
Indeed, the gcc-2.95 implementation of alloca allocates memory from the heap using malloc itself. Also that implementation is buggy, it may lead to a memory leak and to some unexpected behavior if you call it inside a block with a further use of goto. Not, to say that you should never use it, but some times alloca leads to more overhead than it releaves frome.

In my opinion, alloca(), where available, should be used only in a constrained manner. Very much like the use of "goto", quite a large number of otherwise reasonable people have strong aversion not just to the use of, but also the existence of, alloca().
For embedded use, where the stack size is known and limits can be imposed via convention and analysis on the size of the allocation, and where the compiler cannot be upgraded to support C99+, use of alloca() is fine, and I've been known to use it.
When available, VLAs may have some advantages over alloca(): The compiler can generate stack limit checks that will catch out-of-bounds access when array style access is used (I don't know if any compilers do this, but it can be done), and analysis of the code can determine whether the array access expressions are properly bounded. Note that, in some programming environments, such as automotive, medical equipment, and avionics, this analysis has to be done even for fixed size arrays, both automatic (on the stack) and static allocation (global or local).
On architectures that store both data and return addresses/frame pointers on the stack (from what I know, that's all of them), any stack allocated variable can be dangerous because the address of the variable can be taken, and unchecked input values might permit all sorts of mischief.
Portability is less of a concern in the embedded space, however it is a good argument against use of alloca() outside of carefully controlled circumstances.
Outside of the embedded space, I've used alloca() mostly inside logging and formatting functions for efficiency, and in a non-recursive lexical scanner, where temporary structures (allocated using alloca() are created during tokenization and classification, then a persistent object (allocated via malloc()) is populated before the function returns. The use of alloca() for the smaller temporary structures greatly reduces fragmentation when the persistent object is allocated.

Why no one mentions this example introduced by GNU documention?
https://www.gnu.org/software/libc/manual/html_node/Advantages-of-Alloca.html
Nonlocal exits done with longjmp (see Non-Local Exits) automatically
free the space allocated with alloca when they exit through the
function that called alloca. This is the most important reason to use
alloca
Suggest reading order 1->2->3->1:
https://www.gnu.org/software/libc/manual/html_node/Advantages-of-Alloca.html
Intro and Details from Non-Local Exits
Alloca Example

I don't think that anybody has mentioned this, but alloca also has some serious security issues not necessarily present with malloc (though these issues also arise with any stack based arrays, dynamic or not). Since the memory is allocated on the stack, buffer overflows/underflows have much more serious consequences than with just malloc.
In particular, the return address for a function is stored on the stack. If this value gets corrupted, your code could be made to go to any executable region of memory. Compilers go to great lengths to make this difficult (in particular by randomizing address layout). However, this is clearly worse than just a stack overflow since the best case is a SEGFAULT if the return value is corrupted, but it could also start executing a random piece of memory or in the worst case some region of memory which compromises your program's security.

IMO the biggest risk with alloca and variable length arrays is it can fail in a very dangerous manner if the allocation size is unexpectedly large.
Allocations on the stack typically have no checking in user code.
Modern operating systems will generally put a guard page in place below* to detect stack overflow. When the stack overflows the kernel may either expand the stack or kill the process. Linux expanded this guard region in 2017 to be significantly large than a page, but it's still finite in size.
So as a rule it's best to avoid allocating more than a page on the stack before making use of the previous allocations. With alloca or variable length arrays it's easy to end up allowing an attacker to make arbitrary size allocations on the stack and hence skip over any guard page and access arbitrary memory.
* on most widespread systems today the stack grows downwards.

Most answers here largely miss the point: there's a reason why using _alloca() is potentially worse than merely storing large objects in the stack.
The main difference between automatic storage and _alloca() is that the latter suffers from an additional (serious) problem: the allocated block is not controlled by the compiler, so there's no way for the compiler to optimize or recycle it.
Compare:
while (condition) {
char buffer[0x100]; // Chill.
/* ... */
}
with:
while (condition) {
char* buffer = _alloca(0x100); // Bad!
/* ... */
}
The problem with the latter should be obvious.

Related

Is the compiler allowed to not retract the stack pointer when an object on the stack goes out of scope?

I'm using a Raspberry Pi Pico, which has two cores, both with a 4KB stack, with core0's on top of core1's so that core0 gets to have 8KB of stack in single-threaded apps.
The gist of the issue sparking this question is as follows:
// Do stuff
{
uint8_t buffer[4096];
// Use buffer (for flash IO)
}
MyObject myObject = buildMyObject();
multicore_launch_core1(core1_entry); // Will allocate on its stack
// Use myObject
Here we allocate 4KB on the stack "while we have 8KB of stack". Then we make it go out of scope. Then we allocate another object on the stack. We then launch core1.
At this point, the bottom 4KB of the stack still belong to core0, the top 4KB now belong to core1. Core1 starts using them. We then use the previously allocated object.
I expect myObject to be in the first 4KB, because I expect buffer going out of its explicit scope to increase the stack pointer by 4KB immediately with regards to control flow.
This isn't what happens on GCC 10.3.1 arm-none-eabi. The 4KB of stack taken by buffer stay there, never to be given back until the enclosing scope (same as myObject's) ends. Which of course, results in myObject being allocated in core1's stack-to-be. Chaos ensues.
This sounds counterintuitive to me and, in the context of embedded programming where we might not even have a heap, harmful.
Is this a compiler bug ? Or does the standard allow this to happen ?
Is the compiler allowed to not retract the stack pointer when an object on the stack gets out of scope ?

Since this is tagged language-lawyer: Neither C nor C++ standard make any guarantees over layout and location of memory. They don't have any real concept of a stack either. (In C++ there is a concept of "stack unwinding" which however doesn't really require a stack as in the memory concept and in C++23 there is support for stacktraces, but it also has no concept of memory addresses.)
There is also no standard-approved way of actually depending on the memory location chosen for variables. It is fundamentally impossible to get from a pointer to one of them to a pointer to another (without taking the address of the latter with & first and storing the result somewhere in an object reachable from the former). The compiler can assume that individual variables are completely independent in terms of their memory location and that they cannot be messed with from anything external. It can (and does) for example reorder the location of variables on the stack in whatever way deemed suitable for optimization. It may also add padding, etc. It may decide arbitrarily to reuse storage of variables whose storage duration has ended, but it doesn't have to either.
Everything you are doing that allows you to do context switches or the like is completely outside the standard's specification and dependent on the C++ implementation, i.e. compiler, architecture, etc.
For your use case it seems that you likely want to write inline assembly (also a compiler-specific extension) so that you have control over where your data is located in memory. Alternatively there may be other compiler-specific extensions such as attributes to help with that.

Why do you need to allocate memory in c/c++?

Getting straight to the point: What is the reason for needing to allocate memory in c++?
I understand some programming languages do it automatically, but in C/C++: what is the reason for having to allocate memory. For example:
When declaring PROCESSENTRY32, why do we need to ZeroMemory() it? When making a buffer for a sockets program, why do we need to ZeroMemory() it? Why don't you need to allocate memory when you declare an int data type?

Your question doesn't really make sense. ZeroMemory doesn't allocate memory; it just, well, sets bytes to 0. You can easily ZeroMemory an int, if you want. It's just that i = 0; is shorter to write.
In all cases ZeroMemory only works on memory that already exists; i.e. something else must have allocated it before.
As for actual allocation, C distinguishes three kinds of storage for objects:
Static storage. These objects are allocated when the program starts and live for as long as the program runs. Example: Global variables.
Automatic storage. These objects are allocated when execution reaches their scope and deallocated when execution leaves their containing scope. Example: Local variables.
Dynamic storage. This is what you manage manually by calling malloc / calloc / realloc / free.
The only case where you really have to allocate memory yourself is case #3. If your program only uses automatic storage, you don't have to do anything special.
In languages like Java, you still have to allocate memory by calling new. Python doesn't have new, but e.g. whenever you execute something like [...] or {...}, it creates a new list/dictionary, which allocates memory.
The crucial part is really that you don't have to deallocate memory.
Languages like Java or Python include a garbage collector: You create objects, but the language takes care of cleaning up behind you. When an object is no longer needed1, it is deallocated automatically.
C doesn't do that. The reasons lie in its history: C was invented as a replacement for assembler code, in order to make porting Unix to a new computer easier. Automatic garbage collection requires a runtime system, which adds complexity and can have performance issues (even modern garbage collectors sometimes pause the whole program in order to reclaim memory, which is undesirable, and C was created back in 1972).
Not having a garbage collector makes C
easier to implement
easier to predict
potentially more efficient
able to run on very limited hardware
C++ was meant to be a "better C", targeting the same kind of audience. That's why C++ kept nearly all of C's features, even those that are very unfriendly to automatic garbage collection.
1 Not strictly true. Memory is reclaimed when it is no longer reachable. If the program can still reach an object somehow, it will be kept alive even if it's not really needed anymore (see also: Space leak).

C chooses to be relatively low-level language where language constructs more or less directly map to at most a few machine instructions.
Block level allocations such as in
int main()
{
int a,b,c; //a very cheap allocation on the stack
//... do something with a, b, and c
}
fall within this category as all block-level allocations together in a function will normally translate to just a single subtraction to the stack pointer.
The downside of these allocations is that they're very limited -- you shouldn't allocate big objects or multiple objects like this (or you risk stack overflow) and they're not very persistent either--they're effectively undone at the end of the scope.
As for generic allocations from main memory, the machine doesn't really offer you much apart from a big array of char (i.e., your RAM) and possibly some virtual memory mapping facilities (i.e., mapping real memory into smaller arrays of char). There are multiple ways for slicing up these arrays and for using and reusing the pieces, so C leaves this to the libraries. C++ takes after C.

No stack allocation whole program compilation?

If you write an app that is:
Single threaded
Has no cycles in call graph
Doesn't use alloca or VLAs
Can modern whole program optimizing compilers optimize away all stack allocation (e.g. GCC, MSVC, ICC)? It seems like in those circumstances it should be able to allocate all possible stack space statically. By 'whole program' I mean the compiler has access to /all/ source code (no possiblity of dlopen'ing things at runtime, etc.).

If you can guarantee the conditions you stated, then yes: it would be possible to effectively have the stack be completely statically allocated. Each function would have a block of stack memory.
However, will actual compilers do it? No.
It gains absolutely nothing to do so. Indeed, it may gain less than nothing. Often times, much of the working stack is in the cache, so modifications to it are pretty cheap. If the stack were in static memory, then the only time any particular function's "stack" memory would be cached would be if you had recently called that function. Using a real stack, you're more likely to be working in the cache.
Furthermore, giving each function a block of stack memory can easily make your program's static memory usage much larger than it needs to be. The stack is a fixed-size construct; no matter how many functions you have, the stack takes up a certain size. If you have 100,000 functions, and each function takes up 64 bytes of space, then your static "stack" must take up ~6.4MB of space.
Why? You're never going to be using most of that memory at any one time. The program would run just fine in a 1MB or even 512KB stack; why take up 6x that memory for nothing?
So it is both not a performance optimization and can bloats your program's memory.

This is a comment that's too long to be a comment:
Note that while all stack allocations may be theoretically optimized away, more may be allocated than necessary. This is not what the OP was asking, but it could be interesting to consider. Finding the minimum-sized allocation required would be equivalent to solving the halting problem. Imagine a program structured as:
<do 'something'>
<call last thing which happens to require more
stack space than everything else in 'something'>
You only need the additional stack space if <do 'something'> "halts".
You can also imagine other variations where optimizing becomes arbitrarily hard. For example, your program could simply evaluate a 3SAT expression with user input and do something depending on that -- but that 3SAT expression may or may not have any value that results in true.
Perhaps there is a more trivial case: The user may simply never enter input that requires more stack space for processing.

It's possible for a compiler to do this, but it would be such a specific optimization that it probably wouldn't.
If you had a program that was completely inlined, you would take care of the overhead of setting up stack frames for function calls.
However, if you wanted to also get rid of stack allocations for local variables, the compiler has to transform those local variables into global variables. No compiler I know of does that, and on some platforms it takes extra instructions to reference a global variable compared to a local variable (as the address must be loaded with two instructions rather than one). Plus, since referencing a stack variable is such a common operation, it's usually encoded into a smaller instruction.

Unless you add "doesn't use any external libraries" to your list, any calls to external functions will require stack setup because they would have been compiled expecting the calling code to pass its parameters in a particular way, most likely on the stack. Additionally such libraries would almost certainly have to adjust the stack themslves for their own locals.
Additionally, depending on your precise application, even if you know that the stack could be allocated statically, it may be very hard for the compiler to know that there aren't any callbacks into your code, etc, that would cause a need for stack allocation.
I just can't see a case where the compile would attempt this optimization because allocating stack space is trivially fast already (a couple register manipulations I believe)

How to detect the amount of stack space available to my program?

My Win32 C++ application acts as an RPC server - it has a set of functions for processing requests and RPC runtime creates a separate thread and invokes one of my functions in that thread.
In my function I have an std::auto_ptr which is used to control a heap-allocated char[] array of size known at compile time. It accidentially works when compiled with VC++ but it's undefined behaviour according to C++ standard and I'd like to get rid of it.
I have two options: std::vector or a stack-allocated array. Since I have no idea why there's a heap-allocated array I would like to consider replacing it with a stack-allocated one. The array is 10k elements and I can hypothetically face a stack overflow if the RPC runtime spawns a thread with a very small stack.
I would like to detect how much stack space is typilcally allocated to the thread and how much of it is available to my function (its callees certainly consume some of allocated space). How could I do that?

I don't know of any way of figuring out the stack size directly using the API if you don't have access to the CreateThread call or, if it's the main thread, looking into the EXE's default thread size in the PE header.
In your situation, I would allocate on the heap to be safe, even though a 10K array of small data is unlikely to max out the stack in non-recursive scenarios.
However, you can probe for the stack limit, if done carefully. The stack gets committed in 4K pages as you touch them (via guard pages) until you hit the limit, whereupon Windows will throw a stack overflow exception. There is still one page of stack left when the exception gets dispatched, so that the exception dispatching logic itself (including filter functions) can execute - but Windows throws the exception because it couldn't allocate another guard page. That means that the next stack overflow, or probe, will not result in a stack overflow exception, but an access violation. So to make probing work reliably (and in particular, repeatably) you need to decommit the memory allocated by the probing and reinstate a guard page.
This article on KB describes how to decommit stack memory and reinstate the guard page. It probes using recursion and 10,000-byte increments; the compiler by default implements its own stack probing for stack allocations of locals >4KB, so that the stack growth mechanism works correctly.

In windows, the default stack size is 1MB, so you are unlikely to stack overflow with only a 10k array. That said, I think that allocating so much memory on the stack is a bad practice, and you should try to favour allocating it dynamically, if you can. There is also the Scoped Array which is well defined for automatically managing arrays - unlike the vector class, it is non-copyable.

I second 1800 INFORMATION:
Allocate your data on heap if you can. It's safer (e.g. buffer overflows are harder to exploit) and more flexible when (not if) you need to extend your design later.
Use std::vector, boost::scoped_array or boost::shared_array.
I know it's not answering your question on detecting stack size but I think it's a logical answer to your problem.

I'm not sure what you're after.
If you just want typical numbers, then go ahead and try! Create a function with nested scopes, each of which allocates some more stack space. Output in each scope. See how far the thing gets.
If you want concrete numbers in a concrete situation, ask yourself what you would want to do once you have them? Branch into different implementations? This sounds like a maintenance problem the use of which should be very well justified. What do you expect to gain? Is this really worth such a hassle?
I agree that 10k usually shouldn't be a problem. So if your code isn't mission critical, go ahead and use boost::array (or std::tr1::array, if your std lib comes with it). Otherwise just use std::vector or, if you feel you must, boost::scoped_array (or std::tr1::scoped_array, if your std lib comes with it).

"std::auto_ptr which is used to
control a heap-allocated char[] ...
it's undefined behaviour according to
C++"
It is wrong assumption!
STL's auto_ptr has precise description of behavior. If you are worried about loosing control during sophisticated assignment review possibility to use reference-counter pattern to control destroying heap-allocated array.

Proper stack and heap usage in C++?

I've been programming for a while but It's been mostly Java and C#. I've never actually had to manage memory on my own. I recently began programming in C++ and I'm a little confused as to when I should store things on the stack and when to store them on the heap.
My understanding is that variables which are accessed very frequently should be stored on the stack and objects, rarely used variables, and large data structures should all be stored on the heap. Is this correct or am I incorrect?

No, the difference between stack and heap isn't performance. It's lifespan: any local variable inside a function (anything you do not malloc() or new) lives on the stack. It goes away when you return from the function. If you want something to live longer than the function that declared it, you must allocate it on the heap.
class Thingy;
Thingy* foo( )
{
int a; // this int lives on the stack
Thingy B; // this thingy lives on the stack and will be deleted when we return from foo
Thingy *pointerToB = &B; // this points to an address on the stack
Thingy *pointerToC = new Thingy(); // this makes a Thingy on the heap.
// pointerToC contains its address.
// this is safe: C lives on the heap and outlives foo().
// Whoever you pass this to must remember to delete it!
return pointerToC;
// this is NOT SAFE: B lives on the stack and will be deleted when foo() returns.
// whoever uses this returned pointer will probably cause a crash!
return pointerToB;
}
For a clearer understanding of what the stack is, come at it from the other end -- rather than try to understand what the stack does in terms of a high level language, look up "call stack" and "calling convention" and see what the machine really does when you call a function. Computer memory is just a series of addresses; "heap" and "stack" are inventions of the compiler.

I would say:
Store it on the stack, if you CAN.
Store it on the heap, if you NEED TO.
Therefore, prefer the stack to the heap. Some possible reasons that you can't store something on the stack are:
It's too big - on multithreaded programs on 32-bit OS, the stack has a small and fixed (at thread-creation time at least) size (typically just a few megs. This is so that you can create lots of threads without exhausting address space. For 64-bit programs, or single threaded (Linux anyway) programs, this is not a major issue. Under 32-bit Linux, single threaded programs usually use dynamic stacks which can keep growing until they reach the top of the heap.
You need to access it outside the scope of the original stack frame - this is really the main reason.
It is possible, with sensible compilers, to allocate non-fixed size objects on the heap (usually arrays whose size is not known at compile time).

It's more subtle than the other answers suggest. There is no absolute divide between data on the stack and data on the heap based on how you declare it. For example:
std::vector<int> v(10);
In the body of a function, that declares a vector (dynamic array) of ten integers on the stack. But the storage managed by the vector is not on the stack.
Ah, but (the other answers suggest) the lifetime of that storage is bounded by the lifetime of the vector itself, which here is stack-based, so it makes no difference how it's implemented - we can only treat it as a stack-based object with value semantics.
Not so. Suppose the function was:
void GetSomeNumbers(std::vector<int> &result)
{
std::vector<int> v(10);
// fill v with numbers
result.swap(v);
}
So anything with a swap function (and any complex value type should have one) can serve as a kind of rebindable reference to some heap data, under a system which guarantees a single owner of that data.
Therefore the modern C++ approach is to never store the address of heap data in naked local pointer variables. All heap allocations must be hidden inside classes.
If you do that, you can think of all variables in your program as if they were simple value types, and forget about the heap altogether (except when writing a new value-like wrapper class for some heap data, which ought to be unusual).
You merely have to retain one special bit of knowledge to help you optimise: where possible, instead of assigning one variable to another like this:
a = b;
swap them like this:
a.swap(b);
because it's much faster and it doesn't throw exceptions. The only requirement is that you don't need b to continue to hold the same value (it's going to get a's value instead, which would be trashed in a = b).
The downside is that this approach forces you to return values from functions via output parameters instead of the actual return value. But they're fixing that in C++0x with rvalue references.
In the most complicated situations of all, you would take this idea to the general extreme and use a smart pointer class such as shared_ptr which is already in tr1. (Although I'd argue that if you seem to need it, you've possibly moved outside Standard C++'s sweet spot of applicability.)

You also would store an item on the heap if it needs to be used outside the scope of the function in which it is created. One idiom used with stack objects is called RAII - this involves using the stack based object as a wrapper for a resource, when the object is destroyed, the resource would be cleaned up. Stack based objects are easier to keep track of when you might be throwing exceptions - you don't need to concern yourself with deleting a heap based object in an exception handler. This is why raw pointers are not normally used in modern C++, you would use a smart pointer which can be a stack based wrapper for a raw pointer to a heap based object.

To add to the other answers, it can also be about performance, at least a little bit. Not that you should worry about this unless it's relevant for you, but:
Allocating in the heap requires finding a tracking a block of memory, which is not a constant-time operation (and takes some cycles and overhead). This can get slower as memory becomes fragmented, and/or you're getting close to using 100% of your address space. On the other hand, stack allocations are constant-time, basically "free" operations.
Another thing to consider (again, really only important if it becomes an issue) is that typically the stack size is fixed, and can be much lower than the heap size. So if you're allocating large objects or many small objects, you probably want to use the heap; if you run out of stack space, the runtime will throw the site titular exception. Not usually a big deal, but another thing to consider.

Stack is more efficient, and easier to managed scoped data.
But heap should be used for anything larger than a few KB (it's easy in C++, just create a boost::scoped_ptr on the stack to hold a pointer to the allocated memory).
Consider a recursive algorithm that keeps calling into itself. It's Very hard to limit and or guess the total stack usage! Whereas on the heap, the allocator (malloc() or new) can indicate out-of-memory by returning NULL or throw ing.
Source: Linux Kernel whose stack is no larger than 8KB!

For completeness, you may read Miro Samek's article about the problems of using the heap in the context of embedded software.
A Heap of Problems

The choice of whether to allocate on the heap or on the stack is one that is made for you, depending on how your variable is allocated. If you allocate something dynamically, using a "new" call, you are allocating from the heap. If you allocate something as a global variable, or as a parameter in a function it is allocated on the stack.

In my opinion there are two deciding factors
1) Scope of variable
2) Performance.
I would prefer to use stack in most cases but if you need access to variable outside scope you can use heap.
To enhance performance while using heaps you can also use the functionality to create heap block and that can help in gaining performance rather than allocating each variable in different memory location.

probably this has been answered quite well. I would like to point you to the below series of articles to have a deeper understanding of low level details. Alex Darby has a series of articles, where he walks you through with a debugger. Here is Part 3 about the Stack.
http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js