How does an optimizing c++ compiler determine when a stack slot of a function(part of stack frame of a function) is no longer needed by that function, so it can reuse its memory? .
By stack slot I mean a part of stack frame of a function, not necessarily a whole stack frame of a function and an example to clarify the matter is, suppose we have a function that has six integer variables defined in its scope, when it's time to use sixth variable in the function, fifth variable's become useless so compiler can use same memory block for fifth and sixth variables.
any information on this subject is appreciated.
EDIT: I interpreted the question to mean, "how does the compiler reuse a particular memory word in the stack?" Most of the following answers that question, and a note a the end answers the question, "how does the compiler reuse all the stack space needed by a function?".
Most compilers don't assign stack slots first. Rather, what they do, for each function body, is treat each update to a variable, and all accesses to that variable that can see that particular assignment, as a so-called variable lifetime. A variable which is assigned multiple times will thus cause the compiler to create multiple lifetimes.
(There are complications with this idea that occur when multiple assignments can reach an access through different control paths; this is solved by using a clever enhancement to this idea called static single assignment, which I'm not going to discuss here).
At any point in the code, there are a set of variable lifetimes that are valid; as you choose differnt code points, you have different valid variable lifetimes. The compiler's actual problem is to assign different registers or stack slots of each of the lifetimes. One can think of this as a graph-coloring problem: each lifetime is a node, and if two lifetimes can overlap at a point in the code, there is an "interference" arc from that node to the other node representing the other lifetime. You can color the graph (or equivalently use numbers instead of colors), such that no two nodes connected by an interference arc have the same color (number); you may have to use arbitarily large numbers to do this, but for most functions the numbers don't have to be very large. If you do this, the colors (numbers) will tell you a safe stack slot to use for the assigned value of the particular variable lifetime. (This idea is normally used in roughly two phases: once to allocate registers, and once to allocate stack slots for those lifetimes that don't fit into the registers).
By determining the largest number used as a color on the graph, the compiler knows how many slots are needed in the worst case, and can reserve that much storage at function entry time.
There's lots of complications: different values take different amounts of space, etc., but the basic idea is here. Not all compilers use the graph coloring technique, but almost all of them figure out how to assign stack slots and registers in a way to avoid the implied interference. And thus they know stack slot numbers and the size of the stack frame.
EDIT... while typing, it appears that the question has been interpreted as "when does the stack frame for a function vanish"? The answer is, at function exit. The compiler already knows how big it is. It has no need to push or pop onto the stack during function execution; it knows where to put everything based on the stack slot numbering determined by the graph coloring.
The easy part is: When a function exits, all local variables of that function are released. Thus, function exit indicates that the whole stack frame can be freed. That's a no-brainer, though, and you wouldn't have mentioned "optimizing compiler" if that's what you were after.
In theory, a compiler can do flow analysis on a function, find out which chunks of memory are used at what time, and perhaps even re-order stack allocation based on the order in which variables become available. Then, if new automatic variables are introduced somewhere in the middle of the function or other scope nested within the function (rather than at its beginning), those recently freed slots could be re-used.
In practice, this sounds like a lot of spinning gears, and I suspect that stack is simply allocated whenever variables come in scope and popped off en block by decrementing the stack pointer when the scope finishes. But I admit I'm no expert on this topic. Someone with more authoritative knowledge may come along and correct me.
If I understand the question correctly, this is about call chaining, i.e. invoking function from function without allocating new stack frame.
This is possible when the call can be transformed into tail call - the last op before return. This way all local variables (stack) are already out of scope, so the space can be reused. Compiler then generates a jump instead of call instruction. The original return return address is still at the proper place on the stack.
I probably glossed over lots of details here, but that's the idea.
Related
I'm currently adapting some example Arduino code to fit my needs. The following snippet confuses me:
// Dont put this on the stack:
uint8_t buf[RH_RF95_MAX_MESSAGE_LEN];
What does it mean to put the buf variable on the stack? How can I avoid doing this? What bad things could happen if I did it?
The program stack has a limited size (even on desktop computers, it's typically capped in megabytes, and on an Arduino, it may be much smaller).
All function local variables for functions are stored there, in a LIFO manner; the variables of your main method are at the bottom of the stack, the variables of the functions called in main on top of that, and so on; space is (typically) reserved on entering a function, and not reclaimed until a function returns. If a function allocates a truly huge buffer (or multiple functions in a call chain allocate slightly smaller buffers) you can rapidly approach the stack limit, which will cause your program to crash.
It sounds like your array is being allocated outside of a function, putting it at global scope. The downside to this is there is only one shared buffer (so two functions can't use it simultaneously without coordinating access, while a stack buffer would be independently reserved for each function), but the upside is that it doesn't cost stack to use it; it's allocated from a separate section of program memory (a section that's usually unbounded, or at least has limits in the gigabyte, rather than megabyte range).
So to answer your questions:
What does it mean to put the buf variable on the stack?
It would be on the stack if it:
Is declared in function scope rather than global scope, and
Is not declared as static (or thread_local, though that's more complicated than you should care about right now); if it's declared static at function scope, it's basically global memory that can only be referenced directly in that specific function
How can I avoid doing this?
Don't declare huge non-static arrays at function scope.
What bad things could happen if I did it?
If the array is large enough, you could suffer a stack overflow from running out of available stack space, crashing your program.
I recently dove into graphics programming and I noticed that many graphic engines (i.e Ogre), and many coders overall, prefer to initialize class instances dynamically. Here's an example from Ogre Basic Tutorial 1
//...
Ogre::Entity* ogreHead = mSceneMgr->createEntity("Head", "ogrehead.mesh");
Ogre::SceneNode* headNode = mSceneMgr->getRootSceneNode()->createChildSceneNode("HeadNode");
//...
ogreHead and headNode data members and methods are then referred to as ogreHead->blabla.
Why mess around with object pointers instead of plain objects?
BTW, I've also read somewhere that heap memory allocation is much slower than stack memory allocation.
Heap allocation is, inevitably much slower than stack allocation. More on "How much slower?" later. However, in many cases, the choice is "made for you", for several reasons:
Stack is limited. And if you run out, the application almost always gets terminated - there is no real good recovery, even printing an error message to say "I ran out of stack" may be hard...
Stack allocation "goes away" when you leave the function where the allocation was made.
Variability is much more well defined and easy to deal with. C++ does not cope with "variable length arrays" very well, and it's certainly not guaranteed to work in all compilers.
How much slower is heap over stack?
We'll get to "and does it matter" in a bit.
For a given allocation, stack allocation is simply a subtract operation [1], where at the very minimum new or malloc will be a function call, and probably even the most simple allocator will be several dozen instructions, in complex cases thousands [because memory has to be gotten from the OS, and cleared of it's previous content]. So anything from a 10x to "infinitely" slower, give or take. Exact numbers will depend on the exact system the code is running in, size of the allocation, and often "previous calls to the allocator" (e.g. a long list of "freed" allocations can make allocating a new object slower, because a good fit has to be searched for). And of course, unless you do the "ostrich" method of heap management, you also need to free the object and cope with "out of memory" which adds more code/time to the execution and complexity of the code.
With some reasonably clever programming, however, this can be mostly hidden - for example, allocating something that stays allocated for a long time, over the lifetime of the object, will be "nothing to worry about". Allocating objects from the heap for every pixel or every trianle in a 3D game would CLEARLY be a bad idea. But if the lifetime of the object is many frames or even the entire game, the time to allocate and free it will be nearly nothing.
Similarly, instead of doing 10000 individual object allocations, make one for 10000 objects. Object pool is one such concept.
Further, often the allocation time isn't where the time is spent. For example, reading a triangle list from a file from a disk will take much longer than allocating the space for the same triangle list - even if you allocate each single one!
To me, the rule is:
Does it fit nicely on the stack? Typically a few kilobytes is fine, many kilobytes not so good, and megabytes definitely not ok.
Is the number (e.g. array of objects) known, and the maximum such that you can fit it on the stack?
Do you know what the object will be? In other words abstract/polymorphic classes will probably need to be allocated on the heap.
Is its lifetime the same as the scope it is in? If not, use the heap (or stack further down, and pass it up the stack).
[1] Or add if stack is "grows towards high addresses" - I don't know of a machine which has such an architecture, but it is conceivable and I think some have been made. C certainly makes no promises as to which way the stack grows, or anything else about how the runtime stack works.
The scope of the stack is limited: it only exists within a function. Now, modern user-interfacing programs are usually event driven, which means that a function of yours is invoked to handle an event, and then that function must return in order for the program to continue running. So, if your event handler function wishes to create an object which will remain in existence after the function has returned, clearly, that object cannot be allocated on the stack of that function, because it will cease to exist as soon as the function returns. That's the main reason why we allocate things on the heap.
There are other reasons, too.
Sometimes, the exact size of a class is not known during compilation time. If the exact size of a class is not known, it cannot be created on the stack, because the compiler needs to have precise knowledge of how much space it needs to allocate for each item on the stack.
Furthermore, factory methods like whatever::createEntity() are often used. If you have to invoke a separate method to create an object for you, then that object cannot be created on the stack, for the reason explained in the first paragraph of this answer.
Why pointers instead of objects?
Because pointers help make things fast. If you pass an object by value, to another function, for example
shoot(Orge::Entity ogre)
instead of
shoot(Orge::Entity* ogrePtr)
If ogre isn't a pointer, what happens is you are passing the whole object into the function, rather than a reference. If the compiler doesn't optimize, you are left with an inefficient program. There are other reasons too, with the pointer, you can modify the passed in object (some argue references are better but that's a different discussion). Otherwise you would be spending too much time copying modified objects back and forth.
Why heap?
In some sense heap is a safer type of memory to access and allows you to safely reset/recover. If you call new and don't have memory, you can flag that as an error. If you are using the stack, there is actually no good way to know you have caused stackoverflow, without some other supervising program, at which point you are already in danger zone.
Depends on your application. Stack has local scope so if the object goes out of scope, it will deallocate memory for the object. If you need the object in some other function, then no real way to do that.
Applies more to OS, heap is comparatively much larger than stack, especially in multi-threaded application where each thread can have a limited stack size.
The way I've been taught for quite some time is that when I run a program, the first thing that immediately goes on the stack is a stack frame for the main method. And if I call on a function called foo() from within main, then a stack frame that is the size of the local variables ( automatic objects) and the parameters gets pushed onto the stack as well.
However, I've ran into a couple things that contradict this. And I'm hoping someone can clear up my confusion or explain why there really aren't any contradictions.
First contradiction:
In the book, "The C++ Programming Language" 3rd edition by Bjarne Stroustrup, it says on page 244, "A named automatic object is created each time its declaration is encountered in the execution of the program." If that's not clear enough, on the next page it says, "The constructor for a local variable is executed each time the thread of control passes through the declaration of the local variable."
Does this mean that the total memory for a stack frame is not allocated all at once, but rather block by block as the variable declarations are encountered ?
Also, does this mean that a stack frame may not be the same size every time if a variable declaration was not encountered due to an if statement ?
Second contradiction:
I've done a little coding in assembly ( ARM to be specific ), and the way my class was taught was that when a function was called, we immediately used the registers and never pushed any of the local variables of the current function onto the stack unless the algorithm was not possible to perform with the limited amount of registers. And even then, we only pushed the leftover variables.
Does this mean when a function is called, a stack frame may not be created at all ?
Does this also imply that a stack frame may differ in size due to the use of registers ?
Regarding your first question:
The creation of the object has nothing to do with the allocation of the data itself. To be more specific: the fact that the object has its reserved space on the stack doesn't imply anything about when its constructor is called.
Does this mean that the total memory for a stack frame is not allocated all at once, but rather block by block as the variable declarations are encountered?
This question is really compiler specific. A stack pointer is just a pointer, how it is used by the binary is up to the compiler. Actually some compilers may reserve the whole activation record, some may reserve just little by little, some other may reserve it dynamically according to the specific invocation and so on. This is even tightly coupled with optimization so that the compiler is able to arrange things in the way it thinks is better.
Does this mean when a function is called, a stack frame may not be created at all ? Does this also imply that a stack frame may differ in size due to the use of registers ?
Again, there is no strict answer here. Usually compilers rely on register allocation algorithms that are able to allocate registers in a way that minimizes "spilled" (on stack) variables. Of course, if you are writing in assembly by hand, you can decide to assign specific registers to specific variables throughout your program just because you know by their content how you want to make it work.
A compiler can't guess this, but it can see when a variable starts to be used or is no longer needed and arrange things in a way that minimize memory accesses (so stack size). For example, it could implement a policy such that some registers should be saved by the called, some others by the callee and assign or whatever.
Constructing a C++ object has very little to do with acquiring memory for the object. In fact, it would be more accurate to say "reserving memory", since in general, computers do not have little teams of RAM-builders which spring into action every time you ask for a new object. The memory is more or less permanent (although we could quibble about VM). Of course, the compiler has to arrange for its program to only use a particular range of memory for one thing at a time. That may (and probably does) require it to reserve a range of memory prior to the object's existence, and avoid using it for other objects until some time after the object's disappearance. For efficiency, the compiler may (even in the case of objects with dynamic storage duration) optimize reservations by reserving several blocks of memory at once, if it knows it will need them. In any event, when C++ talks about "constructing an object", it means just that: taking a range of memory with undefined contents, and doing what is necessary to create the representation of the object (and whatever else in the state of the world is implied by the creation of the object, which might not be limited to a particular hunk of memory.)
There is no requirement for stack frames to exist. There is no requirement for a stack to exist. That's all up to the compiler. Most compilers do generate code which uses a stack, of course, and good compilers will figure out when it is possible to abbreviate or even omit a stack frame. So, yes, frames may vary in size.
You are absolutely right, a stack frame is not required. Stack frames are a quick and dirty solution to the problem of managing the local space, easier to debug than to manage changes in the stack pointer during the course of the function. If there is a need for the stack within the function it is easier to just adjust the stack pointer on entry and restore it on return.
This is also not black and white, compilers are programs like any other program, and if you dont already know then you will come to realize that given any number of programmers you are going to get multiple solutions to the same problem. Even if the number of programmers is one that one person may choose to solve the problem over and over again until they are satisfied and/or for whatever reason may choose to release the various versions. The use of the stack is very common for local variables, it is really how you do it but that does not mean you have to use a stack frame created on entry and restored on return.
And as you have learned in your classes and is very easy to see through experiments (compile some simple functions, with various levels of optimization from no optimization to some) that for example gcc wont use the stack unless it has to. We are talking arm here right where the normal calling convention is register based (there is nothing that says the compiler author(s) have to follow that convention, it is possible to use stack based on arm if a compiler choose to do that). Processors where the normal convention is stack based since the code is already dealing with the stack it may choose to use a stack frame anyway. It is likely that in those cases the stack based convention is used because the processor lacks general purpose registers and relies more on the stack than other processors with more registers, which means that processor likely needs the stack often not just for the calling convention but for most of the local storage.
I'm sorry if this has been asked before, but I didn't find anything...
For a "normal" x86 architecture:
When I call a large function in C++, is the memory then allocated immediately for all stack variables?
Or are there compilers which can (and do) modify the stack size even if the function is not finished.
For example if a new scope starts:
int largeFunction(){
int a = 1;
int b = 2;
// .... long code ....
{ // new scope
int c = 5;
// .... code again ....
}
// .....
}
Could the call stack "grow" also for the variable c at the beginning of the separate scope and "shrink" at its end?
Or will current compilers always produce code which affects the stack pointer at the entry and return value of the function?
Thanks for your answer in advance.
1) How long a function is has nothing to do with the allocation of memory, independent of stack or heap.
2) When stack is "allocated" depends only on the compiler's way to make the most efficient code. "Efficient" has a wide range of requirements. All compilers have options to modify the optimizer goals for speed & size and most compilers can optimize also for lower stack consumption and other parameters.
3) Automatic variables can go on the stack but that is not a must. A lot of variables should be "allocated" to registers of your cpu. This speeds up the code a lot and saves stack. But this depends very much on the cpu platform.
4) When a compiler generates a new stack frame is also a question of optimization of code. Compilers can do "out of order execution" if this saves resources or fits better with the architecture. So the question when a stack frame comes in use cannot be answered. A new scope (open brace) can be the point for allocating a new stack frame, but this is never a guarantee. Sometimes it is not efficient to do a recalculation of all stack relative addresses of all called functions from the actual scope.
5) Some compilers can also use heap memory for auto variables. This is often seen on embedded cores if access via special instructions is faster as a stack relative addressing.
But normally it is not very important when a compiler do what he wants. The only thing which is sometimes to remember is, that you have to guarantee that your stack is large enough. Often system calls for new threads have params to set the stack size. So you have to know how many stack size your implementation needs. But in all other cases: Forget to think about. This job is done perfectly from your compiler developers.
I don't know the answer (and I hope you only want to know because you're curious, as no valid program should be able to tell the difference), but you could test the behaiour of your compiler by calling a function like this before the new scope and again after the new scope:
std::intptr_t stackaddr()
{
int i;
return reinterpret_cast<std::intptr_t>(&i);
}
If you get the same result then it means the stack was already adjusted in advance of creating c.
There was a change in G++ 4.7 which allows the compiler to re-use the stack space of c after its scope ends, where previously any new variables after that point would have increased the stack usage: "G++ now properly re-uses stack space allocated for temporary objects when their lifetime ends, which can significantly lower stack consumption for some C++ functions." But I think that only affects how much stack is reserved on entry to the function, not when/where it's reserved.
This is entirely dependent on the runtime conventions of the system you are using, however, the CPU architecture usually plays a big part in the decision, because the architecture defines what stack management can safely be used. On the old PowerPCs under MacOS X, for instance, stack frames were always of fixed size, one atomical store of the stackpointer at the low end of a new stack frame would allocate it, dereferencing the stackpointer was equivalent to poping an entire stack frame.
Current systems like Linux and (correct me, if I'm wrong) Windows on x86 have a more dynamic approach with atomic push and pop instructions (there is no atomic pop on PowerPC), where the parameters to a function call are pushed unto the stack before each function call, effectively resizing the allocated stack frame each time.
So, yes, on many current systems the compiler can resize the stack frame, but on other systems such an operation is at least hard to accomplish (never impossiple, though).
Max function stack size is limited and can be quickly exhausted if we use big stack variables or get careless with recursive functions.
But main's stack isn't really a stack. main is always called exactly once and never recursively. By all means main's stack is static storage allocated at the very beginning and it lives until the very end. Does it mean I can allocate big arrays in main's stack?
int main()
{
double a[5000000];
}
main is just a normal function. Stack size is system dependent.
Alos remember your process shares only one stack, for all function calls. Items are pushed and popped from the stack, as function are called by main.
It's implementation-defined (the language standard doesn't talk about stacks, AFAIK). But typically, main lives on the stack just like any other function.
It's 100% compiler and system dependent, like most of this kind of funny business. Heck, even the existence of the stack isn't mandated by the standard.
In practice, yes, it's on the stack, and no, you can't allocate things like that on the stack without running into trouble.
When you allocate an array in that manner, it is allocated on the stack. There is a platform-dependent maximum size the stack can grow to. And yes, you've exceeded it.
Second thought, I just remembered - it can be called recursively. Check out this obfuscated code:
http://en.wikipedia.org/wiki/Obfuscated_code
It calls main many times and works wonders :) It's a fun link anyway. So, its definitely stack allocated, sorry about that!
The stack is something that is used by all functions - the way you've worded your question suggests that each function is given a stack which is not the case.
Stack usage grows with each function call - main() being the first. The allocation that you used in your example is just as bad as making a stack allocation in another function.
For most modern systems, there is no real reason the stack size needs to be limited. You can probably adjust an operating system parameter and that program will work fine. (As will any that allocates an equal amount of data on the stack, main or not.)
However, if you really want an object with a lifetime equal to the duration of the program, create a global variable instead of a local inside main. Most platforms do not artificially limit the size of global objects — they can usually be as large as the memory map allows.
By the way, main is not active for the duration of a C++ program. It may be preceded by construction of global objects and followed by destruction of same and atexit handlers.