I'm working with LLVM and somewhat new to it.
I'm having trouble figuring out what LLVM means by stack frame lowering. Can someone explain what it is?
Any help is appreciated
When a function runs, it gets some amount of space on the stack to store stuff like stack variables and callee saved registers (CSR). Stack frame lowering is the process of calculating the amount of space and the layout required for this, and then emitting the required machine instructions in the prologue and epilogue (beginning and end) of the function.
When variables on the stack are referred to before the prologue-epilogue insertion (PEI) step, they are addressed using "frame indexes", an arbitrary name for a location that will eventually resolve to a stack-pointer relative offset. Note that PEI happens fairly late (after register allocation).
Related
I am slightly confused on how a compiler stores variable on the stack. I read that c++ can store local variables on the stack, but if the stack is LIFO, how could it make sure to call the right variable when the variable is called in the program?
The run-time stack isn't simply a LIFO structure; it's a complex structure which supports random access.
Typically how it works on many platforms is that whe na function is entered, a new stack frame is pushed onto the stack (LIFO fashion). Local variables are stored within the frame and are accessed relative to a more or less stable pointer stored in a register.
When other functions are called, they push their own frames, but they restore everything before returning.
Run-time stacks also typically support the ad hoc pushing and popping of individual values, for the purpose of temporarily saving register values or for parameter passing.
A common implementation strategy for that is to allocate the frame first. For instance if a 512 byte stack frame is needed, the stack pointer is moved by 512 bytes. The stack pointer is then used freely for pushing and popping, provided it doesn't pop too far and start gobbling into the frame.
A separate frame pointer may be used which tracks the location of the frame. The frame is then accessed relative to that frame pointer, which allows the stack pointer to move without interfering with those accesses.
Compilers can generate code without the use of frame pointers also; if the stack pointer only moves in ways that a compiler knows about, it can adjust all the variable references. Over an area of the code where the compiler knows that the stack pointer has moved by four bytes due to something being pushed on it, it can adjust the stack-pointer-relative frame references by four bytes.
The basic principle is that when a given function is executing, then the stack is in the right state that is expected by that function (unless something has gone horribly wrong due to a bug causing corruption or something). The LIFO allocation strategy of the stack frames closely tracks the function calls and returns. Functions that are called must save and restore certain registers (the "callee-saved registers"), which helps to maintain the stable stack environment.
How do segmented stacks work? This question also applies to Boost.Coroutine so I am using the C++ tag as well here. The main doubt comes from this article It looks like what they do is keep some space at the bottom of the stack and check if it has gotten corrupted by registering some sort of signal handler with the memory allocated there (perhaps via mmap and mprotect?) And then when they detect that they have run out of space they continue by allocating more memory and then continuing from there. 3 questions about this
Isn't this construct a user space thing? How do they control where the new stack is allocated and how do the instructions the program is compiled down to get aware of that?
A push instruction is basically just adding a value to the stack pointer and then storing the value in a register on the stack, then how can the push instruction be aware of where the new stack starts and correspondingly how can the pop know when it has to move the stack pointer back to the old stack?
They also say
After we've got a new stack segment, we restart the goroutine by retrying the function that caused us to run out of stack
what does this mean? Do they restart the entire goroutine? Won't this possibly cause non deterministic behavior?
How do they detect that the program has overrun the stack? If they keep a canary-ish memory area at the bottom then what happens when the user program creates an array big enough that overflows that? Will that not cause a stack overflow and is a potential security vulnerability?
If the implementations are different for Go and Boost I would be happy to know how either of them deal with this situation 🙂
I'll give you a quick sketch of one possible implementation.
First, assume most stack frames are smaller than some size. For ones that are larger, we can use a longer instruction sequence at entry to make sure there is enough stack space. Let's assume we're on an architecture that that has 4k pages and we're choosing 4k - 1 as the maximum size stack frame handled by the fast path.
The stack is allocated with a single guard page at the bottom. That is, a page that is not mapped for write. At function entry, the stack pointer is decremented by the stack frame size, which is less than the size of a page, and then the program arranges to write a value at the lowest address in the newly allocated stack frame. If the end of the stack has been reached, this write will cause a processor exception and ultimately be turned into some sort of upcall from the OS to the user program -- e.g. a signal in UNIX family OSes.
The signal handler (or equivalent) has to be able to determine this is a stack extension fault from the address of the instruction that faulted and the address it was writing to. This is determinable as the instruction is in the prolog of a function and the address being written to is in the guard page of the stack for the current thread. The instruction being in the prolog can be recognized by requiring a very specific pattern of instructions at the start of functions, or possibly by maintaining metadata about functions. (Possibly using traceback tables.)
At this point the handler can allocate a new stack block, set the stack pointer to the top of the block, do something to handle unchaining the stack block, and then call the function that faulted again. This second call is safe because the fault is in the function prolog the compiler generated and no side effects are allowed before validating there is enough stack space. (The code may also need to fixup the return address for architectures that push it onto the stack automatically. If the return address is in a register, it just needs to be in the same register when the second call is made.)
Likely the easiest way to handle unchaining is to push a small stack frame onto the new extension block for a routine that when returned to unchains the new stack block and frees the allocated memory. It then returns the processor registers to the state they were in when the call was made that caused the stack to need to be extended.
The advantage of this design is that the function entry sequence is very few instructions and is very fast in the non-extending case. The disadvantage is that in the case where the stack does need to be extended, the processor incurs an exception, which may cost much much more than a function call.
Go doesn't actually use a guard page if I understand correctly. Rather the function prolog explicitly checks the stack limit and if the new stack frame won't fit it calls a function to extend the stack.
Go 1.3 changed its design to not use a linked list of stack blocks. This is to avoid the trap cost if the extension boundary is crossed in both directions many times in a certain calling pattern. They start with a small stack, and use a similar mechanism to detect the need for extension. But when a stack extension fault does occur, the entire stack is moved to a larger block. This removes the need for unchaining entirely.
There are quite a few details glossed over here. (E.g. one may not be able to do the stack extension in the signal handler itself. Rather the handler can arrange to have the thread suspended and hand it off to a manager thread. One likely has to use a dedicated signal stack to handle the signal as well.)
Another common pattern with this sort of thing is the runtime requiring there to be a certain amount of valid stack space below the current stack frame for either something like a signal handler or for calling special routines in the runtime. Go works this way and the stack limit test guarantees a certain fixed amount of stack space is available below the current frame. One can e.g. call plain C functions on the stack so long as one guarantees they do not consume more than the fixed stack reserve amount. (One can use this to call C library routines in theory, though most of these have no formal specification of how much stack they might use.)
Dynamic allocation in the stack frame, such as alloca or stack allocated variable length arrays, add some complexity to the implementation. If the routine can compute the entire final size of the frame in the prolog then it is fairly straightforward. Any increase in the frame size while the routine is running likely has to be modeled as a new call, though with Go's new architecture that allows moving the stack, it is possible the alloca point in the routine can be made such that all the state allows a stack move to happen there.
I recently used the /FAsu Visual C++ compiler option to output the source + assembly of a particularly long member function definition. In the assembly output, after the stack frame is set up, there is a single call to a mysterious _chkstk() function.
The MSDN page on _chkstk() does not explain the reason why this function is called. I have also seen the Stack Overflow question Allocating a buffer of more a page size on stack will corrupt memory?, but I do not understand what the OP and the accepted answer are talking about.
What is the purpose of the _chkstk() CRT function? What does it do?
Windows pages in extra stack for your thread as it is used. At the end of the stack, there is one guard page mapped as inaccessible memory -- if the program accesses it (because it is trying to use more stack than is currently mapped), there's an access violation. The OS catches the fault, maps in another page of stack at the same address as the old guard page, creates a new guard page just beyond the old one, and resumes from the instruction that caused the violation.
If a function has more than one page of local variables, then the first address it accesses might be more than one page beyond the current end of the stack. Hence it would miss the guard page and trigger an access violation that the OS doesn't realise is because more stack is needed. If the total stack required is particularly huge, it could perhaps even reach beyond the guard page, beyond the end of the virtual address space assigned to stack, and into memory that's actually in use for something else.
So, _chkstk ensures that there is enough space for the local variables. You can imagine that it does this by touching the memory for the local variables at page-sized intervals, in increasing order, to ensure that it doesn't miss the guard page (so-called "stack probes"). I don't know whether it actually does that, though, possibly it takes a more direct route and instructs the OS to map in a certain amount of stack. Either way, if the total required is greater than the virtual address space available for stack, then the OS can complain about it instead of doing something undefined.
I looked at the code for __chkstk and it does do the repeated stack probes at one-page intervals. So this way, it doesn't need to make any calls to the OS. The parameter in rax is size of data you want to add. It ensures that the target address (current rsp - rax) is accessible. If rax > rsp, it does this for address 0. As an interesting shortcut, it first compares the address with gs:[10h], which is the current lowest page that is mapped; if the target address >= this, then it does nothing.
By the way, for 64-bit code at least, it is spelled with two underscores: __chkstk__.
I have divided the whole question into smaller ones:
What kind of different algorithms GDB is capable to use to reconstruct stacktraces?
How each of the stacktrace reconstruction algorithm works at high level? Advantages and disadvantages?
What kind of meta-information compiler needs to provide in program for each stacktrace reconstruction algorithm to work?
And also corresponding g++ compiler switches that enable/disable particular algorithm?
Speaking Pseudocode, you could call the stack "an array of packed stack frames", where every stack frame is a data structure of variable size you could express like:
template struct stackframe<N> {
uintptr_t contents[N];
#ifndef OMIT_FRAME_POINTER
struct stackframe<> *nextfp;
#endif
void *retaddr;
};
Problem is that every function has a different <N> - frame sizes vary.
The compiler knows frame sizes, and if creating debugging information will usually emit these as part of that. All the debugger then needs to do is to locate the last program counter, look up the function in the symbol table, then use that name to look up the framesize in the debugging information. Add that to the stackpointer and you get to the beginning of the next frame.
If using this method you don't require frame linkage, and backtracing will work just fine even if you use -fomit-frame-pointer. On the other hand, if you have frame linkage, then iterating the stack is just following a linked list - because every framepointer in a new stackframe is initialized by the function prologue code to point to the previous one.
If you have neither frame size information nor framepointers, but still a symbol table, then you can also perform backtracing by a bit of reverse engineering to calculate the framesizes from the actual binary. Start with the program counter, look up the function it belongs to in the symbol table, and then disassemble the function from the start. Isolate all operations between the beginning of the function and the program counter that actually modify the stackpointer (write anything to the stack and/or allocate stackspace). That calculates the frame size for the current function, so subtract that from the stackpointer, and you should (on most architectures) find the last word written to the stack before the function was entered - which is usually the return address into the caller. Re-iterate as necessary.
Finally, you can perform a heuristic analysis of the contents of the stack - isolate all words in the stack that are within executably-mapped segments of the process address space (and thereby could be function offsets aka return addresses), and play a what-if game looking up the memory, disassembling the instruction there and see if it actually is a call instruction of sort, if so whether that really called the 'next' and if you can construct an uninterrupted call sequence from that. This works to a degree even if the binary is completely stripped (although all you could get in that case is a list of return addresses). I don't think GDB employs this technique, but some embedded lowlevel debuggers do. On x86, due to the varying instruction lengths, this is terribly difficult to do because you can't easily "step back" through an instruction stream, but on RISC, where instruction lengths are fixed, e.g. on ARM, this is much simpler.
There are some holes that make simple or even complex/exhaustive implementations of these algorithms fall over sometimes, like tail-recursive functions, inlined code, and so on. The gdb sourcecode might give you some more ideas:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/frame.c
GDB employs a variety of such techniques.
Why infinite recursion leads to seg fault ?
Why stack overflow leads to seg fault.
I am looking for detailed explanation.
int f()
{
f();
}
int main()
{
f();
}
Every time you call f(), you increase the size of the stack - that's where the return address is stored so the program knows where to go to when f() completes. As you never exit f(), the stack is going to increase by at least one return address each call. Once the stack segment is full up, you get a segfault error. You'll get similar results in every OS.
Segmentation fault is a condition when your program tries to access a memory location that it is not allowed to access. Infinite recursion causes your stack to grow. And grow. And grow. Eventually it will grow to a point when it will spill into an area of memory that your program is forbidden to access by the operating system. That's when you get the segmentation fault.
Your system resources are finite. They are limited. Even if your system has the most memory and storage on the entire Earth, infinite is WAY BIGGER than what you have. Remember that now.
The only way to do something an "infinite number of times" is to "forget" previous information. That is, you have to "forget" what has been done before. Otherwise you have to remember what happened before and that takes storage of one form or another (cache, memory, disk space, writing things down on paper, ...)--this is inescapable. If you are storing things, you have a finite amount of space available. Recall, that infinite is WAY BIGGER than what you have. If you try to store an infinite amount of information, you WILL run out of storage space.
When you employ recursion, you are implicitly storing previous information with each recursive call. Thus, at some point you will exhaust your storage if you try to do this an infinite number of takes. Your storage space in this case is the stack. The stack is a piece of finite memory. When you use it all up and try to access beyond what you have, the system will generate an exception which may ultimately result in a seg fault if the memory it tried to access was write-protected. If it was not write-protected, it will keep on going, overwriting god-knows-what until such time as it either tries to write to memory that just does not exist, or it tries to write to some other piece of write protected memory, or until it corrupts your code (in memory).
It's still a stackoverflow ;-)
The thing is that the C runtime doesn't provide "instrumentalisation" like other managed languages do (e.g. Java, Python, etc.), so writing outside the space designated for the stack instead of causing a detailed exception just raises a lower level error, that has the generic name of "segmentation fault".
This is for performance reasons, as those memory access watchdogs can be set with help of hardware support with little or none overhead; I cannot remember the exact details now, but it's usually done by marking the MMU page tables or with the mostly obsolete segment offsets registers.
AFAIK: The ends of the stack are protected by addresses that aren't accessible to the process. This prevents the stack from growing over allocated data-structures, and is more efficient than checking the stack size explicitly, since you have to check the memory protection anyway.
A program copunter or instruction pointer is a register which contains the value of next instruction to be executed.
In a function call, the current value of program counter pushed into the stack and then program counter points to first instruction of the function. The old value is poped after returning from that function and assigned to program counter. In infinite recursion the value is pushed again and again and leads to the stack overflow.
It's essentially the same principle as a buffer overflow; the OS allocates a fixed amount of memory for the stack, and when you run out (stack overflow) you get undefined behavior, which in this context means a SIGSEGV.
The basic idea:
int stack[A_LOT];
int rsp=0;
void call(Func_p fn)
{
stack[rsp++] = rip;
rip = fn;
}
void retn()
{
rip = stack[--rsp];
}
/*recurse*/
for(;;){call(somefunc);}
eventually rsp moves past the end of the stack and you try to put the next return address in unallocated storage and your program barfs. Obviously real systems are a lot more complicated than that, but that could (and has) take up several large books.
At a "low" level, the stack is "maintained" through a pointer (the stack pointer), kept in a processor register. This register points to memory, since stack is memory after all. When you push values on the stack, its "value" is decremented (stack pointer moves from higher addresses to lower addresses). Each time you enter a function some space is "taken" from the stack (local variables); moreover, on many architectures the call to a subroutine pushes the return value on the stack (and if the processor has no a special register stack pointer, likely a "normal" register is used for the purpose, since stack is useful even where subroutines can be called with other mechanisms), so that the stack is at least diminuished by the size of a pointer (say, 4 or 8 bytes).
In an infinite recursion loop, in the best case only the return value causes the stack to be decremented... until it points to a memory that can't be accessed by the program. And you see the segmentation fault problem.
You may find interesting this page.