I know that when a function gets called, a stack frame is created for it which contains(local variables,return address,frame pointer...) and pushed on to the program stack.
We are able to use the passed aurguments randomly.
Void func(int a,int b,int c){
//a,b,c
//c,b,a
//a,c,b
}
In the above function the arguments can be use randomly, I know that the stack is LIFO(last in first out), for now I just want to know, is the stack frame random access?
Because we are able to access the variables (local variables) randomly.
Yes, on all platforms I'm aware of that use a stack all RAM is random access (that's what the 'RA' stands for after all).
The stack is just a convention for managing ownership and organization of your program's memory so that different function calls don't try to use memory that other function calls are still using. Each function call pushes a stack frame onto the top of the stack to indicate what memory it needs to use. It can randomly access any of that memory (or any other memory; the CPU doesn't prevent functions from accessing other functions' stack frames) as it pleases. The stack frame is just a way of telling other function calls that the memory is in use.
On an x86_64 machine, your values for a, b, and c would be stored in the lower-32 bits of the %rdi, %rsi, and %rdx registers, which correspond to some of the registers in your hardware.
The stack itself is just a location in memory, typically in DRAM, that you can access as you please, specifically allocated in frames, based on your function calls. Your function that you just called would have its own frame.
Related
Previously I had seen assembly of many functions in C++. In gcc, all of them start with these instructions:
push rbp
mov rbp, rsp
sub rsp, <X> ; <X> is size of frame
I know that these instructions store the frame pointer of previous function and then sets up a frame for current function. But here, assembly is neither asking for mapping memory (like malloc) and nor it is checking weather the memory pointed by rbp is allocated to the process.
So it assumes that startup code has mapped enough memory for the entire depth of call stack. So exactly how much memory is allocated for call stack? How does startup code can know the maximum depth of call stack?
It also means that, I can access array out of bound for a long distance since although it is not in current frame, it mapped to the process. So I wrote this code:
int main() {
int arr[3] = {};
printf("%d", arr[900]);
}
This is exiting with SIGSEGV when index is 900. But surprisingly not when index is 901. Similarly, it is exiting with SIGSEGV for some random indices and not for some. This behavior was observed when compiled with gcc-x86-64-11.2 in compiler explorer.
How does startup code can know the maximum depth of call stack?
It doesn't.
In most common implementation, the size of the stack is constant.
If the program exceeds the constant sized stack, that is called a stack overflow. This is why you must avoid creating large objects (which are typically, but not necessarily, arrays) in automatic storage, and why you must avoid recursion with linear depth (such as recursive linked list algorithms).
So exactly how much memory is allocated for call stack?
On most desktop/server systems it's configurable, and defaults to one to few megabytes. It can be much less on embedded systems.
This is exiting with SIGSEGV when index is 900. But surprisingly not when index is 901.
In both cases, the behaviour of the program is undefined.
Is it possible to know the allocated stack size?
Yes. You can read the documentation of the target system. If you intend to write a portable program, then you must assume the minimum of all target systems. For desktop/server, 1 megabyte that I mentioned is reasonable.
There is no standard way to acquire the size within C++.
I am slightly confused on how a compiler stores variable on the stack. I read that c++ can store local variables on the stack, but if the stack is LIFO, how could it make sure to call the right variable when the variable is called in the program?
The run-time stack isn't simply a LIFO structure; it's a complex structure which supports random access.
Typically how it works on many platforms is that whe na function is entered, a new stack frame is pushed onto the stack (LIFO fashion). Local variables are stored within the frame and are accessed relative to a more or less stable pointer stored in a register.
When other functions are called, they push their own frames, but they restore everything before returning.
Run-time stacks also typically support the ad hoc pushing and popping of individual values, for the purpose of temporarily saving register values or for parameter passing.
A common implementation strategy for that is to allocate the frame first. For instance if a 512 byte stack frame is needed, the stack pointer is moved by 512 bytes. The stack pointer is then used freely for pushing and popping, provided it doesn't pop too far and start gobbling into the frame.
A separate frame pointer may be used which tracks the location of the frame. The frame is then accessed relative to that frame pointer, which allows the stack pointer to move without interfering with those accesses.
Compilers can generate code without the use of frame pointers also; if the stack pointer only moves in ways that a compiler knows about, it can adjust all the variable references. Over an area of the code where the compiler knows that the stack pointer has moved by four bytes due to something being pushed on it, it can adjust the stack-pointer-relative frame references by four bytes.
The basic principle is that when a given function is executing, then the stack is in the right state that is expected by that function (unless something has gone horribly wrong due to a bug causing corruption or something). The LIFO allocation strategy of the stack frames closely tracks the function calls and returns. Functions that are called must save and restore certain registers (the "callee-saved registers"), which helps to maintain the stable stack environment.
How do segmented stacks work? This question also applies to Boost.Coroutine so I am using the C++ tag as well here. The main doubt comes from this article It looks like what they do is keep some space at the bottom of the stack and check if it has gotten corrupted by registering some sort of signal handler with the memory allocated there (perhaps via mmap and mprotect?) And then when they detect that they have run out of space they continue by allocating more memory and then continuing from there. 3 questions about this
Isn't this construct a user space thing? How do they control where the new stack is allocated and how do the instructions the program is compiled down to get aware of that?
A push instruction is basically just adding a value to the stack pointer and then storing the value in a register on the stack, then how can the push instruction be aware of where the new stack starts and correspondingly how can the pop know when it has to move the stack pointer back to the old stack?
They also say
After we've got a new stack segment, we restart the goroutine by retrying the function that caused us to run out of stack
what does this mean? Do they restart the entire goroutine? Won't this possibly cause non deterministic behavior?
How do they detect that the program has overrun the stack? If they keep a canary-ish memory area at the bottom then what happens when the user program creates an array big enough that overflows that? Will that not cause a stack overflow and is a potential security vulnerability?
If the implementations are different for Go and Boost I would be happy to know how either of them deal with this situation 🙂
I'll give you a quick sketch of one possible implementation.
First, assume most stack frames are smaller than some size. For ones that are larger, we can use a longer instruction sequence at entry to make sure there is enough stack space. Let's assume we're on an architecture that that has 4k pages and we're choosing 4k - 1 as the maximum size stack frame handled by the fast path.
The stack is allocated with a single guard page at the bottom. That is, a page that is not mapped for write. At function entry, the stack pointer is decremented by the stack frame size, which is less than the size of a page, and then the program arranges to write a value at the lowest address in the newly allocated stack frame. If the end of the stack has been reached, this write will cause a processor exception and ultimately be turned into some sort of upcall from the OS to the user program -- e.g. a signal in UNIX family OSes.
The signal handler (or equivalent) has to be able to determine this is a stack extension fault from the address of the instruction that faulted and the address it was writing to. This is determinable as the instruction is in the prolog of a function and the address being written to is in the guard page of the stack for the current thread. The instruction being in the prolog can be recognized by requiring a very specific pattern of instructions at the start of functions, or possibly by maintaining metadata about functions. (Possibly using traceback tables.)
At this point the handler can allocate a new stack block, set the stack pointer to the top of the block, do something to handle unchaining the stack block, and then call the function that faulted again. This second call is safe because the fault is in the function prolog the compiler generated and no side effects are allowed before validating there is enough stack space. (The code may also need to fixup the return address for architectures that push it onto the stack automatically. If the return address is in a register, it just needs to be in the same register when the second call is made.)
Likely the easiest way to handle unchaining is to push a small stack frame onto the new extension block for a routine that when returned to unchains the new stack block and frees the allocated memory. It then returns the processor registers to the state they were in when the call was made that caused the stack to need to be extended.
The advantage of this design is that the function entry sequence is very few instructions and is very fast in the non-extending case. The disadvantage is that in the case where the stack does need to be extended, the processor incurs an exception, which may cost much much more than a function call.
Go doesn't actually use a guard page if I understand correctly. Rather the function prolog explicitly checks the stack limit and if the new stack frame won't fit it calls a function to extend the stack.
Go 1.3 changed its design to not use a linked list of stack blocks. This is to avoid the trap cost if the extension boundary is crossed in both directions many times in a certain calling pattern. They start with a small stack, and use a similar mechanism to detect the need for extension. But when a stack extension fault does occur, the entire stack is moved to a larger block. This removes the need for unchaining entirely.
There are quite a few details glossed over here. (E.g. one may not be able to do the stack extension in the signal handler itself. Rather the handler can arrange to have the thread suspended and hand it off to a manager thread. One likely has to use a dedicated signal stack to handle the signal as well.)
Another common pattern with this sort of thing is the runtime requiring there to be a certain amount of valid stack space below the current stack frame for either something like a signal handler or for calling special routines in the runtime. Go works this way and the stack limit test guarantees a certain fixed amount of stack space is available below the current frame. One can e.g. call plain C functions on the stack so long as one guarantees they do not consume more than the fixed stack reserve amount. (One can use this to call C library routines in theory, though most of these have no formal specification of how much stack they might use.)
Dynamic allocation in the stack frame, such as alloca or stack allocated variable length arrays, add some complexity to the implementation. If the routine can compute the entire final size of the frame in the prolog then it is fairly straightforward. Any increase in the frame size while the routine is running likely has to be modeled as a new call, though with Go's new architecture that allows moving the stack, it is possible the alloca point in the routine can be made such that all the state allows a stack move to happen there.
When I create a new variable in a C++ program, eg a char:
char c = 'a';
how does C++ then have access to this variable in memory? I would imagine that it would need to store the memory location of the variable, but then that would require a pointer variable, and this pointer would again need to be accessed.
See the docs:
When a variable is declared, the memory needed to store its value is
assigned a specific location in memory (its memory address).
Generally, C++ programs do not actively decide the exact memory
addresses where its variables are stored. Fortunately, that task is
left to the environment where the program is run - generally, an
operating system that decides the particular memory locations on
runtime. However, it may be useful for a program to be able to obtain
the address of a variable during runtime in order to access data cells
that are at a certain position relative to it.
You can also refer this article on Variables and Memory
The Stack
The stack is where local variables and function parameters reside. It
is called a stack because it follows the last-in, first-out principle.
As data is added or pushed to the stack, it grows, and when data is
removed or popped it shrinks. In reality, memory addresses are not
physically moved around every time data is pushed or popped from the
stack, instead the stack pointer, which as the name implies points to
the memory address at the top of the stack, moves up and down.
Everything below this address is considered to be on the stack and
usable, whereas everything above it is off the stack, and invalid.
This is all accomplished automatically by the operating system, and as
a result it is sometimes also called automatic memory. On the
extremely rare occasions that one needs to be able to explicitly
invoke this type of memory, the C++ key word auto can be used.
Normally, one declares variables on the stack like this:
void func () {
int i; float x[100];
...
}
Variables that are declared on the stack are only valid within the
scope of their declaration. That means when the function func() listed
above returns, i and x will no longer be accessible or valid.
There is another limitation to variables that are placed on the stack:
the operating system only allocates a certain amount of space to the
stack. As each part of a program that is being executed comes into
scope, the operating system allocates the appropriate amount of memory
that is required to hold all the local variables on the stack. If this
is greater than the amount of memory that the OS has allowed for the
total size of the stack, then the program will crash. While the
maximum size of the stack can sometimes be changed by compile time
parameters, it is usually fairly small, and nowhere near the total
amount of RAM available on a machine.
Assuming this is a local variable, then this variable is allocated on the stack - i.e. in the RAM. The compiler keeps track of the variable offset on the stack. In the basic scenario, in case any computation is then performed with the variable, it is moved to one of the processor's registers and the CPU performs the computation. Afterwards the result is returned back to the RAM. Modern processors keep whole stack frames in the registers and have multiple levels of registers, so it can get quite complex.
Please note the "c" name is no more mentioned in the binary (unless you have debugging symbols). The binary only then works with the memory locations. E.g. it would look like this (simple addition):
a = b + c
take value of memory offset 1 and put it in the register 1
take value of memory offset 2 and put in in the register 2
sum registers 1 and 2 and store the result in register 3
copy the register 3 to memory location 3
The binary doesn't know "a", "b" or "c". The compiler just said "a is in memory 1, b is in memory 2, c is in memory 3". And the CPU just blindly executes the commands the compiler has generated.
C++ itself (or, the compiler) would have access to this variable in terms of the program structure, represented as a data structure. Perhaps you're asking how other parts in the program would have access to it at run time.
The answer is that it varies. It can be stored either in a register, on the stack, on the heap, or in the data/bss sections (global/static variables), depending on its context and the platform it was compiled for: If you needed to pass it around by reference (or pointer) to other functions, then it would likely be stored on the stack. If you only need it in the context of your function, it would probably be handled in a register. If it's a member variable of an object on the heap, then it's on the heap, and you reference it by an offset into the object. If it's a global/static variable, then its address is determined once the program is fully loaded into memory.
C++ eventually compiles down to machine language, and often runs within the context of an operating system, so you might want to brush up a bit on Assembly basics, or even some OS principles, to better understand what's going on under the hood.
Lets say our program starts with a stack address of 4000000
When, you call a function, depending how much stack you use, it will "allocate it" like this
Let's say we have 2 ints (8bytes)
int function()
{
int a = 0;
int b = 0;
}
then whats gonna happen in assembly is
MOV EBP,ESP //Here we store the original value of the stack address (4000000) in EBP, and we restore it at the end of the function back to 4000000
SUB ESP, 8 //here we "allocate" 8 bytes in the stack, which basically just decreases the ESP addr by 8
so our ESP address was changed from
4000000
to
3999992
that's how the program knows knows the stack addresss for the first int is "3999992" and the second int is from 3999996 to 4000000
Even tho this pretty much has nothing to do with the compiler, it's really important to know because when you know how stack is "allocated", you realize how cheap it is to do things like
char my_array[20000];
since all it's doing is just doing sub esp, 20000 which is a single assembly instruction
but if u actually use all those bytes like memset(my_array,20000) that's a different history.
how does C++ then have access to this variable in memory?
It doesn't!
Your computer does, and it is instructed on how to do that by loading the location of the variable in memory into a register. This is all handled by assembly language. I shan't go into the details here of how such languages work (you can look it up!) but this is rather the purpose of a C++ compiler: to turn an abstract, high-level set of "instructions" into actual technical instructions that a computer can understand and execute. You could sort of say that assembly programs contain a lot of pointers, though most of them are literals rather than "variables".
I was reading [1) about stack pointers and the need of knowing both ebp (start of the stack for the function) and esp (end). The article said that you need to know both because the stack can grow, but I don't see how this can be possible in c/c++. (Im not talking about another function call because to my mind this would make the stack grow, do some stuff, then recursively be popped and back to state before call)
I have done a little bit of research and only saw people saying that new allocates on the heap. But the pointer will be a local variable, right ? And this is known at compile time and reserved in the stack at the time the function is called.
I started to think that maybe with loops you have an uncontrolled number of local variables
int a;
for (int i = 0; i < n; ++i)
int b = i + 3;
but no, this doesn't allocate n times b, and only 1 int is reserved just as it is for a.
So... any example ?
[1): http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
You can allocate memory on the stack with alloca function from stdlib. I don't recommend to use this function in production code. It's to easy to corrupt your stack or get stack overflow.
The use of EBP is more for convenience. It is possible to just use ESP. The problem with that is, as parameters to functions are pushed onto the stack, the address relative to ESP of all the variables and parameters changes.
By setting EBP to a fixed, known position on the stack, usually between the function parameters and the local variables, the address of all these elements remains constant relative to EBP throughout the lifetime of the function. It can also help with debugging, as the value of ESP at the end of the function should equal the value of EBP.
The only way I know of to grow the stack in an indeterminate way at compile time, is to use alloca repeatedly.
A pointer is indeed allocated on the stack. And the size is usually 4 or 8 bytes on 32 and 64bit architectures respectively. So you know the size of a pointer statically, at compile time, and you can keep them on the stack if you choose to do so.
This pointer can point to free store, and you can allocate memory to it dynamically - without having to know the size beforehand. Moreover, it is usually a good idea to keep stack frames empty, and compilers will even "enforce" this with (adjustable) limits. MSVC has 1MB if I recall correctly.
So no, there is no way you can create a stack frame of size that is unknown at compile time. Your stack frame of the code you posted will have room for 3 integers (a, b, i). (And possibly some padding, shadow space, etc, not relevant.) (It is TECHNICALLY possible to extend the size of stackframes at run time but you just don't want to do that almost never.)