Previously I had seen assembly of many functions in C++. In gcc, all of them start with these instructions:
push rbp
mov rbp, rsp
sub rsp, <X> ; <X> is size of frame
I know that these instructions store the frame pointer of previous function and then sets up a frame for current function. But here, assembly is neither asking for mapping memory (like malloc) and nor it is checking weather the memory pointed by rbp is allocated to the process.
So it assumes that startup code has mapped enough memory for the entire depth of call stack. So exactly how much memory is allocated for call stack? How does startup code can know the maximum depth of call stack?
It also means that, I can access array out of bound for a long distance since although it is not in current frame, it mapped to the process. So I wrote this code:
int main() {
int arr[3] = {};
printf("%d", arr[900]);
}
This is exiting with SIGSEGV when index is 900. But surprisingly not when index is 901. Similarly, it is exiting with SIGSEGV for some random indices and not for some. This behavior was observed when compiled with gcc-x86-64-11.2 in compiler explorer.
How does startup code can know the maximum depth of call stack?
It doesn't.
In most common implementation, the size of the stack is constant.
If the program exceeds the constant sized stack, that is called a stack overflow. This is why you must avoid creating large objects (which are typically, but not necessarily, arrays) in automatic storage, and why you must avoid recursion with linear depth (such as recursive linked list algorithms).
So exactly how much memory is allocated for call stack?
On most desktop/server systems it's configurable, and defaults to one to few megabytes. It can be much less on embedded systems.
This is exiting with SIGSEGV when index is 900. But surprisingly not when index is 901.
In both cases, the behaviour of the program is undefined.
Is it possible to know the allocated stack size?
Yes. You can read the documentation of the target system. If you intend to write a portable program, then you must assume the minimum of all target systems. For desktop/server, 1 megabyte that I mentioned is reasonable.
There is no standard way to acquire the size within C++.
Related
How do segmented stacks work? This question also applies to Boost.Coroutine so I am using the C++ tag as well here. The main doubt comes from this article It looks like what they do is keep some space at the bottom of the stack and check if it has gotten corrupted by registering some sort of signal handler with the memory allocated there (perhaps via mmap and mprotect?) And then when they detect that they have run out of space they continue by allocating more memory and then continuing from there. 3 questions about this
Isn't this construct a user space thing? How do they control where the new stack is allocated and how do the instructions the program is compiled down to get aware of that?
A push instruction is basically just adding a value to the stack pointer and then storing the value in a register on the stack, then how can the push instruction be aware of where the new stack starts and correspondingly how can the pop know when it has to move the stack pointer back to the old stack?
They also say
After we've got a new stack segment, we restart the goroutine by retrying the function that caused us to run out of stack
what does this mean? Do they restart the entire goroutine? Won't this possibly cause non deterministic behavior?
How do they detect that the program has overrun the stack? If they keep a canary-ish memory area at the bottom then what happens when the user program creates an array big enough that overflows that? Will that not cause a stack overflow and is a potential security vulnerability?
If the implementations are different for Go and Boost I would be happy to know how either of them deal with this situation 🙂
I'll give you a quick sketch of one possible implementation.
First, assume most stack frames are smaller than some size. For ones that are larger, we can use a longer instruction sequence at entry to make sure there is enough stack space. Let's assume we're on an architecture that that has 4k pages and we're choosing 4k - 1 as the maximum size stack frame handled by the fast path.
The stack is allocated with a single guard page at the bottom. That is, a page that is not mapped for write. At function entry, the stack pointer is decremented by the stack frame size, which is less than the size of a page, and then the program arranges to write a value at the lowest address in the newly allocated stack frame. If the end of the stack has been reached, this write will cause a processor exception and ultimately be turned into some sort of upcall from the OS to the user program -- e.g. a signal in UNIX family OSes.
The signal handler (or equivalent) has to be able to determine this is a stack extension fault from the address of the instruction that faulted and the address it was writing to. This is determinable as the instruction is in the prolog of a function and the address being written to is in the guard page of the stack for the current thread. The instruction being in the prolog can be recognized by requiring a very specific pattern of instructions at the start of functions, or possibly by maintaining metadata about functions. (Possibly using traceback tables.)
At this point the handler can allocate a new stack block, set the stack pointer to the top of the block, do something to handle unchaining the stack block, and then call the function that faulted again. This second call is safe because the fault is in the function prolog the compiler generated and no side effects are allowed before validating there is enough stack space. (The code may also need to fixup the return address for architectures that push it onto the stack automatically. If the return address is in a register, it just needs to be in the same register when the second call is made.)
Likely the easiest way to handle unchaining is to push a small stack frame onto the new extension block for a routine that when returned to unchains the new stack block and frees the allocated memory. It then returns the processor registers to the state they were in when the call was made that caused the stack to need to be extended.
The advantage of this design is that the function entry sequence is very few instructions and is very fast in the non-extending case. The disadvantage is that in the case where the stack does need to be extended, the processor incurs an exception, which may cost much much more than a function call.
Go doesn't actually use a guard page if I understand correctly. Rather the function prolog explicitly checks the stack limit and if the new stack frame won't fit it calls a function to extend the stack.
Go 1.3 changed its design to not use a linked list of stack blocks. This is to avoid the trap cost if the extension boundary is crossed in both directions many times in a certain calling pattern. They start with a small stack, and use a similar mechanism to detect the need for extension. But when a stack extension fault does occur, the entire stack is moved to a larger block. This removes the need for unchaining entirely.
There are quite a few details glossed over here. (E.g. one may not be able to do the stack extension in the signal handler itself. Rather the handler can arrange to have the thread suspended and hand it off to a manager thread. One likely has to use a dedicated signal stack to handle the signal as well.)
Another common pattern with this sort of thing is the runtime requiring there to be a certain amount of valid stack space below the current stack frame for either something like a signal handler or for calling special routines in the runtime. Go works this way and the stack limit test guarantees a certain fixed amount of stack space is available below the current frame. One can e.g. call plain C functions on the stack so long as one guarantees they do not consume more than the fixed stack reserve amount. (One can use this to call C library routines in theory, though most of these have no formal specification of how much stack they might use.)
Dynamic allocation in the stack frame, such as alloca or stack allocated variable length arrays, add some complexity to the implementation. If the routine can compute the entire final size of the frame in the prolog then it is fairly straightforward. Any increase in the frame size while the routine is running likely has to be modeled as a new call, though with Go's new architecture that allows moving the stack, it is possible the alloca point in the routine can be made such that all the state allows a stack move to happen there.
When I create a new variable in a C++ program, eg a char:
char c = 'a';
how does C++ then have access to this variable in memory? I would imagine that it would need to store the memory location of the variable, but then that would require a pointer variable, and this pointer would again need to be accessed.
See the docs:
When a variable is declared, the memory needed to store its value is
assigned a specific location in memory (its memory address).
Generally, C++ programs do not actively decide the exact memory
addresses where its variables are stored. Fortunately, that task is
left to the environment where the program is run - generally, an
operating system that decides the particular memory locations on
runtime. However, it may be useful for a program to be able to obtain
the address of a variable during runtime in order to access data cells
that are at a certain position relative to it.
You can also refer this article on Variables and Memory
The Stack
The stack is where local variables and function parameters reside. It
is called a stack because it follows the last-in, first-out principle.
As data is added or pushed to the stack, it grows, and when data is
removed or popped it shrinks. In reality, memory addresses are not
physically moved around every time data is pushed or popped from the
stack, instead the stack pointer, which as the name implies points to
the memory address at the top of the stack, moves up and down.
Everything below this address is considered to be on the stack and
usable, whereas everything above it is off the stack, and invalid.
This is all accomplished automatically by the operating system, and as
a result it is sometimes also called automatic memory. On the
extremely rare occasions that one needs to be able to explicitly
invoke this type of memory, the C++ key word auto can be used.
Normally, one declares variables on the stack like this:
void func () {
int i; float x[100];
...
}
Variables that are declared on the stack are only valid within the
scope of their declaration. That means when the function func() listed
above returns, i and x will no longer be accessible or valid.
There is another limitation to variables that are placed on the stack:
the operating system only allocates a certain amount of space to the
stack. As each part of a program that is being executed comes into
scope, the operating system allocates the appropriate amount of memory
that is required to hold all the local variables on the stack. If this
is greater than the amount of memory that the OS has allowed for the
total size of the stack, then the program will crash. While the
maximum size of the stack can sometimes be changed by compile time
parameters, it is usually fairly small, and nowhere near the total
amount of RAM available on a machine.
Assuming this is a local variable, then this variable is allocated on the stack - i.e. in the RAM. The compiler keeps track of the variable offset on the stack. In the basic scenario, in case any computation is then performed with the variable, it is moved to one of the processor's registers and the CPU performs the computation. Afterwards the result is returned back to the RAM. Modern processors keep whole stack frames in the registers and have multiple levels of registers, so it can get quite complex.
Please note the "c" name is no more mentioned in the binary (unless you have debugging symbols). The binary only then works with the memory locations. E.g. it would look like this (simple addition):
a = b + c
take value of memory offset 1 and put it in the register 1
take value of memory offset 2 and put in in the register 2
sum registers 1 and 2 and store the result in register 3
copy the register 3 to memory location 3
The binary doesn't know "a", "b" or "c". The compiler just said "a is in memory 1, b is in memory 2, c is in memory 3". And the CPU just blindly executes the commands the compiler has generated.
C++ itself (or, the compiler) would have access to this variable in terms of the program structure, represented as a data structure. Perhaps you're asking how other parts in the program would have access to it at run time.
The answer is that it varies. It can be stored either in a register, on the stack, on the heap, or in the data/bss sections (global/static variables), depending on its context and the platform it was compiled for: If you needed to pass it around by reference (or pointer) to other functions, then it would likely be stored on the stack. If you only need it in the context of your function, it would probably be handled in a register. If it's a member variable of an object on the heap, then it's on the heap, and you reference it by an offset into the object. If it's a global/static variable, then its address is determined once the program is fully loaded into memory.
C++ eventually compiles down to machine language, and often runs within the context of an operating system, so you might want to brush up a bit on Assembly basics, or even some OS principles, to better understand what's going on under the hood.
Lets say our program starts with a stack address of 4000000
When, you call a function, depending how much stack you use, it will "allocate it" like this
Let's say we have 2 ints (8bytes)
int function()
{
int a = 0;
int b = 0;
}
then whats gonna happen in assembly is
MOV EBP,ESP //Here we store the original value of the stack address (4000000) in EBP, and we restore it at the end of the function back to 4000000
SUB ESP, 8 //here we "allocate" 8 bytes in the stack, which basically just decreases the ESP addr by 8
so our ESP address was changed from
4000000
to
3999992
that's how the program knows knows the stack addresss for the first int is "3999992" and the second int is from 3999996 to 4000000
Even tho this pretty much has nothing to do with the compiler, it's really important to know because when you know how stack is "allocated", you realize how cheap it is to do things like
char my_array[20000];
since all it's doing is just doing sub esp, 20000 which is a single assembly instruction
but if u actually use all those bytes like memset(my_array,20000) that's a different history.
how does C++ then have access to this variable in memory?
It doesn't!
Your computer does, and it is instructed on how to do that by loading the location of the variable in memory into a register. This is all handled by assembly language. I shan't go into the details here of how such languages work (you can look it up!) but this is rather the purpose of a C++ compiler: to turn an abstract, high-level set of "instructions" into actual technical instructions that a computer can understand and execute. You could sort of say that assembly programs contain a lot of pointers, though most of them are literals rather than "variables".
I'm using C++ and Windows.h in my source code. I read the CreateThread API in MSDN, but I still don't understand the essence of specifying stack size. By default it is 1 MB. But what will happen if I specify 32 bytes?
What does stack size in a thread define?
Please provide a thorough explanation and I'll appreciate it. Thanks.
The stack is used to store local variables, pass parameters in function calls, store return addresses. A thread's stack has a fixed size which is determined when the thread is created. That is the value that you are referring too.
The stack size is determined when the thread is created since it needs to occupy contiguous address space. That means that the entire address space for the thread's stack has to be reserved at the point of creating the thread.
If the stack is too small then it can overflow. That's an error condition known as stack overflow, from which this website took its name. When you call a function some or all of the following happens:
Parameters are pushed onto the stack.
The return address is pushed onto the stack.
A stack frame containing space for the function's local variables is created.
All of this consumes space from the stack. When the function in turn calls another function, more stack space is consumed. As the call stack goes deeper, more stack space is required.
The consequence therefore of setting the stack size too low is that you can exhaust the stack and overflow it. That is a terminal condition from which you cannot recover. Certainly 32 bytes (rounded up to one page which is 4096 bytes) is too small for almost all threads.
If you have a program with a lot of threads, and you know that the thread's don't need to reserve 1MB of stack size then there can be benefits to using a smaller stack size. Doing so can avoid exhausting the available process address space.
On the other hand you might have a program with a single thread that has deep call stacks that consume large amounts of stack space. In this scenario you might reserve more than the default 1MB.
However, unless you have strong reason to do otherwise, it is likely best to stick to the default stack size.
Stack size is just tradeoff between ability to create many threads and possibility of stack overflow in one of them.
The more stack size is, the less number of threads you can create and the less possibility of stack overflow is. You should worry about stack size only if you are going to create many threads (you will have to lower stack size but remember about stack overflow). Otherwise default value is suffice.
But what will happen if I specify 32 bytes?
I have not read the Windows documentation, but if Windows allows this (to specify only 32 bytes), you will most likely get a stack overflow. According to their documentation the value is rounded up to the page size in anycase, therefore in reality you stack size will be at least the size of a page. The created thread assumes that there exists enough "stack space" for it to use (for allocating automatic variables, storing function addresses etc), and allocates space according to it's needs. When there is not enough stack space, the stack allocator might use invalid memory, overriding memory used elsewhere.
What does stack size in a thread define?
It defines how much memory will be allocated for use by that thread's stack.
There is a good description of what exactly a thread call stack is here
I was reading [1) about stack pointers and the need of knowing both ebp (start of the stack for the function) and esp (end). The article said that you need to know both because the stack can grow, but I don't see how this can be possible in c/c++. (Im not talking about another function call because to my mind this would make the stack grow, do some stuff, then recursively be popped and back to state before call)
I have done a little bit of research and only saw people saying that new allocates on the heap. But the pointer will be a local variable, right ? And this is known at compile time and reserved in the stack at the time the function is called.
I started to think that maybe with loops you have an uncontrolled number of local variables
int a;
for (int i = 0; i < n; ++i)
int b = i + 3;
but no, this doesn't allocate n times b, and only 1 int is reserved just as it is for a.
So... any example ?
[1): http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
You can allocate memory on the stack with alloca function from stdlib. I don't recommend to use this function in production code. It's to easy to corrupt your stack or get stack overflow.
The use of EBP is more for convenience. It is possible to just use ESP. The problem with that is, as parameters to functions are pushed onto the stack, the address relative to ESP of all the variables and parameters changes.
By setting EBP to a fixed, known position on the stack, usually between the function parameters and the local variables, the address of all these elements remains constant relative to EBP throughout the lifetime of the function. It can also help with debugging, as the value of ESP at the end of the function should equal the value of EBP.
The only way I know of to grow the stack in an indeterminate way at compile time, is to use alloca repeatedly.
A pointer is indeed allocated on the stack. And the size is usually 4 or 8 bytes on 32 and 64bit architectures respectively. So you know the size of a pointer statically, at compile time, and you can keep them on the stack if you choose to do so.
This pointer can point to free store, and you can allocate memory to it dynamically - without having to know the size beforehand. Moreover, it is usually a good idea to keep stack frames empty, and compilers will even "enforce" this with (adjustable) limits. MSVC has 1MB if I recall correctly.
So no, there is no way you can create a stack frame of size that is unknown at compile time. Your stack frame of the code you posted will have room for 3 integers (a, b, i). (And possibly some padding, shadow space, etc, not relevant.) (It is TECHNICALLY possible to extend the size of stackframes at run time but you just don't want to do that almost never.)
Why infinite recursion leads to seg fault ?
Why stack overflow leads to seg fault.
I am looking for detailed explanation.
int f()
{
f();
}
int main()
{
f();
}
Every time you call f(), you increase the size of the stack - that's where the return address is stored so the program knows where to go to when f() completes. As you never exit f(), the stack is going to increase by at least one return address each call. Once the stack segment is full up, you get a segfault error. You'll get similar results in every OS.
Segmentation fault is a condition when your program tries to access a memory location that it is not allowed to access. Infinite recursion causes your stack to grow. And grow. And grow. Eventually it will grow to a point when it will spill into an area of memory that your program is forbidden to access by the operating system. That's when you get the segmentation fault.
Your system resources are finite. They are limited. Even if your system has the most memory and storage on the entire Earth, infinite is WAY BIGGER than what you have. Remember that now.
The only way to do something an "infinite number of times" is to "forget" previous information. That is, you have to "forget" what has been done before. Otherwise you have to remember what happened before and that takes storage of one form or another (cache, memory, disk space, writing things down on paper, ...)--this is inescapable. If you are storing things, you have a finite amount of space available. Recall, that infinite is WAY BIGGER than what you have. If you try to store an infinite amount of information, you WILL run out of storage space.
When you employ recursion, you are implicitly storing previous information with each recursive call. Thus, at some point you will exhaust your storage if you try to do this an infinite number of takes. Your storage space in this case is the stack. The stack is a piece of finite memory. When you use it all up and try to access beyond what you have, the system will generate an exception which may ultimately result in a seg fault if the memory it tried to access was write-protected. If it was not write-protected, it will keep on going, overwriting god-knows-what until such time as it either tries to write to memory that just does not exist, or it tries to write to some other piece of write protected memory, or until it corrupts your code (in memory).
It's still a stackoverflow ;-)
The thing is that the C runtime doesn't provide "instrumentalisation" like other managed languages do (e.g. Java, Python, etc.), so writing outside the space designated for the stack instead of causing a detailed exception just raises a lower level error, that has the generic name of "segmentation fault".
This is for performance reasons, as those memory access watchdogs can be set with help of hardware support with little or none overhead; I cannot remember the exact details now, but it's usually done by marking the MMU page tables or with the mostly obsolete segment offsets registers.
AFAIK: The ends of the stack are protected by addresses that aren't accessible to the process. This prevents the stack from growing over allocated data-structures, and is more efficient than checking the stack size explicitly, since you have to check the memory protection anyway.
A program copunter or instruction pointer is a register which contains the value of next instruction to be executed.
In a function call, the current value of program counter pushed into the stack and then program counter points to first instruction of the function. The old value is poped after returning from that function and assigned to program counter. In infinite recursion the value is pushed again and again and leads to the stack overflow.
It's essentially the same principle as a buffer overflow; the OS allocates a fixed amount of memory for the stack, and when you run out (stack overflow) you get undefined behavior, which in this context means a SIGSEGV.
The basic idea:
int stack[A_LOT];
int rsp=0;
void call(Func_p fn)
{
stack[rsp++] = rip;
rip = fn;
}
void retn()
{
rip = stack[--rsp];
}
/*recurse*/
for(;;){call(somefunc);}
eventually rsp moves past the end of the stack and you try to put the next return address in unallocated storage and your program barfs. Obviously real systems are a lot more complicated than that, but that could (and has) take up several large books.
At a "low" level, the stack is "maintained" through a pointer (the stack pointer), kept in a processor register. This register points to memory, since stack is memory after all. When you push values on the stack, its "value" is decremented (stack pointer moves from higher addresses to lower addresses). Each time you enter a function some space is "taken" from the stack (local variables); moreover, on many architectures the call to a subroutine pushes the return value on the stack (and if the processor has no a special register stack pointer, likely a "normal" register is used for the purpose, since stack is useful even where subroutines can be called with other mechanisms), so that the stack is at least diminuished by the size of a pointer (say, 4 or 8 bytes).
In an infinite recursion loop, in the best case only the return value causes the stack to be decremented... until it points to a memory that can't be accessed by the program. And you see the segmentation fault problem.
You may find interesting this page.