I know the maximum stack size usually is fixed on link (maybe on windows is that).
But I don't know when the program stack size ( not maximum stack size just used size) used is be fixed to OS. compile ? linked ? execute ?
like this:
int main(){ int a[10]; return 0;}
the program just use 10 * sizeof(int) stack. so, is the stack size fixed?
above all. if the heap size is changed when malloc or free?
Stack size is not explicitly provided to OS, when program is loaded. Instead, OS uses mechanism of page faults (if it is supported by MMU).
If you try to access memory which was not granted by operating system yet, MMU generates a page fault which is handled by OS. OS checks address of page fault and either expands stack by creating new memory page or if you have exhausted stack limits, handles it as stack overflow.
Consider following program running on x86 and Linux:
void foo(void) {
volatile int a = 10;
foo();
}
int main() {
foo();
}
It faults because of infinite recursion and stack overflow. It actually requires infinite stack to be completed. When program is loaded, OS allocates initial stack and writes it to %rsp (stack pointer). Let's look at foo() disassembly:
push %rbp
mov %rsp,%rbp <--- Save stackpointer to %rbp
sub $0x10,%rsp <--- Advance stack pointer by 16 bytes
movl $0xa,-0x4(%rbp) <--- Write memory at %rbp
callq 0x400500 <foo>
leaveq
retq
After at most 4096 / 16 = 256 calls of foo(), you will break page boundary by writing a memory at address X + 4096 where X is initial %rsp value. Then page fault will be generated, and OS provide new memory page for stack, allowing program to utilize it.
After about 500k of foo() calls (default Linux ulimit for stack), OS will detect that application utilizes too many stack pages and send SIGSEGV to it.
In an answer to a question I provided the following information:
The BSS/DATA segment contains all the global variables, initialized to a specific value or to zero by default. This segment is part of the executable image. At load time, the heap segment is added to this; however, it is not a "segment" but just the amount of extra data to be allocated as an extension of the loaded BSS/DATA segment. In the same way the stack "segment" is not a true segment but is added to the BSS+heap segment. The stack grows down whereas the heap grows up. If these overlap (more heap used and stack still growing) an "out of memory" error occurrs (heap) or "stack overflow" (stack) - this may be detected with the use of segment registers (Intel) to trigger a hardware generated exception or by using software checks.
This is the traditional way of laying out the segments. Think of older Intel chips where all progeram data must be in 64KB. With more modern chips the same layout is often used where address space of 32MB is used in this layout but only actual physical memory required is used. The stack can thus be pretty big.
Related
I know that when a function gets called, a stack frame is created for it which contains(local variables,return address,frame pointer...) and pushed on to the program stack.
We are able to use the passed aurguments randomly.
Void func(int a,int b,int c){
//a,b,c
//c,b,a
//a,c,b
}
In the above function the arguments can be use randomly, I know that the stack is LIFO(last in first out), for now I just want to know, is the stack frame random access?
Because we are able to access the variables (local variables) randomly.
Yes, on all platforms I'm aware of that use a stack all RAM is random access (that's what the 'RA' stands for after all).
The stack is just a convention for managing ownership and organization of your program's memory so that different function calls don't try to use memory that other function calls are still using. Each function call pushes a stack frame onto the top of the stack to indicate what memory it needs to use. It can randomly access any of that memory (or any other memory; the CPU doesn't prevent functions from accessing other functions' stack frames) as it pleases. The stack frame is just a way of telling other function calls that the memory is in use.
On an x86_64 machine, your values for a, b, and c would be stored in the lower-32 bits of the %rdi, %rsi, and %rdx registers, which correspond to some of the registers in your hardware.
The stack itself is just a location in memory, typically in DRAM, that you can access as you please, specifically allocated in frames, based on your function calls. Your function that you just called would have its own frame.
Previously I had seen assembly of many functions in C++. In gcc, all of them start with these instructions:
push rbp
mov rbp, rsp
sub rsp, <X> ; <X> is size of frame
I know that these instructions store the frame pointer of previous function and then sets up a frame for current function. But here, assembly is neither asking for mapping memory (like malloc) and nor it is checking weather the memory pointed by rbp is allocated to the process.
So it assumes that startup code has mapped enough memory for the entire depth of call stack. So exactly how much memory is allocated for call stack? How does startup code can know the maximum depth of call stack?
It also means that, I can access array out of bound for a long distance since although it is not in current frame, it mapped to the process. So I wrote this code:
int main() {
int arr[3] = {};
printf("%d", arr[900]);
}
This is exiting with SIGSEGV when index is 900. But surprisingly not when index is 901. Similarly, it is exiting with SIGSEGV for some random indices and not for some. This behavior was observed when compiled with gcc-x86-64-11.2 in compiler explorer.
How does startup code can know the maximum depth of call stack?
It doesn't.
In most common implementation, the size of the stack is constant.
If the program exceeds the constant sized stack, that is called a stack overflow. This is why you must avoid creating large objects (which are typically, but not necessarily, arrays) in automatic storage, and why you must avoid recursion with linear depth (such as recursive linked list algorithms).
So exactly how much memory is allocated for call stack?
On most desktop/server systems it's configurable, and defaults to one to few megabytes. It can be much less on embedded systems.
This is exiting with SIGSEGV when index is 900. But surprisingly not when index is 901.
In both cases, the behaviour of the program is undefined.
Is it possible to know the allocated stack size?
Yes. You can read the documentation of the target system. If you intend to write a portable program, then you must assume the minimum of all target systems. For desktop/server, 1 megabyte that I mentioned is reasonable.
There is no standard way to acquire the size within C++.
This is merely out of interest and I personally use C++Builder 2009
Suppose I allocate: wchar_t Buffer[32] or I allocate wchar_t Buffer[512]
The second call allocates more memory, so you could argue that the second call is more expensive in terms of memory usage.
However, is anything else also possibly affected by allocating more memory this way ? Are there more instructions involved ? More CPU usage ?
Just wondering ?
However, is anything else also possibly affected by allocating more memory this way ?
There can be one related side effect: when you allocate more memory for your buffer, you increase the chance that the stack pages the program needs to access will be split across more cache lines, which may ultimately mean the CPU has to wait for a cache miss that otherwise wouldn't have happened. Note that there's no particular reason here to think you're using more of the buffer: the "problem" is that the CPU's likely to be asked to get data before and after the buffer, and all of that may be split across more cache lines. These stack pages around the buffer are likely to be accessed often enough to keep them in cache, but in doing so some cache content that hasn't been used for a while may be ejected, and if that's later needed then you have a cache miss. The granularity of the cache lines (how many bytes per "page") can also affect how this pans out.
This is usually totally insignificant, but you asked... ;-).
Are there more instructions involved ?
No more instructions are involved.
More CPU usage ?
In as much as time waiting for a cache is "usage".
This is ''allocating'' stack memory. All this requires is adjusting the stack pointer. If you write a function like:
void foo()
{
char c[32];
...
}
The resulting assembly looks like (on a 64-bit machine):
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $48, %rsp // This is the actual "allocation" on the stack
If you change this to char c[512], the only thing that changes is:
subq $528, %rsp // Allocation of 512 bytes on stack
There is no difference in CPU instructions or the time this takes. The only difference is the second uses up more of the limited amount of stack memory.
There won't be any difference in the instructions by allocating more size.
Also the stack memory is known at compile time and compiler generates the required instructions.
For Example ::
int main()
{
char Buffer[1024] ;
char Buffer2[ 512] ;
return 0 ;
}
00981530 push ebp
00981531 mov ebp,esp
00981533 sub esp,6DCh //6dch = 1756 just the esp is adjusted to allocate more memory
00981539 push ebx
int main()
{
char Buffer[32] ;
char Buffer2[ 512] ;
return 0 ;
}
00D51530 push ebp
00D51531 mov ebp,esp
00D51533 sub esp,2FCh //2fch = 764 now the esp is adjusted to 764 without any instruction change.
00D51539 push ebx
Are there more instructions involved ?
No, you can see the above example :)
More CPU usage ?
No, because same number of instructions got executed
More Memory usage ?
Yes, because more stack memory allocated.
In operating systems a memory management implementation called dynamic loading only loads routines as it is called instead of loading all routines in your program into the main memory. When the routine is loaded all its elements' addresses have to be loaded into the page table for address translations. The content corresponding to a particular address is loaded in unit called page.
Page sizes are of usually in smaller orders of 2kb or 4kb. If the content exceeds the page size then it is split up and occupies more than one page. When page fault occurs a new content is loaded overwriting the older content based on page replacement policies to a swap space. When the replaced content is again needed MMU will again load the content from the swap space to the page. This
Let's think what could happen if larger content is involved for loading and swapping, it's a performance issue and involves some cpu cycles.
As per your questions it's not an impact with arrays of wchar_t of size 32 or 512. But a different data structure having a size in Megabytes and an array of this structure in few thousands will make some impact on memory and CPU. I suggest you to have look here.
I think if you are calling the first one to the same extent, meaning the same amount of bits, then it would be better to use wchar_t Buffer[512] simply because it would take longer and, I believe, use more resources to start a call exit and then start another, and keep going. But with the second you have one start and then it is tied to that task, which is fine as long as you don't want to do anything else for a while. Hope that helped.
When a binary (C/C++) is executed under Linux,
How is the stack initialized for the process?
How does the stack grow and up to what limit?
Using ulimit, I can have a limit number and by using setrlimit, I can modify it, but up to what limit, how can I determine it?
Is the same stack size allocated for all executing processes?
As you can see in the code below, I have recursively called func() for push operation only, and the stack grew up to around approximately 8 MB. And it crashed (stack overflow!).
void func()
{
static int i=0;
int arr[1024]={0};
printf("%d KB pushed on stack!\n",++i*sizeof(int));
func();
}
int main()
{
func();
return 0;
}
output snippet:
8108 KB pushed on stack!
8112 KB pushed on stack!
8116 KB pushed on stack!
8120 KB pushed on stack!
Segmentation fault (core dumped)
Where did these approximately 8 MB come from?
Stack is one of the various memory region that is associated to a process at startup time and may vary during runtime. Others can be text/code, heap, static/bss, etc.
Each time you call a function the stack grows. A stack frame is added on top of it. A stack frame is what is necessary to a given function to be executed (parameters, return value, local variables). Each time you return from a function, the stack shrinks by the same amount it grew.
You can try to estimate how deep you function call tree will be (f calls g which in turn calls h, depth is 3 calls, so 3 stack frames).
Yes there is a default value that was estimated by OS designers. That size is in general sufficient.
This is a default constant associated to your OS.
How stack is initialized for its process?
It depends on the architecture, but in general, the kernel allocates some virtual memory in your process's VM, and sets the stack pointer register to point to the top of it.
How stack grows and up to what limit?
Every function call reserves more space on the stack using an architecturally defined procedures. This is typically referred to as a "function prologue".
Using ulimit, I can have limit number and using setrlimit, I can modify it but up to what limit, how can I determine it?
ulimit -s will tell you the maximum stack size (in KB) for the current process (and all child processes which will inherit this value, unless overridden).
Does same stack size is allocated for all executing process?
See previous answer.
Related:
Is there a limit of stack size of a process in linux
Suppose in a program we have implemented a stack. But who creates the stack ? Is it the processor, or operating system, or compiler?
Are you confusing the programs execution stack with the stack container?
You can't "implement" the execution stack, the OS will give you Virtual Address Space and locate there your stack pointer, so you just push and pop from it, you don't "create it", its there when you start.
If you mean the data structure: The processor executes the code. The code makes calls to the operating system to get the memory for the stack, and then manipulates it to form it into a stack. The compiler just turns the code you wrote into code the processor can understand.
If you mean the execution stack: The OS is responsible for loading a process into memory and setting up its memory space to form the stack.
Your program... it performs the required assembly. That assembly was inserted by the compiler in place of the function/function call based on the calling convention being used.
Learning about calling conventions would probably be the most effective way to answer your question.
None of the above. YOU created it when you implemented it. The compiler only translates your thoughts (expressed in a programming language) into machine or assembly code. The processor only runs that program that you wrote. The operating system (assuming one exists), provides mechanisms to facilitate giving you an execution space and memory to do it, but YOUR PROGRAM determines what happens in that execution space and memory.
If you want to implement a stack of your own, try using std::stack<>. If you're talking about the stack that local variables are on, that's created by the C++ runtime system.
"Suppose in a program we have implemented a stack."
Then you implemented it on the underlying, low level data structure like for example array. Your stack = array + functions (push(), pop()) working on an array to provide stack functionality.
"But who creates the stack ? Is it the processor, or operating system, or compiler?"
And who creates functions and array? Functions are created by you, then compiler translates functions to machine instructions and keeps this code in executable. Additionaly it produces a set of instructions to allocate some space in the memory for your array. So your program is a mix of instructions and space for an array. Then operating system loads your program and sends instructions to the processor. Processor performs this instructions and reads/writes data to your array.
Say you have a test C program:
int square( int val ) {
int result;
result = val * val;
return( result );
}
int main( void ) {
int store;
store = square( 3 );
return( 0 );
}
then you can produce the assembler output produced by the compiler using the command gcc -S test.c -o test.s (if you're on a Linux platform).
Looking at the generated code for just the square() function we get:
square:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl 8(%ebp), %eax
imull 8(%ebp), %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
leave
ret
You can see that the compiler has generated code to move the stack pointer for the local variables in the routine.
The initialisation code for your program will have allocated a certain amount of memory for "the stack" by calling system (Operating Systems) memory allocation functions. Then it is up to the compiled program to choose how to utilise that area of memory.
Fortunately for you all of this is effectively handled by the compiler without you having to think about it (unless, of course, you're likely to have local variables that are too big for a standard stack size, in which case you may have to instruct your compiler, or thread library, to allocate more stack from the system).
Suppose in a program we have implemented a stack. But who creates the stack ?
Well if you implemented it, then by definition you created it. You need to be more specific w.r.t. context.
The standard runtime library or the linker-loader creates the stack. It is done in a little section of code that runs before your main. This code is inserted automatically by the linker when you link and it runs at runtime, setting up various things before your main is called, for example any statically initialized global variables. It usually sets up the stack too, although some OSes put this into OS code (the linker-loader) because they want to standardize stack implementation/shape on their systems.
the stack is embedded in the processor, it is the esp register, you need to learn a little win32 assembly programming in order to understad the stack
A stack is a Last In First Out (LIFO) list data structure. The stack is created by the program execution where the variables are stored, deleted as per the program execution requirement.