Where machine instructions of a program stored during runtime? - c++

So far as I know, whenever we run any program, the machine instructions of the program is loaded in RAM. Again, there are two regions of memory: stack and heap.
My question is: Which region of memory the machine instruction stored in? stack or heap?
I learnt that the following program gives a runtime error though there is no variable declared inside the function. The reason behind this is the overflow of stack. Then should I assume that the machines instructions of the function is stored in stack?
int func()
{
return func();
}

Neither, as it is not dynamically allocated the way stack and heap are.
The executable loader loads the executable (.text) and any static data it contains, like the initial values of global variables (.data / .rodata), into an unused RAM area. It then sets up any zero-initialized memory the executable asked for (.bss).
Only then is the stack set up for main(). Stack memory is allocated on the stack if you enter another function, holding the return address, function arguments, and any locally declared variables as well as any memory allocated via alloca().[1] Memory is released when you return from the function.
Heap memory is allocated by malloc(), calloc(), or realloc(). It gets released when you free() or realloc() it.
The RAM used for the executable and its static data does not get released until the process terminates.
Thus, stack and heap are, basically, under control of the application. The memory of the executable itself is under control of the executable loader / the operating system. In appropriately equipped operating systems, you don't even have write access to that memory.
Regarding your edited question, no. (Bad style, editing a question to give it a completely new angle.)
The executable code remains where it was loaded. Calling a function does not place machine instructions on the stack. The only thing your func() (a function taking no arguments) places on the stack is the return address, a pointer that indicates where execution should continue once the current function returns.
Since none of the calls ever returns, your program keeps adding return addresses on the stack, until that cannot hold any more. This has nothing to do with machine code instructions at all.
[1]: Note that none of this is actually part and parcel of the C language standard, but implementation-defined, and may differ -- I presented a simplified version of affairs. For example, function parameters might be passed in CPU registers instead of on the stack.

Neither one nor the other.
The image of your program contains the code and the static data (f.e. all you string constants, static arrays and structures, and so on). They will be loaded into different segments of the RAM.
Stack and heap are dynamic structures to store data, they will be created at the start of your program. Stack is hardware supported solutions, while heap is a standard library supported solution.
So, your code will be located in the code segment, your static data and heap will be located in the data segment, and the stack will be located in the stack segment.

the machine instructions of the program is loaded in RAM
Correct for hosted, "PC-like" systems. On embedded systems, code is most often executed directly from flash memory
Again, there is two regions of memory: stack and heap.
No, that's some over-simplification that way too many, way too bad programming teachers teach.
There are lots of other regions too: .data and .bss where all variables with static storage go, .rodata where constants go, etc etc.
The segment where the program code is stored is usually called .text.

Again, there is two regions of memory: stack and heap.
It's not that simple. Normally on mainstream operating system there's more stuff going on:
there is one stack per running thread;
there are as many heaps as you decide to allocate (actually, as far as the memory manager is concerned you ask some memory pages to "enabled" in your virtual address space, the "heap" thing derives from the fact that normally one uses some kind of heap manager code to efficiently distribute these memory portions between allocations);
there can be memory mapped files and shared memory;
most importantly, the executable file (and the dynamic libraries) are mapped in the memory of the process, normally with the code zone (the so called "text" segment) mapped in read-only mode, and other zones (typically pertaining to initialized global and static variables and stuff fixed by the loader) in copy-on-write.
So, the code is stored in the relevant section of the executable, which is mapped in memory.

They are usually on a section called .text.
On Linux you can list the sections of an ELF object or executable with size command from core-utils, for example on a tst ELF executable:
$ size -Ax tst | grep "^\.text"
.text 0x1e8 0x4003b0
$

There are multiple memory segments in addition to the stack and heap. Here's an example of how a program may be laid out in memory:
+------------------------+
high address | Command line arguments |
| and environment vars |
+------------------------+
| stack |
| - - - - - - - - - - - |
| | |
| V |
| |
| ^ |
| | |
| - - - - - - - - - - - |
| heap |
+------------------------+
| global and read- |
| only data |
+------------------------+
| program text |
low address | (machine code) |
+------------------------+
Details will vary from platform to platform, but this layout is pretty common for x86-based systems. The machine code occupies its own memory segment (labeled .text in ELF format), global read-only data will be stored in another segment (.rdata or .rodata), uninitialized globals in yet another segment (.bss), etc. Some segments are read-only, some are writable.

Neither heap nor stack.
Usually, executable instructions are present in Code segment.
Quoting the wikipedia article
In computing, a code segment, also known as a text segment or simply as text, is a portion of an object file or the corresponding section of the program's virtual address space that contains executable instructions.
and
when the loader places a program into memory so that it may be executed, various memory regions are allocated (in particular, as pages)
At runtime, the code segment of an object file is loaded into a corresponding code segment in memory. In particular, it has nothing to do with stack or heap.
EDIT:
In your code snippet above, what you're experiencing is called infinite recursion.
Even if you function does not need any space in stack for local variable, it still needs to push the return address of the outer function before calling the inner function, thus claiming the stack space, only never to return [pop the addresses out of stack] [like at a point of no return], thereby running out of stack space, causing stack overflow.

Related

Measure static, heap and stack memory ? (c++, Linux - Centos 7)

I would like to measure stack, heap and static memory separately because I have some constraints for each one.
To measure the heap memory I'm using valgrind->massif tool.
Massif should also be possible to measure heap AND stack memory but it shows strange resultats :
Last snapshot without --stacks=yes provides total(B)=0, useful-heap(B)=0, extra-heap(B)=0 (so all is fine)
Last snapshot with --stacks=yes provides total(B)= 2,256, useful-heap(B)=1,040, extra-heap(B)=0, stacks(B)=1,208 (which shows memory leak even if it is the same command and same binary tested... dunno why ...)
So finally I need a tool to measure stack and static memory used by a c++ binary, some help would be welcome :)
Thanks for your help !
----------- EDIT --------------
Further to the Basile Starynkevitch comment, to explain what I mean with static, stack and heap memory I took it from the Dmalloc library documentation :
Static data is the information whose storage space is compiled into the program.
/* global variables are allocated as static data */
int numbers[10];
main()
{
…
}
Stack data is data allocated at runtime to hold information used inside of functions. This data is managed by the system in the space called stack space.
void foo()
{
/* if they are, the parameters of the function are stored in the stack */
/* this local variable is stored on the stack */
float total;
…
}
main()
{
foo();
}
Heap data is also allocated at runtime and provides a programmer with dynamic memory capabilities.
main()
{
/* the address is stored on the stack */
char * string;
…
/*
* Allocate a string of 10 bytes on the heap. Store the
* address in string which is on the stack.
*/
string = (char *)malloc(10);
…
/* de-allocate the heap memory now that we're done with it */
(void)free(string);
…
}
I would like to measure stack, heap and static memory separately because I have some constraints for each one.
I cannot imagine why you have separate constraints for each one. They all sit in virtual memory! BTW you might use setrlimit(2) to set limits (perhaps from the invoking shell process, e.g. with bash ulimit builtin).
Your definitions are naive, if you consider the actual virtual address space of your process.
BTW, proc(5) enables you to query that space, e.g. using /proc/self/maps like here from inside your program (or /proc/1234/maps to query the process of pid 1234, perhaps from a terminal). You could also use /proc/self/status and /proc/self/statm (BTW try cat /proc/self/maps and cat /proc/$$/maps in a terminal). On Linux, you might also use mallinfo(3) and malloc_stats(3) to get information about memory allocation statistics.
Static data may be in the data segment (or BSS segment) of your program. But what about thread local space? And these data segments also contain data internal to various libraries, notably the C standard library libc.so (do that count?). Of course the stack segment is often bigger (since page aligned) than the actual used stack (from "bottom" to current %esp register). And a multi-threaded process has several stacks (and stack segments), one per thread.
Stack data is of course in the call stack, which contains a lot of other things (return addresses, gap, spilled registers) than just automatic variables (some of them sitting only in registers or being optimized by the compiler, without consuming any stack slot). Do they count? Also, startup code from crt0 (which calls your main) is likely to use a little bit of stack space (does it count?)...
Heap allocated data (it could be allocated from various libraries, or even from the dynamic linker) contains not only what your program gets from malloc (and friends) but also the necessary overhead. Do that count? And what about memory-mapped files? How should they count?
I would recommend querying the actual virtual address space (e.g. by reading /proc/self/maps or using pmap(1)...) but they you get something different than what you ask.
Just BTW I found it before your answer :
To measure the heap memory use valgrind -> massif
To measure the static memory use the bash function size on the binary
To measure the stack it is possible to use stackusage
It gave me all the stats I wanted

C++ how are variables accessed in memory?

When I create a new variable in a C++ program, eg a char:
char c = 'a';
how does C++ then have access to this variable in memory? I would imagine that it would need to store the memory location of the variable, but then that would require a pointer variable, and this pointer would again need to be accessed.
See the docs:
When a variable is declared, the memory needed to store its value is
assigned a specific location in memory (its memory address).
Generally, C++ programs do not actively decide the exact memory
addresses where its variables are stored. Fortunately, that task is
left to the environment where the program is run - generally, an
operating system that decides the particular memory locations on
runtime. However, it may be useful for a program to be able to obtain
the address of a variable during runtime in order to access data cells
that are at a certain position relative to it.
You can also refer this article on Variables and Memory
The Stack
The stack is where local variables and function parameters reside. It
is called a stack because it follows the last-in, first-out principle.
As data is added or pushed to the stack, it grows, and when data is
removed or popped it shrinks. In reality, memory addresses are not
physically moved around every time data is pushed or popped from the
stack, instead the stack pointer, which as the name implies points to
the memory address at the top of the stack, moves up and down.
Everything below this address is considered to be on the stack and
usable, whereas everything above it is off the stack, and invalid.
This is all accomplished automatically by the operating system, and as
a result it is sometimes also called automatic memory. On the
extremely rare occasions that one needs to be able to explicitly
invoke this type of memory, the C++ key word auto can be used.
Normally, one declares variables on the stack like this:
void func () {
int i; float x[100];
...
}
Variables that are declared on the stack are only valid within the
scope of their declaration. That means when the function func() listed
above returns, i and x will no longer be accessible or valid.
There is another limitation to variables that are placed on the stack:
the operating system only allocates a certain amount of space to the
stack. As each part of a program that is being executed comes into
scope, the operating system allocates the appropriate amount of memory
that is required to hold all the local variables on the stack. If this
is greater than the amount of memory that the OS has allowed for the
total size of the stack, then the program will crash. While the
maximum size of the stack can sometimes be changed by compile time
parameters, it is usually fairly small, and nowhere near the total
amount of RAM available on a machine.
Assuming this is a local variable, then this variable is allocated on the stack - i.e. in the RAM. The compiler keeps track of the variable offset on the stack. In the basic scenario, in case any computation is then performed with the variable, it is moved to one of the processor's registers and the CPU performs the computation. Afterwards the result is returned back to the RAM. Modern processors keep whole stack frames in the registers and have multiple levels of registers, so it can get quite complex.
Please note the "c" name is no more mentioned in the binary (unless you have debugging symbols). The binary only then works with the memory locations. E.g. it would look like this (simple addition):
a = b + c
take value of memory offset 1 and put it in the register 1
take value of memory offset 2 and put in in the register 2
sum registers 1 and 2 and store the result in register 3
copy the register 3 to memory location 3
The binary doesn't know "a", "b" or "c". The compiler just said "a is in memory 1, b is in memory 2, c is in memory 3". And the CPU just blindly executes the commands the compiler has generated.
C++ itself (or, the compiler) would have access to this variable in terms of the program structure, represented as a data structure. Perhaps you're asking how other parts in the program would have access to it at run time.
The answer is that it varies. It can be stored either in a register, on the stack, on the heap, or in the data/bss sections (global/static variables), depending on its context and the platform it was compiled for: If you needed to pass it around by reference (or pointer) to other functions, then it would likely be stored on the stack. If you only need it in the context of your function, it would probably be handled in a register. If it's a member variable of an object on the heap, then it's on the heap, and you reference it by an offset into the object. If it's a global/static variable, then its address is determined once the program is fully loaded into memory.
C++ eventually compiles down to machine language, and often runs within the context of an operating system, so you might want to brush up a bit on Assembly basics, or even some OS principles, to better understand what's going on under the hood.
Lets say our program starts with a stack address of 4000000
When, you call a function, depending how much stack you use, it will "allocate it" like this
Let's say we have 2 ints (8bytes)
int function()
{
int a = 0;
int b = 0;
}
then whats gonna happen in assembly is
MOV EBP,ESP //Here we store the original value of the stack address (4000000) in EBP, and we restore it at the end of the function back to 4000000
SUB ESP, 8 //here we "allocate" 8 bytes in the stack, which basically just decreases the ESP addr by 8
so our ESP address was changed from
4000000
to
3999992
that's how the program knows knows the stack addresss for the first int is "3999992" and the second int is from 3999996 to 4000000
Even tho this pretty much has nothing to do with the compiler, it's really important to know because when you know how stack is "allocated", you realize how cheap it is to do things like
char my_array[20000];
since all it's doing is just doing sub esp, 20000 which is a single assembly instruction
but if u actually use all those bytes like memset(my_array,20000) that's a different history.
how does C++ then have access to this variable in memory?
It doesn't!
Your computer does, and it is instructed on how to do that by loading the location of the variable in memory into a register. This is all handled by assembly language. I shan't go into the details here of how such languages work (you can look it up!) but this is rather the purpose of a C++ compiler: to turn an abstract, high-level set of "instructions" into actual technical instructions that a computer can understand and execute. You could sort of say that assembly programs contain a lot of pointers, though most of them are literals rather than "variables".

Memory hacking / memory allocation : why does this work and how?

What i know is you look for some address of of some variables in the software you're trying to manipulate . And once you find it you then try to find the "base pointer" for it so no matter where the process is , you can still access those variables.I mean when the process is restarted the variables would have different addresses , but still you know where they are .
Now what i don't get is why is this possible ?
lets say i do this in C++ :
int *x = (int*)malloc(sizeOf(int));
isn't that dynamic ? and shouldn't x be placed in a "random" address that was free at the time ?
Basically what I'm asking about is where is x really when that code is executed?
EDIT : i mean like usually variables like health are stored in addresses like game.exe + 0000caff , and then you can always find them there .
The following is from the perspective of 32-bit x86 architecture, other architectures may do things differently. Also, I assumed that the single line of code was in a function.
First remember what the declaration int* x means, loosely it can be read as "the variable x will contain the address of an integer"
The variable x is a local (or automatic variable) which means its location is determined at compile time, and storage space for it is allocated when the function is activated (most likely on the stack).
The value contained in x is a dynamic address (and probably located on the heap).
So what you have is a binding between the variable x and some location in the activation frame of the function. Excusing the poor ASCII art, something like
this (address are notional, also remember that the stack grows down):
~ ~
| |
+------------+
0x04000 | |<---- ebp
+------------+
| |
+------------+
0x03ff8 | x |
+------------+
| |
+------------+
0x03ff0 | |<---- esp
+------------+
Now we have to register values ebp (the extended base pointer) and esp (the extended stack pointer) that define the bottom and top of the activation frame on the stack. So in assembly, if you want to locate where x is stored is is based on an offset from the base pointer.
Another way of thinking about it that the variable name x is an alias for the memory location epb-8. Because, this is how the compiler lays out memory the storage location for x will always be located at the same offset from the base pointer.
Now, during multiple runs the value of the base pointer might change, as long as I can find the base pointer for the activation frame, I can find the storage location for x and fiddle with it.
x is a variable which holds an address of an integer.
malloc allocates memory on the heap, but x is still located on the stack (if you declare it in a functio).
How ever, your programm has diffrent parts of memory, one for global variables and constants, one for source code, some for resources, one for dynamic stuff (heap) and some others.
Lets assume your x is located in the part for global variables and points to the heap. Then x has a static address.
This address can be found in the source code part by assembler instructions like mov eax , [0x123].
This assembler instruction also has an address which is relative to the module where the instruction is located. These modules can be loaded dynamicly (e.g. by LoadLibrary), but the offset from the modules base address to the instruction is fixed.
To get data where x points to:
base = getModuleBaseAddress("modulename")
addressOfX = base + offset
valueOfX = *addressOfX
Most objects are not located on the stack and can be referenced by a modules address, a fixed offset to jump to the instruction where objects pointer is used and some offsets to get objects variavles.
To find objects in memory you can also use a pattern scan to scan the memory.
"and shouldn't x be placed in a "random" address that was free at the time ?"
If you run a process in an operating system, the pointer values actually used are likely to address kind of virtual address spaces available for particular storage duration categories, that are associated to your process with a concrete region of RAM and ROM addresses, and those are managed by the operating system.
So if you have the same dynamic storage duration de-/allocations in sequence, these are likely to receive the same (virtual) address with each run of the program. There's nothing random at this level.

What does "Memory allocated at compile time" really mean?

In programming languages like C and C++, people often refer to static and dynamic memory allocation. I understand the concept but the phrase "All memory was allocated (reserved) during compile time" always confuses me.
Compilation, as I understand it, converts high level C/C++ code to machine language and outputs an executable file. How is memory "allocated" in a compiled file ? Isn't memory always allocated in the RAM with all the virtual memory management stuff ?
Isn't memory allocation by definition a runtime concept ?
If I make a 1KB statically allocated variable in my C/C++ code, will that increase the size of the executable by the same amount ?
This is one of the pages where the phrase is used under the heading "Static allocation".
Back To Basics: Memory allocation, a walk down the history
Memory allocated at compile-time means the compiler resolves at compile-time where certain things will be allocated inside the process memory map.
For example, consider a global array:
int array[100];
The compiler knows at compile-time the size of the array and the size of an int, so it knows the entire size of the array at compile-time. Also a global variable has static storage duration by default: it is allocated in the static memory area of the process memory space (.data/.bss section). Given that information, the compiler decides during compilation in what address of that static memory area the array will be.
Of course that memory addresses are virtual addresses. The program assumes that it has its own entire memory space (From 0x00000000 to 0xFFFFFFFF for example). That's why the compiler could do assumptions like "Okay, the array will be at address 0x00A33211". At runtime that addresses are translated to real/hardware addresses by the MMU and OS.
Value initialized static storage things are a bit different. For example:
int array[] = { 1 , 2 , 3 , 4 };
In our first example, the compiler only decided where the array will be allocated, storing that information in the executable.
In the case of value-initialized things, the compiler also injects the initial value of the array into the executable, and adds code which tells the program loader that after the array allocation at program start, the array should be filled with these values.
Here are two examples of the assembly generated by the compiler (GCC4.8.1 with x86 target):
C++ code:
int a[4];
int b[] = { 1 , 2 , 3 , 4 };
int main()
{}
Output assembly:
a:
.zero 16
b:
.long 1
.long 2
.long 3
.long 4
main:
pushq %rbp
movq %rsp, %rbp
movl $0, %eax
popq %rbp
ret
As you can see, the values are directly injected into the assembly. In the array a, the compiler generates a zero initialization of 16 bytes, because the Standard says that static stored things should be initialized to zero by default:
8.5.9 (Initializers) [Note]:
Every object of static storage duration is zero-initialized at
program startup before any other initial- ization takes place. In some
cases, additional initialization is done later.
I always suggest people to disassembly their code to see what the compiler really does with the C++ code. This applies from storage classes/duration (like this question) to advanced compiler optimizations. You could instruct your compiler to generate the assembly, but there are wonderful tools to do this on the Internet in a friendly manner. My favourite is GCC Explorer.
Memory allocated at compile time simply means there will be no further allocation at run time -- no calls to malloc, new, or other dynamic allocation methods. You'll have a fixed amount of memory usage even if you don't need all of that memory all of the time.
Isn't memory allocation by definition a runtime concept?
The memory is not in use prior to run time, but immediately prior to execution starting its allocation is handled by the system.
If I make a 1KB statically allocated variable in my C/C++ code, will that increase the size of the executable by the same amount?
Simply declaring the static will not increase the size of your executable more than a few bytes. Declaring it with an initial value that is non-zero will (in order to hold that initial value). Rather, the linker simply adds this 1KB amount to the memory requirement that the system's loader creates for you immediately prior to execution.
Memory allocated in compile time means that when you load the program, some part of the memory will be immediately allocated and the size and (relative) position of this allocation is determined at compile time.
char a[32];
char b;
char c;
Those 3 variables are "allocated at compile time", it means that the compiler calculates their size (which is fixed) at compile time. The variable a will be an offset in memory, let's say, pointing to address 0, b will point at address 33 and c at 34 (supposing no alignment optimization). So, allocating 1Kb of static data will not increase the size of your code, since it will just change an offset inside it. The actual space will be allocated at load time.
Real memory allocation always happens in run time, because the kernel needs to keep track of it and to update its internal data structures (how much memory is allocated for each process, pages and so on). The difference is that the compiler already knows the size of each data you are going to use and this is allocated as soon as your program is executed.
Remember also that we are talking about relative addresses. The real address where the variable will be located will be different. At load time the kernel will reserve some memory for the process, lets say at address x, and all the hard coded addresses contained in the executable file will be incremented by x bytes, so that variable a in the example will be at address x, b at address x+33 and so on.
Adding variables on the stack that take up N bytes doesn't (necessarily) increase the bin's size by N bytes. It will, in fact, add but a few bytes most of the time.
Let's start off with an example of how adding a 1000 chars to your code will increase the bin's size in a linear fashion.
If the 1k is a string, of a thousand chars, which is declared like so
const char *c_string = "Here goes a thousand chars...999";//implicit \0 at end
and you then were to vim your_compiled_bin, you'd actually be able to see that string in the bin somewhere. In that case, yes: the executable will be 1 k bigger, because it contains the string in full.
If, however you allocate an array of ints, chars or longs on the stack and assign it in a loop, something along these lines
int big_arr[1000];
for (int i=0;i<1000;++i) big_arr[i] = some_computation_func(i);
then, no: it won't increase the bin... by 1000*sizeof(int)
Allocation at compile time means what you've now come to understand it means (based on your comments): the compiled bin contains information the system requires to know how much memory what function/block will need when it gets executed, along with information on the stack size your application requires. That's what the system will allocate when it executes your bin, and your program becomes a process (well, the executing of your bin is the process that... well, you get what I'm saying).
Of course, I'm not painting the full picture here: The bin contains information about how big a stack the bin will actually be needing. Based on this information (among other things), the system will reserve a chunk of memory, called the stack, that the program gets sort of free reign over. Stack memory still is allocated by the system, when the process (the result of your bin being executed) is initiated. The process then manages the stack memory for you. When a function or loop (any type of block) is invoked/gets executed, the variables local to that block are pushed to the stack, and they are removed (the stack memory is "freed" so to speak) to be used by other functions/blocks. So declaring int some_array[100] will only add a few bytes of additional information to the bin, that tells the system that function X will be requiring 100*sizeof(int) + some book-keeping space extra.
On many platforms, all of the global or static allocations within each module will be consolidated by the compiler into three or fewer consolidated allocations (one for uninitialized data (often called "bss"), one for initialized writable data (often called "data"), and one for constant data ("const")), and all of the global or static allocations of each type within a program will be consolidated by the linker into one global for each type. For example, assuming int is four bytes, a module has the following as its only static allocations:
int a;
const int b[6] = {1,2,3,4,5,6};
char c[200];
const int d = 23;
int e[4] = {1,2,3,4};
int f;
it would tell the linker that it needed 208 bytes for bss, 16 bytes for "data", and 28 bytes for "const". Further, any reference to a variable would be replaced with an area selector and offset, so a, b, c, d, and e, would be replaced by bss+0, const+0, bss+4, const+24, data+0, or bss+204, respectively.
When a program is linked, all of the bss areas from all the modules are be concatenated together; likewise the data and const areas. For each module, the address of any bss-relative variables will be increased by the size of all preceding modules' bss areas (again, likewise with data and const). Thus, when the linker is done, any program will have one bss allocation, one data allocation, and one const allocation.
When a program is loaded, one of four things will generally happen depending upon the platform:
The executable will indicate how many bytes it needs for each kind of data and--for the initialized data area, where the initial contents may be found. It will also include a list of all the instructions which use a bss-, data-, or const- relative address. The operating system or loader will allocate the appropriate amount of space for each area and then add the starting address of that area to each instruction which needs it.
The operating system will allocate a chunk of memory to hold all three kinds of data, and give the application a pointer to that chunk of memory. Any code which uses static or global data will dereference it relative to that pointer (in many cases, the pointer will be stored in a register for the lifetime of an application).
The operating system will initially not allocate any memory to the application, except for what holds its binary code, but the first thing the application does will be to request a suitable allocation from the operating system, which it will forevermore keep in a register.
The operating system will initially not allocate space for the application, but the application will request a suitable allocation on startup (as above). The application will include a list of instructions with addresses that need to be updated to reflect where memory was allocated (as with the first style), but rather than having the application patched by the OS loader, the application will include enough code to patch itself.
All four approaches have advantages and disadvantages. In every case, however, the compiler will consolidate an arbitrary number of static variables into a fixed small number of memory requests, and the linker will consolidate all of those into a small number of consolidated allocations. Even though an application will have to receive a chunk of memory from the operating system or loader, it is the compiler and linker which are responsible for allocating individual pieces out of that big chunk to all the individual variables that need it.
The core of your question is this: "How is memory "allocated" in a compiled file? Isn't memory always allocated in the RAM with all the virtual memory management stuff? Isn't memory allocation by definition a runtime concept?"
I think the problem is that there are two different concepts involved in memory allocation. At its basic, memory allocation is the process by which we say "this item of data is stored in this specific chunk of memory". In a modern computer system, this involves a two step process:
Some system is used to decide the virtual address at which the item will be stored
The virtual address is mapped to a physical address
The latter process is purely run time, but the former can be done at compile time, if the data have a known size and a fixed number of them is required. Here's basically how it works:
The compiler sees a source file containing a line that looks a bit like this:
int c;
It produces output for the assembler that instructs it to reserve memory for the variable 'c'. This might look like this:
global _c
section .bss
_c: resb 4
When the assembler runs, it keeps a counter that tracks offsets of each item from the start of a memory 'segment' (or 'section'). This is like the parts of a very large 'struct' that contains everything in the entire file it doesn't have any actual memory allocated to it at this time, and could be anywhere. It notes in a table that _c has a particular offset (say 510 bytes from the start of the segment) and then increments its counter by 4, so the next such variable will be at (e.g.) 514 bytes. For any code that needs the address of _c, it just puts 510 in the output file, and adds a note that the output needs the address of the segment that contains _c adding to it later.
The linker takes all of the assembler's output files, and examines them. It determines an address for each segment so that they won't overlap, and adds the offsets necessary so that instructions still refer to the correct data items. In the case of uninitialized memory like that occupied by c (the assembler was told that the memory would be uninitialized by the fact that the compiler put it in the '.bss' segment, which is a name reserved for uninitialized memory), it includes a header field in its output that tells the operating system how much needs to be reserved. It may be relocated (and usually is) but is usually designed to be loaded more efficiently at one particular memory address, and the OS will try to load it at this address. At this point, we have a pretty good idea what the virtual address is that will be used by c.
The physical address will not actually be determined until the program is running. However, from the programmer's perspective the physical address is actually irrelevant—we'll never even find out what it is, because the OS doesn't usually bother telling anyone, it can change frequently (even while the program is running), and a main purpose of the OS is to abstract this away anyway.
An executable describes what space to allocate for static variables. This allocation is done by the system, when you run the executable. So your 1kB static variable won't increase the size of the executable with 1kB:
static char[1024];
Unless of course you specify an initializer:
static char[1024] = { 1, 2, 3, 4, ... };
So, in addition to 'machine language' (i.e. CPU instructions), an executable contains a description of the required memory layout.
Memory can be allocated in many ways:
in application heap (whole heap is allocated for your app by OS when the program starts)
in operating system heap (so you can grab more and more)
in garbage collector controlled heap (same as both above)
on stack (so you can get a stack overflow)
reserved in code/data segment of your binary (executable)
in remote place (file, network - and you receive a handle not a pointer to that memory)
Now your question is what is "memory allocated at compile time". Definitely it is just an incorrectly phrased saying, which is supposed to refer to either binary segment allocation or stack allocation, or in some cases even to a heap allocation, but in that case the allocation is hidden from programmer eyes by invisible constructor call. Or probably the person who said that just wanted to say that memory is not allocated on heap, but did not know about stack or segment allocations.(Or did not want to go into that kind of detail).
But in most cases person just wants to say that the amount of memory being allocated is known at compile time.
The binary size will only change when the memory is reserved in the code or data segment of your app.
You are right. Memory is actually allocated (paged) at load time, i.e. when the executable file is brought into (virtual) memory. Memory can also be initialized on that moment. The compiler just creates a memory map. [By the way, stack and heap spaces are also allocated at load time !]
I think you need to step back a bit. Memory allocated at compile time.... What can that mean? Can it mean that memory on chips that have not yet been manufactured, for computers that have not yet been designed, is somehow being reserved? No. No, time travel, no compilers that can manipulate the universe.
So, it must mean that the compiler generates instructions to allocate that memory somehow at runtime. But if you look at it in from the right angle, the compiler generates all instructions, so what can be the difference. The difference is that the compiler decides, and at runtime, your code can not change or modify its decisions. If it decided it needed 50 bytes at compile time, at runtime, you can't make it decide to allocate 60 -- that decision has already been made.
If you learn assembly programming, you will see that you have to carve out segments for the data, the stack, and code, etc. The data segment is where your strings and numbers live. The code segment is where your code lives. These segments are built into the executable program. Of course the stack size is important as well... you wouldn't want a stack overflow!
So if your data segment is 500 bytes, your program has a 500 byte area. If you change the data segment to 1500 bytes, the size of the program will be 1000 bytes larger. The data is assembled into the actual program.
This is what is going on when you compile higher level languages. The actual data area is allocated when it is compiled into an executable program, increasing the size of the program. The program can request memory on the fly, as well, and this is dynamic memory. You can request memory from the RAM and the CPU will give it to you to use, you can let go of it, and your garbage collector will release it back to the CPU. It can even be swapped to a hard disk, if necessary, by a good memory manager. These features are what high level languages provide you.
I would like to explain these concepts with the help of few diagrams.
This is true that memory cannot be allocated at compile time, for sure.
But, then what happens in fact at compile time.
Here comes the explanation.
Say, for example a program has four variables x,y,z and k.
Now, at compile time it simply makes a memory map, where the location of these variables with respect to each other is ascertained.
This diagram will illustrate it better.
Now imagine, no program is running in memory.
This I show by a big empty rectangle.
Next, the first instance of this program is executed.
You can visualize it as follows.
This is the time when actually memory is allocated.
When second instance of this program is running, the memory would look like as follows.
And the third ..
So on and so forth.
I hope this visualization explains this concept well.
There is very nice explanation given in the accepted answer. Just in case i will post the link which i have found useful.
https://www.tenouk.com/ModuleW.html
One among the many thing what a compiler does is that create and maintain a SYMTAB(Symbol Table under the section.symtab). This will be purely created and maintained by compilers using any Data Structure(List, Trees...etc) and not for the developers eyes. Any access request made by the developers this is where it will hit first.
Now about the Symbol Table,
We only need to know about the two columns Symbol Name and the Offset.
Symbol Name column will have the variable names and the offset column will have the offset value.
Lets see this with an example:
int a , b , c ;
Now we all know that the register Stack_Pointer(sp) points to the Top of the Stack Memory. Let that be sp = 1000.
Now the Symbol Name column will have three values in it a then b and then c. Reminding you all that variable a will be at the top of the stack memory.
So a's equivalent offset value will be 0.
(Compile Time Offset_Value)
Then b and its equivalent offset value will be 1. (Compile Time Offset_Value)
Then c and its equivalent offset value will be 2. (Compile Time Offset_Value)
Now calculating a's Physical address (or) Runtime Memory Address = (sp + offset_value of a)
= (1000 + 0) = 1000
Now calculating b's Physical address (or) Runtime Memory Address = (sp - offset_value of b)
= (1000 - 1) = 996
Now calculating c's Physical address (or) Runtime Memory Address = (sp - offset_value of c)
= (1000 - 2) = 992
Therefore at the time of the compilation we will only be having the offset values and only during the runtime the actual physical addresses are calculated.
Note:
Stack_Pointer value will be assigned only after the program is loaded. Pointer Arithmetic happens between the Stack_Pointer register and the variables offset to calculate the variables Physical Address.
"POINTERS AND POINTER ARITHMETIC, WAY OF THE PROGRAMMING WORLD"
Share what I learned about this question.
You can understand this issue in two steps:
First, the compilation step: the compiler generates the binary. In Linux system, binary is a file in ELF (Executable and Linkable Format) format. ELF file contains several sections, including .bss and .data
.data
Initialized data, with read/write access rights
.bss
Uninitialized data, with read/write access rights (=WA)
.data and .bss just map to the segments of process's memory layout, which contains static variables.
second, the loading step. When the binary file get executed, the ELF file will be loaded into process's memory. The loader can find static variables' information from ELF file.
Simply speaking, the compiler and the loader follow the same standard to communicate with each other, and the standard is ELF format.

What happens when a computer program runs?

I know the general theory but I can't fit in the details.
I know that a program resides in the secondary memory of a computer. Once the program begins execution it is entirely copied to the RAM. Then the processor retrive a few instructions (it depends on the size of the bus) at a time, puts them in registers and executes them.
I also know that a computer program uses two kinds of memory: stack and heap, which are also part of the primary memory of the computer. The stack is used for non-dynamic memory, and the heap for dynamic memory (for example, everything related to the new operator in C++)
What I can't understand is how those two things connect. At what point is the stack used for the execution of the instructions? Instructions go from the RAM, to the stack, to the registers?
It really depends on the system, but modern OSes with virtual memory tend to load their process images and allocate memory something like this:
+---------+
| stack | function-local variables, return addresses, return values, etc.
| | often grows downward, commonly accessed via "push" and "pop" (but can be
| | accessed randomly, as well; disassemble a program to see)
+---------+
| shared | mapped shared libraries (C libraries, math libs, etc.)
| libs |
+---------+
| hole | unused memory allocated between the heap and stack "chunks", spans the
| | difference between your max and min memory, minus the other totals
+---------+
| heap | dynamic, random-access storage, allocated with 'malloc' and the like.
+---------+
| bss | Uninitialized global variables; must be in read-write memory area
+---------+
| data | data segment, for globals and static variables that are initialized
| | (can further be split up into read-only and read-write areas, with
| | read-only areas being stored elsewhere in ROM on some systems)
+---------+
| text | program code, this is the actual executable code that is running.
+---------+
This is the general process address space on many common virtual-memory systems. The "hole" is the size of your total memory, minus the space taken up by all the other areas; this gives a large amount of space for the heap to grow into. This is also "virtual", meaning it maps to your actual memory through a translation table, and may be actually stored at any location in actual memory. It is done this way to protect one process from accessing another process's memory, and to make each process think it's running on a complete system.
Note that the positions of, e.g., the stack and heap may be in a different order on some systems (see Billy O'Neal's answer below for more details on Win32).
Other systems can be very different. DOS, for instance, ran in real mode, and its memory allocation when running programs looked much differently:
+-----------+ top of memory
| extended | above the high memory area, and up to your total memory; needed drivers to
| | be able to access it.
+-----------+ 0x110000
| high | just over 1MB->1MB+64KB, used by 286s and above.
+-----------+ 0x100000
| upper | upper memory area, from 640kb->1MB, had mapped memory for video devices, the
| | DOS "transient" area, etc. some was often free, and could be used for drivers
+-----------+ 0xA0000
| USER PROC | user process address space, from the end of DOS up to 640KB
+-----------+
|command.com| DOS command interpreter
+-----------+
| DOS | DOS permanent area, kept as small as possible, provided routines for display,
| kernel | *basic* hardware access, etc.
+-----------+ 0x600
| BIOS data | BIOS data area, contained simple hardware descriptions, etc.
+-----------+ 0x400
| interrupt | the interrupt vector table, starting from 0 and going to 1k, contained
| vector | the addresses of routines called when interrupts occurred. e.g.
| table | interrupt 0x21 checked the address at 0x21*4 and far-jumped to that
| | location to service the interrupt.
+-----------+ 0x0
You can see that DOS allowed direct access to the operating system memory, with no protection, which meant that user-space programs could generally directly access or overwrite anything they liked.
In the process address space, however, the programs tended to look similar, only they were described as code segment, data segment, heap, stack segment, etc., and it was mapped a little differently. But most of the general areas were still there.
Upon loading the program and necessary shared libs into memory, and distributing the parts of the program into the right areas, the OS begins executing your process wherever its main method is at, and your program takes over from there, making system calls as necessary when it needs them.
Different systems (embedded, whatever) may have very different architectures, such as stackless systems, Harvard architecture systems (with code and data being kept in separate physical memory), systems which actually keep the BSS in read-only memory (initially set by the programmer), etc. But this is the general gist.
You said:
I also know that a computer program uses two kinds of memory: stack and heap, which are also part of the primary memory of the computer.
"Stack" and "heap" are just abstract concepts, rather than (necessarily) physically distinct "kinds" of memory.
A stack is merely a last-in, first-out data structure. In the x86 architecture, it can actually be addressed randomly by using an offset from the end, but the most common functions are PUSH and POP to add and remove items from it, respectively. It is commonly used for function-local variables (so-called "automatic storage"), function arguments, return addresses, etc. (more below)
A "heap" is just a nickname for a chunk of memory that can be allocated on demand, and is addressed randomly (meaning, you can access any location in it directly). It is commonly used for data structures that you allocate at runtime (in C++, using new and delete, and malloc and friends in C, etc).
The stack and heap, on the x86 architecture, both physically reside in your system memory (RAM), and are mapped through virtual memory allocation into the process address space as described above.
The registers (still on x86), physically reside inside the processor (as opposed to RAM), and are loaded by the processor, from the TEXT area (and can also be loaded from elsewhere in memory or other places depending on the CPU instructions that are actually executed). They are essentially just very small, very fast on-chip memory locations that are used for a number of different purposes.
Register layout is highly dependent on the architecture (in fact, registers, the instruction set, and memory layout/design, are exactly what is meant by "architecture"), and so I won't expand upon it, but recommend you take an assembly language course to understand them better.
Your question:
At what point is the stack used for the execution of the instructions? Instructions go from the RAM, to the stack, to the registers?
The stack (in systems/languages that have and use them) is most often used like this:
int mul( int x, int y ) {
return x * y; // this stores the result of MULtiplying the two variables
// from the stack into the return value address previously
// allocated, then issues a RET, which resets the stack frame
// based on the arg list, and returns to the address set by
// the CALLer.
}
int main() {
int x = 2, y = 3; // these variables are stored on the stack
mul( x, y ); // this pushes y onto the stack, then x, then a return address,
// allocates space on the stack for a return value,
// then issues an assembly CALL instruction.
}
Write a simple program like this, and then compile it to assembly (gcc -S foo.c if you have access to GCC), and take a look. The assembly is pretty easy to follow. You can see that the stack is used for function local variables, and for calling functions, storing their arguments and return values. This is also why when you do something like:
f( g( h( i ) ) );
All of these get called in turn. It's literally building up a stack of function calls and their arguments, executing them, and then popping them off as it winds back down (or up ;). However, as mentioned above, the stack (on x86) actually resides in your process memory space (in virtual memory), and so it can be manipulated directly; it's not a separate step during execution (or at least is orthogonal to the process).
FYI, the above is the C calling convention, also used by C++. Other languages/systems may push arguments onto the stack in a different order, and some languages/platforms don't even use stacks, and go about it in different ways.
Also note, these aren't actual lines of C code executing. The compiler has converted them into machine language instructions in your executable. They are then (generally) copied from the TEXT area into the CPU pipeline, then into the CPU registers, and executed from there. [This was incorrect. See Ben Voigt's correction below.]
Sdaz has gotten a remarkable number of upvotes in a very short time, but sadly is perpetuating a misconception about how instructions move through the CPU.
The question asked:
Instructions go from the RAM, to the stack, to the registers?
Sdaz said:
Also note, these aren't actual lines of C code executing. The compiler has converted them into machine language instructions in your executable. They are then (generally) copied from the TEXT area into the CPU pipeline, then into the CPU registers, and executed from there.
But this is wrong. Except for the special case of self-modifying code, instructions never enter the datapath. And they are not, cannot be, executed from the datapath.
The x86 CPU registers are:
General registers
EAX EBX ECX EDX
Segment registers
CS DS ES FS GS SS
Index and pointers
ESI EDI EBP EIP ESP
Indicator
EFLAGS
There are also some floating-point and SIMD registers, but for the purposes of this discussion we'll classify those as part of the coprocessor and not the CPU. The memory-management unit inside the CPU also has some registers of its own, we'll again treat that as a separate processing unit.
None of these registers are used for executable code. EIP contains the address of the executing instruction, not the instruction itself.
Instructions go through a completely different path in the CPU from data (Harvard architecture). All current machines are Harvard architecture inside the CPU. Most these days are also Harvard architecture in the cache. x86 (your common desktop machine) are Von Neumann architecture in the main memory, meaning data and code are intermingled in RAM. That's beside the point, since we're talking about what happens inside the CPU.
The classic sequence taught in computer architecture is fetch-decode-execute. The memory controller looks up the instruction stored at the address EIP. The bits of the instruction go through some combinational logic to create all the control signals for the different multiplexers in the processor. And after some cycles, the arithmetic logic unit arrives at a result, which is clocked into the destination. Then the next instruction is fetched.
On a modern processor, things work a little differently. Each incoming instruction is translated into a whole series of microcode instructions. This enable pipelining, because the resources used by the first microinstruction aren't needed later, so they can begin working on the first microinstruction from the next instruction.
To top it off, terminology is slightly confused because register is an electrical engineering term for a collection of D-flipflops. And instructions (or especially microinstructions) may very well be stored temporarily in such a collection of D-flipflops. But this is not what is meant when a computer scientist or software engineer or run-of-the-mill developer uses the term register. They mean the datapath registers as listed above, and these are not used for transporting code.
The names and number of datapath registers vary for other CPU architectures, such as ARM, MIPS, Alpha, PowerPC, but all of them execute instructions without passing them through the ALU.
The exact layout of the memory while a process is executing is completely dependent on the platform which you're using. Consider the following test program:
#include <stdlib.h>
#include <stdio.h>
int main()
{
int stackValue = 0;
int *addressOnStack = &stackValue;
int *addressOnHeap = malloc(sizeof(int));
if (addressOnStack > addressOnHeap)
{
puts("The stack is above the heap.");
}
else
{
puts("The heap is above the stack.");
}
}
On Windows NT (and it's children), this program is going to generally produce:
The heap is above the stack
On POSIX boxes, it's going to say:
The stack is above the heap
The UNIX memory model is quite well explained here by #Sdaz MacSkibbons, so I won't reiterate that here. But that is not the only memory model. The reason POSIX requires this model is the sbrk system call. Basically, on a POSIX box, to get more memory, a process merely tells the Kernel to move the divider between the "hole" and the "heap" further into the "hole" region. There is no way to return memory to the operating system, and the operating system itself does not manage your heap. Your C runtime library has to provide that (via malloc).
This also has implications for the kind of code actually used in POSIX binaries. POSIX boxes (almost universally) use the ELF file format. In this format, the operating system is responsible for communications between libraries in different ELF files. Therefore, all the libraries use position-independent code (That is, the code itself can be loaded into different memory addresses and still operate), and all calls between libraries are passed through a lookup table to find out where control needs to jump for cross library function calls. This adds some overhead and can be exploited if one of the libraries changes the lookup table.
Windows' memory model is different because the kind of code it uses is different. Windows uses the PE file format, which leaves the code in position-dependent format. That is, the code depends on where exactly in virtual memory the code is loaded. There is a flag in the PE spec which tells the OS where exactly in memory the library or executable would like to be mapped when your program runs. If a program or library cannot be loaded at it's preferred address, the Windows loader must rebase the library/executable -- basically, it moves the position-dependent code to point at the new positions -- which doesn't require lookup tables and cannot be exploited because there's no lookup table to overwrite. Unfortunately, this requires very complicated implementation in the Windows loader, and does have considerable startup time overhead if an image needs to be rebased. Large commercial software packages often modify their libraries to start purposely at different addresses to avoid rebasing; windows itself does this with it's own libraries (e.g. ntdll.dll, kernel32.dll, psapi.dll, etc. -- all have different start addresses by default)
On Windows, virtual memory is obtained from the system via a call to VirtualAlloc, and it is returned to the system via VirtualFree (Okay, technically VirtualAlloc farms out to NtAllocateVirtualMemory, but that's an implementation detail) (Contrast this to POSIX, where memory cannot be reclaimed). This process is slow (and IIRC, requires that you allocate in physical page sized chunks; typically 4kb or more). Windows also provides it's own heap functions (HeapAlloc, HeapFree, etc.) as part of a library known as RtlHeap, which is included as a part of Windows itself, upon which the C runtime (that is, malloc and friends) is typically implemented.
Windows also has quite a few legacy memory allocation APIs from the days when it had to deal with old 80386s, and these functions are now built on top of RtlHeap. For more information about the various APIs that control memory management in Windows, see this MSDN article: http://msdn.microsoft.com/en-us/library/ms810627 .
Note also that this means on Windows a single process an (and usually does) have more than one heap. (Typically, each shared library creates it's own heap.)
(Most of this information comes from "Secure Coding in C and C++" by Robert Seacord)
The stack
In X86 architercture the CPU executes operations with registers. The stack is only used for convenience reasons. You can save the content of your registers to stack before calling a subroutine or a system function and then load them back to continue your operation where you left. (You could to it manually without the stack, but it is a frequently used function so it has CPU support). But you can do pretty much anything without the stack in a PC.
For example an integer multiplication:
MUL BX
Multiplies AX register with BX register. (The result will be in DX and AX, DX containing the higher bits).
Stack based machines (like JAVA VM) use the stack for their basic operations. The above multiplication:
DMUL
This pops two values from the top of the stack and multiplies tem, then pushes the result back to the stack. Stack is essential for this kind of machines.
Some higher level programming languages (like C and Pascal) use this later method for passing parameters to functions: the parameters are pushed to the stack in left to right order and popped by the function body and the return values are pushed back. (This is a choice that the compiler manufacturers make and kind of abuses the way the X86 uses the stack).
The heap
The heap is an other concept that exists only in the realm of the compilers. It takes the pain of handling the memory behind your variables away, but it is not a function of the CPU or the OS, it is just a choice of housekeeping the memory block wich is given out by the OS. You could do this manyually if you want.
Accessing system resources
The operating system has a public interface how you can access its functions. In DOS parameters are passed in registers of the CPU. Windows uses the stack for passing parameters for OS functions (the Windows API).