Related
I have this piece of code in C:
int q = 10;
int s = 5;
int a[3];
printf("Address of a: %d\n", (int)a);
printf("Address of a[1]: %d\n", (int)&a[1]);
printf("Address of a[2]: %d\n", (int)&a[2]);
printf("Address of q: %d\n", (int)&q);
printf("Address of s: %d\n", (int)&s);
The output is:
Address of a: 2293584
Address of a[1]: 2293588
Address of a[2]: 2293592
Address of q: 2293612
Address of s: 2293608
So, I see that from a to a[2], memory addresses increases by 4 bytes each.
But from q to s, memory addresses decrease by 4 byte.
I wonder 2 things:
Does stack grow up or down? (It looks like both to me in this case)
What happens between a[2] and q memory addresses? Why there is a big memory difference there? (20 bytes).
Note: This is not homework question. I am curious on how stack works. Thanks for any help.
The behavior of stack (growing up or growing down) depends on the application binary interface (ABI) and how the call stack (aka activation record) is organized.
Throughout its lifetime a program is bound to communicate with other programs like OS. ABI determines how a program can communicate with another program.
The stack for different architectures can grow the either way, but for an architecture it will be consistent. Please check this wiki link. But, the stack's growth is decided by the ABI of that architecture.
For example, if you take the MIPS ABI, the call stack is defined as below.
Let us consider that function 'fn1' calls 'fn2'. Now the stack frame as seen by 'fn2' is as follows:
direction of | |
growth of +---------------------------------+
stack | Parameters passed by fn1(caller)|
from higher addr.| |
to lower addr. | Direction of growth is opposite |
| | to direction of stack growth |
| +---------------------------------+ <-- SP on entry to fn2
| | Return address from fn2(callee) |
V +---------------------------------+
| Callee saved registers being |
| used in the callee function |
+---------------------------------+
| Local variables of fn2 |
|(Direction of growth of frame is |
| same as direction of growth of |
| stack) |
+---------------------------------+
| Arguments to functions called |
| by fn2 |
+---------------------------------+ <- Current SP after stack
frame is allocated
Now you can see the stack grows downward. So, if the variables are allocated to the local frame of the function, the variable's addresses actually grows downward. The compiler can decide on the order of variables for memory allocation. (In your case it can be either 'q' or 's' that is first allocated stack memory. But, generally the compiler does stack memory allocation as per the order of the declaration of the variables).
But in case of the arrays, the allocation has only single pointer and the memory needs to be allocated will be actually pointed by a single pointer. The memory needs to be contiguous for an array. So, though stack grows downward, for arrays the stack grows up.
This is actually two questions. One is about which way the stack grows when one function calls another (when a new frame is allocated), and the other is about how variables are laid out in a particular function's frame.
Neither is specified by the C standard, but the answers are a little different:
Which way does the stack grow when a new frame is allocated -- if function f() calls function g(), will f's frame pointer be greater or less than g's frame pointer? This can go either way -- it depends on the particular compiler and architecture (look up "calling convention"), but it is always consistent within a given platform (with a few bizarre exceptions, see the comments). Downwards is more common; it's the case in x86, PowerPC, MIPS, SPARC, EE, and the Cell SPUs.
How are a function's local variables laid out inside its stack frame? This is unspecified and completely unpredictable; the compiler is free to arrange its local variables however it likes to get the most efficient result.
The direction is which stacks grow is architecture specific. That said, my understanding is that only a very few hardware architectures have stacks that grow up.
The direction that a stack grows is independent of the the layout of an individual object. So while the stack may grown down, arrays will not (i.e &array[n] will always be < &array[n+1]);
There's nothing in the standard that mandates how things are organized on the stack at all. In fact, you could build a conforming compiler that didn't store array elements at contiguous elements on the stack at all, provided it had the smarts to still do array element arithmetic properly (so that it knew, for example, that a1 was 1K away from a[0] and could adjust for that).
The reason you may be getting different results is because, while the stack may grow down to add "objects" to it, the array is a single "object" and it may have ascending array elements in the opposite order. But it's not safe to rely on that behaviour since direction can change and variables could be swapped around for a variety of reasons including, but not limited to:
optimization.
alignment.
the whims of the person the stack management part of the compiler.
See here for my excellent treatise on stack direction :-)
In answer to your specific questions:
Does stack grow up or down?
It doesn't matter at all (in terms of the standard) but, since you asked, it can grow up or down in memory, depending on the implementation.
What happen between a[2] and q memory addresses? Why there are a big memory difference there? (20 bytes)?
It doesn't matter at all (in terms of the standard). See above for possible reasons.
On an x86, the memory "allocation" of a stack frame consists simply of subtracting the necessary number of bytes from the stack pointer (I believe other architectures are similar). In this sense, I guess the stack growns "down", in that the addresses get progressively smaller as you call more deeply into the stack (but I always envision the memory as starting with 0 in the top left and getting larger addresses as you move to the right and wrap down, so in my mental image the stack grows up...). The order of the variables being declared may not have any bearing on their addresses -- I believe the standard allows for the compiler to reorder them, as long as it doesn't cause side effects (someone please correct me if I'm wrong). They're just stuck somewhere into that gap in the used addresses created when it subtracts the number of bytes from the stack pointer.
The gap around the array may be some kind of padding, but it's mysterious to me.
First of all, its 8 bytes of unused space in memory(its not 12, remember stack grows downward, So the space which is not allocated is from 604 to 597). and why?.
Because every data type takes space in memory starting from the address divisible by its size. In our case array of 3 integers take 12 bytes of memory space and 604 is not divisible by 12. So it leaves empty spaces until it encounters a memory address which is divisible by 12, it is 596.
So the memory space allocated to the array is from 596 to 584. But as array allocation is in continuation, So first element of array starts from 584 address and not from 596.
grows downward and this is because of the little endian byte order standard when it comes to the set of data in memory.
One way you could look at it is that the stack DOES grow upward if you look at memory from 0 from the top and max from the bottom.
The reason for the stack growing downward is to be able to dereference from the perspective of the stack or base pointer.
Remember that dereferencing of any type increases from the lowest to highest address. Since the Stack grows downward (highest to lowest address) this lets you treat the stack like dynamic memory.
This is one reason why so many programming and scripting languages use a stack-based virtual machine rather than a register-based.
It depends on the architecture. To check your own system, use this code from GeeksForGeeks:
// C program to check whether stack grows
// downward or upward.
#include<stdio.h>
void fun(int *main_local_addr)
{
int fun_local;
if (main_local_addr < &fun_local)
printf("Stack grows upward\n");
else
printf("Stack grows downward\n");
}
int main()
{
// fun's local variable
int main_local;
fun(&main_local);
return 0;
}
The compiler is free to allocate local (auto) variables at any place on the local stack frame, you cannot reliably infer the stack growing direction purely from that. You can infer the stack grow direction from comparing the addresses of nested stack frames, ie comparing the address of a local variable inside the stack frame of a function compared to it's callee :
#include <stdio.h>
int f(int *x)
{
int a;
return x == NULL ? f(&a) : &a - x;
}
int main(void)
{
printf("stack grows %s!\n", f(NULL) < 0 ? "down" : "up");
return 0;
}
I don't think it's deterministic like that. The a array seems to "grow" because that memory should be allocated contiguously. However, since q and s are not related to one another at all, the compiler just sticks each of them in an arbitrary free memory location within the stack, probably the ones that fit an integer size the best.
What happened between a[2] and q is that the space around q's location wasn't large enough (ie, wasn't bigger than 12 bytes) to allocate a 3 integer array.
My stack appears to extend towards lower numbered addresses.
It may be different on another computer, or even on my own computer if I use a different compiler invocation. ... or the compiler muigt choose not to use a stack at all (inline everything (functions and variables if I didn't take the address of them)).
$ cat stack.c
#include <stdio.h>
int stack(int x) {
printf("level %d: x is at %p\n", x, (void*)&x);
if (x == 0) return 0;
return stack(x - 1);
}
int main(void) {
stack(4);
return 0;
}
$ /usr/bin/gcc -Wall -Wextra -std=c89 -pedantic stack.c
$ ./a.out
level 4: x is at 0x7fff7781190c
level 3: x is at 0x7fff778118ec
level 2: x is at 0x7fff778118cc
level 1: x is at 0x7fff778118ac
level 0: x is at 0x7fff7781188c
The stack grows down (on x86). However, the stack is allocated in one block when the function loads, and you don't have a guarantee what order the items will be on the stack.
In this case, it allocated space for two ints and a three-int array on the stack. It also allocated an additional 12 bytes after the array, so it looks like this:
a [12 bytes]
padding(?) [12 bytes]
s [4 bytes]
q [4 bytes]
For whatever reason, your compiler decided that it needed to allocate 32 bytes for this function, and possibly more. That's opaque to you as a C programmer, you don't get to know why.
If you want to know why, compile the code to assembly language, I believe that it's -S on gcc and /S on MS's C compiler. If you look at the opening instructions to that function, you'll see the old stack pointer being saved and then 32 (or something else!) being subtracted from it. From there, you can see how the code accesses that 32-byte block of memory and figure out what your compiler is doing. At the end of the function, you can see the stack pointer being restored.
It depends on your operating system and your compiler.
Stack does grow down. So f(g(h())), the stack allocated for h will start at lower address then g and g's will be lower then f's. But variables within the stack have to follow the C specification,
http://c0x.coding-guidelines.com/6.5.8.html
1206 If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values.
&a[0] < &a[1], must always be true, regardless of how 'a' is allocated
I just want to mention that for absolutely beginners. Words "up" or "down" may also sounds confusing. It depends on how you draw the memory layout. Normally, we draw the smaller address at the bottom, and bigger address at top, like the following:
fff0
ffe0
ffd0
...
0010
0000
Then growing down means the address becomes smaller.
It's a bit hard to formulate what I want to know in a single question, so I'll try to break it down.
For example purposes, let's say we have the following struct:
struct X {
uint8_t a;
uint16_t b;
uint32_t c;
};
Is it true that the compiler is guaranteed to never rearrange the order of X's members, only add padding where necessary? In other words, is it always true that offsetof(X, a) < offsetof(X, c)?
Is it true that the compiler will pick the largest alignment among X's members and use it to align objects of type X (i.e. addresses of X instances will be divisible by the largest alignment among X's members)?
Since malloc does not know anything about the type of objects that we're going to store when we allocate a buffer, how does it choose an alignment for the returned address? Does it simply return an adress that is divisible by the largest alignment possible (in which case, no matter what structure we put in the buffer, the memory accesses will always be aligned)?
Yes
No, the compiler will use its knowledge of the target host hardware to select the best alignment.
See question 2.
Since malloc does not know anything about the type of objects that we're going to store when we allocate a buffer, how does it choose an alignment for the returned address?
malloc(3) returns "memory that is suitably aligned for any kind of variable."
Does it simply return an adress that is divisible by the largest alignment possible (in which case, no matter what structure we put in the buffer, the memory accesses will always be aligned)?
Yes, but watch your compliance with the strict aliasing rule.
The compiler will do whatever is most beneficial on that computer in the largest number of circumstances. On most platforms loading bus wide values on bus width offsets is fastest.
That means that generally on 32-bit computers compilers will chose to align 32-bit numbers on 4 byte offsets. On 64-bit computers 64-bit values are aligned on 8 byte offsets.
On most computers smaller values like 8-bit and 16-bit values are slower to load. Probably all 4 or 8 bytes around it are loaded and the byte or two bytes you need are masked off.
When you have special circumstances you can override the compiler by specifying the alignment and the padding. You might do this when you know fast loading is not important, but you really want to pack the data tightly. Or when you are playing very subtle tricks with casting and unions.
Memory allocation routines on almost any modern computer will always return memory that is aligned on at least the bus width of the platform ( e.g. 4 or 8 bytes ) - or even more - like 16 byte alignment.
When you call "malloc" you are responsible for knowing the size of the structures you need. Luckily the compiler will tell you the size of any structure with "sizeof". That means that if you pack a structure to save memory, sizeof will return a smaller value than an unpacked structure. So you really will save memory - if you are allocating small structures in large arrays of them.
If you allocate small packed structures one at a time - then yes - if you pack them or not it won't make any difference. That is because when you allocate some odd small piece of memory - the allocator will actually use significantly more memory than that. It will allocate an convenient sized block of memory for you, and then an additional block of memory for itself to keep track of you allocation.
Which is why if you care about memory use and want to pack your structures - you definitely don't want to allocate them one at time.
In programming languages like C and C++, people often refer to static and dynamic memory allocation. I understand the concept but the phrase "All memory was allocated (reserved) during compile time" always confuses me.
Compilation, as I understand it, converts high level C/C++ code to machine language and outputs an executable file. How is memory "allocated" in a compiled file ? Isn't memory always allocated in the RAM with all the virtual memory management stuff ?
Isn't memory allocation by definition a runtime concept ?
If I make a 1KB statically allocated variable in my C/C++ code, will that increase the size of the executable by the same amount ?
This is one of the pages where the phrase is used under the heading "Static allocation".
Back To Basics: Memory allocation, a walk down the history
Memory allocated at compile-time means the compiler resolves at compile-time where certain things will be allocated inside the process memory map.
For example, consider a global array:
int array[100];
The compiler knows at compile-time the size of the array and the size of an int, so it knows the entire size of the array at compile-time. Also a global variable has static storage duration by default: it is allocated in the static memory area of the process memory space (.data/.bss section). Given that information, the compiler decides during compilation in what address of that static memory area the array will be.
Of course that memory addresses are virtual addresses. The program assumes that it has its own entire memory space (From 0x00000000 to 0xFFFFFFFF for example). That's why the compiler could do assumptions like "Okay, the array will be at address 0x00A33211". At runtime that addresses are translated to real/hardware addresses by the MMU and OS.
Value initialized static storage things are a bit different. For example:
int array[] = { 1 , 2 , 3 , 4 };
In our first example, the compiler only decided where the array will be allocated, storing that information in the executable.
In the case of value-initialized things, the compiler also injects the initial value of the array into the executable, and adds code which tells the program loader that after the array allocation at program start, the array should be filled with these values.
Here are two examples of the assembly generated by the compiler (GCC4.8.1 with x86 target):
C++ code:
int a[4];
int b[] = { 1 , 2 , 3 , 4 };
int main()
{}
Output assembly:
a:
.zero 16
b:
.long 1
.long 2
.long 3
.long 4
main:
pushq %rbp
movq %rsp, %rbp
movl $0, %eax
popq %rbp
ret
As you can see, the values are directly injected into the assembly. In the array a, the compiler generates a zero initialization of 16 bytes, because the Standard says that static stored things should be initialized to zero by default:
8.5.9 (Initializers) [Note]:
Every object of static storage duration is zero-initialized at
program startup before any other initial- ization takes place. In some
cases, additional initialization is done later.
I always suggest people to disassembly their code to see what the compiler really does with the C++ code. This applies from storage classes/duration (like this question) to advanced compiler optimizations. You could instruct your compiler to generate the assembly, but there are wonderful tools to do this on the Internet in a friendly manner. My favourite is GCC Explorer.
Memory allocated at compile time simply means there will be no further allocation at run time -- no calls to malloc, new, or other dynamic allocation methods. You'll have a fixed amount of memory usage even if you don't need all of that memory all of the time.
Isn't memory allocation by definition a runtime concept?
The memory is not in use prior to run time, but immediately prior to execution starting its allocation is handled by the system.
If I make a 1KB statically allocated variable in my C/C++ code, will that increase the size of the executable by the same amount?
Simply declaring the static will not increase the size of your executable more than a few bytes. Declaring it with an initial value that is non-zero will (in order to hold that initial value). Rather, the linker simply adds this 1KB amount to the memory requirement that the system's loader creates for you immediately prior to execution.
Memory allocated in compile time means that when you load the program, some part of the memory will be immediately allocated and the size and (relative) position of this allocation is determined at compile time.
char a[32];
char b;
char c;
Those 3 variables are "allocated at compile time", it means that the compiler calculates their size (which is fixed) at compile time. The variable a will be an offset in memory, let's say, pointing to address 0, b will point at address 33 and c at 34 (supposing no alignment optimization). So, allocating 1Kb of static data will not increase the size of your code, since it will just change an offset inside it. The actual space will be allocated at load time.
Real memory allocation always happens in run time, because the kernel needs to keep track of it and to update its internal data structures (how much memory is allocated for each process, pages and so on). The difference is that the compiler already knows the size of each data you are going to use and this is allocated as soon as your program is executed.
Remember also that we are talking about relative addresses. The real address where the variable will be located will be different. At load time the kernel will reserve some memory for the process, lets say at address x, and all the hard coded addresses contained in the executable file will be incremented by x bytes, so that variable a in the example will be at address x, b at address x+33 and so on.
Adding variables on the stack that take up N bytes doesn't (necessarily) increase the bin's size by N bytes. It will, in fact, add but a few bytes most of the time.
Let's start off with an example of how adding a 1000 chars to your code will increase the bin's size in a linear fashion.
If the 1k is a string, of a thousand chars, which is declared like so
const char *c_string = "Here goes a thousand chars...999";//implicit \0 at end
and you then were to vim your_compiled_bin, you'd actually be able to see that string in the bin somewhere. In that case, yes: the executable will be 1 k bigger, because it contains the string in full.
If, however you allocate an array of ints, chars or longs on the stack and assign it in a loop, something along these lines
int big_arr[1000];
for (int i=0;i<1000;++i) big_arr[i] = some_computation_func(i);
then, no: it won't increase the bin... by 1000*sizeof(int)
Allocation at compile time means what you've now come to understand it means (based on your comments): the compiled bin contains information the system requires to know how much memory what function/block will need when it gets executed, along with information on the stack size your application requires. That's what the system will allocate when it executes your bin, and your program becomes a process (well, the executing of your bin is the process that... well, you get what I'm saying).
Of course, I'm not painting the full picture here: The bin contains information about how big a stack the bin will actually be needing. Based on this information (among other things), the system will reserve a chunk of memory, called the stack, that the program gets sort of free reign over. Stack memory still is allocated by the system, when the process (the result of your bin being executed) is initiated. The process then manages the stack memory for you. When a function or loop (any type of block) is invoked/gets executed, the variables local to that block are pushed to the stack, and they are removed (the stack memory is "freed" so to speak) to be used by other functions/blocks. So declaring int some_array[100] will only add a few bytes of additional information to the bin, that tells the system that function X will be requiring 100*sizeof(int) + some book-keeping space extra.
On many platforms, all of the global or static allocations within each module will be consolidated by the compiler into three or fewer consolidated allocations (one for uninitialized data (often called "bss"), one for initialized writable data (often called "data"), and one for constant data ("const")), and all of the global or static allocations of each type within a program will be consolidated by the linker into one global for each type. For example, assuming int is four bytes, a module has the following as its only static allocations:
int a;
const int b[6] = {1,2,3,4,5,6};
char c[200];
const int d = 23;
int e[4] = {1,2,3,4};
int f;
it would tell the linker that it needed 208 bytes for bss, 16 bytes for "data", and 28 bytes for "const". Further, any reference to a variable would be replaced with an area selector and offset, so a, b, c, d, and e, would be replaced by bss+0, const+0, bss+4, const+24, data+0, or bss+204, respectively.
When a program is linked, all of the bss areas from all the modules are be concatenated together; likewise the data and const areas. For each module, the address of any bss-relative variables will be increased by the size of all preceding modules' bss areas (again, likewise with data and const). Thus, when the linker is done, any program will have one bss allocation, one data allocation, and one const allocation.
When a program is loaded, one of four things will generally happen depending upon the platform:
The executable will indicate how many bytes it needs for each kind of data and--for the initialized data area, where the initial contents may be found. It will also include a list of all the instructions which use a bss-, data-, or const- relative address. The operating system or loader will allocate the appropriate amount of space for each area and then add the starting address of that area to each instruction which needs it.
The operating system will allocate a chunk of memory to hold all three kinds of data, and give the application a pointer to that chunk of memory. Any code which uses static or global data will dereference it relative to that pointer (in many cases, the pointer will be stored in a register for the lifetime of an application).
The operating system will initially not allocate any memory to the application, except for what holds its binary code, but the first thing the application does will be to request a suitable allocation from the operating system, which it will forevermore keep in a register.
The operating system will initially not allocate space for the application, but the application will request a suitable allocation on startup (as above). The application will include a list of instructions with addresses that need to be updated to reflect where memory was allocated (as with the first style), but rather than having the application patched by the OS loader, the application will include enough code to patch itself.
All four approaches have advantages and disadvantages. In every case, however, the compiler will consolidate an arbitrary number of static variables into a fixed small number of memory requests, and the linker will consolidate all of those into a small number of consolidated allocations. Even though an application will have to receive a chunk of memory from the operating system or loader, it is the compiler and linker which are responsible for allocating individual pieces out of that big chunk to all the individual variables that need it.
The core of your question is this: "How is memory "allocated" in a compiled file? Isn't memory always allocated in the RAM with all the virtual memory management stuff? Isn't memory allocation by definition a runtime concept?"
I think the problem is that there are two different concepts involved in memory allocation. At its basic, memory allocation is the process by which we say "this item of data is stored in this specific chunk of memory". In a modern computer system, this involves a two step process:
Some system is used to decide the virtual address at which the item will be stored
The virtual address is mapped to a physical address
The latter process is purely run time, but the former can be done at compile time, if the data have a known size and a fixed number of them is required. Here's basically how it works:
The compiler sees a source file containing a line that looks a bit like this:
int c;
It produces output for the assembler that instructs it to reserve memory for the variable 'c'. This might look like this:
global _c
section .bss
_c: resb 4
When the assembler runs, it keeps a counter that tracks offsets of each item from the start of a memory 'segment' (or 'section'). This is like the parts of a very large 'struct' that contains everything in the entire file it doesn't have any actual memory allocated to it at this time, and could be anywhere. It notes in a table that _c has a particular offset (say 510 bytes from the start of the segment) and then increments its counter by 4, so the next such variable will be at (e.g.) 514 bytes. For any code that needs the address of _c, it just puts 510 in the output file, and adds a note that the output needs the address of the segment that contains _c adding to it later.
The linker takes all of the assembler's output files, and examines them. It determines an address for each segment so that they won't overlap, and adds the offsets necessary so that instructions still refer to the correct data items. In the case of uninitialized memory like that occupied by c (the assembler was told that the memory would be uninitialized by the fact that the compiler put it in the '.bss' segment, which is a name reserved for uninitialized memory), it includes a header field in its output that tells the operating system how much needs to be reserved. It may be relocated (and usually is) but is usually designed to be loaded more efficiently at one particular memory address, and the OS will try to load it at this address. At this point, we have a pretty good idea what the virtual address is that will be used by c.
The physical address will not actually be determined until the program is running. However, from the programmer's perspective the physical address is actually irrelevant—we'll never even find out what it is, because the OS doesn't usually bother telling anyone, it can change frequently (even while the program is running), and a main purpose of the OS is to abstract this away anyway.
An executable describes what space to allocate for static variables. This allocation is done by the system, when you run the executable. So your 1kB static variable won't increase the size of the executable with 1kB:
static char[1024];
Unless of course you specify an initializer:
static char[1024] = { 1, 2, 3, 4, ... };
So, in addition to 'machine language' (i.e. CPU instructions), an executable contains a description of the required memory layout.
Memory can be allocated in many ways:
in application heap (whole heap is allocated for your app by OS when the program starts)
in operating system heap (so you can grab more and more)
in garbage collector controlled heap (same as both above)
on stack (so you can get a stack overflow)
reserved in code/data segment of your binary (executable)
in remote place (file, network - and you receive a handle not a pointer to that memory)
Now your question is what is "memory allocated at compile time". Definitely it is just an incorrectly phrased saying, which is supposed to refer to either binary segment allocation or stack allocation, or in some cases even to a heap allocation, but in that case the allocation is hidden from programmer eyes by invisible constructor call. Or probably the person who said that just wanted to say that memory is not allocated on heap, but did not know about stack or segment allocations.(Or did not want to go into that kind of detail).
But in most cases person just wants to say that the amount of memory being allocated is known at compile time.
The binary size will only change when the memory is reserved in the code or data segment of your app.
You are right. Memory is actually allocated (paged) at load time, i.e. when the executable file is brought into (virtual) memory. Memory can also be initialized on that moment. The compiler just creates a memory map. [By the way, stack and heap spaces are also allocated at load time !]
I think you need to step back a bit. Memory allocated at compile time.... What can that mean? Can it mean that memory on chips that have not yet been manufactured, for computers that have not yet been designed, is somehow being reserved? No. No, time travel, no compilers that can manipulate the universe.
So, it must mean that the compiler generates instructions to allocate that memory somehow at runtime. But if you look at it in from the right angle, the compiler generates all instructions, so what can be the difference. The difference is that the compiler decides, and at runtime, your code can not change or modify its decisions. If it decided it needed 50 bytes at compile time, at runtime, you can't make it decide to allocate 60 -- that decision has already been made.
If you learn assembly programming, you will see that you have to carve out segments for the data, the stack, and code, etc. The data segment is where your strings and numbers live. The code segment is where your code lives. These segments are built into the executable program. Of course the stack size is important as well... you wouldn't want a stack overflow!
So if your data segment is 500 bytes, your program has a 500 byte area. If you change the data segment to 1500 bytes, the size of the program will be 1000 bytes larger. The data is assembled into the actual program.
This is what is going on when you compile higher level languages. The actual data area is allocated when it is compiled into an executable program, increasing the size of the program. The program can request memory on the fly, as well, and this is dynamic memory. You can request memory from the RAM and the CPU will give it to you to use, you can let go of it, and your garbage collector will release it back to the CPU. It can even be swapped to a hard disk, if necessary, by a good memory manager. These features are what high level languages provide you.
I would like to explain these concepts with the help of few diagrams.
This is true that memory cannot be allocated at compile time, for sure.
But, then what happens in fact at compile time.
Here comes the explanation.
Say, for example a program has four variables x,y,z and k.
Now, at compile time it simply makes a memory map, where the location of these variables with respect to each other is ascertained.
This diagram will illustrate it better.
Now imagine, no program is running in memory.
This I show by a big empty rectangle.
Next, the first instance of this program is executed.
You can visualize it as follows.
This is the time when actually memory is allocated.
When second instance of this program is running, the memory would look like as follows.
And the third ..
So on and so forth.
I hope this visualization explains this concept well.
There is very nice explanation given in the accepted answer. Just in case i will post the link which i have found useful.
https://www.tenouk.com/ModuleW.html
One among the many thing what a compiler does is that create and maintain a SYMTAB(Symbol Table under the section.symtab). This will be purely created and maintained by compilers using any Data Structure(List, Trees...etc) and not for the developers eyes. Any access request made by the developers this is where it will hit first.
Now about the Symbol Table,
We only need to know about the two columns Symbol Name and the Offset.
Symbol Name column will have the variable names and the offset column will have the offset value.
Lets see this with an example:
int a , b , c ;
Now we all know that the register Stack_Pointer(sp) points to the Top of the Stack Memory. Let that be sp = 1000.
Now the Symbol Name column will have three values in it a then b and then c. Reminding you all that variable a will be at the top of the stack memory.
So a's equivalent offset value will be 0.
(Compile Time Offset_Value)
Then b and its equivalent offset value will be 1. (Compile Time Offset_Value)
Then c and its equivalent offset value will be 2. (Compile Time Offset_Value)
Now calculating a's Physical address (or) Runtime Memory Address = (sp + offset_value of a)
= (1000 + 0) = 1000
Now calculating b's Physical address (or) Runtime Memory Address = (sp - offset_value of b)
= (1000 - 1) = 996
Now calculating c's Physical address (or) Runtime Memory Address = (sp - offset_value of c)
= (1000 - 2) = 992
Therefore at the time of the compilation we will only be having the offset values and only during the runtime the actual physical addresses are calculated.
Note:
Stack_Pointer value will be assigned only after the program is loaded. Pointer Arithmetic happens between the Stack_Pointer register and the variables offset to calculate the variables Physical Address.
"POINTERS AND POINTER ARITHMETIC, WAY OF THE PROGRAMMING WORLD"
Share what I learned about this question.
You can understand this issue in two steps:
First, the compilation step: the compiler generates the binary. In Linux system, binary is a file in ELF (Executable and Linkable Format) format. ELF file contains several sections, including .bss and .data
.data
Initialized data, with read/write access rights
.bss
Uninitialized data, with read/write access rights (=WA)
.data and .bss just map to the segments of process's memory layout, which contains static variables.
second, the loading step. When the binary file get executed, the ELF file will be loaded into process's memory. The loader can find static variables' information from ELF file.
Simply speaking, the compiler and the loader follow the same standard to communicate with each other, and the standard is ELF format.
Suppose I declare an array as int myarray[5]
Or declare it as int*myarray=malloc(5*sizeof(int))
Will both the declarations set equal amount of memory in number of bytes?
Without considering that the former declaration is for the stack and the latter on the heap.
Thank you!
There's a fundamental difference, that may not be apparent in the way you use myarray:
int myarray[5]; declares an array of five integers, and the array is an automatic variable (and it is uninitialized).
int * myarray = malloc(5 * sizeof(int)); declares a variable that is a pointer to an int (also as an automatic variable), and that pointer is initialized with the result of a library call. That library call promises to make the resulting pointer point to a region of memory that's big enough to store five consecutive integers.
Because of pointer arithmetic, array-to-pointer decay and the convention that a[i] is the same as *(a + i), you can use both variables in the same way, namely as myarray[i]. This is of course by design.
If you're looking for a difference, then maybe the following helps: The array-of-five-ints is a single object, and it has a definite size. By contrast, the malloc library call doesn't create any objects. It just sets aside enough memory (and suitably aligned), but it could for example allocate a lot more memory.
(In C++ there's of course the additional distinction between memory and objects.)
Neither is guaranteed to allocate exactly 5*sizeof(int) bytes, though both will give you at least that much space (assuming no allocation failures or stack exhaustion).
In the first case, the stack variable may be surrounded by alignment padding, and/or stack canaries (depending on compile options). These could result in the stack pointer being adjusted by more than 5*sizeof(int) bytes.
In the second case, you allocate a int * on the stack (sizeof(int *) bytes), plus the space that malloc returns. malloc may allocate additional memory in the form of allocation tracking structures, alignment padding, linked-list pointers, etc. Thus, in that case you are also not guaranteed to allocate exactly 5*sizeof(int) bytes.
If you want to be very precise about your memory usage, the mmap function allows you to request pages of virtual memory from the OS. The memory you request this way will be precisely the amount you request (ignoring the space taken up in the kernel to track those allocations).
Short Answer: No normally the latter use a small larger memory.
Long Answer:
Memory management will certainly use some extra memory to manage the returned pointer and be able to track it, and free it at a later time and you declare an extra pointer to point to that memory. So its actual memory is sizeof(int*) + malloc_overhead. But in first case you use exactly 5 int(plus alignment possibly).
Dynamic allocation will require at least a few extra bytes; however many bytes for the pointer variable in addition to the 5 int-sized elements, and potentially some extra bytes to track the size of the allocated region so that it can be freed properly.
Usually data is aligned at power of two addresses depending on its size.
How should I align a struct or class with size of 20 bytes or another non-power-of-two size?
I'm creating a custom stack allocator so I guess that the compiler wont align data for me since I'm working with a continuous block of memory.
Some more context:
I have an Allocator class that uses malloc() to allocate a large amount of data.
Then I use void* allocate(U32 size_of_object) method to return the pointer that where I can store whether objects I need to store.
This way all objects are stored in the same region of memory and it will hopefully fit in the cache reducing cache misses.
C++11 has the alignof operator specifically for this purpose. Don't use any of the tricks mentioned in other posts, as they all have edge cases or may fail for certain compiler optimisations. The alignof operator is implemented by the compiler and knows the exact alignment being used.
See this description of c++11's new alignof operator
Although the compiler (or interpreter) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition the data structure as a whole may be padded with a final unnamed member. This allows each member of an array of structures to be properly aligned. http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86
This says that the compiler takes care of it for you, 99.9% of the time. As for how to force an object to align a specific way, that is compiler specific, and only works in certain circumstances.
MSVC: http://msdn.microsoft.com/en-us/library/83ythb65.aspx
__declspec(align(20))
struct S{ int a, b, c, d; };
//must be less than or equal to 20 bytes
GCC: http://gcc.gnu.org/onlinedocs/gcc-3.4.0/gcc/Type-Attributes.html
struct S{ int a, b, c, d; }
__attribute__ ((aligned (20)));
I don't know of a cross-platform way (including macros!) to do this, but there's probably neat macro somewhere.
Unless you want to access memory directly, or squeeze maximum data in a block of memory you don't worry about alignment -- the compiler takes case of that for you.
Due to the way processor data buses work, what you want to avoid is 'mis-aligned' access. Usually you can read a 32 bit value in a single access from addresses which are multiples of four; if you try to read it from an address that's not such a multiple, the CPU may have to grab it in two or more pieces. So if you're really worrying about things at this level of detail, what you need to be concerned about is not so much the overall struct, as the pieces within it. You'll find that compilers will frequently pad out structures with dummy bytes to ensure aligned access, unless you specifically force them not to with a pragma.
Since you've now added that you actually want to write your own allocator, the answer is straight-forward: Simply ensure that your allocator returns a pointer whose value is a multiple of the requested size. The object's size itself will already come suitably adjusted (via internal padding) so that all member objects themselves are properly aligned, so if you request sizeof(T) bytes, all your allocator needs to do is to return a pointer whose value is divisible by sizeof(T).
If your object does indeed have size 20 (as reported by sizeof), then you have nothing further to worry about. (On a 64-bit platform, the object would probably be padded to 24 bytes.)
Update: In fact, as I only now came to realize, strictly speaking you only need to ensure that the pointer is aligned, recursively, for the largest member of your type. That may be more efficient, but aligning to the size of the entire type is definitely not getting it wrong.
How should I align a struct or class with size of 20 bytes or another non-power-of-two size?
Alignment is CPU-specific, so there is no answer to this question without, at least, knowing the target CPU.
Generally speaking, alignment isn't something that you have to worry about; your compiler will have the rules implemented for you. It does come up once in a while, like when writing an allocator. The classic solution is discussed in The C Programming Language (K&R): use the worst possible alignment. malloc does this, although it's phrased as, "the pointer returned if the allocation succeeds shall be suitably aligned so that it may be assigned to a pointer to any type of object."
The way to do that is to use a union (the elements of a union are all allocated at the union's base address, and the union must therefore be aligned in such a way that each element could exist at that address; i.e., the union's alignment will be the same as the alignment of the element with the strictest rules):
typedef Align long;
union header {
// the inner struct has the important bookeeping info
struct {
unsigned size;
header* next;
} s;
// the align member only exists to make sure header_t's are always allocated
// using the alignment of a long, which is probably the worst alignment
// for the target architecture ("worst" == "strictest," something that meets
// the worst alignment will also meet all better alignment requirements)
Align align;
};
Memory is allocated by creating an array (using somthing like sbrk()) of headers large enough to satisfy the request, plus one additional header element that actually contains the bookkeeping information. If the array is called arry, the bookkeeping information is at arry[0], while the pointer returned points at arry[1] (the next element is meant for walking the free list).
This works, but can lead to wasted space ("In Sun's HotSpot JVM, object storage is aligned to the nearest 64-bit boundary"). I'm aware of a better approach that tries to get a type-specific alignment instead of "the alignment that will work for anything."
Compilers also often have compiler-specific commands. They aren't standard, and they require that you know the correct alignment requirements for the types in question. I would avoid them.