when do global variables allocate their memory? - c++

It's been bothering me for a while but I didn't find any good resource about this matter. I have some global variables in my code. It's obvious that they are initialized in some order but is the memmory needed for all those objects reserved before any initialization take place?
here is the simple example of what might go wrong in my code and how I can use the answer:
I had a map<RTTI, object*> objectPool which holds samples of every class in my code, which i used to load objects from a file. To create those samples I use some global variables just to introduce class instance to objectPool. But sometimes those sample instances were initialized before ObjectPool itself. And that generated runtime error.
To fix that error I used some delayed initializer map<RTTI,object*>* lateInitializedObjectPool;. Now every instance first check if the objectPool is initialized and initilize it if not and then intoduce itself to the object pool. It seems to work fine but I'm worried if even the memmory needed for object pool pointer is not reserved before other classes begin to introduce themselves and that may cause access violation.

Variables declared at namespace scope (as opposed to in classes or functions) have the space for the objects themselves (sizeof(ObjectType)) allocated by the executable (or DLL) loader. If the object is a POD that uses aggregate initialization, then it typically gets its values set by having the linker write those values directly into the executable and the exe's loader simply blasting all of those into memory. Objects that don't use aggregate initialization get their values zeroed out initially.
After all of that, if any of these objects have constructors, then those constructors are executed before main is run. Thus, if any of those constructors dynamically allocate memory, that is when they do it. After the executable is loaded, but before main is run.

There's usually separate memory areas for variables that the compiler:
worked out initially contain all 0s - perhaps with a pre-main() constructor running to change their content
predetermined have a specific non-0 value, such that they can be written in a pre-constructed form into the executable image and page faulted in ready for use.
When I say a "separate memory area", I mean some memory the OS executable loader arranges for the process, just as per the stack or heap, but different in that these areas are of fixed pre-determined size. In UNIX, the all-0 memory region mentioned above is commonly known as the "BSS", the non-0 initialised area as "data" - see http://en.wikipedia.org/wiki/Data_segment for details.

C++ has the notion of "static storage duration". This refers to all kinds of variables that will take up a fixed amount of space during the execution of a program. These include not just globals, but also static variables at namespace, class and function level.
Note that the memory allocation in all cases can be done before main, but that the actual initialization differs. Also, some of them are zero-initialized before they're normally initialized. Precisely how all this happens is unspecified: the compiler may add a hidden function call, or the OS just happens to zero the process space anyway, etc.

Related

When is static memory allocated in C/C++? At compile time or at the very beginning of when a program is run?

Different sources say different things for me - some StackOverflow answers say that it is allocated at compile time - others say it is "defined" at compile time, and allocated at the very beginning of runtime ("load time" is what some called it), while others say it is allocated at compile time. When is static memory exactly allocated in C/C++? (if it is to do with "defining" variables - can someone tell me what it means to "define" a variable on the memory level - that would be highly appreciated!)
Also, how would you during runtime set a pointer to the start of the allocated static memory?
In typical tools, memory with static storage duration is arranged in multiple steps:
The compiler generates data in object modules (likely passing through some form of assembly code) that describes needs for various kinds of memory: memory initialized to zero, memory initialized to particular values and is read-only thereafter, memory initialized to particular values and may be modified, memory that does not need to be initialized, and possibly others. The compiler also includes initial data as necessary, information about symbols that refer to various places in the required memory, and other information. At this point, the allocation of memory is in forms roughly like “8 bytes are needed in the constant data section, and a symbol called foo should be set to their address.”
The linker combines this information into similar information in an executable file. It also resolves some or all information about symbols. At this point, the allocation of memory is in forms like “The initialized non-constant data section requires 3048 bytes, and here is the initial data for it. When it is assigned a virtual address, the following symbols should be adjusted: bar is at offset 124 from the start of the section, baz is at offset 900…”
The program loader reads this information, allocates locations in the virtual address space for it, and may read some of the data from the executable file into memory or inform the operating system where the data is to be found when it is needed. At this point, the places in the code that refer to various symbols have been modified according to the final values of those symbols.
The operating system allocates physical memory for the virtual addresses. Often, this is done “on demand” in pieces (memory pages) when a process attempts to access the memory in a specific page, rather than being done at the time the program is initially loaded.
All-in-all, static memory is not allocated at any particular time. It is a combination of many activities. The effect on the program is largely that it occurs the same as if it were all allocated when the program started, but the physical memory might only be allocated just before an instruction actually executes. (The physical memory can even be taken away from the process and restored to it later.)
The C standard says only this:
C11 5.1.2p1
[...]All objects with static storage duration shall be initialized (set to their initial values) before program startup. The manner and timing of such initialization are otherwise unspecified.
and
C11 6.2.4p2-3
2 The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address,33) and retains its last-stored value throughout its lifetime.34) If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
3 An object whose identifier is declared without the storage-class specifier _Thread_local, and either with external or internal linkage or with the storage-class specifier static, has static storage duration. Its lifetime is the entire execution of the program and its stored value is initialized only once, prior to program startup.
But... this is further made complicated by the as-if rule, the actual implementation need to do this only as far as observable side effects go.
In fact in Linux for example, one could argue that the variables with static storage duration are initialized and allocated by the compiler and the linker when producing executable file. When the program is run, the dynamic linker (ld.so) then prepares the program segments so that the initialized data is memory-mapped (mmap) from the executable image to RAM, default (zero-initialized) data is mapped from zeroed pages.
While the virtual memory is allocated by the compiler, linker and dynamic linker, the actual writable RAM page frames are allocated only when you write to a variable on a page for the first time...
but you do not need to know about this in basic cases. It is as if the memory for variables with static storage duration were allocated and initialized just before main was entered, even though this is not actually the case.
Static memory is allocated in two steps.
Step 1 is carried out by the linker as it lays out the executable image and says where the static variables live in relative address space.
Step 2 is carried out by the loader when the process memory is actually allocated.
In C++, static objects are initialized before entering main. If you're not careful with your code you can see objects that are still zeros even though they have constructors that would always change that. (The compiler does as much constant evaluation as it can, so toy examples won't show it.)

Why are local variables not set to zero?

Since global and static variables are initialized to 0 by default, why are local variables not initialized to 0 by default as well?
Because such zero-initializations take execution time. It would make your program significantly slower. Each time you call a function, the program would have to execute pointless overhead code, which sets the variables to zero.
Static variables persist for the whole lifetime of the program, so there you can afford the luxuary to zero-initialize them, because they are only initialized once. While locals are initialized in runtime.
It is not uncommon in realtime systems to enable a compiler option which stops the zero initialization of static storage objects as well. Such an option makes the program non-standard, but also makes it start up faster.
This is because global and static variables live in different memory regions than local variables.
uninitialized static and global variables live in the .bss segment, which is a memory region that is guaranteed to be initialized to zero on program startup, before the program enters `main'
explicitly initialized static and global variables are part of the actual application file, their value is determined at compile-time and loaded into memory together with the application
local variables are dynamically generated at runtime, by growing the stack. If your stack grows over a memory region that holds garbage, then your uninitialized local variables will contain garbage (garbage in, garbage out).
Because that would take time, and it's not always the case that you need them to be zero.
The allocation of local variables (typically on the CPU's hardware stack) is very fast, much less than one instruction per variable and basically independent of the size of the variables.
So any initialization code (which generally would not be independent of the size of the variables) would add a relatively massive amount of overhead, compared to the allocation, and since you cannot be sure that the initialization is needed, it would be very disruptive when optimizing for performance.
Global/static variables are different, they generally live in a segment of the program's binary that is set to 0 by the program loader anyway, so you get that "for free".
Mainly historical. Back when C was being defined, the zero
initialization of static variables was handled automatically by
the OS, and would occur anyway, where as the zero initialization
of local variables would require runtime. The first is still
true today on a lot of systmes (including all Unix and Windows).
The second is far less an issue, however; most compilers would
detect a superfluous initialization in most cases, and skip it,
and in the cases where the compiler couldn't do so, the rest of
the code would be complicated enough that the time required for
the initialization wouldn't be measurable. You can still
construct special cases where this wouldn't be the case, but
they're certainly very rare. However, the original C was
specified like this, and none of the committees have reviewed
the issue since.
Global and static variables are stored at Data Segment [Data in the uninitialised data segment is initialized by the kernel to arithmetic 0 before the program starts executing], while local variables are stored at call stack.
The global or static variables that are initialized by you explicitly will be stored in the .data segment (initialized data) and the uninitialized global or static variables are stored in the .bss (uninitialized data).
This .bss is not stored in the compiled .obj files because there is no data available for these variables (remember you have not initialized them with any certain data).
Now, when the OS loads an exe, it just looks at the size of the .bss segment, allocates that much memory, and zero-initializes it for you (exec). That's why it is necessary to initialize .bss segment to zero.
The local variables are not initialized because there is no such need to initialize them. They are stored at the stack level, so loading an exe will automatically load that amount of memory needed by the local variables. So, why to do extra initialization of local variables and make our program slower.
Suppose you need to call a function 100 times and if there is a local variable and suppose if it were
to initialise to 0 every time...oops there will be extra overhead and wastage of time.
on the other hand global variables are initialised only once.so we can afford its default initialising to 0.

C++ Global and Scoped integers initial values

A Four bytes memory slot is reserved for every defined integer. Uninitialised variable maintains the old value of that slot. hence, the initial value is somehow randomised.
int x = 5; // definition with initialisation
This fact in most C++ compilers as far as I know holds for scoped variables. But, when it comes to global variables. a value of zero will be set.
int x; // uninitialised definition
Why does the C++ Compiler behave differently regarding to the initial value of the global and scoped variables.
Is it fundamental?
The namespace level variables (which means global) belong to static storage duration, and as per the Standard, all variables with static storage duration are statically initialized, which means all bits are set 0:
§3.6.2/2 from the C++ Standard (n3242) says,
Variables with static storage duration (3.7.1) or thread storage duration (3.7.2) shall be zero-initialized (8.5) before any other initialization takes place.
In case of local variables with automatic storage duration, the Standard imposes no such requirement on the compilers. So the automatic variables are usually left uninitialized for performance reason — almost all major compilers choose this approach, though there might be a compiler which initializes the automatic variables also.
"A Four bytes memory slot is reserved for every defined integer.".
No, it isn't. Disregarding the "4 bytes" size, the main problem with the statement is that modern compilers often find a new location for a variable each time it's assigned to. This can be a register or some place in memory. There's a lot of smartness involved.
An uninitialized variable isn't written to, so in general there's not even a place assigned for it. Trying to read "it" might not produce a value at all; the compiler can fail outright to generate code for that.
Now globals are another matter. Since they can be read and written from anywhere, the compiler can't just find new places for them on each write. They necessarily have to stick to one place, and it can't realistically be a register. Often they're all allocated together in one chunk of memory. Zeroing that chunk can typically be done very efficiently. that's why globals are different.
As you might expect there are efficiency driven reasons behind this behavior as well.
Stack space is generally "allocated" simply by adjusting the stack pointer.
If you have 32 bytes of simple variables in a function then the compiler emits an instruction equivalent to "sp = sp - 32"
Any initialization of those variables would take additional code and execution time - hence they end up being initialized to apparently random values.
Global variables are another beast entirely.
Simple variables are effectively allocated by the program loader and can be located in what is commonly called "BSS". These variables take almost no space at all in the executable file. All of them can be merged together into a single block - so the executable image needs only specify the size of the block. Since the OS must ensure that a new process doesn't get to see any left-over data in memory from some now dead process, the memory needs to be filled with something - and you might as well fill it with zeros.
Global variables that are initialized to non-zeros actually do take up space in the executable file, they appear as a block of data and just get loaded into memory - there is no code in the executable to initialize these.
C++ also allows global variables that require code to be executed to initialize, C doesn't allow this.
For example "int x = rand();"
Get initialized at run time by code in the executable.
Try adding this global variable
int x[1024 * 1024];
and see if it makes a difference to the executable size.
Now try:
int x[1024 * 1024] = {1,2,3};
And see what difference that makes.

C++: Global variable as pointer

I am new to c++ and have one question to global variables. I see in many examples that global variables are pointers with addresses of the heap. So the pointers are in the memory for global/static variables and the data behind the addresses is on the heap, right?
Instead of this you can declare global (no-pointer) variables that are stored the data. So the data is stored in the memory for global/static variables and not on the heap.
Has this solution any disadvantages over the first solution with the pointers and the heap?
Edit:
First solution:
//global
Sport *sport;
//somewhere
sport = new Sport;
Second solution:
//global
Sport sport;
A disadvantage of storing your data in a global/static variable is that the size is fixed at compile time and can't be changed as opposed to heap storage where the size can be determined at runtime and grow or shrink repeatedly over the run. The lifetime is also fixed as the complete run of the program from start to finish for global/static variables as opposed to heap storage where it can be acquired and released (even repeatedly) all through the runtime of the program. On the other hand, global and static storage management is all handled for you by the compiler where as heap storage has to be explicitly managed by your code. So in summary, global/static storage is easier but not as flexible as heap storage.
You are right in your hypothesis of where the objects are located. About usage,
It's horses for courses. There is no definite rule, it depends on the design & the type of functionality you want to implement. For example:
One may choose the pointer version to achieve lazy initialization or polymorphic behavior, neither of which is possible with global non pointer object approach.
Right. Declared variables go in the DataSegment. And they sit there for the life of the program. You cannot free them. You cannot reallocate them. In Windows, the DataSegment is a fixed size....if you put everything there you may run out of memory (at least it used to be this way).

How is the memory layout of a C/C++ program?

I know that there are sections like Stack, Heap, Code and Data. Stack/Heap do they use the same section of memory as they can grow independently?
What is this code section? When I have a function is it a part of the stack or the code section?
Also what is this initialized/uninitialized data segment?
Are there read only memory section available? When I have a const variable, what is actually happening is it that the compiler marks a memory section as read only or does it put into a read only memory section.
Where are static data kept?
Where are global data kept?
Any good references/articles for the same?
I thought the memory sections and layout are OS independent and it has more to do with compiler. Doesn't Stack, Heap, Code, Data [Initialized, Uninitialized] segment occur in all the OS? When there is a static data, what is happening the compiler has understood it is static, what next, what will it do? It is the compiler which is managing the program and it should know what to do right? All compilers shouldn't they follow common standards?
There's very little that's actually definitive about C++ memory layouts. However, most modern OS's use a somewhat similar system, and the segments are separated based on permissions.
Code has execute permission. The other segments don't. In a Windows application, you can't just put some native code on the stack and execute. Linux offers the same functionality- it's in the x86 architecture.
Data is data that's part of the result (.exe, etc) but can't be written to. This section is basically where literals go. Only read permission in this section.
Those two segments are part of the resulting file. Stack and Heap are runtime allocated, instead of mapped off the hard drive.
Stack is essentially one, large (1MB or so, many compilers offer a setting for it) heap allocation. The compiler manages it for you.
Heap memory is memory that the OS returns to you through some process. Normally, heap is a heap (the data structure) of pointers to free memory blocks and their sizes. When you request one, it's given to you. Both read and write permissions here, but no execute.
There is read-only memory(ROM). However, this is just the Data section. You can't alter it at runtime. When you make a const variable, nothing special happens to it in memory. All that happens is that the compiler will only create certain instructions on it. That's it. x86 has no knowledge or notion of const- it's all in the compiler.
AFAIK:
Stack/Heap
do they use the same section of memory
as they can grow independently?
They can grow indipendently.
What is this code section?
A read-only segment where code and const data are stored.
When I have a function is it a part of the stack or
the code section?
The definition (code) of the function will be in the CS. The arguments of each call are passed on the stack.
Also what is this
initialized/uninitialized data
segment?
The data segment is where globals/static variables are stored.
Are there read only memory section
available?
The code segment. I suppose some OS's might offer primitives for creating custom read-only segments.
When I have a const variable, what is actually happening
is it that the compiler marks a memory
section as read only or does it put
into a read only memory section.
It goes into the CS.
Where are static data kept? Where are
global data kept?
The data segment.
I was in same dilemma when I was reading about memory layout's of C/C++. Here is the link which I followed to get the questions cleared.
http://www.geeksforgeeks.org/memory-layout-of-c-program/
The link's main illustration is added here:
I hope this helps 'the one' finding answers to similar question.
(Note: The following applies to Linux)
The stack and heap of a process both exist in the "same" part of a process's memory. The stack and heap grow towards each other (initially, when the process is started, the stack occupies the entire area that can be occupied by the combination of the stack and the heap; each memory allocation (malloc/free/new/delete) can push the boundary between the stack and the heap either up or down). The BSS section, also located on the same OS-allocated process space, is in its own section and contains global variables. Read-only data resides in the rodata section and contains such things as string literals. For example, if your code has the line:
char tmpStr[] = "hello";
Then, the portion of the source code containing "hello" will reside in the rodata section.
A good, thorough book on this is Randall E. Bryant's Computer Systems.
As an addendum to the answers, here is a quote from GotW that classifies some major memory areas (note the difference between free-store, which is what I would usually refer to as the heap, and the actual heap, which is the part managed through malloc/free). The article is a bit old so I don't know if it applies to modern C++; so far I haven't found a direct contradiction.
Const Data The const data area stores string literals and
other data whose values are known at compile
time. No objects of class type can exist in
this area. All data in this area is available
during the entire lifetime of the program. Further, all
of this data is read-only, and the
results of trying to modify it are undefined.
This is in part because even the underlying
storage format is subject to arbitrary
optimization by the implementation. For
example, a particular compiler may store string
literals in overlapping objects if it wants to.
Stack The stack stores automatic variables. Typically
allocation is much faster than for dynamic
storage (heap or free store) because a memory
allocation involves only pointer increment
rather than more complex management. Objects
are constructed immediately after memory is
allocated and destroyed immediately before
memory is deallocated, so there is no
opportunity for programmers to directly
manipulate allocated but uninitialized stack
space (barring willful tampering using explicit
dtors and placement new).
Free Store The free store is one of the two dynamic memory
areas, allocated/freed by new/delete. Object
lifetime can be less than the time the storage
is allocated; that is, free store objects can
have memory allocated without being immediately
initialized, and can be destroyed without the
memory being immediately deallocated. During
the period when the storage is allocated but
outside the object's lifetime, the storage may
be accessed and manipulated through a void* but
none of the proto-object's nonstatic members or
member functions may be accessed, have their
addresses taken, or be otherwise manipulated.
Heap The heap is the other dynamic memory area,
allocated/freed by malloc/free and their
variants. Note that while the default global
new and delete might be implemented in terms of
malloc and free by a particular compiler, the
heap is not the same as free store and memory
allocated in one area cannot be safely
deallocated in the other. Memory allocated from
the heap can be used for objects of class type
by placement-new construction and explicit
destruction. If so used, the notes about free
store object lifetime apply similarly here.
Global/Static Global or static variables and objects have
their storage allocated at program startup, but
may not be initialized until after the program
has begun executing. For instance, a static
variable in a function is initialized only the
first time program execution passes through its
definition. The order of initialization of
global variables across translation units is not
defined, and special care is needed to manage
dependencies between global objects (including
class statics). As always, uninitialized proto-
objects' storage may be accessed and manipulated
through a void* but no nonstatic members or
member functions may be used or referenced
outside the object's actual lifetime.