Why are local variables not set to zero? - c++

Since global and static variables are initialized to 0 by default, why are local variables not initialized to 0 by default as well?

Because such zero-initializations take execution time. It would make your program significantly slower. Each time you call a function, the program would have to execute pointless overhead code, which sets the variables to zero.
Static variables persist for the whole lifetime of the program, so there you can afford the luxuary to zero-initialize them, because they are only initialized once. While locals are initialized in runtime.
It is not uncommon in realtime systems to enable a compiler option which stops the zero initialization of static storage objects as well. Such an option makes the program non-standard, but also makes it start up faster.

This is because global and static variables live in different memory regions than local variables.
uninitialized static and global variables live in the .bss segment, which is a memory region that is guaranteed to be initialized to zero on program startup, before the program enters `main'
explicitly initialized static and global variables are part of the actual application file, their value is determined at compile-time and loaded into memory together with the application
local variables are dynamically generated at runtime, by growing the stack. If your stack grows over a memory region that holds garbage, then your uninitialized local variables will contain garbage (garbage in, garbage out).

Because that would take time, and it's not always the case that you need them to be zero.
The allocation of local variables (typically on the CPU's hardware stack) is very fast, much less than one instruction per variable and basically independent of the size of the variables.
So any initialization code (which generally would not be independent of the size of the variables) would add a relatively massive amount of overhead, compared to the allocation, and since you cannot be sure that the initialization is needed, it would be very disruptive when optimizing for performance.
Global/static variables are different, they generally live in a segment of the program's binary that is set to 0 by the program loader anyway, so you get that "for free".

Mainly historical. Back when C was being defined, the zero
initialization of static variables was handled automatically by
the OS, and would occur anyway, where as the zero initialization
of local variables would require runtime. The first is still
true today on a lot of systmes (including all Unix and Windows).
The second is far less an issue, however; most compilers would
detect a superfluous initialization in most cases, and skip it,
and in the cases where the compiler couldn't do so, the rest of
the code would be complicated enough that the time required for
the initialization wouldn't be measurable. You can still
construct special cases where this wouldn't be the case, but
they're certainly very rare. However, the original C was
specified like this, and none of the committees have reviewed
the issue since.

Global and static variables are stored at Data Segment [Data in the uninitialised data segment is initialized by the kernel to arithmetic 0 before the program starts executing], while local variables are stored at call stack.

The global or static variables that are initialized by you explicitly will be stored in the .data segment (initialized data) and the uninitialized global or static variables are stored in the .bss (uninitialized data).
This .bss is not stored in the compiled .obj files because there is no data available for these variables (remember you have not initialized them with any certain data).
Now, when the OS loads an exe, it just looks at the size of the .bss segment, allocates that much memory, and zero-initializes it for you (exec). That's why it is necessary to initialize .bss segment to zero.
The local variables are not initialized because there is no such need to initialize them. They are stored at the stack level, so loading an exe will automatically load that amount of memory needed by the local variables. So, why to do extra initialization of local variables and make our program slower.

Suppose you need to call a function 100 times and if there is a local variable and suppose if it were
to initialise to 0 every time...oops there will be extra overhead and wastage of time.
on the other hand global variables are initialised only once.so we can afford its default initialising to 0.

Related

When is static memory allocated in C/C++? At compile time or at the very beginning of when a program is run?

Different sources say different things for me - some StackOverflow answers say that it is allocated at compile time - others say it is "defined" at compile time, and allocated at the very beginning of runtime ("load time" is what some called it), while others say it is allocated at compile time. When is static memory exactly allocated in C/C++? (if it is to do with "defining" variables - can someone tell me what it means to "define" a variable on the memory level - that would be highly appreciated!)
Also, how would you during runtime set a pointer to the start of the allocated static memory?
In typical tools, memory with static storage duration is arranged in multiple steps:
The compiler generates data in object modules (likely passing through some form of assembly code) that describes needs for various kinds of memory: memory initialized to zero, memory initialized to particular values and is read-only thereafter, memory initialized to particular values and may be modified, memory that does not need to be initialized, and possibly others. The compiler also includes initial data as necessary, information about symbols that refer to various places in the required memory, and other information. At this point, the allocation of memory is in forms roughly like “8 bytes are needed in the constant data section, and a symbol called foo should be set to their address.”
The linker combines this information into similar information in an executable file. It also resolves some or all information about symbols. At this point, the allocation of memory is in forms like “The initialized non-constant data section requires 3048 bytes, and here is the initial data for it. When it is assigned a virtual address, the following symbols should be adjusted: bar is at offset 124 from the start of the section, baz is at offset 900…”
The program loader reads this information, allocates locations in the virtual address space for it, and may read some of the data from the executable file into memory or inform the operating system where the data is to be found when it is needed. At this point, the places in the code that refer to various symbols have been modified according to the final values of those symbols.
The operating system allocates physical memory for the virtual addresses. Often, this is done “on demand” in pieces (memory pages) when a process attempts to access the memory in a specific page, rather than being done at the time the program is initially loaded.
All-in-all, static memory is not allocated at any particular time. It is a combination of many activities. The effect on the program is largely that it occurs the same as if it were all allocated when the program started, but the physical memory might only be allocated just before an instruction actually executes. (The physical memory can even be taken away from the process and restored to it later.)
The C standard says only this:
C11 5.1.2p1
[...]All objects with static storage duration shall be initialized (set to their initial values) before program startup. The manner and timing of such initialization are otherwise unspecified.
and
C11 6.2.4p2-3
2 The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address,33) and retains its last-stored value throughout its lifetime.34) If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
3 An object whose identifier is declared without the storage-class specifier _Thread_local, and either with external or internal linkage or with the storage-class specifier static, has static storage duration. Its lifetime is the entire execution of the program and its stored value is initialized only once, prior to program startup.
But... this is further made complicated by the as-if rule, the actual implementation need to do this only as far as observable side effects go.
In fact in Linux for example, one could argue that the variables with static storage duration are initialized and allocated by the compiler and the linker when producing executable file. When the program is run, the dynamic linker (ld.so) then prepares the program segments so that the initialized data is memory-mapped (mmap) from the executable image to RAM, default (zero-initialized) data is mapped from zeroed pages.
While the virtual memory is allocated by the compiler, linker and dynamic linker, the actual writable RAM page frames are allocated only when you write to a variable on a page for the first time...
but you do not need to know about this in basic cases. It is as if the memory for variables with static storage duration were allocated and initialized just before main was entered, even though this is not actually the case.
Static memory is allocated in two steps.
Step 1 is carried out by the linker as it lays out the executable image and says where the static variables live in relative address space.
Step 2 is carried out by the loader when the process memory is actually allocated.
In C++, static objects are initialized before entering main. If you're not careful with your code you can see objects that are still zeros even though they have constructors that would always change that. (The compiler does as much constant evaluation as it can, so toy examples won't show it.)

Where are the addresses of local and other variable types stored?

I was asked this question and I am not quite sure about the answer. I know that the value (content) of local variables is on the stack and those allocated on heap (in C/C++ language). But:
1- Where are the addresses of those local variables stored? How the program knows where on the stack it should look for each of the local variables? Are these references (addresses of each variable) saved on data segment? How about the address of other variable types (global, pointer, ...)
2- Am I right that programs directly (not using pop/push) read/write to different addresses in stack segment when dealing with local variables?
The compiler will track where, relative to the top of the stack, each argument and local variable is located. And if possible, the compiler will use registers for "important" variables (such as loop counters) - it will use statistics of how many times each variable is used to see which ones are "hot" (used a lot) and which are "cold" (not used much).
Note that "addresses of local variables" doesn't always apply. Registers have no (direct) address [except in the TI TMS9900 processor and a few others, where registers and memory have slightly blurred lines].
The compiler will know where each of the things are - it's what compilers do - just like it knows WHICH variable has been stored where in the data section. Exactly how this is done is the subject of a small book. For now, just trust that the compiler does this.
Yes, nearly all processors today allow reads and writes from stack + offset (where offset is typically negative, so further down the stack, as the stack normally grows towards zero).
Although the stack sometimes counts as the "data segment", it's typically its own section of memory on modern machines - and if you have multiple threads, each thread will have its own stack.
First, for the protocol, let's just note that the answers to both questions are subjected to compiler implementation and are not dictated by the language standard (nor C neither C++).
Where are the addresses of those local variables stored?
The symbols (names of functions and variables) are translated into addresses during compilation, i.e., they are not stored anywhere in the memory of the executed program:
Addresses of functions are in the code-segment of the executable image
They are constant throughout the execution of the program
Addresses of static and/or global variables are in the data-segment of the executable image
They are constant throughout the execution of the program
Addresses of non-static local variables are in the stack of the executable image
They may be different each time the function where these variables are declared is invoked
Am I right that programs directly (not using pop/push) read/write to different addresses in stack segment when dealing with local variables?
Depends on your platform (underlying HW architecture + designated compiler).
Re #2: "the stack" is of stack frames, not separate local variables. While individual values are sometimes pushed and popped (e.g., return addresses), local variables are normally created all at once by simply adjusting the stack pointer. That "creates" an area on the stack without storing any particular values there (which is why uninitialized variables might have any value), and then offsets from the stack pointer are used to find the individual variables.
See also a similar CS SE question.

Static and global variable storage clarification

As I was reviewing memory organisation and storage in C/C++ I came upon this:
"Initialized data segment, usually called simply the Data Segment. A data segment is a portion of virtual address space of a program, which contains the global variables and static variables that are initialized by the programmer.
Note that, data segment is not read-only, since the values of the variables can be altered at run time."
(found in http://www.geeksforgeeks.org/memory-layout-of-c-program/ )
I was under the impression that a static and/or global variable remained immutable throughout an application, I thought this was the point of their existence. Can they really be altered at run time?
Can they really be altered at run time?
Yes. Unless you declare them as const, of course.
I was under the impression that a static and/or global variable
remained immutable throughout an application, I thought this was the
point of their existence.
No, you're describing constants. Variables with so-called static storage duration have, how the name implies, a different lifetime. [basic.stc.static]:
All variables which do not have dynamic storage duration, do not have
thread storage duration, and are not local have static storage
duration. The storage for these entities shall last for the duration
of the program (3.6.2, 3.6.3).
Just think about cout, a global stream object that you modify by inserting data into it.
You'll generally find better documentation on a site that more people take an interest in updating, for example - from Wikipedia:
In computing, a data segment (often denoted .data) is a portion of an object file or the corresponding virtual address space of a program that contains initialized static variables, that is, global variables and static local variables. The size of this segment is determined by the size of the values in the program's source code, and does not change at run time.
The data segment is read-write, since the values of variables can be altered at run time. This is in contrast to the read-only data segment (rodata segment or .rodata), which contains static constants rather than variables; it also contrasts to the code segment, also known as the text segment, which is read-only on many architectures. Uninitialized data, both variables and constants, is instead in the BSS segment.
So, it's just a matter of definition:
the data segment holds the read-write variables
the "read only" data segment holds the constants
On some old/hockey systems they might not both with a read only data segment and just lump it all together - the main thing with the read only segment is that it means a few more bugs are reported more dramatically, rather than letting the program corrupt that data and potentially spew bogus results. That's probably why .data is general and sometime later - as OS/compiler writers had time and motivation to care - .rodata ended up being contrasted with it, but .data wasn't renamed to e.g. .rwdata. These names - .data, .rodata, test, BSS etc. were and are often used in assembly languages to denote where variables should be located.
As far as things go... global variables and static variables are similar in that the [possibly virtual] memory address for them - and indeed their total size - can typically be calculated (at least relative to some supporting CPU "segment" register that's left at a convenient value most of the time) at compile time. That's in contrast to automatic (stack) and dynamic (heap) variables, where the memory's transient. Most systems only have control over write-access to memory on a per-page basis (e.g. 4k, 8k), so it's far less practical to keep granting and removing write access to put transient const automatic and heap-based variables into memory that seems read-only to the process, and it's impractical when you consider the race conditions in a threaded application. That's why this whole distinction between read-write and read-only memory's normally discussed in the context of global and static variables.

C++ Global and Scoped integers initial values

A Four bytes memory slot is reserved for every defined integer. Uninitialised variable maintains the old value of that slot. hence, the initial value is somehow randomised.
int x = 5; // definition with initialisation
This fact in most C++ compilers as far as I know holds for scoped variables. But, when it comes to global variables. a value of zero will be set.
int x; // uninitialised definition
Why does the C++ Compiler behave differently regarding to the initial value of the global and scoped variables.
Is it fundamental?
The namespace level variables (which means global) belong to static storage duration, and as per the Standard, all variables with static storage duration are statically initialized, which means all bits are set 0:
§3.6.2/2 from the C++ Standard (n3242) says,
Variables with static storage duration (3.7.1) or thread storage duration (3.7.2) shall be zero-initialized (8.5) before any other initialization takes place.
In case of local variables with automatic storage duration, the Standard imposes no such requirement on the compilers. So the automatic variables are usually left uninitialized for performance reason — almost all major compilers choose this approach, though there might be a compiler which initializes the automatic variables also.
"A Four bytes memory slot is reserved for every defined integer.".
No, it isn't. Disregarding the "4 bytes" size, the main problem with the statement is that modern compilers often find a new location for a variable each time it's assigned to. This can be a register or some place in memory. There's a lot of smartness involved.
An uninitialized variable isn't written to, so in general there's not even a place assigned for it. Trying to read "it" might not produce a value at all; the compiler can fail outright to generate code for that.
Now globals are another matter. Since they can be read and written from anywhere, the compiler can't just find new places for them on each write. They necessarily have to stick to one place, and it can't realistically be a register. Often they're all allocated together in one chunk of memory. Zeroing that chunk can typically be done very efficiently. that's why globals are different.
As you might expect there are efficiency driven reasons behind this behavior as well.
Stack space is generally "allocated" simply by adjusting the stack pointer.
If you have 32 bytes of simple variables in a function then the compiler emits an instruction equivalent to "sp = sp - 32"
Any initialization of those variables would take additional code and execution time - hence they end up being initialized to apparently random values.
Global variables are another beast entirely.
Simple variables are effectively allocated by the program loader and can be located in what is commonly called "BSS". These variables take almost no space at all in the executable file. All of them can be merged together into a single block - so the executable image needs only specify the size of the block. Since the OS must ensure that a new process doesn't get to see any left-over data in memory from some now dead process, the memory needs to be filled with something - and you might as well fill it with zeros.
Global variables that are initialized to non-zeros actually do take up space in the executable file, they appear as a block of data and just get loaded into memory - there is no code in the executable to initialize these.
C++ also allows global variables that require code to be executed to initialize, C doesn't allow this.
For example "int x = rand();"
Get initialized at run time by code in the executable.
Try adding this global variable
int x[1024 * 1024];
and see if it makes a difference to the executable size.
Now try:
int x[1024 * 1024] = {1,2,3};
And see what difference that makes.

when do global variables allocate their memory?

It's been bothering me for a while but I didn't find any good resource about this matter. I have some global variables in my code. It's obvious that they are initialized in some order but is the memmory needed for all those objects reserved before any initialization take place?
here is the simple example of what might go wrong in my code and how I can use the answer:
I had a map<RTTI, object*> objectPool which holds samples of every class in my code, which i used to load objects from a file. To create those samples I use some global variables just to introduce class instance to objectPool. But sometimes those sample instances were initialized before ObjectPool itself. And that generated runtime error.
To fix that error I used some delayed initializer map<RTTI,object*>* lateInitializedObjectPool;. Now every instance first check if the objectPool is initialized and initilize it if not and then intoduce itself to the object pool. It seems to work fine but I'm worried if even the memmory needed for object pool pointer is not reserved before other classes begin to introduce themselves and that may cause access violation.
Variables declared at namespace scope (as opposed to in classes or functions) have the space for the objects themselves (sizeof(ObjectType)) allocated by the executable (or DLL) loader. If the object is a POD that uses aggregate initialization, then it typically gets its values set by having the linker write those values directly into the executable and the exe's loader simply blasting all of those into memory. Objects that don't use aggregate initialization get their values zeroed out initially.
After all of that, if any of these objects have constructors, then those constructors are executed before main is run. Thus, if any of those constructors dynamically allocate memory, that is when they do it. After the executable is loaded, but before main is run.
There's usually separate memory areas for variables that the compiler:
worked out initially contain all 0s - perhaps with a pre-main() constructor running to change their content
predetermined have a specific non-0 value, such that they can be written in a pre-constructed form into the executable image and page faulted in ready for use.
When I say a "separate memory area", I mean some memory the OS executable loader arranges for the process, just as per the stack or heap, but different in that these areas are of fixed pre-determined size. In UNIX, the all-0 memory region mentioned above is commonly known as the "BSS", the non-0 initialised area as "data" - see http://en.wikipedia.org/wiki/Data_segment for details.
C++ has the notion of "static storage duration". This refers to all kinds of variables that will take up a fixed amount of space during the execution of a program. These include not just globals, but also static variables at namespace, class and function level.
Note that the memory allocation in all cases can be done before main, but that the actual initialization differs. Also, some of them are zero-initialized before they're normally initialized. Precisely how all this happens is unspecified: the compiler may add a hidden function call, or the OS just happens to zero the process space anyway, etc.