In C/C++, why are globals and static variables initialized to default values?
Why not leave it with just garbage values? Are there any special
reasons for this?
Security: leaving memory alone would leak information from other processes or the kernel.
Efficiency: the values are useless until initialized to something, and it's more efficient to zero them in a block with unrolled loops. The OS can even zero freelist pages when the system is otherwise idle, rather than when some client or user is waiting for the program to start.
Reproducibility: leaving the values alone would make program behavior non-repeatable, making bugs really hard to find.
Elegance: it's cleaner if programs can start from 0 without having to clutter the code with default initializers.
One might then wonder why the auto storage class does start as garbage. The answer is two-fold:
It doesn't, in a sense. The very first stack frame page at each level (i.e., every new page added to the stack) does receive zero values. The "garbage", or "uninitialized" values that subsequent function instances at the same stack level see are really the previous values left by other method instances of your own program and its library.
There might be a quadratic (or whatever) runtime performance penalty associated with initializing auto (function locals) to anything. A function might not use any or all of a large array, say, on any given call, and it could be invoked thousands or millions of times. The initialization of statics and globals, OTOH, only needs to happen once.
Because with the proper cooperation of the OS, 0 initializing statics and globals can be implemented with no runtime overhead.
Section 6.7.8 Initialization of C99 standard (n1256) answers this question:
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.
Think about it, in the static realm you can't tell always for sure something is indeed initialized, or that main has started. There's also a static init and a dynamic init phase, the static one first right after the dynamic one where order matters.
If you didn't have zeroing out of statics then you would be completely unable to tell in this phase for sure if anything was initialized AT ALL and in short the C++ world would fly apart and basic things like singletons (or any sort of dynamic static init) would simple cease to work.
The answer with the bulletpoints is enthusiastic but a bit silly. Those could all apply to nonstatic allocation but that isn't done (well, sometimes but not usually).
In C, statically-allocated objects without an explicit initializer are initialized to zero (for arithmetic types) or a null pointer (for pointer types). Implementations of C typically represent zero values and null pointer values using a bit pattern consisting solely of zero-valued bits (though this is not required by the C standard). Hence, the bss section typically includes all uninitialized variables declared at file scope (i.e., outside of any function) as well as uninitialized local variables declared with the static keyword.
Source: Wikipedia
Related
The book Object oriented programming in c++ by Robert Lafore says,
A static local variable has the visibility of an automatic local
variable (that is, inside the function containing it). However, its
lifetime is the same as that of a global variable, except that it
doesn’t come into existence until the first call to the function
containing it. Thereafter it remains in existence for the life of the
program
What does coming into existence after first call of function mean? The storage for static local is allocated at the time program is loaded in the memory.
The storage is allocated before main is entered, but (for example) if the static object has a ctor with side effects, those side effects might be delayed until just before the first time the function is called.
Note, however, that this is not necessarily the case. Constant initialization is only required to happen before that block is entered (not necessarily just as execution "crosses" that definition). Likewise, implementations are allowed to initialize other block-scope static variables earlier than required under some circumstances (if you want to get into the gory details of the circumstances, you can look at [basic.start.init] and [stmt.dcl], but it basically comes down to: as long as it doesn't affect the value with which it's initialized. For example, if you had something like:
int i;
std::cin >> i;
{
static int x = i;
...the implementation wouldn't be able to initialize x until the block was entered, because the value with which it was being initialized wouldn't be known until them. On the other hand, if you had:
{
static int i = 0;
...the implementation could carry out the initialization as early as it wished (and most would/will basically carry out such an initialization at compile time, so it won't involve executing any instructions at run-time at all). Even for less trivial cases, however, earlier initialization is allowed when logically possible (e.g., the value isn't coming from previous execution).
In C++ storage duration of an object (when raw memory gets allocated for it) and lifetime of an object are two separate concepts. The author was apparently referring to the latter one when he was talking about object's "coming into existence".
In general case it is not enough to allocate storage for an object to make it "come into existence". Lifetime of an object with non-trivial initialization begins once its initialization is complete. For example, an object of a class with a non-trivial constructor does not officially "live" until its constructor has completed execution.
Initialization of a static local object is performed when the control passes over the declaration for the very first time. Before that the object does not officially exist, even if the memory for it is already allocated.
Note that the author is not painstakingly precise in his description. It is not sufficient to just call the function containing the declaration. The control has to pass through the declaration of the object for it to begin its lifetime. If the function contains branching, this does not necessarily happen during the very first call to the function.
For object with trivial initialization (like int objects), there's no difference between storage duration and lifetime. For such objects allocating memory is all that needs to be done. But in general case allocating memory alone is not sufficient.
It means that the static variable inside a function doesn't get initialized (by the constructor or the assignment operator) until the first call for that function.
As soon as the function, which contains a static local variable, is called the static local variable is initialized.
My csapp book says that if global and static variables are initialized, than they are contained in .data section in ELF relocatable object file.
So my question is that if some foo.c code contains
int a;
int main()
{
a = 3;
}`
and example.c contains,
int b = 3;
int main()
{
...
}
is it only b that considered to be initialized? In other words, does initialization mean declaration and definition in same line?
It means exactly what it says. Initialized static storage duration objects will have their init values set before the main function is called. Not initialized will be zeroed. The second part of the statement is actually implementation dependant, and implementation has the full freedom of the way it will be archived.
When you declare the variable without the keyword extern you always define it as well
Both are considered initialized
They get zero initialized or constant initalized (in short: if the right hand side is a compile time constant expression).
If permitted, Constant initialization takes place first (see Constant
initialization for the list of those situations). In practice,
constant initialization is usually performed at compile time, and
pre-calculated object representations are stored as part of the
program image. If the compiler doesn't do that, it still has to
guarantee that this initialization happens before any dynamic
initialization.
For all other non-local static and thread-local variables, Zero
initialization takes place. In practice, variables that are going to
be zero-initialized are placed in the .bss segment of the program
image, which occupies no space on disk, and is zeroed out by the OS
when loading the program.
To sum up, if the implementation cannot constant initialize it, then it must first zero initialize and then initialize it before any dynamic initialization happends.
In the snippet:
int a;
int main()
{
a = 3;
}
a is not initialized; it is assigned. Assignment is a run-time execution of code. For example, should main be called multiple times (which is not, but any user function could), then a is set to 3 each time the function is called.
You second snippet is initializaion of the globalvariable b and it will be placed in the .data segment.
I will answer this question in general and complete way and not with respect to any programming language
There is a hell lot of confusion between declaration, definition, and initialization. Sometimes they all look similar and sometimes completely different.
Before understanding the differences, It is very important to be aware of two things:
The difference between declaration, definition, and initialization
varies from one programming language to other. Each programming has
its own way of doing these three things.The “thing” which you are
defining, declaring or initializing also affects the difference
between the three of them. That “thing” can be a variable, a class or
a function. All of them have different meanings of definitions,
declaration, and initialization. Once we are aware of the above two
things, most of the doubts get cleared and we stop seeking exact
differences because it’s not there.
In general terms ( irrespective of any language or “thing”)
The declaration means we are saying to a computer that this “thing”
(it can be a variable, a function or a class) exists but we don’t know
where. In the future, we may tell but right now it just exists
somewhere. In simple words, we don’t allocate memory while declaring.
We can declare that “thing” many times.
The definition means we are saying to the computer that this “thing” needs memory and it needs to be located somewhere. In simple
words, defining means we have allocated memory for it. We can define
something only once
The initialization means whatever our “thing “ is, we are giving it an initial value. That “thing” must be in some memory location and
if we keep that location empty, it may be a house for bugs and errors.
Initialization is not always necessary but it’s important.
Many people assume that declaration + definition = Initialization .
It's not wrong, but it’s not correct in all places. Its correct only for variables that too in a language like C ++ or maybe C.
In python, there is no concept of the declaration . We don’t need to declare anything in it.
The general meaning of the three is valid everywhere but the way that is performed varies from language to language and the “thing”.
Hope it helps :)
Variables with static storage duration that are initialized to zero end up in .bss.
Variables with static storage duration that are initialized with a non-zero value end up in .data.
NOTE: the C standard guarantees that if the programmer doesn't explicitly initialize a variable with static storage duration, such as static int a;, it is then initialized to zero implicitly1). Therefore a ends up in .bss.
Examples here.
1) C11 6.7.9
If an object that has static or thread storage duration is not initialized
explicitly, then:
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
In C++ I know static and global objects are constructed before the main function. But as you know, in C, there is no such kind initialization procedure before main.
For example, in my code:
int global_int1 = 5;
int global_int2;
static int static_int1 = 4;
static int static_int2;
When are these four variables initialized?
Where values for initialization like 5 and 4 are stored during compilation? How to manage them when initialization?
EDIT:
Clarification of 2nd question.
In my code I use 5 to initialize global_int1, so how can the compiler assign 5 to global_int? For example, maybe the compiler first store the 5 value at somewhere (i.e. a table), and get this value when initialization begins.
As to "How to manage them when initialization?", it is realy vague and I myself does not how to interpret yet. Sometimes, it is not easy to explain a question. Overlook it since I have not mastered the question fully yet.
By static and global objects, I presume you mean objects with
static lifetime defined at namespace scope. When such objects
are defined with local scope, the rules are slightly different.
Formally, C++ initializes such variables in three phases:
1. Zero initialization
2. Static initialization
3. Dynamic initialization
The language also distinguishes between variables which require
dynamic initialization, and those which require static
initialization: all static objects (objects with static
lifetime) are first zero initialized, then objects with static
initialization are initialized, and then dynamic initialization
occurs.
As a simple first approximation, dynamic initialization means
that some code must be executed; typically, static
initialization doesn't. Thus:
extern int f();
int g1 = 42; // static initialization
int g2 = f(); // dynamic initialization
Another approximization would be that static initialization is
what C supports (for variables with static lifetime), dynamic
everything else.
How the compiler does this depends, of course, on the
initialization, but on disk based systems, where the executable
is loaded into memory from disk, the values for static
initialization are part of the image on disk, and loaded
directly by the system from the disk. On a classical Unix
system, global variables would be divided into three "segments":
text:
The code, loaded into a write protected area. Static
variables with `const` types would also be placed here.
data:
Static variables with static initializers.
bss:
Static variables with no-initializer (C and C++) or with dynamic
initialization (C++). The executable contains no image for this
segment, and the system simply sets it all to `0` before
starting your code.
I suspect that a lot of modern systems still use something
similar.
EDIT:
One additional remark: the above refers to C++03. For existing
programs, C++11 probably doesn't change anything, but it does
add constexpr (which means that some user defined functions
can still be static initialization) and thread local variables,
which opens up a whole new can of worms.
Preface: The word "static" has a vast number of different meanings in C++. Don't get confused.
All your objects have static storage duration. That is because they are neither automatic nor dynamic. (Nor thread-local, though thread-local is a bit like static.)
In C++, Static objects are initialized in two phases: static initialization, and dynamic initialization.
Dynamic initialization requires actual code to execute, so this happens for objects that start with a constructor call, or where the initializer is an expression that can only be evaluated at runtime.
Static initialization is when the initializer is known statically and no constructor needs to run. (Static initialization is either zero-initialization or constant-initialization.) This is the case for your int variables with constant initializer, and you are guaranteed that those are indeed initialized in the static phase.
(Static-storage variables with dynamic initialization are also zero-initialzed statically before anything else happens.)
The crucial point is that the static initialization phase doens't "run" at all. The data is there right from the start. That means that there is no "ordering" or any other such dynamic property that concerns static initialization. The initial values are hard-coded into your program binary, if you will.
When are these four variables initialized?
As you say, this happens before program startup, i.e. before main begins. C does not specify it further; in C++, these happen during the static initialisation phase before objects with more complicated constructors or initialisers.
Where values for initialization like 5 and 4 are stored during compilation?
Typically, the non-zero values are stored in a data segment in the program file, while the zero values are in a bss segment which just reserves enough memory for the variables. When the program starts, the data segment is loaded into memory and the bss segment is set to zero. (Of course, the language standard doesn't specify this, so a compiler could do something else, like generate code to initialise each variables before running main).
Paraphrased from the standard:
All variables which do not have dynamic storage duration, do not have thread local storage duration, and are not local, have static storage duration. In other words, all globals have static storage duration.
Static objects with dynamic initialization are not necessarily created before the first statement in the main function. It is implementation defined as to whether these objects are created before the first statement in main, or before the first use of any function or variable defined in the same translation unit as the static variable to be initialized.
So, in your code, global_int1 and static_int1 are definitely initialized before the first statement in main because they are statically initialized. However, global_int2 and static_int2 are dynamically initialized, so their initialization is implementation defined according to the rule I mentioned above.
As for your second point, I'm not sure I understand what you mean. Could you clarify?
I am sorry to ask this trivial question but I could not find a definitive answer: if I have explicit static initialization to zero, is it zero initialization or initialization with constant expression? Say if I have
a.hpp:
class A { ... static int x; }
a.cpp;
int A::x = 0;
How many times will 0 be assigned to x? Once during zero initialization or twice during both zero initialization and initialization with constant expression?
The value of the variable will be 0 before any of your code is executed.
How it gets that way depends largely on the system; one typical approach
is to read an image of the date from disk, when loading the program.
Formally, you have zero initialization, followed by static
initialization, but there's no way a conforming implementation can tell,
and I've never heard of an implementation that separates the two.
Under Unix, at least in its older and more traditional versions,
uninitialized static variables were placed in the bs segment,
statically initialized variables in the data segment. The executable
file on the disk contained an image of the data segment, which was
copied into memory; all of the bytes in the bs segment were set to 0.
On a modern machine, with paged virtual memory, I would expect similar
behavior, with the difference that the initialization be deferred until
the page was first accessed.
I would be very surprised if Windows handled this differently (except for the names of the segments).
I've been reading up on C++ on the Internet, and here's one thing that I haven't been quite able to find an answer to.
I know that static variables used within functions are akin to globals, and that subsequent invocations of that function will have the static variable retain its value between calls.
However, if the function is never called, does the static variable get allocated?
Thanks
If the function is never called, it is likely that your linker will deadstrip both the function and the static variable, preventing it from entering .rodata, .data, or .bss segments (or your executable file format's equivalents).
However, there are various reasons why a linker might not deadstrip (flags telling it not to, an inability to determine what depends on the symbol, etc).
It's worth checking your linker map file (sometimes just a text file!), or using objdump, nm, or dumpbin utilities on the final executable to see if the symbol or related symbols (such as static initializer code) survived.
The C++ Standard, section 6.7 says:
The zero-initialization (8.5) of all
local objects with static storage
duration (3.7.1) is performed before
any other initialization takes place.
A local object of POD type (3.9) with
static storage duration initialized
with constant-expressions is
initialized before its block is first
entered. An implementation is
permitted to per- form early
initialization of other local objects
with static storage duration under the
same conditions that an implementation
is permitted to statically initialize
an object with static storage duration
in namespace scope (3.6.2). Otherwise
such an object is initialized the
first time control passes through its
declaration; such an object is
considered initialized upon the
completion of its initialization.
Which indicates that local static objects are normally initialised the first time the control flow encounters them. However, they may well be allocated before this - the standard is somewhat reticent on what static storage actually is, except with reference to static object lifetimes.
Every object in C++ has two nested time-periods associated with it: storage duration and lifetime. Storage duration is the period for which the raw memory occupied by the object is allocated. Lifetime is the period between construction and destruction of an actual object in that memory. (For objects of POD-types construction-destruction either doesn't matter or not applicable, so their lifetime matches their storage duration).
When someone says "allocated" they usually refer to storage duration. The language doesn't actually specify exactly when the object's storage duration begins. It is sufficient to require that shall begin at some point before the object's lifetime begins.
For this reason, in general case a static object defined inside a function might never begin its lifetime and, theoretically, it's storage duration does not have to begin either. So, in theory, in might not even get "allocated".
In practice though, all objects with static storage duration ("globals", local statics, etc.) are normally treated equally: they are assigned a specific amount of storage early, at the program's startup.
As an additional note, if a local object with static storage duration requires a non-trivial initialization, this initialization is carried out when the control passes over the definition for the very first time. So in this example
void foo() {
static int *p = new int[100];
}
the dynamic array will never be allocated if the function is never called. And it will be allocated only once if the function is called. This doesn't look like what you are asking about, but I mention this just in case.
Im sure that thats going to be up to the implementation. What MSVC does is - static objects are allocated in the automatic data segment of the EXE or DLL. However, the constructor is only executed the first time the function containing the static is executed.
Yes, actual allocation is compiler dependent, although I think that every compiler just reserves the space in the .static segment of the executable (or the equivalent in its executable file format).
The initialization, however takes place only the firs time that the execution flow encounters the static object, and that is required by the standard.
Beware that initialization of global static objects works in a different way, though.
You can get very good answers to almost every question at the C++ FAQ lite site.
I am also fond of Scott Meyers's "Effective C++".
Depends. If you mean, never called, as in, the function is literally never invoked, then your compiler will probably not allocate it, or even put in the function code. If, however, you made it dependent on, say, user input, and that user input just happened to never come up, then it will probably be pre-allocated. However, you're treading in a minefield here, and it's best just to assume that it is always created by the time control enters the function(s) that refer to it.
Static variables defined on classes (members) or functions are not allocated dynamically on stack during function call, like non static ones. They are allocated in another area of generated code reserved for global and static data. So, if you call the function or not, instantiate classes that contain static members or not, a space to their data will be reserved on program data area anyway.