static initialization order fiasco

static initialization order fiasco - c++

I was reading about SIOF from a book and it gave an example :
//file1.cpp
extern int y;
int x=y+1;
//file2.cpp
extern int x;
int y=x+1;
Now My question is :
In above code, will following things happen ?
while compiling file1.cpp, compiler leaves y as it is i.e doesn't allocate storage for it.
compiler allocates storage for x, but doesn't initialize it.
While compiling file2.cpp, compiler leaves x as it is i.e doesn't allocate storage for it.
compiler allocates storage for y, but doesn't initialize it.
While linking file1.o and file2.o, now let file2.o is initialized first, so now:
Does x gets initial value of 0? or doesn't get initialized?

The initialization steps are given in 3.6.2 "Initialization of non-local objects" of the C++ standard:
Step 1: x and y are zero-initialized before any other initialization takes place.
Step 2: x or y is dynamically initialized - which one is unspecified by the standard. That variable will get the value 1 since the other variable will have been zero-initialized.
Step 3: the other variable will be dynamically initialized, getting the value 2.

SIOF is very much a runtime artifact, the compiler and linker don't have much to do with it. Consider the atexit() function, it registers functions to be called at program exit. Many CRT implementations have something similar for program initialization, let's call it atinit().
Initializing these global variables requires executing code, the value cannot be determined by the compiler. So the compiler generates snippets of machine code that execute the expression and assigns the value. These snippets need to be executed before main() runs.
That's where atinit() comes into play. A common CRT implementation walks a list of atinit function pointers and execute the initialization snippets, in order. The problem is the order in which the functions are registered in the atinit() list. While atexit() has a well defined LIFO order, and it is implicitly determined by the order in which the code calls atexit(), such is not the case for atinit functions. The language specification doesn't require an order, there is nothing you could do in your code to specify an order. SIOF is the result.
One possible implementation is the compiler emitting function pointers in a separate section. The linker merges them, producing the atinit list. If your compiler does that then the initialization order will be determined by the order in which you link the object files. Look at the map file, you should see the atinit section if your compiler does this. It won't be called atinit, but some kind of name with "init" is likely. Taking a look at the CRT source code that calls main() should give insight as well.

It is compiler dependent and may be runtime dependent. A compiler may decide to lazily initialize static variables when the first variable in a file is accessed, or as each variable is accessed. Otherwise it will initialize all static variables by file at launch time, with the order usually depending on the link order of files. The file order could change based on dependencies or other, compiler dependent influences.
Static variables are usually initialized to zero unless they have a constant initializer. Again, this is compiler dependent. So one of these variables will probably be zero when the other is initialized. However, since both have initializers some compilers might leave the values undefined.
I think the most likely scenario would be:
Space is allocated for the variables, and both have the value 0.
One variable, say x, is initialized and set to the value 1.
The other, say y, is initialized and set to the value 2.
You could always run it and see. It could be that some compilers would generate code that goes into an infinite loop.

The whole point (and the reason it's called a "fiasco") is that it's impossible to say with any certainty what will happen in a case like this. Essentially, you're asking for something impossible (that two variables each be one greater than the other). Since they can't do that, what they will do is open to some question -- they might produce 0/1, or 1/0, or 1/2, or 2/1, or possibly (best case) just an error message.

Related

What does 'initialization' exactly mean?

My csapp book says that if global and static variables are initialized, than they are contained in .data section in ELF relocatable object file.
So my question is that if some foo.c code contains
int a;
int main()
{
a = 3;
}`
and example.c contains,
int b = 3;
int main()
{
...
}
is it only b that considered to be initialized? In other words, does initialization mean declaration and definition in same line?

It means exactly what it says. Initialized static storage duration objects will have their init values set before the main function is called. Not initialized will be zeroed. The second part of the statement is actually implementation dependant, and implementation has the full freedom of the way it will be archived.
When you declare the variable without the keyword extern you always define it as well

Both are considered initialized
They get zero initialized or constant initalized (in short: if the right hand side is a compile time constant expression).
If permitted, Constant initialization takes place first (see Constant
initialization for the list of those situations). In practice,
constant initialization is usually performed at compile time, and
pre-calculated object representations are stored as part of the
program image. If the compiler doesn't do that, it still has to
guarantee that this initialization happens before any dynamic
initialization.
For all other non-local static and thread-local variables, Zero
initialization takes place. In practice, variables that are going to
be zero-initialized are placed in the .bss segment of the program
image, which occupies no space on disk, and is zeroed out by the OS
when loading the program.
To sum up, if the implementation cannot constant initialize it, then it must first zero initialize and then initialize it before any dynamic initialization happends.

In the snippet:
int a;
int main()
{
a = 3;
}
a is not initialized; it is assigned. Assignment is a run-time execution of code. For example, should main be called multiple times (which is not, but any user function could), then a is set to 3 each time the function is called.
You second snippet is initializaion of the globalvariable b and it will be placed in the .data segment.

I will answer this question in general and complete way and not with respect to any programming language
There is a hell lot of confusion between declaration, definition, and initialization. Sometimes they all look similar and sometimes completely different.
Before understanding the differences, It is very important to be aware of two things:
The difference between declaration, definition, and initialization
varies from one programming language to other. Each programming has
its own way of doing these three things.The “thing” which you are
defining, declaring or initializing also affects the difference
between the three of them. That “thing” can be a variable, a class or
a function. All of them have different meanings of definitions,
declaration, and initialization. Once we are aware of the above two
things, most of the doubts get cleared and we stop seeking exact
differences because it’s not there.
In general terms ( irrespective of any language or “thing”)
The declaration means we are saying to a computer that this “thing”
(it can be a variable, a function or a class) exists but we don’t know
where. In the future, we may tell but right now it just exists
somewhere. In simple words, we don’t allocate memory while declaring.
We can declare that “thing” many times.
The definition means we are saying to the computer that this “thing” needs memory and it needs to be located somewhere. In simple
words, defining means we have allocated memory for it. We can define
something only once
The initialization means whatever our “thing “ is, we are giving it an initial value. That “thing” must be in some memory location and
if we keep that location empty, it may be a house for bugs and errors.
Initialization is not always necessary but it’s important.
Many people assume that declaration + definition = Initialization .
It's not wrong, but it’s not correct in all places. Its correct only for variables that too in a language like C ++ or maybe C.
In python, there is no concept of the declaration . We don’t need to declare anything in it.
The general meaning of the three is valid everywhere but the way that is performed varies from language to language and the “thing”.
Hope it helps :)

Variables with static storage duration that are initialized to zero end up in .bss.
Variables with static storage duration that are initialized with a non-zero value end up in .data.
NOTE: the C standard guarantees that if the programmer doesn't explicitly initialize a variable with static storage duration, such as static int a;, it is then initialized to zero implicitly1). Therefore a ends up in .bss.
Examples here.
1) C11 6.7.9
If an object that has static or thread storage duration is not initialized
explicitly, then:
if it has arithmetic type, it is initialized to (positive or unsigned) zero;

Initialize static variable with static variable got different result in c++ [duplicate]

I was reading about SIOF from a book and it gave an example :
//file1.cpp
extern int y;
int x=y+1;
//file2.cpp
extern int x;
int y=x+1;
Now My question is :
In above code, will following things happen ?
while compiling file1.cpp, compiler leaves y as it is i.e doesn't allocate storage for it.
compiler allocates storage for x, but doesn't initialize it.
While compiling file2.cpp, compiler leaves x as it is i.e doesn't allocate storage for it.
compiler allocates storage for y, but doesn't initialize it.
While linking file1.o and file2.o, now let file2.o is initialized first, so now:
Does x gets initial value of 0? or doesn't get initialized?

The initialization steps are given in 3.6.2 "Initialization of non-local objects" of the C++ standard:
Step 1: x and y are zero-initialized before any other initialization takes place.
Step 2: x or y is dynamically initialized - which one is unspecified by the standard. That variable will get the value 1 since the other variable will have been zero-initialized.
Step 3: the other variable will be dynamically initialized, getting the value 2.

SIOF is very much a runtime artifact, the compiler and linker don't have much to do with it. Consider the atexit() function, it registers functions to be called at program exit. Many CRT implementations have something similar for program initialization, let's call it atinit().
Initializing these global variables requires executing code, the value cannot be determined by the compiler. So the compiler generates snippets of machine code that execute the expression and assigns the value. These snippets need to be executed before main() runs.
That's where atinit() comes into play. A common CRT implementation walks a list of atinit function pointers and execute the initialization snippets, in order. The problem is the order in which the functions are registered in the atinit() list. While atexit() has a well defined LIFO order, and it is implicitly determined by the order in which the code calls atexit(), such is not the case for atinit functions. The language specification doesn't require an order, there is nothing you could do in your code to specify an order. SIOF is the result.
One possible implementation is the compiler emitting function pointers in a separate section. The linker merges them, producing the atinit list. If your compiler does that then the initialization order will be determined by the order in which you link the object files. Look at the map file, you should see the atinit section if your compiler does this. It won't be called atinit, but some kind of name with "init" is likely. Taking a look at the CRT source code that calls main() should give insight as well.

It is compiler dependent and may be runtime dependent. A compiler may decide to lazily initialize static variables when the first variable in a file is accessed, or as each variable is accessed. Otherwise it will initialize all static variables by file at launch time, with the order usually depending on the link order of files. The file order could change based on dependencies or other, compiler dependent influences.
Static variables are usually initialized to zero unless they have a constant initializer. Again, this is compiler dependent. So one of these variables will probably be zero when the other is initialized. However, since both have initializers some compilers might leave the values undefined.
I think the most likely scenario would be:
Space is allocated for the variables, and both have the value 0.
One variable, say x, is initialized and set to the value 1.
The other, say y, is initialized and set to the value 2.
You could always run it and see. It could be that some compilers would generate code that goes into an infinite loop.

The whole point (and the reason it's called a "fiasco") is that it's impossible to say with any certainty what will happen in a case like this. Essentially, you're asking for something impossible (that two variables each be one greater than the other). Since they can't do that, what they will do is open to some question -- they might produce 0/1, or 1/0, or 1/2, or 2/1, or possibly (best case) just an error message.

Why is extern const initialized incorrecly? [duplicate]

I was reading about SIOF from a book and it gave an example :
//file1.cpp
extern int y;
int x=y+1;
//file2.cpp
extern int x;
int y=x+1;
Now My question is :
In above code, will following things happen ?
while compiling file1.cpp, compiler leaves y as it is i.e doesn't allocate storage for it.
compiler allocates storage for x, but doesn't initialize it.
While compiling file2.cpp, compiler leaves x as it is i.e doesn't allocate storage for it.
compiler allocates storage for y, but doesn't initialize it.
While linking file1.o and file2.o, now let file2.o is initialized first, so now:
Does x gets initial value of 0? or doesn't get initialized?

The initialization steps are given in 3.6.2 "Initialization of non-local objects" of the C++ standard:
Step 1: x and y are zero-initialized before any other initialization takes place.
Step 2: x or y is dynamically initialized - which one is unspecified by the standard. That variable will get the value 1 since the other variable will have been zero-initialized.
Step 3: the other variable will be dynamically initialized, getting the value 2.

SIOF is very much a runtime artifact, the compiler and linker don't have much to do with it. Consider the atexit() function, it registers functions to be called at program exit. Many CRT implementations have something similar for program initialization, let's call it atinit().
Initializing these global variables requires executing code, the value cannot be determined by the compiler. So the compiler generates snippets of machine code that execute the expression and assigns the value. These snippets need to be executed before main() runs.
That's where atinit() comes into play. A common CRT implementation walks a list of atinit function pointers and execute the initialization snippets, in order. The problem is the order in which the functions are registered in the atinit() list. While atexit() has a well defined LIFO order, and it is implicitly determined by the order in which the code calls atexit(), such is not the case for atinit functions. The language specification doesn't require an order, there is nothing you could do in your code to specify an order. SIOF is the result.
One possible implementation is the compiler emitting function pointers in a separate section. The linker merges them, producing the atinit list. If your compiler does that then the initialization order will be determined by the order in which you link the object files. Look at the map file, you should see the atinit section if your compiler does this. It won't be called atinit, but some kind of name with "init" is likely. Taking a look at the CRT source code that calls main() should give insight as well.

It is compiler dependent and may be runtime dependent. A compiler may decide to lazily initialize static variables when the first variable in a file is accessed, or as each variable is accessed. Otherwise it will initialize all static variables by file at launch time, with the order usually depending on the link order of files. The file order could change based on dependencies or other, compiler dependent influences.
Static variables are usually initialized to zero unless they have a constant initializer. Again, this is compiler dependent. So one of these variables will probably be zero when the other is initialized. However, since both have initializers some compilers might leave the values undefined.
I think the most likely scenario would be:
Space is allocated for the variables, and both have the value 0.
One variable, say x, is initialized and set to the value 1.
The other, say y, is initialized and set to the value 2.
You could always run it and see. It could be that some compilers would generate code that goes into an infinite loop.

The whole point (and the reason it's called a "fiasco") is that it's impossible to say with any certainty what will happen in a case like this. Essentially, you're asking for something impossible (that two variables each be one greater than the other). Since they can't do that, what they will do is open to some question -- they might produce 0/1, or 1/0, or 1/2, or 2/1, or possibly (best case) just an error message.

Global objects are inherently unsafe?

I know that the order of initialization of static variables defined in different translation units (e.g. different cpp/lib/dll/so files) is undefined. Does it mean that the behavior of following program is not well defined?
#include <vector>
std::vector<int> v;
int main()
{
v.push_back(1);
}
EDIT: Here I used STL vector as an example. But it could be an object of any other "3rd party" class. As such we wouldn't know if that object initialized via some other global variable. This means that in C++ it not safe to create even a single global object with nontrivial constructor. Right?

No, because when you use v in main, it is perfectly defined. The static initialization phase takes place before you use v in main ...
The problem arise if you use 2 globals in different translation units and there is a dependency between the two. See this C++ FAQ lite for an explanation. The next items in the FAQ explains how to avoid the 'fiasco'.
The problem of static initialization made globals worse in C++ than in any other language. Good library writers know the problem and avoid the static order initialization fiasco. And even if not, if the library is well spread, someone will hit the problem and, I hope, fix it. But 3rd party libs are not always well written, they can be libraries written in your company by an ignorant new to C++ programmer ...
So, yes, it is unsafe, you're right. And in C++ avoid globals even more than in other languages !
Note: Columbo as pointed out that the standard does not not exactly say that v is defined before entering main (see his answer). No practical difference in your instance.

It's specified in [basic.start.init]/4:
It is implementation-defined whether the dynamic initialization of a
non-local variable with static storage duration is done before the
first statement of main. If the initialization is deferred to some
point in time after the first statement of main, it shall occur before
the first odr-use (3.2) of any function or variable defined in the
same translation unit as the variable to be initialized.
It is therefore defined that v is initialized before its first use in any function of this translation unit, including main. That implies that in this particular program v is initialized before the first statement of main.
The static initialization order fiasco occurs when multiple variables in different translation units depend on their relative order of initialization; The initializations may be indeterminately sequenced with respect to each other, depending on their initialization.

Since there's only one global object being defined, there can be only one ordering of its initialization, and therefore there is no issue.

Why are global and static variables initialized to their default values?

In C/C++, why are globals and static variables initialized to default values?
Why not leave it with just garbage values? Are there any special
reasons for this?

Security: leaving memory alone would leak information from other processes or the kernel.
Efficiency: the values are useless until initialized to something, and it's more efficient to zero them in a block with unrolled loops. The OS can even zero freelist pages when the system is otherwise idle, rather than when some client or user is waiting for the program to start.
Reproducibility: leaving the values alone would make program behavior non-repeatable, making bugs really hard to find.
Elegance: it's cleaner if programs can start from 0 without having to clutter the code with default initializers.
One might then wonder why the auto storage class does start as garbage. The answer is two-fold:
It doesn't, in a sense. The very first stack frame page at each level (i.e., every new page added to the stack) does receive zero values. The "garbage", or "uninitialized" values that subsequent function instances at the same stack level see are really the previous values left by other method instances of your own program and its library.
There might be a quadratic (or whatever) runtime performance penalty associated with initializing auto (function locals) to anything. A function might not use any or all of a large array, say, on any given call, and it could be invoked thousands or millions of times. The initialization of statics and globals, OTOH, only needs to happen once.

Because with the proper cooperation of the OS, 0 initializing statics and globals can be implemented with no runtime overhead.

Section 6.7.8 Initialization of C99 standard (n1256) answers this question:
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.

Think about it, in the static realm you can't tell always for sure something is indeed initialized, or that main has started. There's also a static init and a dynamic init phase, the static one first right after the dynamic one where order matters.
If you didn't have zeroing out of statics then you would be completely unable to tell in this phase for sure if anything was initialized AT ALL and in short the C++ world would fly apart and basic things like singletons (or any sort of dynamic static init) would simple cease to work.
The answer with the bulletpoints is enthusiastic but a bit silly. Those could all apply to nonstatic allocation but that isn't done (well, sometimes but not usually).

In C, statically-allocated objects without an explicit initializer are initialized to zero (for arithmetic types) or a null pointer (for pointer types). Implementations of C typically represent zero values and null pointer values using a bit pattern consisting solely of zero-valued bits (though this is not required by the C standard). Hence, the bss section typically includes all uninitialized variables declared at file scope (i.e., outside of any function) as well as uninitialized local variables declared with the static keyword.
Source: Wikipedia

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js