I read several posts on C++ initialization from Google, some of which direct me here on StackOverflow. The concepts I picked from those posts are as follows:
The order of initialization of C++ is:
Zero Initialization;
Static Initialization;
Dynamic Initialization.
Static objects (variables included) are first Zero-initialized, and then Static-initialized.
I have several inquiries as to the initialization issue (storage class issue may be related as well):
Global objects (defined without static keyword) are also static objects, right?
Global objects are also initialized like static objects by two steps like above, right?
What is the Static Initialization? Does it refer to initializing static objects (defined with static keyword)?
I also read that objects defined within block (i.e. in a function) with static keyword is initialized when the execution thread first enters the block! This means that local static objects are not initialized before main function execution. This means they are not initialized as the two steps mentioned above, right?
Dynamic initialization refers to initialization of objects created by new operator, right? It might refer to initialization like myClass obj = myClass(100); or myClass obj = foo();
I have too many inquiries on the initialization and storage class specifier issues. I read the C++2003 Standard document, but cannot find a clear logic since they are scattered throughout the document.
I hope you give me an answer that logically explains the whole map of storage class specifier and initialization. Any reference is welcome!
Code that might explain my question:
class myClass{
public:
int i;
myClass(int j = 10): j(i){}
// other declarations
};
myClass obj1;//global scope
static myClass obj2(2);//file scope
{ //local scope
myClass obj3(3);
static myClass obj4(4);
}
EDIT:
If you think my question is rather tedious, you can help explain your ideas based on the code above.
I read several posts on C++ initialization from Google, some of which direct me here on StackOverflow. The concepts I picked from those posts are as follows:
The order of initialization of C++ is:
Zero Initialization;
Static Initialization;
Dynamic Initialization.
Yes, indeed there are 3 phases (in the Standard). Let us clarify them before continuing:
Zero Initialization: the memory is filled with 0s at the byte level.
Constant Initialization: a pre-computed (compile-time) byte pattern is copied at the memory location of the object
Static Initialization: Zero Initialization followed by Constant Initialization
Dynamic Initialization: a function is executed to initialize the memory
A simple example:
int const i = 5; // constant initialization
int const j = foo(); // dynamic initialization
Static objects (variables included) are first Zero-initialized, and then Static-initialized.
Yes and no.
The Standard mandates that the objects be first zero-initialized and then they are:
constant initialized if possible
dynamically initialized otherwise (the compiler could not compute the memory content at compile-time)
Note: in case of constant initialization, the compiler might omit to first zero-initialized memory following the as-if rule.
I have several inquiries as to the initialization issue (storage class issue may be related as well):
Global objects (defined without static keyword) are also static objects, right?
Yes, at file scope the static object is just about the visibility of the symbol. A global object can be referred to, by name, from another source file whilst a static object name is completely local to the current source file.
The confusion stems from the reuse of the world static in many different situations :(
Global objects are also initialized like static objects by two steps like above, right?
Yes, as are local static objects in fact.
What is the Static Initialization? Does it refer to initializing static objects (defined with static keyword)?
No, as explained above it refers to initializing objects without executing a user-defined function but instead copying a pre-computed byte pattern over the object's memory. Note that in the case of objects that will later be dynamically initialized, this is just zero-ing the memory.
I also read that objects defined within block (i.e. in a function) with static keyword is initialized when the execution thread first enters the block! This means that local static objects are not initialized before main function execution. This means they are not initialized as the two steps mentioned above, right?
They are initialized with the two steps process, though indeed only the first time execution pass through their definition. So the process is the same but the timing is subtly different.
In practice though, if their initialization is static (ie, the memory pattern is a compile-time pattern) and their address is not taken they might be optimized away.
Note that in case of dynamic initialization, if their initialization fails (an exception is thrown by the function supposed to initialize them) it will be re-attempted the next time flow-control passes through their definition.
Dynamic initialization refers to initialization of objects created by new operator, right? It might refer to initialization like myClass obj = myClass(100); or myClass obj = foo();
Not at all, it refers to initialization requiring the execution of a user defined function (note: std::string has a user-defined constructor as far as the C++ language is concerned).
EDIT: My thanks to Zach who pointed to me I erroneously called Static Initialization what the C++11 Standard calls Constant Initialization; this error should now be fixed.
I believe there are three different concepts: initializing the variable, the location of the variable in memory, the time the variable is initialized.
First: Initialization
When a variable is allocated in memory, typical processors leave the memory untouched, so the variable will have the same value that somebody else stored earlier. For security, some compilers add the extra code to initialize all variables they allocate to zero. I think this is what you mean by "Zero Initialization". It happens when you say:
int i; // not all compilers set this to zero
However if you say to the compiler:
int i = 10;
then the compiler instructs the processor to put 10 in the memory rather than leaving it with old values or setting it to zero. I think this is what you mean by "Static Initialization".
Finally, you could say this:
int i;
...
...
i = 11;
then the processor "zero initializes" (or leaves the old value) when executing int i; then when it reaches the line i = 11 it "dynamically initializes" the variable to 11 (which can happen very long after the first initialization.
Second: Location of the variable
There are: stack-based variables (sometimes called static variables), and memory-heap variables (sometimes called dynamic variables).
Variables can be created in the stack segment using this:
int i;
or the memory heap like this:
int *i = new int;
The difference is that the stack segment variable is lost after exiting the function call, while memory-heap variables are left until you say delete i;. You can read an Assembly-language book to understand the difference better.
Third: The time the variable is initialized
A stack-segment variable is "zero-initialized" or statically-initialized" when you enter the function call they are defined within.
A memory-heap variable is "zero-initialized" or statically-initialized" when it is first created by the new operator.
Final Remark
You can think about static int i; as a global variable with a scope limited to the function it is defined in. I think the confusion about static int i; comes because static hear mean another thing (it is not destroyed when you exit the routine, so it retains its value). I am not sure, but I think the trick used for static int i; is to put it in the stack of main() which means it is not destroyed until you exit the whole program (so it retains the first initialization), or it could be that it is stored in the data segment of the application.
Related
The book Object oriented programming in c++ by Robert Lafore says,
A static local variable has the visibility of an automatic local
variable (that is, inside the function containing it). However, its
lifetime is the same as that of a global variable, except that it
doesn’t come into existence until the first call to the function
containing it. Thereafter it remains in existence for the life of the
program
What does coming into existence after first call of function mean? The storage for static local is allocated at the time program is loaded in the memory.
The storage is allocated before main is entered, but (for example) if the static object has a ctor with side effects, those side effects might be delayed until just before the first time the function is called.
Note, however, that this is not necessarily the case. Constant initialization is only required to happen before that block is entered (not necessarily just as execution "crosses" that definition). Likewise, implementations are allowed to initialize other block-scope static variables earlier than required under some circumstances (if you want to get into the gory details of the circumstances, you can look at [basic.start.init] and [stmt.dcl], but it basically comes down to: as long as it doesn't affect the value with which it's initialized. For example, if you had something like:
int i;
std::cin >> i;
{
static int x = i;
...the implementation wouldn't be able to initialize x until the block was entered, because the value with which it was being initialized wouldn't be known until them. On the other hand, if you had:
{
static int i = 0;
...the implementation could carry out the initialization as early as it wished (and most would/will basically carry out such an initialization at compile time, so it won't involve executing any instructions at run-time at all). Even for less trivial cases, however, earlier initialization is allowed when logically possible (e.g., the value isn't coming from previous execution).
In C++ storage duration of an object (when raw memory gets allocated for it) and lifetime of an object are two separate concepts. The author was apparently referring to the latter one when he was talking about object's "coming into existence".
In general case it is not enough to allocate storage for an object to make it "come into existence". Lifetime of an object with non-trivial initialization begins once its initialization is complete. For example, an object of a class with a non-trivial constructor does not officially "live" until its constructor has completed execution.
Initialization of a static local object is performed when the control passes over the declaration for the very first time. Before that the object does not officially exist, even if the memory for it is already allocated.
Note that the author is not painstakingly precise in his description. It is not sufficient to just call the function containing the declaration. The control has to pass through the declaration of the object for it to begin its lifetime. If the function contains branching, this does not necessarily happen during the very first call to the function.
For object with trivial initialization (like int objects), there's no difference between storage duration and lifetime. For such objects allocating memory is all that needs to be done. But in general case allocating memory alone is not sufficient.
It means that the static variable inside a function doesn't get initialized (by the constructor or the assignment operator) until the first call for that function.
As soon as the function, which contains a static local variable, is called the static local variable is initialized.
Given this class in the header file:
class ClassA
{
public:
ClassA(){};
}
Then in file.cpp
#include file.h
ClassA* GlobalPointerToClassAType = new ClassA();
a. Is it allowed, and is it good practice to use the keyword 'new' to allocate memory for an object in the heap(?) in lines of file-scope?
b. If it is allowed, then when exactly does the constructor ClassA() is actually called?
c. How does it differ if I wrote instead this line:
ClassA GlobalInstanceOfClassAType = ClassA();
in terms of the time of calling the constructor, in terms of memory efficiency, and in terms of good practice?
a. Is it allowed, and is it good practice to use the keyword 'new' to allocate memory for an object in the heap(?) in lines of file-scope?
It is allowed. Whether is it good practice to use new here is opinion based. And i predict that most people will answer no.
b. If it is allowed, then when exactly does the constructor ClassA() is actually called?
Let's start from some concepts.
In C++, all objects in a program have one of the following storage durations:
automatic
static
thread (since C++11)
dynamic
And if you check the cppreference, it claim:
static storage duration. The storage for the object is allocated when the program begins and deallocated when the program ends. Only one instance of the object exists. All objects declared at namespace scope (including global namespace) have this storage duration, plus those declared with static or extern. See Non-local variables and Static local variables for details on initialization of objects with this storage duration.
So, GlobalPointerToClassAType has static storage duration, it fit the statement that "All objects declared at namespace scope (including global namespace) have this storage duration...".
And if you get deeper into the link of the above section, you will find:
All non-local variables with static storage duration are initialized as part of program startup, before the execution of the main function begins (unless deferred, see below). All non-local variables with thread-local storage duration are initialized as part of thread launch, sequenced-before the execution of the thread function begins. For both of these classes of variables, initialization occurs in two distinct stages:
There's more detail in the same site, you can go deeper if you want to get more, but for this question, let's only focus on the initialization time. According to the reference, The constructor ClassA() might be called before the execution of the main function begins (unless deferred).
What is "deferred"? The answer is in the below sections:
It is implementation-defined whether dynamic initialization happens-before the first statement of the main function (for statics) or the initial function of the thread (for thread-locals), or deferred to happen after.
If the initialization of a non-inline variable (since C++17) is deferred to happen after the first statement of main/thread function, it happens before the first odr-use of any variable with static/thread storage duration defined in the same translation unit as the variable to be initialized. If no variable or function is odr-used from a given translation unit, the non-local variables defined in that translation unit may never be initialized (this models the behavior of an on-demand dynamic library). However, as long as anything from a translation unit is odr-used, all non-local variables whose initialization or destruction has side effects will be initialized even if they are not used in the program.
Let's see a tiny example, from godbolt. I use clang, directly copy your code, except that the Class A and main are defined in the same translation unit. You can see clang generate some section like __cxx_global_var_init, where the class ctor is called.
My csapp book says that if global and static variables are initialized, than they are contained in .data section in ELF relocatable object file.
So my question is that if some foo.c code contains
int a;
int main()
{
a = 3;
}`
and example.c contains,
int b = 3;
int main()
{
...
}
is it only b that considered to be initialized? In other words, does initialization mean declaration and definition in same line?
It means exactly what it says. Initialized static storage duration objects will have their init values set before the main function is called. Not initialized will be zeroed. The second part of the statement is actually implementation dependant, and implementation has the full freedom of the way it will be archived.
When you declare the variable without the keyword extern you always define it as well
Both are considered initialized
They get zero initialized or constant initalized (in short: if the right hand side is a compile time constant expression).
If permitted, Constant initialization takes place first (see Constant
initialization for the list of those situations). In practice,
constant initialization is usually performed at compile time, and
pre-calculated object representations are stored as part of the
program image. If the compiler doesn't do that, it still has to
guarantee that this initialization happens before any dynamic
initialization.
For all other non-local static and thread-local variables, Zero
initialization takes place. In practice, variables that are going to
be zero-initialized are placed in the .bss segment of the program
image, which occupies no space on disk, and is zeroed out by the OS
when loading the program.
To sum up, if the implementation cannot constant initialize it, then it must first zero initialize and then initialize it before any dynamic initialization happends.
In the snippet:
int a;
int main()
{
a = 3;
}
a is not initialized; it is assigned. Assignment is a run-time execution of code. For example, should main be called multiple times (which is not, but any user function could), then a is set to 3 each time the function is called.
You second snippet is initializaion of the globalvariable b and it will be placed in the .data segment.
I will answer this question in general and complete way and not with respect to any programming language
There is a hell lot of confusion between declaration, definition, and initialization. Sometimes they all look similar and sometimes completely different.
Before understanding the differences, It is very important to be aware of two things:
The difference between declaration, definition, and initialization
varies from one programming language to other. Each programming has
its own way of doing these three things.The “thing” which you are
defining, declaring or initializing also affects the difference
between the three of them. That “thing” can be a variable, a class or
a function. All of them have different meanings of definitions,
declaration, and initialization. Once we are aware of the above two
things, most of the doubts get cleared and we stop seeking exact
differences because it’s not there.
In general terms ( irrespective of any language or “thing”)
The declaration means we are saying to a computer that this “thing”
(it can be a variable, a function or a class) exists but we don’t know
where. In the future, we may tell but right now it just exists
somewhere. In simple words, we don’t allocate memory while declaring.
We can declare that “thing” many times.
The definition means we are saying to the computer that this “thing” needs memory and it needs to be located somewhere. In simple
words, defining means we have allocated memory for it. We can define
something only once
The initialization means whatever our “thing “ is, we are giving it an initial value. That “thing” must be in some memory location and
if we keep that location empty, it may be a house for bugs and errors.
Initialization is not always necessary but it’s important.
Many people assume that declaration + definition = Initialization .
It's not wrong, but it’s not correct in all places. Its correct only for variables that too in a language like C ++ or maybe C.
In python, there is no concept of the declaration . We don’t need to declare anything in it.
The general meaning of the three is valid everywhere but the way that is performed varies from language to language and the “thing”.
Hope it helps :)
Variables with static storage duration that are initialized to zero end up in .bss.
Variables with static storage duration that are initialized with a non-zero value end up in .data.
NOTE: the C standard guarantees that if the programmer doesn't explicitly initialize a variable with static storage duration, such as static int a;, it is then initialized to zero implicitly1). Therefore a ends up in .bss.
Examples here.
1) C11 6.7.9
If an object that has static or thread storage duration is not initialized
explicitly, then:
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
In C++ I know static and global objects are constructed before the main function. But as you know, in C, there is no such kind initialization procedure before main.
For example, in my code:
int global_int1 = 5;
int global_int2;
static int static_int1 = 4;
static int static_int2;
When are these four variables initialized?
Where values for initialization like 5 and 4 are stored during compilation? How to manage them when initialization?
EDIT:
Clarification of 2nd question.
In my code I use 5 to initialize global_int1, so how can the compiler assign 5 to global_int? For example, maybe the compiler first store the 5 value at somewhere (i.e. a table), and get this value when initialization begins.
As to "How to manage them when initialization?", it is realy vague and I myself does not how to interpret yet. Sometimes, it is not easy to explain a question. Overlook it since I have not mastered the question fully yet.
By static and global objects, I presume you mean objects with
static lifetime defined at namespace scope. When such objects
are defined with local scope, the rules are slightly different.
Formally, C++ initializes such variables in three phases:
1. Zero initialization
2. Static initialization
3. Dynamic initialization
The language also distinguishes between variables which require
dynamic initialization, and those which require static
initialization: all static objects (objects with static
lifetime) are first zero initialized, then objects with static
initialization are initialized, and then dynamic initialization
occurs.
As a simple first approximation, dynamic initialization means
that some code must be executed; typically, static
initialization doesn't. Thus:
extern int f();
int g1 = 42; // static initialization
int g2 = f(); // dynamic initialization
Another approximization would be that static initialization is
what C supports (for variables with static lifetime), dynamic
everything else.
How the compiler does this depends, of course, on the
initialization, but on disk based systems, where the executable
is loaded into memory from disk, the values for static
initialization are part of the image on disk, and loaded
directly by the system from the disk. On a classical Unix
system, global variables would be divided into three "segments":
text:
The code, loaded into a write protected area. Static
variables with `const` types would also be placed here.
data:
Static variables with static initializers.
bss:
Static variables with no-initializer (C and C++) or with dynamic
initialization (C++). The executable contains no image for this
segment, and the system simply sets it all to `0` before
starting your code.
I suspect that a lot of modern systems still use something
similar.
EDIT:
One additional remark: the above refers to C++03. For existing
programs, C++11 probably doesn't change anything, but it does
add constexpr (which means that some user defined functions
can still be static initialization) and thread local variables,
which opens up a whole new can of worms.
Preface: The word "static" has a vast number of different meanings in C++. Don't get confused.
All your objects have static storage duration. That is because they are neither automatic nor dynamic. (Nor thread-local, though thread-local is a bit like static.)
In C++, Static objects are initialized in two phases: static initialization, and dynamic initialization.
Dynamic initialization requires actual code to execute, so this happens for objects that start with a constructor call, or where the initializer is an expression that can only be evaluated at runtime.
Static initialization is when the initializer is known statically and no constructor needs to run. (Static initialization is either zero-initialization or constant-initialization.) This is the case for your int variables with constant initializer, and you are guaranteed that those are indeed initialized in the static phase.
(Static-storage variables with dynamic initialization are also zero-initialzed statically before anything else happens.)
The crucial point is that the static initialization phase doens't "run" at all. The data is there right from the start. That means that there is no "ordering" or any other such dynamic property that concerns static initialization. The initial values are hard-coded into your program binary, if you will.
When are these four variables initialized?
As you say, this happens before program startup, i.e. before main begins. C does not specify it further; in C++, these happen during the static initialisation phase before objects with more complicated constructors or initialisers.
Where values for initialization like 5 and 4 are stored during compilation?
Typically, the non-zero values are stored in a data segment in the program file, while the zero values are in a bss segment which just reserves enough memory for the variables. When the program starts, the data segment is loaded into memory and the bss segment is set to zero. (Of course, the language standard doesn't specify this, so a compiler could do something else, like generate code to initialise each variables before running main).
Paraphrased from the standard:
All variables which do not have dynamic storage duration, do not have thread local storage duration, and are not local, have static storage duration. In other words, all globals have static storage duration.
Static objects with dynamic initialization are not necessarily created before the first statement in the main function. It is implementation defined as to whether these objects are created before the first statement in main, or before the first use of any function or variable defined in the same translation unit as the static variable to be initialized.
So, in your code, global_int1 and static_int1 are definitely initialized before the first statement in main because they are statically initialized. However, global_int2 and static_int2 are dynamically initialized, so their initialization is implementation defined according to the rule I mentioned above.
As for your second point, I'm not sure I understand what you mean. Could you clarify?
I have C++ code which declares static-lifetime variables which are initialized by function calls. The called function constructs a vector instance and calls its push_back method. Is the code risking doom via the C++ static initialization order fiasco? If not, why not?
Supplementary information:
What's the "static initialization order fiasco"?
It's explained in C++ FAQ 10.14
Why would I think use of vector could trigger the fiasco?
It's possible that the vector constructor makes use of the value of another static-lifetime variable initialized dynamically. If so, then there is nothing to ensure that vector's variable is initialized before I use vector in my code. Initializing result (see code below) could end up calling the vector constructor before vector's dependencies are fully initialized, leading to access to uninitialized memory.
What does this code look like anyway?
struct QueryEngine {
QueryEngine(const char *query, string *result_ptr)
: query(query), result_ptr(result_ptr) { }
static void AddQuery(const char *query, string *result_ptr) {
if (pending == NULL)
pending = new vector<QueryEngine>;
pending->push_back(QueryEngine(query, result_ptr));
}
const char *query;
string *result_ptr;
static vector<QueryEngine> *pending;
};
vector<QueryEngine> *QueryEngine::pending = NULL;
void Register(const char *query, string *result_ptr) {
QueryEngine::AddQuery(query, result_ptr);
}
string result = Register("query", &result);
Fortunately, static objects are zero-initialised even before any other initialisation is performed (even before the "true" initialisation of the same objects), so you know that the NULL will be set on that pointer long before Register is first invoked.1
Now, in terms of operating on your vector, it appears that (technically) you could run into such a problem:
[C++11: 17.6.5.9/3]: A C++ standard library function shall not directly or indirectly modify objects (1.10) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s non-const arguments, including this.
[C++11: 17.6.5.9/4]: [Note: This means, for example, that implementations can’t use a static object for internal purposes without synchronization because it could cause a data race even in programs that do not explicitly share objects between threads. —end note]
Notice that, although synchronisation is being required in this note, that's been mentioned within a passage that ultimately acknowledges that static implementation details are otherwise allowed.
That being said, it seems like the standard should further state that user code should avoid operating on standard containers during static initialisation, if the intent were that the semantics of such code could not be guaranteed; I'd consider this a defect in the standard, either way. It should be clearer.
1 And it is a NULL pointer, whatever the bit-wise representation of that may be, rather than a blot to all-zero-bits.
vector doesn't depend on anything preventing its use in dynamic initialisation of statics. The only issue with your code is a lack of thread safety - no particular reason to think you should care about that, unless you have statics whose construction spawns threads....
Initializing result (see code below) could end up calling the vector constructor before that class is fully initialized, leading to access to uninitialized memory.
No... initialising result calls AddQuery which checks if (pending == NULL) - the initialisation to NULL will certainly have been done before any dynamic initialisation, per 3.6.2/2:
Constant initialization is performed:
...
— if an object with static or thread storage duration is not initialized by a constructor call and if either the object is value-initialized or every full-expression that appears in its initializer is a constant expression
So even if the result assignment is in a different translation unit it's safe. See 3.6.2/2:
Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization. Static initialization shall be performed before any dynamic initialization takes place.