I know that the order of initialization of static variables defined in different translation units (e.g. different cpp/lib/dll/so files) is undefined. Does it mean that the behavior of following program is not well defined?
#include <vector>
std::vector<int> v;
int main()
{
v.push_back(1);
}
EDIT: Here I used STL vector as an example. But it could be an object of any other "3rd party" class. As such we wouldn't know if that object initialized via some other global variable. This means that in C++ it not safe to create even a single global object with nontrivial constructor. Right?
No, because when you use v in main, it is perfectly defined. The static initialization phase takes place before you use v in main ...
The problem arise if you use 2 globals in different translation units and there is a dependency between the two. See this C++ FAQ lite for an explanation. The next items in the FAQ explains how to avoid the 'fiasco'.
The problem of static initialization made globals worse in C++ than in any other language. Good library writers know the problem and avoid the static order initialization fiasco. And even if not, if the library is well spread, someone will hit the problem and, I hope, fix it. But 3rd party libs are not always well written, they can be libraries written in your company by an ignorant new to C++ programmer ...
So, yes, it is unsafe, you're right. And in C++ avoid globals even more than in other languages !
Note: Columbo as pointed out that the standard does not not exactly say that v is defined before entering main (see his answer). No practical difference in your instance.
It's specified in [basic.start.init]/4:
It is implementation-defined whether the dynamic initialization of a
non-local variable with static storage duration is done before the
first statement of main. If the initialization is deferred to some
point in time after the first statement of main, it shall occur before
the first odr-use (3.2) of any function or variable defined in the
same translation unit as the variable to be initialized.
It is therefore defined that v is initialized before its first use in any function of this translation unit, including main. That implies that in this particular program v is initialized before the first statement of main.
The static initialization order fiasco occurs when multiple variables in different translation units depend on their relative order of initialization; The initializations may be indeterminately sequenced with respect to each other, depending on their initialization.
Since there's only one global object being defined, there can be only one ordering of its initialization, and therefore there is no issue.
Related
My csapp book says that if global and static variables are initialized, than they are contained in .data section in ELF relocatable object file.
So my question is that if some foo.c code contains
int a;
int main()
{
a = 3;
}`
and example.c contains,
int b = 3;
int main()
{
...
}
is it only b that considered to be initialized? In other words, does initialization mean declaration and definition in same line?
It means exactly what it says. Initialized static storage duration objects will have their init values set before the main function is called. Not initialized will be zeroed. The second part of the statement is actually implementation dependant, and implementation has the full freedom of the way it will be archived.
When you declare the variable without the keyword extern you always define it as well
Both are considered initialized
They get zero initialized or constant initalized (in short: if the right hand side is a compile time constant expression).
If permitted, Constant initialization takes place first (see Constant
initialization for the list of those situations). In practice,
constant initialization is usually performed at compile time, and
pre-calculated object representations are stored as part of the
program image. If the compiler doesn't do that, it still has to
guarantee that this initialization happens before any dynamic
initialization.
For all other non-local static and thread-local variables, Zero
initialization takes place. In practice, variables that are going to
be zero-initialized are placed in the .bss segment of the program
image, which occupies no space on disk, and is zeroed out by the OS
when loading the program.
To sum up, if the implementation cannot constant initialize it, then it must first zero initialize and then initialize it before any dynamic initialization happends.
In the snippet:
int a;
int main()
{
a = 3;
}
a is not initialized; it is assigned. Assignment is a run-time execution of code. For example, should main be called multiple times (which is not, but any user function could), then a is set to 3 each time the function is called.
You second snippet is initializaion of the globalvariable b and it will be placed in the .data segment.
I will answer this question in general and complete way and not with respect to any programming language
There is a hell lot of confusion between declaration, definition, and initialization. Sometimes they all look similar and sometimes completely different.
Before understanding the differences, It is very important to be aware of two things:
The difference between declaration, definition, and initialization
varies from one programming language to other. Each programming has
its own way of doing these three things.The “thing” which you are
defining, declaring or initializing also affects the difference
between the three of them. That “thing” can be a variable, a class or
a function. All of them have different meanings of definitions,
declaration, and initialization. Once we are aware of the above two
things, most of the doubts get cleared and we stop seeking exact
differences because it’s not there.
In general terms ( irrespective of any language or “thing”)
The declaration means we are saying to a computer that this “thing”
(it can be a variable, a function or a class) exists but we don’t know
where. In the future, we may tell but right now it just exists
somewhere. In simple words, we don’t allocate memory while declaring.
We can declare that “thing” many times.
The definition means we are saying to the computer that this “thing” needs memory and it needs to be located somewhere. In simple
words, defining means we have allocated memory for it. We can define
something only once
The initialization means whatever our “thing “ is, we are giving it an initial value. That “thing” must be in some memory location and
if we keep that location empty, it may be a house for bugs and errors.
Initialization is not always necessary but it’s important.
Many people assume that declaration + definition = Initialization .
It's not wrong, but it’s not correct in all places. Its correct only for variables that too in a language like C ++ or maybe C.
In python, there is no concept of the declaration . We don’t need to declare anything in it.
The general meaning of the three is valid everywhere but the way that is performed varies from language to language and the “thing”.
Hope it helps :)
Variables with static storage duration that are initialized to zero end up in .bss.
Variables with static storage duration that are initialized with a non-zero value end up in .data.
NOTE: the C standard guarantees that if the programmer doesn't explicitly initialize a variable with static storage duration, such as static int a;, it is then initialized to zero implicitly1). Therefore a ends up in .bss.
Examples here.
1) C11 6.7.9
If an object that has static or thread storage duration is not initialized
explicitly, then:
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
In C++98/03, the construction of static objects(in files, in classes, in functions) has no specified sequence, one static object cannot assume it's constructed after or before another static object, seem to be decided by linker.
My question is, does C++ 11/14 specify any rules for the construction sequences of static objects and global objects?
The rules have not changed. However all global/static objects are constructed in the order they appear in the translation unit. It is just the order of initialization of multiple translation units that is unspecified.
Do note that function local static objects are constructed in a specified manner. They are constructed the first time the their declaration is reached and live until the end of the program. This behavior was changed in C++11 though as before C++11 that initialization was not thread safe where C++11 and above specifies that static initialization will be thread safe.
Yes: Global objects will be constructed in order within a compilation unit. Ando No: Nothing changed with C++11/14.
If I were to load up some symbols using something like dlopen in C++ while other classes in that translation unit had static member variables what exactly is the behavior of those static member variables. Do they get initialized or no because the library isn't really loaded just the symbols that you looked up (which I'm thinking the latter is not true because if the symbol you looked up needs those they need to be loaded too)?
In short, there's no guarantee that static variables that cannot be initialized at compile time will be initialized before an externally visible function or variable in the same translation unit is referred to. This is true even for static linking. As for trying to get static variables in dynamically loaded libraries to initialize upon loading, my experience is that often you'll get lucky, especially for small programs, but fundamentally this is undefined behavior and should not be relied on. The resulting bugs are unpredictable, difficult to reproduce, and highly system specific.
First, some standardese and an explanation of why this is undefined behavior and then some workarounds.
The word static is unfortunately overloaded in the Standard so bear with me. The Standard makes reference to both static storage duration and static initialization. The types of storage duration defined by the Standard are static, thread, automatic, and dynamic. They are as they sound. Static storage duration means that the lifetime of such a variable is the entire duration of the program.
Static initialization is a distinct concept. Although a variable may be stored only once per program execution, the value it will be initialized with may be not be known when the program starts. At the start of the program, all variables with static storage duration will be zero initialized and those that can be will then be constant initialized. The fine points are in §3.6.2 but roughly, a static variable will be constant initialized if its initialization relies only on constant expressions. Together, zero initialization and constant initialization are termed static initialization. The counterpart is dynamic initialization. These are the interesting ones but unfortunately there's no portable way to force dynamic initialization to take place before main() first executes, in the case of dynamic linking, or before dlopen() returns, in the case of dynamic loading. C++ simply does not demand such.
The key part of the C++11 Standard is in §3.6.2:
It is implementation-defined whether the dynamic initialization of a
non-local variable with static storage duration is done before the
first statement of main. If the initialization is deferred to some
point in time after the first statement of main, it shall occur before
the first odr-use (3.2) of any function or variable defined in the
same translation unit as the variable to be initialized.
Nonetheless, if you've experimented, you've noticed that sometimes this does work. Sometimes you can get arbitrary code to run upon library loading by stuffing it in the constructors of static variables. Whether this happens is simply up to the compiler (not the linker). The manpage for dlopen explains.
If a dynamic library exports a routine named _init(), then that code is executed after the loading, before dlopen() returns
Inspecting the asm output of a small shared object written in standard C++, I can see that clang 3.4 and g++ 4.8 both add an _init section, however they are not required to do so.
As for workarounds, a gcc extension that has become commonplace does allow control of this behavior. By adding a constructor attribute to functions, we can insist that they be run upon library initialization. The linked manpage for dlopen suggests using this method.
See the GCC documentation on function attributes and this SO question which has an example usage. This extension is supported by gcc, clang, IBM XL, and my guess is that icc supports it too. MSVC does not support this but I understand there's something similar.
A truly portable solution is elusive. As the Standard says, if you can somehow cause an odr usage in the same translation unit as the static variable, then the static variable must be initialized. Calling a function, even a dummy function just for this purpose, would work.
The C++ standard section 3.6.2 paragraph 3 states that it is implementation-defined whether dynamic initialization of non-local objects occurs after the first statement of main().
Does anyone know what the rationale for this is, and which compilers postpone non-local object initialization this way? I am most familiar with g++, which performs these initializations before main() has been entered.
This question is related: Dynamic initialization phase of static variables
But I'm specifically asking what compilers are known to behave this way.
It may be that the only rationale for this paragraph is to support dynamic libraries loaded at runtime, but I do not think that the standard takes dynamic loading issues into consideration.
One of the reasons may be the following:
static char data[1000000000000000000000000000000];
void main(int argc)
{
if (argc > 0)
data[0] = 0;
}
It might be reasonable to allocate and init this static array only when it turns out that it is really needed. It might happen that some application were coming across something similar and had enough voice to convince the committee. In my own experience with C# I came across situation when static members of the class were not allocated right after jitting the class. They were allocated one by one, on the first use. In that case there was absolutely no justification for doing that. It was a plain disaster. Maybe they fixed this now.
Other reasons are possible also.
From the C++11 draft:
It is implementation-defined whether the dynamic initialization of a non-local variable with static storage
duration is done before the first statement of main. If the initialization is deferred to some point in time after the first statement of main, it shall occur before the first odr-use (3.2) of any function or variable defined in the same translation unit as the variable to be initialized. [emphasis mine]
That is, the static variable has to be initialized before any use of anything defined in the same translation unit.
It looks to me that it is done this way to allow dynamic libraries (DLLs or SOs) to be loaded and initialized lazily, or even dynamically (calling dlopen or LoadLibrary or whatever).
It is obvious that a variable defined in a DLL cannot be initialized before the DLL itself is loaded.
Naturally, C++ knows nothing about DLLs so there is no direct mention to them in the standard. But the people from the commitee do know about real environments and compilers, and certainly know about DLLs. Without this clause, lazy loading a DLL would technically violate the C++ specification. (Not that it would prevent implementators to do it anyway, but it is better if we all try to go along with each other.)
And about which systems support this, that I know of, at least the MS Visual C++ compiler supports lazy dynamic linking (the DLL will not even be loaded until first use). And most modern platforms support dynamic loading a DLL.
I'm working on some C++ code and I've run into a question which has been nagging me for a while... Assuming I'm compiling with GCC on a Linux host for an ELF target, where are global static constructors and destructors called?
I've heard there's a function _init in crtbegin.o, and a function _fini in crtend.o. Are these called by crt0.o? Or does the dynamic linker actually detect their presence in the loaded binary and call them? If so, when does it actually call them?
I'm mainly interested to know so I can understand what's happening behind the scenes as my code is loaded, executed, and then unloaded at runtime.
Thanks in advance!
Update: I'm basically trying to figure out the general time at which the constructors are called. I don't want to make assumptions in my code based on this information, it's more or less to get a better understanding of what's happening at the lower levels when my program loads. I understand this is quite OS-specific, but I have tried to narrow it down a little in this question.
When talking about non-local static objects there are not many guarantees. As you already know (and it's also been mentioned here), it should not write code that depends on that. The static initialization order fiasco...
Static objects goes through a two-phase initialization: static initialization and dynamic initialization. The former happens first and performs zero-initialization or initialization by constant expressions. The latter happens after all static initialization is done. This is when constructors are called, for example.
In general, this initialization happens at some time before main(). However, as opposed to what many people think even that is not guaranteed by the C++ standard. What is in fact guaranteed is that the initialization is done before the use of any function or object defined in the same translation unit as the object being initialized. Notice that this is not OS specific. This is C++ rules. Here's a quote from the Standard:
It is implementation-defined whether or not the dynamic initialization (8.5, 9.4, 12.1, 12.6.1) of an object of
namespace scope is done before the first statement of main. If the initialization is deferred to some point
in time after the first statement of main, it shall occur before the first use of any function or object defined
in the same translation unit as the object to be initialized
This depends heavy on the compiler and runtime. It's not a good idea to make any assumptions on the time global objects are constructed.
This is especially a problem if you have a static object which depends on another one being already constructed.
This is called "static initialization order fiasco". Even if thats not the case in your code, the C++Lite FAQ articles on that topic are worth a read.
This is not OS specific, rather its compiler specific.
You have given the answer, initialization is done in __init.
For the second part, in gcc you can guarantee the order of initialization with a __attribute__((init_priority(PRIORITY))) attached to a variable definition, where PRIORITY is some relative value, with lower numbers initialized first.
The grantees you have:
All static non-local objects in the global namespace are constructed before main()
All static non-local objects in another namespace are constructed before any functions/methods in that namespace are used (Thus allowing the compiler to potentially lazy evaluate them [but don't count on this behavior]).
All static non-local objects in a translation unit are constructed in the order of declaration.
Nothing is defined about the order between translation units.
All static non-local objects are destroyed in the reverse order of creation. (This includes the static function variables (which are lazily created on first use).
If you have globals that have dependencies on each other you have two options:
Put them in the same translation unit.
Transform them into static function variables retrieved and constructed on first use.
Example 1: Global A's constructor uses Global log
class AType
{ AType() { log.report("A Constructed");}};
LogType log;
AType A;
// Or
Class AType()
{ AType() { getLog().report("A Constructed");}};
LogType& getLog()
{
static LogType log;
return log;
}
// Define A anywhere;
Example Global B's destructor uses Global log
Here you have to grantee that the object log is not destroyed before the object B. This means that log must be fully constructed before B (as the reverse order of destruction rule will then apply). Again the same techniques can be used. Either put them in the same translation unit or use a function to get log.
class BType
{ ~BType() { log.report("B Destroyed");}};
LogType log;
BType B; // B constructed after log (so B will be destroyed first)
// Or
Class BType()
{ BType() { getLog();}
/*
* If log is used in the destructor then it must not be destroyed before B
* This means it must be constructed before B
* (reverse order destruction guarantees that it will then be destroyed after B)
*
* To achieve this just call the getLog() function in the constructor.
* This means that 'log' will be fully constructed before this object.
* This means it will be destroyed after and thus safe to use in the destructor.
*/
~BType() { getLog().report("B Destroyed");}
};
LogType& getLog()
{
static LogType log;
return log;
}
// Define B anywhere;
According to the C++ standard they are called before any function or object of their translation unit is used. Note that for objects in the global namespace this would mean they are initialized before main() is called. (See ltcmelo's and Martin's answers for mote details and a discussion of this.)