Static member variable for class that is dynamically loaded - c++

If I were to load up some symbols using something like dlopen in C++ while other classes in that translation unit had static member variables what exactly is the behavior of those static member variables. Do they get initialized or no because the library isn't really loaded just the symbols that you looked up (which I'm thinking the latter is not true because if the symbol you looked up needs those they need to be loaded too)?

In short, there's no guarantee that static variables that cannot be initialized at compile time will be initialized before an externally visible function or variable in the same translation unit is referred to. This is true even for static linking. As for trying to get static variables in dynamically loaded libraries to initialize upon loading, my experience is that often you'll get lucky, especially for small programs, but fundamentally this is undefined behavior and should not be relied on. The resulting bugs are unpredictable, difficult to reproduce, and highly system specific.
First, some standardese and an explanation of why this is undefined behavior and then some workarounds.
The word static is unfortunately overloaded in the Standard so bear with me. The Standard makes reference to both static storage duration and static initialization. The types of storage duration defined by the Standard are static, thread, automatic, and dynamic. They are as they sound. Static storage duration means that the lifetime of such a variable is the entire duration of the program.
Static initialization is a distinct concept. Although a variable may be stored only once per program execution, the value it will be initialized with may be not be known when the program starts. At the start of the program, all variables with static storage duration will be zero initialized and those that can be will then be constant initialized. The fine points are in §3.6.2 but roughly, a static variable will be constant initialized if its initialization relies only on constant expressions. Together, zero initialization and constant initialization are termed static initialization. The counterpart is dynamic initialization. These are the interesting ones but unfortunately there's no portable way to force dynamic initialization to take place before main() first executes, in the case of dynamic linking, or before dlopen() returns, in the case of dynamic loading. C++ simply does not demand such.
The key part of the C++11 Standard is in §3.6.2:
It is implementation-defined whether the dynamic initialization of a
non-local variable with static storage duration is done before the
first statement of main. If the initialization is deferred to some
point in time after the first statement of main, it shall occur before
the first odr-use (3.2) of any function or variable defined in the
same translation unit as the variable to be initialized.
Nonetheless, if you've experimented, you've noticed that sometimes this does work. Sometimes you can get arbitrary code to run upon library loading by stuffing it in the constructors of static variables. Whether this happens is simply up to the compiler (not the linker). The manpage for dlopen explains.
If a dynamic library exports a routine named _init(), then that code is executed after the loading, before dlopen() returns
Inspecting the asm output of a small shared object written in standard C++, I can see that clang 3.4 and g++ 4.8 both add an _init section, however they are not required to do so.
As for workarounds, a gcc extension that has become commonplace does allow control of this behavior. By adding a constructor attribute to functions, we can insist that they be run upon library initialization. The linked manpage for dlopen suggests using this method.
See the GCC documentation on function attributes and this SO question which has an example usage. This extension is supported by gcc, clang, IBM XL, and my guess is that icc supports it too. MSVC does not support this but I understand there's something similar.
A truly portable solution is elusive. As the Standard says, if you can somehow cause an odr usage in the same translation unit as the static variable, then the static variable must be initialized. Calling a function, even a dummy function just for this purpose, would work.

Related

Class instantiated after definition lifetime [duplicate]

In C++ I know static and global objects are constructed before the main function. But as you know, in C, there is no such kind initialization procedure before main.
For example, in my code:
int global_int1 = 5;
int global_int2;
static int static_int1 = 4;
static int static_int2;
When are these four variables initialized?
Where values for initialization like 5 and 4 are stored during compilation? How to manage them when initialization?
EDIT:
Clarification of 2nd question.
In my code I use 5 to initialize global_int1, so how can the compiler assign 5 to global_int? For example, maybe the compiler first store the 5 value at somewhere (i.e. a table), and get this value when initialization begins.
As to "How to manage them when initialization?", it is realy vague and I myself does not how to interpret yet. Sometimes, it is not easy to explain a question. Overlook it since I have not mastered the question fully yet.
By static and global objects, I presume you mean objects with
static lifetime defined at namespace scope. When such objects
are defined with local scope, the rules are slightly different.
Formally, C++ initializes such variables in three phases:
1. Zero initialization
2. Static initialization
3. Dynamic initialization
The language also distinguishes between variables which require
dynamic initialization, and those which require static
initialization: all static objects (objects with static
lifetime) are first zero initialized, then objects with static
initialization are initialized, and then dynamic initialization
occurs.
As a simple first approximation, dynamic initialization means
that some code must be executed; typically, static
initialization doesn't. Thus:
extern int f();
int g1 = 42; // static initialization
int g2 = f(); // dynamic initialization
Another approximization would be that static initialization is
what C supports (for variables with static lifetime), dynamic
everything else.
How the compiler does this depends, of course, on the
initialization, but on disk based systems, where the executable
is loaded into memory from disk, the values for static
initialization are part of the image on disk, and loaded
directly by the system from the disk. On a classical Unix
system, global variables would be divided into three "segments":
text:
The code, loaded into a write protected area. Static
variables with `const` types would also be placed here.
data:
Static variables with static initializers.
bss:
Static variables with no-initializer (C and C++) or with dynamic
initialization (C++). The executable contains no image for this
segment, and the system simply sets it all to `0` before
starting your code.
I suspect that a lot of modern systems still use something
similar.
EDIT:
One additional remark: the above refers to C++03. For existing
programs, C++11 probably doesn't change anything, but it does
add constexpr (which means that some user defined functions
can still be static initialization) and thread local variables,
which opens up a whole new can of worms.
Preface: The word "static" has a vast number of different meanings in C++. Don't get confused.
All your objects have static storage duration. That is because they are neither automatic nor dynamic. (Nor thread-local, though thread-local is a bit like static.)
In C++, Static objects are initialized in two phases: static initialization, and dynamic initialization.
Dynamic initialization requires actual code to execute, so this happens for objects that start with a constructor call, or where the initializer is an expression that can only be evaluated at runtime.
Static initialization is when the initializer is known statically and no constructor needs to run. (Static initialization is either zero-initialization or constant-initialization.) This is the case for your int variables with constant initializer, and you are guaranteed that those are indeed initialized in the static phase.
(Static-storage variables with dynamic initialization are also zero-initialzed statically before anything else happens.)
The crucial point is that the static initialization phase doens't "run" at all. The data is there right from the start. That means that there is no "ordering" or any other such dynamic property that concerns static initialization. The initial values are hard-coded into your program binary, if you will.
When are these four variables initialized?
As you say, this happens before program startup, i.e. before main begins. C does not specify it further; in C++, these happen during the static initialisation phase before objects with more complicated constructors or initialisers.
Where values for initialization like 5 and 4 are stored during compilation?
Typically, the non-zero values are stored in a data segment in the program file, while the zero values are in a bss segment which just reserves enough memory for the variables. When the program starts, the data segment is loaded into memory and the bss segment is set to zero. (Of course, the language standard doesn't specify this, so a compiler could do something else, like generate code to initialise each variables before running main).
Paraphrased from the standard:
All variables which do not have dynamic storage duration, do not have thread local storage duration, and are not local, have static storage duration. In other words, all globals have static storage duration.
Static objects with dynamic initialization are not necessarily created before the first statement in the main function. It is implementation defined as to whether these objects are created before the first statement in main, or before the first use of any function or variable defined in the same translation unit as the static variable to be initialized.
So, in your code, global_int1 and static_int1 are definitely initialized before the first statement in main because they are statically initialized. However, global_int2 and static_int2 are dynamically initialized, so their initialization is implementation defined according to the rule I mentioned above.
As for your second point, I'm not sure I understand what you mean. Could you clarify?

Global objects are inherently unsafe?

I know that the order of initialization of static variables defined in different translation units (e.g. different cpp/lib/dll/so files) is undefined. Does it mean that the behavior of following program is not well defined?
#include <vector>
std::vector<int> v;
int main()
{
v.push_back(1);
}
EDIT: Here I used STL vector as an example. But it could be an object of any other "3rd party" class. As such we wouldn't know if that object initialized via some other global variable. This means that in C++ it not safe to create even a single global object with nontrivial constructor. Right?
No, because when you use v in main, it is perfectly defined. The static initialization phase takes place before you use v in main ...
The problem arise if you use 2 globals in different translation units and there is a dependency between the two. See this C++ FAQ lite for an explanation. The next items in the FAQ explains how to avoid the 'fiasco'.
The problem of static initialization made globals worse in C++ than in any other language. Good library writers know the problem and avoid the static order initialization fiasco. And even if not, if the library is well spread, someone will hit the problem and, I hope, fix it. But 3rd party libs are not always well written, they can be libraries written in your company by an ignorant new to C++ programmer ...
So, yes, it is unsafe, you're right. And in C++ avoid globals even more than in other languages !
Note: Columbo as pointed out that the standard does not not exactly say that v is defined before entering main (see his answer). No practical difference in your instance.
It's specified in [basic.start.init]/4:
It is implementation-defined whether the dynamic initialization of a
non-local variable with static storage duration is done before the
first statement of main. If the initialization is deferred to some
point in time after the first statement of main, it shall occur before
the first odr-use (3.2) of any function or variable defined in the
same translation unit as the variable to be initialized.
It is therefore defined that v is initialized before its first use in any function of this translation unit, including main. That implies that in this particular program v is initialized before the first statement of main.
The static initialization order fiasco occurs when multiple variables in different translation units depend on their relative order of initialization; The initializations may be indeterminately sequenced with respect to each other, depending on their initialization.
Since there's only one global object being defined, there can be only one ordering of its initialization, and therefore there is no issue.

Dynamic Initialization

C++03 Standard [basic.start.init] point 3 states:
It is implementation-defined whether or not the dynamic
initialization (8.5, 9.4, 12.1, 12.6.1) of an object of namespace
scope is done before the first statement of main. If the
initialization is deferred to some point in time after the first
statement of main, it shall occur before the first use of any
function or object defined in the same translation unit as the
object to be initialized.
Microsoft Compilers, according to Additional Startup Considerations, perform the initialization prior to main().
I have been unable to obtain documentation stating the behaviour for GNU and Sun Forte compilers.
Can anyone:
Point me in the direction of documentation that describes the behaviour of the GNU and Forte compilers with respect to dynamic initialization (I have checked the GCC manual and found nothing relating to dynamic initialization).
Comment on the thread-safety of deferred dynamic initialization (if two threads attempt to invoke a function from the same translation unit that contains a non-local object).
FWIW, I observed the behaviour of GNU's g++ and SUN's CC and both performed the initalization prior to main though I don't accept this as a definitive answer. (I can post the very simple code I used to observe if anyone is interested but I felt the question is long enough)
The definitive answer is that all compilers do static initialization
before main, unless the objects are in a DLL which is loaded later.
In practice, it's (almost) impossible to meet the requirements in the
text you cite otherwise. (Think of what happens if there is a cycle.)

C++ compilers implementing dynamic initialization after main

The C++ standard section 3.6.2 paragraph 3 states that it is implementation-defined whether dynamic initialization of non-local objects occurs after the first statement of main().
Does anyone know what the rationale for this is, and which compilers postpone non-local object initialization this way? I am most familiar with g++, which performs these initializations before main() has been entered.
This question is related: Dynamic initialization phase of static variables
But I'm specifically asking what compilers are known to behave this way.
It may be that the only rationale for this paragraph is to support dynamic libraries loaded at runtime, but I do not think that the standard takes dynamic loading issues into consideration.
One of the reasons may be the following:
static char data[1000000000000000000000000000000];
void main(int argc)
{
if (argc > 0)
data[0] = 0;
}
It might be reasonable to allocate and init this static array only when it turns out that it is really needed. It might happen that some application were coming across something similar and had enough voice to convince the committee. In my own experience with C# I came across situation when static members of the class were not allocated right after jitting the class. They were allocated one by one, on the first use. In that case there was absolutely no justification for doing that. It was a plain disaster. Maybe they fixed this now.
Other reasons are possible also.
From the C++11 draft:
It is implementation-defined whether the dynamic initialization of a non-local variable with static storage
duration is done before the first statement of main. If the initialization is deferred to some point in time after the first statement of main, it shall occur before the first odr-use (3.2) of any function or variable defined in the same translation unit as the variable to be initialized. [emphasis mine]
That is, the static variable has to be initialized before any use of anything defined in the same translation unit.
It looks to me that it is done this way to allow dynamic libraries (DLLs or SOs) to be loaded and initialized lazily, or even dynamically (calling dlopen or LoadLibrary or whatever).
It is obvious that a variable defined in a DLL cannot be initialized before the DLL itself is loaded.
Naturally, C++ knows nothing about DLLs so there is no direct mention to them in the standard. But the people from the commitee do know about real environments and compilers, and certainly know about DLLs. Without this clause, lazy loading a DLL would technically violate the C++ specification. (Not that it would prevent implementators to do it anyway, but it is better if we all try to go along with each other.)
And about which systems support this, that I know of, at least the MS Visual C++ compiler supports lazy dynamic linking (the DLL will not even be loaded until first use). And most modern platforms support dynamic loading a DLL.

Static variables within functions in C++ - allocated even if function doesn't run?

I've been reading up on C++ on the Internet, and here's one thing that I haven't been quite able to find an answer to.
I know that static variables used within functions are akin to globals, and that subsequent invocations of that function will have the static variable retain its value between calls.
However, if the function is never called, does the static variable get allocated?
Thanks
If the function is never called, it is likely that your linker will deadstrip both the function and the static variable, preventing it from entering .rodata, .data, or .bss segments (or your executable file format's equivalents).
However, there are various reasons why a linker might not deadstrip (flags telling it not to, an inability to determine what depends on the symbol, etc).
It's worth checking your linker map file (sometimes just a text file!), or using objdump, nm, or dumpbin utilities on the final executable to see if the symbol or related symbols (such as static initializer code) survived.
The C++ Standard, section 6.7 says:
The zero-initialization (8.5) of all
local objects with static storage
duration (3.7.1) is performed before
any other initialization takes place.
A local object of POD type (3.9) with
static storage duration initialized
with constant-expressions is
initialized before its block is first
entered. An implementation is
permitted to per- form early
initialization of other local objects
with static storage duration under the
same conditions that an implementation
is permitted to statically initialize
an object with static storage duration
in namespace scope (3.6.2). Otherwise
such an object is initialized the
first time control passes through its
declaration; such an object is
considered initialized upon the
completion of its initialization.
Which indicates that local static objects are normally initialised the first time the control flow encounters them. However, they may well be allocated before this - the standard is somewhat reticent on what static storage actually is, except with reference to static object lifetimes.
Every object in C++ has two nested time-periods associated with it: storage duration and lifetime. Storage duration is the period for which the raw memory occupied by the object is allocated. Lifetime is the period between construction and destruction of an actual object in that memory. (For objects of POD-types construction-destruction either doesn't matter or not applicable, so their lifetime matches their storage duration).
When someone says "allocated" they usually refer to storage duration. The language doesn't actually specify exactly when the object's storage duration begins. It is sufficient to require that shall begin at some point before the object's lifetime begins.
For this reason, in general case a static object defined inside a function might never begin its lifetime and, theoretically, it's storage duration does not have to begin either. So, in theory, in might not even get "allocated".
In practice though, all objects with static storage duration ("globals", local statics, etc.) are normally treated equally: they are assigned a specific amount of storage early, at the program's startup.
As an additional note, if a local object with static storage duration requires a non-trivial initialization, this initialization is carried out when the control passes over the definition for the very first time. So in this example
void foo() {
static int *p = new int[100];
}
the dynamic array will never be allocated if the function is never called. And it will be allocated only once if the function is called. This doesn't look like what you are asking about, but I mention this just in case.
Im sure that thats going to be up to the implementation. What MSVC does is - static objects are allocated in the automatic data segment of the EXE or DLL. However, the constructor is only executed the first time the function containing the static is executed.
Yes, actual allocation is compiler dependent, although I think that every compiler just reserves the space in the .static segment of the executable (or the equivalent in its executable file format).
The initialization, however takes place only the firs time that the execution flow encounters the static object, and that is required by the standard.
Beware that initialization of global static objects works in a different way, though.
You can get very good answers to almost every question at the C++ FAQ lite site.
I am also fond of Scott Meyers's "Effective C++".
Depends. If you mean, never called, as in, the function is literally never invoked, then your compiler will probably not allocate it, or even put in the function code. If, however, you made it dependent on, say, user input, and that user input just happened to never come up, then it will probably be pre-allocated. However, you're treading in a minefield here, and it's best just to assume that it is always created by the time control enters the function(s) that refer to it.
Static variables defined on classes (members) or functions are not allocated dynamically on stack during function call, like non static ones. They are allocated in another area of generated code reserved for global and static data. So, if you call the function or not, instantiate classes that contain static members or not, a space to their data will be reserved on program data area anyway.