Static variables initialisation order - c++

C++ guarantees that variables in a compilation unit (.cpp file) are initialised in order of declaration. For number of compilation units this rule works for each one separately (I mean static variables outside of classes).
But, the order of initialization of variables, is undefined across different compilation units.
Where can I see some explanations about this order for gcc and MSVC (I know that relying on that is a very bad idea - it is just to understand the problems that we may have with legacy code when moving to new GCC major and different OS)?

As you say the order is undefined across different compilation units.
Within the same compilation unit the order is well defined: The same order as definition.
This is because this is not resolved at the language level but at the linker level. So you really need to check out the linker documentation. Though I really doubt this will help in any useful way.
For gcc: Check out ld
I have found that even changing the order of objects files being linked can change the initialization order. So it is not just your linker that you need to worry about, but how the linker is invoked by your build system. Even try to solve the problem is practically a non starter.
This is generally only a problem when initializing globals that reference each other during their own initialization (so only affects objects with constructors).
There are techniques to get around the problem.
Lazy initialization.
Schwarz Counter
Put all complex global variables inside the same compilation unit.
Note 1: globals:
Used loosely to refer to static storage duration variables that are potentially initialized before main().
Note 2: Potentially
In the general case we expect static storage duration variables to be initialized before main, but the compiler is allowed to defer initialization in some situations (the rules are complex see standard for details).

I expect the constructor order between modules is mainly a function of what order you pass the objects to the linker.
However, GCC does let you use init_priority to explicitly specify the ordering for global ctors:
class Thingy
{
public:
Thingy(char*p) {printf(p);}
};
Thingy a("A");
Thingy b("B");
Thingy c("C");
outputs 'ABC' as you'd expect, but
Thingy a __attribute__((init_priority(300))) ("A");
Thingy b __attribute__((init_priority(200))) ("B");
Thingy c __attribute__((init_priority(400))) ("C");
outputs 'BAC'.

Since you already know that you shouldn't rely on this information unless absolutely necessary, here it comes. My general observation across various toolchains (MSVC, gcc/ld, clang/llvm, etc) is that the order in which your object files are passed to the linker is the order in which they will be initialized.
There are exceptions to this, and I do not claim to all of them, but here are the ones I ran into myself:
1) GCC versions prior to 4.7 actually initialize in the reverse order of the link line. This ticket in GCC is when the change happened, and it broke a lot of programs that depended on initialization order (including mine!).
2) In GCC and Clang, usage of constructor function priority can alter the initialization order. Note that this only applies to functions that are declared to be "constructors" (i.e. they should be run just like a global object constructor would be). I have tried using priorities like this and found that even with highest priority on a constructor function, all constructors without priority (e.g. normal global objects, constructor functions without priority) will be initialized first. In other words, the priority is only relative to other functions with priorities, but the real first class citizens are those without priority. To make it worse, this rule is effectively the opposite in GCC prior to 4.7 due to point (1) above.
3) On Windows, there is a very neat and useful shared-library (DLL) entry-point function called DllMain(), which if defined, will run with parameter "fdwReason" equal to DLL_PROCESS_ATTACH directly after all global data has been initialized and before the consuming application has a chance to call any functions on the DLL. This is extremely useful in some cases, and there absolutely is not analogous behavior to this on other platforms with GCC or Clang with C or C++. The closest you will find is making a constructor function with priority (see above point (2)), which absolutely is not the same thing and won't work for many of the use cases that DllMain() works for.
4) If you are using CMake to generate your build systems, which I often do, I have found that the order of the input source files will be the order of their resultant object files given to the linker. However, often times your application/DLL is also linking in other libraries, in which case those libraries will be on the link line after your input source files. If you are looking to have one of your global objects be the very first one to initialize, then you are in luck and your can put the source file containing that object to be the first in the list of source files. However, if you are looking to have one be the very last one to initialize (which can effectively replicate DllMain() behavior!) then you can make a call to add_library() with that one source file to produce a static library, and add the resulting static library as the very last link dependency in your target_link_libraries() call for your application/DLL. Be wary that your global object may get optimized out in this case and you can use the --whole-archive flag to force the linker not to remove unused symbols for that specific tiny archive file.
Closing Tip
To absolutely know the resulting initialization order of your linked application/shared-library, pass --print-map to ld linker and grep for .init_array (or in GCC prior to 4.7, grep for .ctors). Every global constructor will be printed in the order that it will get initialized, and remember that the order is opposite in GCC prior to 4.7 (see point (1) above).
The motivating factor for writing this answer is that I needed to know this information, had no other choice but to rely on initialization order, and found only sparse bits of this information throughout other SO posts and internet forums. Most of it was learned through much experimentation, and I hope that this saves some people the time of doing that!

http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.12 - this link moves around. this one is more stable but you will have to look around for it.
edit: osgx supplied a better link.

A robust solution is to use a getter function that returns a reference to an static variable. A simple example is shown below, a complex variant in our SDG Controller middleware.
// Foo.h
class Foo {
public:
Foo() {}
static bool insertIntoBar(int number);
private:
static std::vector<int>& getBar();
};
// Foo.cpp
std::vector<int>& Foo::getBar() {
static std::vector<int> bar;
return bar;
}
bool Foo::insertIntoBar(int number) {
getBar().push_back(number);
return true;
}
// A.h
class A {
public:
A() {}
private:
static bool a1;
};
// A.cpp
bool A::a1 = Foo::insertIntoBar(22);
The initialization would being with the only static member variable bool A::a1. This would then call Foo::insertIntoBar(22). This would then call Foo::getBar() in which the initialization of the static std::vector<int> variable would occur before returning a reference to the initialized object.
If the static std::vector<int> bar were placed directly as a member variable of the Foo class, there would be a possibility, depending on the naming ordering of the source files, that bar would be initialized after insertIntoBar() were called, thereby crashing the program.
If multiple static member variables would call insertIntoBar() during their initialization, the order would not be dependent on the names of the source files, i.e., random, but the std::vector<int> would be guaranteed to be initialized before any values be inserted into it.

In addition to Martin's comments, coming from a C background, I always think of static variables as part of the program executable, incorporated and allocated space in the data segment. Thus static variables can be thought of as being initialised as the program loads, prior to any code being executed. The exact order in which this happens can be ascertained by looking at the data segment of map file output by the linker, but for most intents and purposes the initialisation is simultaeneous.
Edit: Depending on construction order of static objects is liable to be non-portable and should probably be avoided.

If you really want to know the final order I would recommend you to create a class whose constructor logs the current timestamp and create several static instances of the class in each of your cpp files so that you could know the final order of initialization. Make sure to put some little time consuming operation in the constructor just so you don't get the same time stamp for each file.

Related

C++ Give compiler error for setting a (static) const global variable to another static const variable

Is it possible for clang to give a compiler error if you accidentally set a const global variable to another static const variable in C++ (in different translation units).
Since the behaviour is pretty much undefined, it would be very useful to detect if doing this accidentally.
EDIT: My question is different from the one linked above, since I'm looking for a compiler warning/error message to force me NOT to assign any static global variable to another static variable. I basically want to be forced by the compiler to avoid the whole fiasco. I'm wondering if that's possible.
I don't think you can throw an error automatically.
I've had issues like that, and here's what I've done to fix them. You can add some global bool globalStaticsDone variable that's set to false at compile time, and on entry to main you need to set that variable to true.
Then, if you have some code anywhere that you suspect gets called from global ctors you spice it up with assert(globalStaticsDone) (or c++ throws if you prefer) to catch unexpected uses of of these objects. Then, you go and fix these uses.
In general, in complex projects that's a common problem, where some non-trivial objects are created as global statics and end up using some other globals that might not have been initialized yet. Problem gets worse if your project is cross platform and order of compilation and linking is different on your target platforms. For example on ios and android builds might have these differences: in such cases it might be undefined behavior on one build and ok on another leading to some mysterious errors.
As an alternative, some compilers might offer uninitialized read checks in debug builds.

Is it a fixed order that all global variables are initialized prior to main()? [duplicate]

C++ guarantees that variables in a compilation unit (.cpp file) are initialised in order of declaration. For number of compilation units this rule works for each one separately (I mean static variables outside of classes).
But, the order of initialization of variables, is undefined across different compilation units.
Where can I see some explanations about this order for gcc and MSVC (I know that relying on that is a very bad idea - it is just to understand the problems that we may have with legacy code when moving to new GCC major and different OS)?
As you say the order is undefined across different compilation units.
Within the same compilation unit the order is well defined: The same order as definition.
This is because this is not resolved at the language level but at the linker level. So you really need to check out the linker documentation. Though I really doubt this will help in any useful way.
For gcc: Check out ld
I have found that even changing the order of objects files being linked can change the initialization order. So it is not just your linker that you need to worry about, but how the linker is invoked by your build system. Even try to solve the problem is practically a non starter.
This is generally only a problem when initializing globals that reference each other during their own initialization (so only affects objects with constructors).
There are techniques to get around the problem.
Lazy initialization.
Schwarz Counter
Put all complex global variables inside the same compilation unit.
Note 1: globals:
Used loosely to refer to static storage duration variables that are potentially initialized before main().
Note 2: Potentially
In the general case we expect static storage duration variables to be initialized before main, but the compiler is allowed to defer initialization in some situations (the rules are complex see standard for details).
I expect the constructor order between modules is mainly a function of what order you pass the objects to the linker.
However, GCC does let you use init_priority to explicitly specify the ordering for global ctors:
class Thingy
{
public:
Thingy(char*p) {printf(p);}
};
Thingy a("A");
Thingy b("B");
Thingy c("C");
outputs 'ABC' as you'd expect, but
Thingy a __attribute__((init_priority(300))) ("A");
Thingy b __attribute__((init_priority(200))) ("B");
Thingy c __attribute__((init_priority(400))) ("C");
outputs 'BAC'.
Since you already know that you shouldn't rely on this information unless absolutely necessary, here it comes. My general observation across various toolchains (MSVC, gcc/ld, clang/llvm, etc) is that the order in which your object files are passed to the linker is the order in which they will be initialized.
There are exceptions to this, and I do not claim to all of them, but here are the ones I ran into myself:
1) GCC versions prior to 4.7 actually initialize in the reverse order of the link line. This ticket in GCC is when the change happened, and it broke a lot of programs that depended on initialization order (including mine!).
2) In GCC and Clang, usage of constructor function priority can alter the initialization order. Note that this only applies to functions that are declared to be "constructors" (i.e. they should be run just like a global object constructor would be). I have tried using priorities like this and found that even with highest priority on a constructor function, all constructors without priority (e.g. normal global objects, constructor functions without priority) will be initialized first. In other words, the priority is only relative to other functions with priorities, but the real first class citizens are those without priority. To make it worse, this rule is effectively the opposite in GCC prior to 4.7 due to point (1) above.
3) On Windows, there is a very neat and useful shared-library (DLL) entry-point function called DllMain(), which if defined, will run with parameter "fdwReason" equal to DLL_PROCESS_ATTACH directly after all global data has been initialized and before the consuming application has a chance to call any functions on the DLL. This is extremely useful in some cases, and there absolutely is not analogous behavior to this on other platforms with GCC or Clang with C or C++. The closest you will find is making a constructor function with priority (see above point (2)), which absolutely is not the same thing and won't work for many of the use cases that DllMain() works for.
4) If you are using CMake to generate your build systems, which I often do, I have found that the order of the input source files will be the order of their resultant object files given to the linker. However, often times your application/DLL is also linking in other libraries, in which case those libraries will be on the link line after your input source files. If you are looking to have one of your global objects be the very first one to initialize, then you are in luck and your can put the source file containing that object to be the first in the list of source files. However, if you are looking to have one be the very last one to initialize (which can effectively replicate DllMain() behavior!) then you can make a call to add_library() with that one source file to produce a static library, and add the resulting static library as the very last link dependency in your target_link_libraries() call for your application/DLL. Be wary that your global object may get optimized out in this case and you can use the --whole-archive flag to force the linker not to remove unused symbols for that specific tiny archive file.
Closing Tip
To absolutely know the resulting initialization order of your linked application/shared-library, pass --print-map to ld linker and grep for .init_array (or in GCC prior to 4.7, grep for .ctors). Every global constructor will be printed in the order that it will get initialized, and remember that the order is opposite in GCC prior to 4.7 (see point (1) above).
The motivating factor for writing this answer is that I needed to know this information, had no other choice but to rely on initialization order, and found only sparse bits of this information throughout other SO posts and internet forums. Most of it was learned through much experimentation, and I hope that this saves some people the time of doing that!
http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.12 - this link moves around. this one is more stable but you will have to look around for it.
edit: osgx supplied a better link.
A robust solution is to use a getter function that returns a reference to an static variable. A simple example is shown below, a complex variant in our SDG Controller middleware.
// Foo.h
class Foo {
public:
Foo() {}
static bool insertIntoBar(int number);
private:
static std::vector<int>& getBar();
};
// Foo.cpp
std::vector<int>& Foo::getBar() {
static std::vector<int> bar;
return bar;
}
bool Foo::insertIntoBar(int number) {
getBar().push_back(number);
return true;
}
// A.h
class A {
public:
A() {}
private:
static bool a1;
};
// A.cpp
bool A::a1 = Foo::insertIntoBar(22);
The initialization would being with the only static member variable bool A::a1. This would then call Foo::insertIntoBar(22). This would then call Foo::getBar() in which the initialization of the static std::vector<int> variable would occur before returning a reference to the initialized object.
If the static std::vector<int> bar were placed directly as a member variable of the Foo class, there would be a possibility, depending on the naming ordering of the source files, that bar would be initialized after insertIntoBar() were called, thereby crashing the program.
If multiple static member variables would call insertIntoBar() during their initialization, the order would not be dependent on the names of the source files, i.e., random, but the std::vector<int> would be guaranteed to be initialized before any values be inserted into it.
In addition to Martin's comments, coming from a C background, I always think of static variables as part of the program executable, incorporated and allocated space in the data segment. Thus static variables can be thought of as being initialised as the program loads, prior to any code being executed. The exact order in which this happens can be ascertained by looking at the data segment of map file output by the linker, but for most intents and purposes the initialisation is simultaeneous.
Edit: Depending on construction order of static objects is liable to be non-portable and should probably be avoided.
If you really want to know the final order I would recommend you to create a class whose constructor logs the current timestamp and create several static instances of the class in each of your cpp files so that you could know the final order of initialization. Make sure to put some little time consuming operation in the constructor just so you don't get the same time stamp for each file.

Using constructor of static data to perform work before main()

Our system has a plugin-based architecture with each module effectively having a 'main' function. I need to have a small piece of code run before a module's main() is invoked. I've had success putting the code in the constructor of a dummy class, then declaring one static variable of that class, eg:
namespace {
class Dummy {
public:
Dummy() { /* do work here */ }
};
Dummy theDummy;
}
void main() {...}
This seems to work well, but is it a valid solution in terms of the compiler guaranteeing the code will run? Is there any chance it could detect that theDummy is not referenced anywhere else in the system and compile/link it away completely, or will it realise that the constructor needs to run? Thanks
This seems to work well, but is it a valid solution in terms of the compiler guaranteeing the code will run? Is there any chance it could detect that theDummy is not referenced anywhere else in the system and compile/link it away completely, or will it realise that the constructor needs to run?
See n3797 S3.7.1/2:
If a variable with static storage duration has initialization or a destructor with side effects, it shall not be eliminated even if it appears to be unused,
Yes, the initialisation has to run. It cannot be simply omitted.
See S3.6.2/4:
It is implementation-defined whether the dynamic initialization of a non-local variable with static storage duration is done before the first statement of main. If the initialization is deferred to some point in time after the first statement of main, it shall occur before the first odr-use (3.2) of any function or variable defined in the same translation unit as the variable to be initialized.
Yes, the initialisation has to be completed before any code runs in the same translation unit.
The use of an entry point called main() in your plugin is of no particular importance.
You're good to go.
As per a comment, you do need to make sure that your Dummy constructor and your main function are in the same translation unit for this to work. If they were compiled separately and only linked together this guarantee would not apply.
Don't call your function main() unless it is a program entry point. If it is, then you are guaranteed that static object constructors will be called before main().
Once main has started, it's guaranteed to run before any function or variable in the same translation unit is used.
So if, as here, it's in the same translation unit as main, then it's guaranteed to run before main. If it's in another transation unit, then it's implementation-defined whether it will be run before main. In the worst case, if the program doesn't use anything from the same translation unit, it might not run at all.
In general, a compiler is allowed to optimize something out only if it can be sure that the semantics are the same. So if you call any function that it can't see into, for example, then it must assume that the function has side effects, and won't optimize the code out.
Note that you may have initialization order issues between translation units, however, since the initialization order of static objects between TUs is in general not guaranteed. It is, however, guaranteed that the constructor will be called before the "main" for your module is entered (assuming same TU). See Section 3.6.2 of the C++11 standard for full details.
If a platform-specific mechanism will work for you, look into using a function attribute, which is supported by g++ and clang++.

What is the "munch"? How it was used with cfront

What was the munch library (or program?) from cfront package?
What is was used for?
Munch was used to scan the nm output and look for static constructors/destructors.
See the code (with comments) at SoftwarePreservation.com.
"Munching" is not specific to cfront.
In C++, your constructors and destructors are called implicitly based on their lifetime.
Global static objects have a lifetime of the whole program.
If you have multiple global static objects across various translation units,
who gets constructed first?
That is up to the compiler, and it is often dubbed The Static Initialization Order Fiasco.
Normally you wouldn't care about construction order, but if your objects have lifetime dependencies on each other, it can end up biting you. See the following FAQ on idiomatically avoiding SIOF.
With all that being said, the "muncher" is a c code generator.
You give all of your c++ object files to the name mangler,
and forward the name mangler's output into the muncher.
*.o -> nm -> muncher -> *.c
Typical implementations will generate two function tables
(an array of function pointers to the constructors and destructors)
+ a bit of assembly to make sure they get called.
There are times when you want to tell the linker specifically what order to initialize,
and that's where the muncher comes in.
Apparently it extracts initializers and finalizers.

C++ static global non-POD: theory and practice

I was reading the Qt coding conventions docs and came upon the following paragraph:
Anything that has a constructor or needs to run code to be initialized cannot be used as global object in library code, since it is undefined when that constructor/code will be run (on first usage, on library load, before main() or not at all). Even if the execution time of the initializer is defined for shared libraries, you’ll get into trouble when moving that code in a plugin or if the library is compiled statically.
I know what the theory says, but I don't understand the "not at all" part. Sometimes I use non-POD global const statics (e.g: QString) and it never occured to me that they might not be initialized... Is this specific to shared objects / DLLs? Does this happen for broken compilers only?
What do you think about this rule?
The "not at all" part simply says that the C++ standard is silent about this issue. It doesn't know about shared libraries and thus doesn't says anything about the interaction of certain C++ features with these.
In practice, I have seen global non-POD static globals used on Windows, OSX, and many versions of Linux and other Unices, both in GUI and command line programs, as plugins and as standalone applications. At least one project (which used non-POD static globals) had versions for the full set of all combinations of these. The only problem I have ever seen was that some very old GCC version generated code that called the dtors of such objects in dynamic libraries when the executable stopped, not when the library was unloaded. Of course, that was fatal (the library code was called when the library was already gone), but that has been almost a decade ago.
But of course, this still doesn't guarantee anything.
If the static object is defined in an object that does not get referenced, the linker can prune the object completely, including the static initializer code. It will do so regularly for libs (that's how the libc does not get completely linked in when using parts of it under gnu, e.g.).
Interestingly, I don't think this is specific to libraries. It can probably happen for objects even in the main build.
I see no problem with having global objects with constructors.
They should just not have any dependency on other global objects in their constructor (or destructor).
But if they do have dependencies then the dependent object must either by in the same compilation unit or be lazily evaluated so you can force it to be evaluated before you use it.
The code in the constructor should also not be dependent on when (this is related to dependencies but not quite the same) it is executed, But you are safe to assume that it will get constructed at the very least (just before a method is called) and C++ guarantees the destruction order is the reverse of instantiation.
Its not so difficult to stick to these rules.
I don't think static objects constructors can be elided. There is probably a confusion with the fact that a static library is often just a bunch of objects which are token in the executable if they are referenced. Some static objects are designed so that they aren't referenced outside their containing object, and so the object file is put in the executable only if there is another dependency on them. This is not the case in some patterns (using a static object which register itself for instance).
C++ doesn't define the order that static initializers execute for objects in different compilations units (the ordering is well defined within a compilation unit).
Consider the situation where you have 2 static objects A and B defined in different compilation units. Let's say that object B actually uses object A in it's initialization.
In this scenario it's possible that B will be initialized first and make a call against an uninitialized A object. This might be one thing that is meant by "not at all" - an object is being used when it hasn't had an opportunity to initialize it self first (even if it might be initialized later).
I suppose that dynamic linking might add complexities that I haven't thought of that cuase an object to never be initialized. Either way, the bottom line is that static initializatino introduces enough potential issues that it should be avoided where possible and very carefully handled where you have to use it.