What is the "munch"? How it was used with cfront - c++

What was the munch library (or program?) from cfront package?
What is was used for?

Munch was used to scan the nm output and look for static constructors/destructors.
See the code (with comments) at SoftwarePreservation.com.

"Munching" is not specific to cfront.
In C++, your constructors and destructors are called implicitly based on their lifetime.
Global static objects have a lifetime of the whole program.
If you have multiple global static objects across various translation units,
who gets constructed first?
That is up to the compiler, and it is often dubbed The Static Initialization Order Fiasco.
Normally you wouldn't care about construction order, but if your objects have lifetime dependencies on each other, it can end up biting you. See the following FAQ on idiomatically avoiding SIOF.
With all that being said, the "muncher" is a c code generator.
You give all of your c++ object files to the name mangler,
and forward the name mangler's output into the muncher.
*.o -> nm -> muncher -> *.c
Typical implementations will generate two function tables
(an array of function pointers to the constructors and destructors)
+ a bit of assembly to make sure they get called.
There are times when you want to tell the linker specifically what order to initialize,
and that's where the muncher comes in.

Apparently it extracts initializers and finalizers.

Related

Is it a fixed order that all global variables are initialized prior to main()? [duplicate]

C++ guarantees that variables in a compilation unit (.cpp file) are initialised in order of declaration. For number of compilation units this rule works for each one separately (I mean static variables outside of classes).
But, the order of initialization of variables, is undefined across different compilation units.
Where can I see some explanations about this order for gcc and MSVC (I know that relying on that is a very bad idea - it is just to understand the problems that we may have with legacy code when moving to new GCC major and different OS)?
As you say the order is undefined across different compilation units.
Within the same compilation unit the order is well defined: The same order as definition.
This is because this is not resolved at the language level but at the linker level. So you really need to check out the linker documentation. Though I really doubt this will help in any useful way.
For gcc: Check out ld
I have found that even changing the order of objects files being linked can change the initialization order. So it is not just your linker that you need to worry about, but how the linker is invoked by your build system. Even try to solve the problem is practically a non starter.
This is generally only a problem when initializing globals that reference each other during their own initialization (so only affects objects with constructors).
There are techniques to get around the problem.
Lazy initialization.
Schwarz Counter
Put all complex global variables inside the same compilation unit.
Note 1: globals:
Used loosely to refer to static storage duration variables that are potentially initialized before main().
Note 2: Potentially
In the general case we expect static storage duration variables to be initialized before main, but the compiler is allowed to defer initialization in some situations (the rules are complex see standard for details).
I expect the constructor order between modules is mainly a function of what order you pass the objects to the linker.
However, GCC does let you use init_priority to explicitly specify the ordering for global ctors:
class Thingy
{
public:
Thingy(char*p) {printf(p);}
};
Thingy a("A");
Thingy b("B");
Thingy c("C");
outputs 'ABC' as you'd expect, but
Thingy a __attribute__((init_priority(300))) ("A");
Thingy b __attribute__((init_priority(200))) ("B");
Thingy c __attribute__((init_priority(400))) ("C");
outputs 'BAC'.
Since you already know that you shouldn't rely on this information unless absolutely necessary, here it comes. My general observation across various toolchains (MSVC, gcc/ld, clang/llvm, etc) is that the order in which your object files are passed to the linker is the order in which they will be initialized.
There are exceptions to this, and I do not claim to all of them, but here are the ones I ran into myself:
1) GCC versions prior to 4.7 actually initialize in the reverse order of the link line. This ticket in GCC is when the change happened, and it broke a lot of programs that depended on initialization order (including mine!).
2) In GCC and Clang, usage of constructor function priority can alter the initialization order. Note that this only applies to functions that are declared to be "constructors" (i.e. they should be run just like a global object constructor would be). I have tried using priorities like this and found that even with highest priority on a constructor function, all constructors without priority (e.g. normal global objects, constructor functions without priority) will be initialized first. In other words, the priority is only relative to other functions with priorities, but the real first class citizens are those without priority. To make it worse, this rule is effectively the opposite in GCC prior to 4.7 due to point (1) above.
3) On Windows, there is a very neat and useful shared-library (DLL) entry-point function called DllMain(), which if defined, will run with parameter "fdwReason" equal to DLL_PROCESS_ATTACH directly after all global data has been initialized and before the consuming application has a chance to call any functions on the DLL. This is extremely useful in some cases, and there absolutely is not analogous behavior to this on other platforms with GCC or Clang with C or C++. The closest you will find is making a constructor function with priority (see above point (2)), which absolutely is not the same thing and won't work for many of the use cases that DllMain() works for.
4) If you are using CMake to generate your build systems, which I often do, I have found that the order of the input source files will be the order of their resultant object files given to the linker. However, often times your application/DLL is also linking in other libraries, in which case those libraries will be on the link line after your input source files. If you are looking to have one of your global objects be the very first one to initialize, then you are in luck and your can put the source file containing that object to be the first in the list of source files. However, if you are looking to have one be the very last one to initialize (which can effectively replicate DllMain() behavior!) then you can make a call to add_library() with that one source file to produce a static library, and add the resulting static library as the very last link dependency in your target_link_libraries() call for your application/DLL. Be wary that your global object may get optimized out in this case and you can use the --whole-archive flag to force the linker not to remove unused symbols for that specific tiny archive file.
Closing Tip
To absolutely know the resulting initialization order of your linked application/shared-library, pass --print-map to ld linker and grep for .init_array (or in GCC prior to 4.7, grep for .ctors). Every global constructor will be printed in the order that it will get initialized, and remember that the order is opposite in GCC prior to 4.7 (see point (1) above).
The motivating factor for writing this answer is that I needed to know this information, had no other choice but to rely on initialization order, and found only sparse bits of this information throughout other SO posts and internet forums. Most of it was learned through much experimentation, and I hope that this saves some people the time of doing that!
http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.12 - this link moves around. this one is more stable but you will have to look around for it.
edit: osgx supplied a better link.
A robust solution is to use a getter function that returns a reference to an static variable. A simple example is shown below, a complex variant in our SDG Controller middleware.
// Foo.h
class Foo {
public:
Foo() {}
static bool insertIntoBar(int number);
private:
static std::vector<int>& getBar();
};
// Foo.cpp
std::vector<int>& Foo::getBar() {
static std::vector<int> bar;
return bar;
}
bool Foo::insertIntoBar(int number) {
getBar().push_back(number);
return true;
}
// A.h
class A {
public:
A() {}
private:
static bool a1;
};
// A.cpp
bool A::a1 = Foo::insertIntoBar(22);
The initialization would being with the only static member variable bool A::a1. This would then call Foo::insertIntoBar(22). This would then call Foo::getBar() in which the initialization of the static std::vector<int> variable would occur before returning a reference to the initialized object.
If the static std::vector<int> bar were placed directly as a member variable of the Foo class, there would be a possibility, depending on the naming ordering of the source files, that bar would be initialized after insertIntoBar() were called, thereby crashing the program.
If multiple static member variables would call insertIntoBar() during their initialization, the order would not be dependent on the names of the source files, i.e., random, but the std::vector<int> would be guaranteed to be initialized before any values be inserted into it.
In addition to Martin's comments, coming from a C background, I always think of static variables as part of the program executable, incorporated and allocated space in the data segment. Thus static variables can be thought of as being initialised as the program loads, prior to any code being executed. The exact order in which this happens can be ascertained by looking at the data segment of map file output by the linker, but for most intents and purposes the initialisation is simultaeneous.
Edit: Depending on construction order of static objects is liable to be non-portable and should probably be avoided.
If you really want to know the final order I would recommend you to create a class whose constructor logs the current timestamp and create several static instances of the class in each of your cpp files so that you could know the final order of initialization. Make sure to put some little time consuming operation in the constructor just so you don't get the same time stamp for each file.

How function declared __declspec(naked) stores local variables?

__declspec(naked) void printfive() {
int i = 5;
printf("%i\n", i);
}
for some reason this code works, but I do not understand where the i is stored? In the frame of the calling function? It becomes global variable? If it is stored in the caller's frame, then how compiler knows the displacement, because you can call printfive() from different functions with different frame size and local variables. If it is global, or something like static maybe, I have tried to go recursive and I can see that variable is not changed, it is not truly local indeed. But that's obvious, there is no entry code (prolog). Ok, I understand, no prolog, no frame, no register change, but this is values, but what happens to the scope? Is the behaviour of this specifier defined in any reference? Is this part of C++ standard anyway? This sort of functions are great if you mostly use asm {} inside them, (or use asm to call them and want to be sure the function is not over optimized), but you can mix with C++. But this is sort of brain twister.
I know this topic is more than several years old, Here are my own answers.
Since no references are made to Microsoft documentation regarding this topic, for those who care to learn more about what is needed or not as stated by Keltar, here is the Microsoft documentation that explains most of what Keltar didn't explain here.
According to Microsoft documentation, and should be avoided.
The following rules and limitations apply to naked functions:
The return statement is not permitted.
Structured Exception Handling and C++ Exception Handling constructs are not permitted because they must unwind across the stack frame.
For the same reason, any form of setjmp is prohibited
Use of the _alloca function is prohibited.
To ensure that no initialization code for local variables appears before the prolog sequence, initialized local variables are not
permitted at function scope. In particular, the declaration of C++
objects are not permitted at function scope. There may, however, be
initialized data in a nested scope.
Frame pointer optimization (the /Oy compiler option) is not recommended, but it is automatically suppressed for a naked function.
You cannot declare C++ class objects at the function lexical scope. You can, however, declare objects in a nested block.
From gcc manual:
Use this attribute ... to indicate that the specified function does
not need prologue/epilogue sequences generated by the compiler. It is
up to the programmer to provide these sequences. The only statements
that can be safely included in naked functions are asm statements that
do not have operands. All other statements, including declarations of
local variables, if statements, and so forth, should be avoided. Naked
functions should be used to implement the body of an assembly
function, while allowing the compiler to construct the requisite
function declaration for the assembler.
And it isn't standard (as well as any __declspec or __attribute__)
When entering or exiting a function, the compiler adds code to help with the passing or parameters. When a function is declared naked, non of the parameter variables assignment code is generated, if you want to get at any of the parameters, you will need to directly access the relevant registers or stack (depending on the ABI defined calling convention).
In your case, you are not passing parameters to the function, therefore your code works even though the function is declared naked. Take a look at the dis-assembler if you want to see the difference.

Is it possible to strip type names from executable while keeping RTTI enabled?

I recently disabled RTTI on my compiler (MSVC10) and the executable size decreased significantly. By comparing the produced executables using a text editor, I found that the RTTI-less version contains much less symbol names, explaining the saved space.
AFAIK, those symbol names are only used to fill the type_info structure associated with each the polymorphic type, and one can programmatically access them calling type_info::name().
According to the standard, the format of the string returned by type_info::name() is unspecified. That is, no one can rely one it to do serious things portably. So, it should be possible for an implementation to always return an empty string without breaking anything, thus reducing the executable size without disabling RTTI support (so we can still use the typeid operator & compare type_info's objects safely).
But... is it possible ? I'm using MSVC10 and I've not found any option to do that. I can either disable completely RTTI (/GR-), or enable it with full type names (/GR). Does any compiler provide such an option?
So, it should be possible for an implementation to always return an empty string without breaking anything, thus reducing the executable size without disabling RTTI support (so we can still use the typeid operator & compare type_info's objects safely).
You are misreading the standard. The intent of making the return value from type_info::name() unspecified (other than a null-terminated binary string) was to give the implementers of the compiler/library/run-time environment free reign to implement the RTTI requirements as they see best. You, the programmer, have no say in how the Application Binary Interface (if there is one) is designed or implemented.
You're asking three different questions here.
The initial question asks whether there's any way to get MSVC to not generate names, or whether it's possible with other compilers, or, failing that, whether there's any way to strip the names out of the generated type_info without breaking things.
Then you want to know whether it would be possible to modify the MS ABI (presumably not too radically) so that it would be possible to strip the names.
Finally, you want to know whether it would be possible to design an ABI that didn't have names.
Question #1 is itself a complex question. As far as I know, there's no way to get MSVC to not generate names. And most other compilers are aimed at ABIs that specifically define what typeid(foo).name() must return, so they also can't be made to not generate names.
The more interesting question is, what happens if you strip out the names. For MSVC, I don't know the answer. The best thing to do here is probably to try it—go into your DLLs and change the first character of each name to \0 and see if it breaks dynamic_cast, etc. (I know that you can do this with Mac and linux x86_64 executables generated by g++ 4.2 and it works, but let's put that aside for now.)
On to question #2, assuming blanking the names doesn't work, it wouldn't be that hard to modify a name-based system to no longer require names. One trivial solution is to use hashes of the names, or even ROT13-encoded names (remember that the original goal here is "I don't want casual users to see the embarrassing names of my classes"). But I'm not sure that would count for what you're looking for. A slightly more complex solution is as follows:
For "dllexport"ed classes, generate a UUID, put that in the typeinfo, and also put it in the .LIB import library that gets generated along with the DLL.
For "dllimport"ed classes, read the UUID out of the .LIB and use that instead.
So, if you manage to get the dllexport/dllimport right, it will work, because your exe will be using the same UUID as the dll. But what if you don't? What if you "accidentally" specify identical classes (e.g., an instantiation of the same template with the same parameters) in your DLL and your EXE, without marking one as dllexport and one as dllimport? RTTI won't see them as the same type.
Is this a problem? Well, the C++ standard doesn't say it is. And neither does any MS documentation. In fact, the documentation explicitly says that you're not allowed to do this. You cannot use the same class or function in two different modules unless you explicitly export it from one module and import it into another. The fact that this is very hard to do with class templates is a problem, and it's a problem they don't try to solve.
Let's take a realistic example: Create a node-based linkedlist class template with a global static sentinel, where every list's last node points to that sentinel, and the end() function just returns a pointer to it. (Microsoft's own implementation of std::map used to do exactly this; I'm not sure if that's still true.) New up a linkedlist<int> in your exe, and pass it by reference to a function in your dll that tries to iterate from l.begin() to l.end(). It will never finish, because none of the nodes created by the exe will point to the copy of the sentinel in the dll. Of course if you pass l.begin() and l.end() into the DLL, instead of passing l itself, you won't have this problem. You can usually get away with passing a std::string or various other types by reference, just because they don't depend on anything that breaks. But you're not actually allowed to do so, you're just getting lucky. So, while replacing the names with UUIDs that have to be looked up at link time means types can't be matched up at link-loader time, the fact that types already can't be matched up at link-loader time means this is irrelevant.
It would be possible to build a name-based system that didn't have these problems. The ARM C++ ABI (and the iOS and Android ABIs based on it) restricts what programmers can get away with much less than MS, and has very specific requirements on how the link-loader has to make it work (3.2.5). This one couldn't be modified to not be name-based because it was an explicit choice in the design that:
• type_info::operator== and type_info::operator!= compare the strings returned by type_info::name(), not just the pointers to the RTTI objects and their names.
• No reliance is placed on the address returned by type_info::name(). (That is, t1.name() != t2.name() does not imply that t1 != t2).
The first condition effectively requires that these operators (and type_info::before()) must be called out of line, and that the execution environment must provide appropriate implementations of them.
But it's also possible to build an ABI that doesn't have this problem and that doesn't use names. Which segues nicely to #3.
The Itanium ABI (used by, among other things, both OS X and recent linux on x86_64 and i386) does guarantee that a linkedlist<int> generated in one object and a linkedlist<int> generated from the same header in another object can be linked together at runtime and will be the same type, which means they must have equal type_info objects. From 2.9.1:
It is intended that two type_info pointers point to equivalent type descriptions if and only if the pointers are equal. An implementation must satisfy this constraint, e.g. by using symbol preemption, COMDAT sections, or other mechanisms.
The compiler, linker, and link-loader must work together to make sure that a linkedlist<int> created in your executable points to the exact same type_info object that a linkedlist<int> created in your shared object would.
So, if you just took out all the names, it wouldn't make any difference at all. (And this is pretty easily tested and verified.)
But how could you possibly implement this ABI spec? j_kubik effectively argues that it's impossible because you'd have to preserve some link-time information in the .so files. Which points to the obvious answer: preserve some link-time information in the .so files. In fact, you already have to do that to handle, e.g., load-time relocations; this just extends what you need to preserve. And in fact, both Apple and GNU/linux/g++/ELF do exactly that. (This is part of the reason everyone building complex linux systems had to learn about symbol visibility and vague linkage a few years ago.)
There's an even more obvious way to solve the problem: Write a C++-based link loader, instead of trying to make the C++ compiler and linker work together to trick a C-based link loader. But as far as I know, nobody's tried that since Be.
Requirements for type-descriptor:
Works correctly in multi compilation-unit and shared-library environment;
Works correctly for different versions of shared libraries;
Works correctly although different compilation units don't share any information about type, except it's name: usually one header is used for all compilation units to define same type, but it's not required; even if, it doesn't affect resulting object file.
Work correctly despite fact that template instantiations must be fully-defined (so including type_info data) in every library that uses them, and yet behave like one type if several such libs are used together.
The fourth rule essentially bans all non-name based type-descriptors like UUIDs (unless specifically mentioned in type definition, but that is just name-replacement at best, and probably requires standard-alterations).
Stroing thuse UUIDs in separate files like suggeste .LIB files also causes trouble: different library versions implementing new types would cause trouble.
Compilation units should be able to share the same type (and its type_info) without the need to involve linker - because it should stay free of any language-specifics.
So type-name can be only unique type descriptor without completely re-modeling compilation and linking (also dynamic). I could imagine it working, but not under current scheme.

C++ static global non-POD: theory and practice

I was reading the Qt coding conventions docs and came upon the following paragraph:
Anything that has a constructor or needs to run code to be initialized cannot be used as global object in library code, since it is undefined when that constructor/code will be run (on first usage, on library load, before main() or not at all). Even if the execution time of the initializer is defined for shared libraries, you’ll get into trouble when moving that code in a plugin or if the library is compiled statically.
I know what the theory says, but I don't understand the "not at all" part. Sometimes I use non-POD global const statics (e.g: QString) and it never occured to me that they might not be initialized... Is this specific to shared objects / DLLs? Does this happen for broken compilers only?
What do you think about this rule?
The "not at all" part simply says that the C++ standard is silent about this issue. It doesn't know about shared libraries and thus doesn't says anything about the interaction of certain C++ features with these.
In practice, I have seen global non-POD static globals used on Windows, OSX, and many versions of Linux and other Unices, both in GUI and command line programs, as plugins and as standalone applications. At least one project (which used non-POD static globals) had versions for the full set of all combinations of these. The only problem I have ever seen was that some very old GCC version generated code that called the dtors of such objects in dynamic libraries when the executable stopped, not when the library was unloaded. Of course, that was fatal (the library code was called when the library was already gone), but that has been almost a decade ago.
But of course, this still doesn't guarantee anything.
If the static object is defined in an object that does not get referenced, the linker can prune the object completely, including the static initializer code. It will do so regularly for libs (that's how the libc does not get completely linked in when using parts of it under gnu, e.g.).
Interestingly, I don't think this is specific to libraries. It can probably happen for objects even in the main build.
I see no problem with having global objects with constructors.
They should just not have any dependency on other global objects in their constructor (or destructor).
But if they do have dependencies then the dependent object must either by in the same compilation unit or be lazily evaluated so you can force it to be evaluated before you use it.
The code in the constructor should also not be dependent on when (this is related to dependencies but not quite the same) it is executed, But you are safe to assume that it will get constructed at the very least (just before a method is called) and C++ guarantees the destruction order is the reverse of instantiation.
Its not so difficult to stick to these rules.
I don't think static objects constructors can be elided. There is probably a confusion with the fact that a static library is often just a bunch of objects which are token in the executable if they are referenced. Some static objects are designed so that they aren't referenced outside their containing object, and so the object file is put in the executable only if there is another dependency on them. This is not the case in some patterns (using a static object which register itself for instance).
C++ doesn't define the order that static initializers execute for objects in different compilations units (the ordering is well defined within a compilation unit).
Consider the situation where you have 2 static objects A and B defined in different compilation units. Let's say that object B actually uses object A in it's initialization.
In this scenario it's possible that B will be initialized first and make a call against an uninitialized A object. This might be one thing that is meant by "not at all" - an object is being used when it hasn't had an opportunity to initialize it self first (even if it might be initialized later).
I suppose that dynamic linking might add complexities that I haven't thought of that cuase an object to never be initialized. Either way, the bottom line is that static initializatino introduces enough potential issues that it should be avoided where possible and very carefully handled where you have to use it.

Static variables initialisation order

C++ guarantees that variables in a compilation unit (.cpp file) are initialised in order of declaration. For number of compilation units this rule works for each one separately (I mean static variables outside of classes).
But, the order of initialization of variables, is undefined across different compilation units.
Where can I see some explanations about this order for gcc and MSVC (I know that relying on that is a very bad idea - it is just to understand the problems that we may have with legacy code when moving to new GCC major and different OS)?
As you say the order is undefined across different compilation units.
Within the same compilation unit the order is well defined: The same order as definition.
This is because this is not resolved at the language level but at the linker level. So you really need to check out the linker documentation. Though I really doubt this will help in any useful way.
For gcc: Check out ld
I have found that even changing the order of objects files being linked can change the initialization order. So it is not just your linker that you need to worry about, but how the linker is invoked by your build system. Even try to solve the problem is practically a non starter.
This is generally only a problem when initializing globals that reference each other during their own initialization (so only affects objects with constructors).
There are techniques to get around the problem.
Lazy initialization.
Schwarz Counter
Put all complex global variables inside the same compilation unit.
Note 1: globals:
Used loosely to refer to static storage duration variables that are potentially initialized before main().
Note 2: Potentially
In the general case we expect static storage duration variables to be initialized before main, but the compiler is allowed to defer initialization in some situations (the rules are complex see standard for details).
I expect the constructor order between modules is mainly a function of what order you pass the objects to the linker.
However, GCC does let you use init_priority to explicitly specify the ordering for global ctors:
class Thingy
{
public:
Thingy(char*p) {printf(p);}
};
Thingy a("A");
Thingy b("B");
Thingy c("C");
outputs 'ABC' as you'd expect, but
Thingy a __attribute__((init_priority(300))) ("A");
Thingy b __attribute__((init_priority(200))) ("B");
Thingy c __attribute__((init_priority(400))) ("C");
outputs 'BAC'.
Since you already know that you shouldn't rely on this information unless absolutely necessary, here it comes. My general observation across various toolchains (MSVC, gcc/ld, clang/llvm, etc) is that the order in which your object files are passed to the linker is the order in which they will be initialized.
There are exceptions to this, and I do not claim to all of them, but here are the ones I ran into myself:
1) GCC versions prior to 4.7 actually initialize in the reverse order of the link line. This ticket in GCC is when the change happened, and it broke a lot of programs that depended on initialization order (including mine!).
2) In GCC and Clang, usage of constructor function priority can alter the initialization order. Note that this only applies to functions that are declared to be "constructors" (i.e. they should be run just like a global object constructor would be). I have tried using priorities like this and found that even with highest priority on a constructor function, all constructors without priority (e.g. normal global objects, constructor functions without priority) will be initialized first. In other words, the priority is only relative to other functions with priorities, but the real first class citizens are those without priority. To make it worse, this rule is effectively the opposite in GCC prior to 4.7 due to point (1) above.
3) On Windows, there is a very neat and useful shared-library (DLL) entry-point function called DllMain(), which if defined, will run with parameter "fdwReason" equal to DLL_PROCESS_ATTACH directly after all global data has been initialized and before the consuming application has a chance to call any functions on the DLL. This is extremely useful in some cases, and there absolutely is not analogous behavior to this on other platforms with GCC or Clang with C or C++. The closest you will find is making a constructor function with priority (see above point (2)), which absolutely is not the same thing and won't work for many of the use cases that DllMain() works for.
4) If you are using CMake to generate your build systems, which I often do, I have found that the order of the input source files will be the order of their resultant object files given to the linker. However, often times your application/DLL is also linking in other libraries, in which case those libraries will be on the link line after your input source files. If you are looking to have one of your global objects be the very first one to initialize, then you are in luck and your can put the source file containing that object to be the first in the list of source files. However, if you are looking to have one be the very last one to initialize (which can effectively replicate DllMain() behavior!) then you can make a call to add_library() with that one source file to produce a static library, and add the resulting static library as the very last link dependency in your target_link_libraries() call for your application/DLL. Be wary that your global object may get optimized out in this case and you can use the --whole-archive flag to force the linker not to remove unused symbols for that specific tiny archive file.
Closing Tip
To absolutely know the resulting initialization order of your linked application/shared-library, pass --print-map to ld linker and grep for .init_array (or in GCC prior to 4.7, grep for .ctors). Every global constructor will be printed in the order that it will get initialized, and remember that the order is opposite in GCC prior to 4.7 (see point (1) above).
The motivating factor for writing this answer is that I needed to know this information, had no other choice but to rely on initialization order, and found only sparse bits of this information throughout other SO posts and internet forums. Most of it was learned through much experimentation, and I hope that this saves some people the time of doing that!
http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.12 - this link moves around. this one is more stable but you will have to look around for it.
edit: osgx supplied a better link.
A robust solution is to use a getter function that returns a reference to an static variable. A simple example is shown below, a complex variant in our SDG Controller middleware.
// Foo.h
class Foo {
public:
Foo() {}
static bool insertIntoBar(int number);
private:
static std::vector<int>& getBar();
};
// Foo.cpp
std::vector<int>& Foo::getBar() {
static std::vector<int> bar;
return bar;
}
bool Foo::insertIntoBar(int number) {
getBar().push_back(number);
return true;
}
// A.h
class A {
public:
A() {}
private:
static bool a1;
};
// A.cpp
bool A::a1 = Foo::insertIntoBar(22);
The initialization would being with the only static member variable bool A::a1. This would then call Foo::insertIntoBar(22). This would then call Foo::getBar() in which the initialization of the static std::vector<int> variable would occur before returning a reference to the initialized object.
If the static std::vector<int> bar were placed directly as a member variable of the Foo class, there would be a possibility, depending on the naming ordering of the source files, that bar would be initialized after insertIntoBar() were called, thereby crashing the program.
If multiple static member variables would call insertIntoBar() during their initialization, the order would not be dependent on the names of the source files, i.e., random, but the std::vector<int> would be guaranteed to be initialized before any values be inserted into it.
In addition to Martin's comments, coming from a C background, I always think of static variables as part of the program executable, incorporated and allocated space in the data segment. Thus static variables can be thought of as being initialised as the program loads, prior to any code being executed. The exact order in which this happens can be ascertained by looking at the data segment of map file output by the linker, but for most intents and purposes the initialisation is simultaeneous.
Edit: Depending on construction order of static objects is liable to be non-portable and should probably be avoided.
If you really want to know the final order I would recommend you to create a class whose constructor logs the current timestamp and create several static instances of the class in each of your cpp files so that you could know the final order of initialization. Make sure to put some little time consuming operation in the constructor just so you don't get the same time stamp for each file.