C++ compilers implementing dynamic initialization after main - c++

The C++ standard section 3.6.2 paragraph 3 states that it is implementation-defined whether dynamic initialization of non-local objects occurs after the first statement of main().
Does anyone know what the rationale for this is, and which compilers postpone non-local object initialization this way? I am most familiar with g++, which performs these initializations before main() has been entered.
This question is related: Dynamic initialization phase of static variables
But I'm specifically asking what compilers are known to behave this way.
It may be that the only rationale for this paragraph is to support dynamic libraries loaded at runtime, but I do not think that the standard takes dynamic loading issues into consideration.

One of the reasons may be the following:
static char data[1000000000000000000000000000000];
void main(int argc)
{
if (argc > 0)
data[0] = 0;
}
It might be reasonable to allocate and init this static array only when it turns out that it is really needed. It might happen that some application were coming across something similar and had enough voice to convince the committee. In my own experience with C# I came across situation when static members of the class were not allocated right after jitting the class. They were allocated one by one, on the first use. In that case there was absolutely no justification for doing that. It was a plain disaster. Maybe they fixed this now.
Other reasons are possible also.

From the C++11 draft:
It is implementation-defined whether the dynamic initialization of a non-local variable with static storage
duration is done before the first statement of main. If the initialization is deferred to some point in time after the first statement of main, it shall occur before the first odr-use (3.2) of any function or variable defined in the same translation unit as the variable to be initialized. [emphasis mine]
That is, the static variable has to be initialized before any use of anything defined in the same translation unit.
It looks to me that it is done this way to allow dynamic libraries (DLLs or SOs) to be loaded and initialized lazily, or even dynamically (calling dlopen or LoadLibrary or whatever).
It is obvious that a variable defined in a DLL cannot be initialized before the DLL itself is loaded.
Naturally, C++ knows nothing about DLLs so there is no direct mention to them in the standard. But the people from the commitee do know about real environments and compilers, and certainly know about DLLs. Without this clause, lazy loading a DLL would technically violate the C++ specification. (Not that it would prevent implementators to do it anyway, but it is better if we all try to go along with each other.)
And about which systems support this, that I know of, at least the MS Visual C++ compiler supports lazy dynamic linking (the DLL will not even be loaded until first use). And most modern platforms support dynamic loading a DLL.

Related

The 'cppcoreguidelines-interfaces-global-init' goes wrong just in a specific scenario

Given the cppcoreguidelines-interfaces-global-init, specifically "initializing non-local variable with non-const expression depending on uninitialized non-local variable", which is exemplified here, I have the following scenario:
My team consists of 4 dev
We all have the same environment: VS2015
Everybody has the same VS project options
Our hardware is slightly different.
Then, I found a local static like below where the warning above ends up on a bad initialization.
static int GlobalScopeBadInit1 = ExternGlobal;
So far, so good - this is a bad init which might go wrong and we need to fix it.
The problem is: why does it go wrong just in my machine? No matter how hard we try - DEBUG or RELEASE, it just happens in my machine. We already cleaned up and deleted the files on other dev's machines and the code above goes wrong 100% of the times in my machine and 0% of the times on another dev's.
It doesn't happen on build machine either.
Does anybody know what could explain that behavior?
Thanks.
The issue that you are facing is caused by the static initialization order fiasco, which says in short that the order of construction and destruction of static and global variables across different compilation units is unspecified.
The fact that in your case the problem occurs (i.e. order of initialization causes a problem) only in a certain environment and not in another, is exactly what unspecified order means: it is affected by the order of compilation and it may be affected by compiler optimizations and other considerations that may be related to the environment. It may even be initialized concurrently in different threads (see: C++ spec [stmt.dcl] and a reason for why and when that may happen at the proper section in the original working doc dealing with that issue).
Is there a solution for the static initialization order fiasco?
Yes, there are several possible solutions.
The first solution may be a redesign.
Redesign option 1 - change the code so you would not have more than a single global object.
You may handle all the other "globals" inside the single actual global object. There are libraries which handle singletons in such a way, managing under the hood all singletons inside a single global SingletonManager. But since such a change may require quite a lot code changes, with the accompanied risks, you may need to consider the other options.
Redesign option 2 - use static or global functions instead of global variables - once your globals are retrieved from a static or global functions like the below the order of initialization is solved:
Boo& get_global_boo() {
static Boo b(get_global_foo());
return b;
}
Foo& get_global_foo() {
static Foo f(42);
return f;
}
You may compare this code example facing the initialization order fiasco to one which solves the issue with static methods. This approach is sometimes called "Meyers Singleton" on behalf of Scott Meyers who discussed this approach in his book "More Effective C++".
More on that approach can be found here and here.
Redesign option 3 - a more simple redesign approach would be to move all global variables to a single compilation unit. This approach requires less changes in your code and is probably less risky. But is not always possible.
Another solution is to use compiler specific options for setting the order of initialization - managing manually the order of static and global initializations can be done in Visual Studio with init_seg - a Visual Studio specific pragma allowing the developer to control the order of initializations. See: MSDN documentation and this blog post.
GCC has also its own attribute for that purpose - init_priority.
Last option is the best but most complicated - you may follow the way the C++ library does the trick for std::cout, which is a global object, yet guaranteed to be initialized whenever you need it, even in global context. This is done with the nifty counter idiom. You may read more about this idiom in the C++ Idioms wiki and also at this SO question. Note that the nifty counter idiom is not a solution for all cases of std::cout usage in global context, and in quite rare cases there is a need to "help" it work correctly with a statement like: static std::ios_base::Init force_init; see this SO post on that.
For an additional discussion of the issue see also: Static variables initialisation order
The static initialization order fiasco, from which you are suffering, occurs when an object in one translation unit may rely on the data from another unit that has not yet been, or is in the process of being, initialized.
Dynamic initialization of static members across translation-units are indeterminately sequenced with respect to all other translation-units. There is no guarantee to the order in which they get initialized, or even if they are on the same thread.
There are a few possible conditions I can think of that may result in you experiencing it while others are not. Factors like hardware, link-order, timing, etc. may all play a role. Some possible cases:
Since dynamic-initialization is indeterminately sequenced and not guaranteed to even exist in the same thread, if initialization is multi-threaded on startup, then differing hardware may introduce different initialization timing (race-condition)
It could just be that there is a difference in your development environment on your system. Something as small as having a library found in a different location may disrupt the library load-order, which may affect the order in which things are initialized.
Assuming your system is performing parallel-compilation, it's possible that your hardware is compiling objects in a different order, which has affected the order these objects get linked in. The link-order may change initialization order -- which may cause you to experience an issue where someone else doesn't.
Ultimately, why you are experiencing this when others are not doesn't actually matter. Formally, you are experiencing undefined behavior, and how it behaves and who it discriminates against cannot be reasoned about.
Note: Without a lot more information about your system setup, the best that can be provided is guesswork as to why this is occurring.

Global objects are inherently unsafe?

I know that the order of initialization of static variables defined in different translation units (e.g. different cpp/lib/dll/so files) is undefined. Does it mean that the behavior of following program is not well defined?
#include <vector>
std::vector<int> v;
int main()
{
v.push_back(1);
}
EDIT: Here I used STL vector as an example. But it could be an object of any other "3rd party" class. As such we wouldn't know if that object initialized via some other global variable. This means that in C++ it not safe to create even a single global object with nontrivial constructor. Right?
No, because when you use v in main, it is perfectly defined. The static initialization phase takes place before you use v in main ...
The problem arise if you use 2 globals in different translation units and there is a dependency between the two. See this C++ FAQ lite for an explanation. The next items in the FAQ explains how to avoid the 'fiasco'.
The problem of static initialization made globals worse in C++ than in any other language. Good library writers know the problem and avoid the static order initialization fiasco. And even if not, if the library is well spread, someone will hit the problem and, I hope, fix it. But 3rd party libs are not always well written, they can be libraries written in your company by an ignorant new to C++ programmer ...
So, yes, it is unsafe, you're right. And in C++ avoid globals even more than in other languages !
Note: Columbo as pointed out that the standard does not not exactly say that v is defined before entering main (see his answer). No practical difference in your instance.
It's specified in [basic.start.init]/4:
It is implementation-defined whether the dynamic initialization of a
non-local variable with static storage duration is done before the
first statement of main. If the initialization is deferred to some
point in time after the first statement of main, it shall occur before
the first odr-use (3.2) of any function or variable defined in the
same translation unit as the variable to be initialized.
It is therefore defined that v is initialized before its first use in any function of this translation unit, including main. That implies that in this particular program v is initialized before the first statement of main.
The static initialization order fiasco occurs when multiple variables in different translation units depend on their relative order of initialization; The initializations may be indeterminately sequenced with respect to each other, depending on their initialization.
Since there's only one global object being defined, there can be only one ordering of its initialization, and therefore there is no issue.

Static member variable for class that is dynamically loaded

If I were to load up some symbols using something like dlopen in C++ while other classes in that translation unit had static member variables what exactly is the behavior of those static member variables. Do they get initialized or no because the library isn't really loaded just the symbols that you looked up (which I'm thinking the latter is not true because if the symbol you looked up needs those they need to be loaded too)?
In short, there's no guarantee that static variables that cannot be initialized at compile time will be initialized before an externally visible function or variable in the same translation unit is referred to. This is true even for static linking. As for trying to get static variables in dynamically loaded libraries to initialize upon loading, my experience is that often you'll get lucky, especially for small programs, but fundamentally this is undefined behavior and should not be relied on. The resulting bugs are unpredictable, difficult to reproduce, and highly system specific.
First, some standardese and an explanation of why this is undefined behavior and then some workarounds.
The word static is unfortunately overloaded in the Standard so bear with me. The Standard makes reference to both static storage duration and static initialization. The types of storage duration defined by the Standard are static, thread, automatic, and dynamic. They are as they sound. Static storage duration means that the lifetime of such a variable is the entire duration of the program.
Static initialization is a distinct concept. Although a variable may be stored only once per program execution, the value it will be initialized with may be not be known when the program starts. At the start of the program, all variables with static storage duration will be zero initialized and those that can be will then be constant initialized. The fine points are in §3.6.2 but roughly, a static variable will be constant initialized if its initialization relies only on constant expressions. Together, zero initialization and constant initialization are termed static initialization. The counterpart is dynamic initialization. These are the interesting ones but unfortunately there's no portable way to force dynamic initialization to take place before main() first executes, in the case of dynamic linking, or before dlopen() returns, in the case of dynamic loading. C++ simply does not demand such.
The key part of the C++11 Standard is in §3.6.2:
It is implementation-defined whether the dynamic initialization of a
non-local variable with static storage duration is done before the
first statement of main. If the initialization is deferred to some
point in time after the first statement of main, it shall occur before
the first odr-use (3.2) of any function or variable defined in the
same translation unit as the variable to be initialized.
Nonetheless, if you've experimented, you've noticed that sometimes this does work. Sometimes you can get arbitrary code to run upon library loading by stuffing it in the constructors of static variables. Whether this happens is simply up to the compiler (not the linker). The manpage for dlopen explains.
If a dynamic library exports a routine named _init(), then that code is executed after the loading, before dlopen() returns
Inspecting the asm output of a small shared object written in standard C++, I can see that clang 3.4 and g++ 4.8 both add an _init section, however they are not required to do so.
As for workarounds, a gcc extension that has become commonplace does allow control of this behavior. By adding a constructor attribute to functions, we can insist that they be run upon library initialization. The linked manpage for dlopen suggests using this method.
See the GCC documentation on function attributes and this SO question which has an example usage. This extension is supported by gcc, clang, IBM XL, and my guess is that icc supports it too. MSVC does not support this but I understand there's something similar.
A truly portable solution is elusive. As the Standard says, if you can somehow cause an odr usage in the same translation unit as the static variable, then the static variable must be initialized. Calling a function, even a dummy function just for this purpose, would work.

Schwarz Counter for a vector

I am looking at a case where I would have a global static std::vector that I would need to guarantee is initialized (constructed) before certain static objects in an assortment of translation units.
When I look up how to handle this I come across two proposed solutions:
Have a static object in a global function that is used in place of a global static object.
Schwarz Counters
My concern with using a Schwarz counter is that the std::vector will be initialized twice. From this link I get "A useful technique for ensuring that a global object is initialized only once and before its first use is to maintain a count of the number of translation units using it."
How does it work that the global is only initialized once? By my reasoning it would be initialized twice. Once in the normal course of static initialization and once when the first instance of the Schwarz counter is initialized.
On a related note, what would the initialization code look like in the Schwarz counter constructor? I can only think of using a placement new.
I can only say how I've implemented it in the past: I design
a special "no-op" constructor, which does nothing, and use
placement new in the Schwartz counter. Something like:
class ForUseAsStatic
{
public:
enum MakeCtorNoop { makeCtorNoop };
ForUseAsStatic(); // normal ctor, called by Schwartz counter.
ForUseAsStatic( MakeCtorNoop );
// no-op constructor, used when
// defining the variable.
};
Formally, this isn't guaranteed—the compiler is allowed to
set the memory to 0 again just before calling the constructor,
but I've never heard of a compiler which does.
It would also be possible to put some sort of flag in the class
itself, tested by the constructor. This would only be valid for
static objects, of course (since it needs zero initialization in
order to work).
Another possible technique (which I've seen used in some
libraries) is to declare the memory for the objects in
assembler, or as a byte array if the compiler has some means of
forcing alignment. Data names are not normally mangled, so this
will generally work, even if it is formally undefined behavior.
(For the standard library, of course, the library authors can
request extensions in the compiler to help them with the
problem.)
Finally: today, the singleton idiom, or something similar, is
generally preferred to such work-arounds. It does mean that you
have to write myobj().xxx, rather than just myobj.xxx, but
this is not generally felt to be an issue.
I don't think there is a right answer to the static initialisation problem the behaviour is undefined, and the selection of a solution depends on the specific circumstances. It depends on:
Your compiler and its implementation
most compilers offer guarantees within a single compilation unit
order can be determined by the linker and sometimes influenced with #pragma
initialisation can occur at any point before main begins executing.
Whether destructors will be called for global functions with static variables and when.
The application architecture
Whether a .DLL is used, certain environments have reduced support for managing static construction/destruction in DLLs particularly when demand loaded.
Whether the application is threaded, this can have an effect on how global functions with static variables get called.
Possibly the best advice is to try to avoid this fiasco by designing your system in another way, although this might not be practical. It does sound like your application does not need to take some of the concerns about portability into account and targets a specific environment with a specific compiler.
#pragma / Compiler options
One option you might not have considered is whether there is some compiler support for what you need.
For windows see:
http://support.microsoft.com/kb/104248
For g++:
Enable the init-priority and use __attribute__ ((init_priority (n))).
Getting Swartz counters to work
What some examples omit is the object that is space is just reserved for the object being allocated that is appropriately aligned. This avoids one of the constructions you mention. For example in g++ gnu use:
typedef char fake_istream[sizeof(istream)] __attribute__ ((aligned(__alignof__(istream))))
...
fake_istream cin;
to allocate the space for the object. All code outside that compilation unit will refer to this area as extern istream cin (via the headers) and is initialised with an in-place new. Some care should be taken to make sure that the nifty counter is thread-safe (atomic).
You could use another level of indirection: have your global variable be a pointer to the vector (which will be zero-initialized), and have your counter new the vector and store the result in the pointer.

Dynamic Initialization

C++03 Standard [basic.start.init] point 3 states:
It is implementation-defined whether or not the dynamic
initialization (8.5, 9.4, 12.1, 12.6.1) of an object of namespace
scope is done before the first statement of main. If the
initialization is deferred to some point in time after the first
statement of main, it shall occur before the first use of any
function or object defined in the same translation unit as the
object to be initialized.
Microsoft Compilers, according to Additional Startup Considerations, perform the initialization prior to main().
I have been unable to obtain documentation stating the behaviour for GNU and Sun Forte compilers.
Can anyone:
Point me in the direction of documentation that describes the behaviour of the GNU and Forte compilers with respect to dynamic initialization (I have checked the GCC manual and found nothing relating to dynamic initialization).
Comment on the thread-safety of deferred dynamic initialization (if two threads attempt to invoke a function from the same translation unit that contains a non-local object).
FWIW, I observed the behaviour of GNU's g++ and SUN's CC and both performed the initalization prior to main though I don't accept this as a definitive answer. (I can post the very simple code I used to observe if anyone is interested but I felt the question is long enough)
The definitive answer is that all compilers do static initialization
before main, unless the objects are in a DLL which is loaded later.
In practice, it's (almost) impossible to meet the requirements in the
text you cite otherwise. (Think of what happens if there is a cycle.)