C++03 Standard [basic.start.init] point 3 states:
It is implementation-defined whether or not the dynamic
initialization (8.5, 9.4, 12.1, 12.6.1) of an object of namespace
scope is done before the first statement of main. If the
initialization is deferred to some point in time after the first
statement of main, it shall occur before the first use of any
function or object defined in the same translation unit as the
object to be initialized.
Microsoft Compilers, according to Additional Startup Considerations, perform the initialization prior to main().
I have been unable to obtain documentation stating the behaviour for GNU and Sun Forte compilers.
Can anyone:
Point me in the direction of documentation that describes the behaviour of the GNU and Forte compilers with respect to dynamic initialization (I have checked the GCC manual and found nothing relating to dynamic initialization).
Comment on the thread-safety of deferred dynamic initialization (if two threads attempt to invoke a function from the same translation unit that contains a non-local object).
FWIW, I observed the behaviour of GNU's g++ and SUN's CC and both performed the initalization prior to main though I don't accept this as a definitive answer. (I can post the very simple code I used to observe if anyone is interested but I felt the question is long enough)
The definitive answer is that all compilers do static initialization
before main, unless the objects are in a DLL which is loaded later.
In practice, it's (almost) impossible to meet the requirements in the
text you cite otherwise. (Think of what happens if there is a cycle.)
Related
Consider this simple code:
void g();
void foo()
{
volatile bool x = false;
if (x)
g();
}
https://godbolt.org/z/I2kBY7
You can see that neither gcc nor clang optimize out the potential call to g. This is correct in my understanding: The abstract machine is to assume that volatile variables may change at any moment (due to being e.g. hardware-mapped), so constant-folding the false initialization into the if check would be wrong.
But MSVC eliminates the call to g entirely (keeping the reads and writes to the volatile though!). Is this standard-compliant behavior?
Background: I occasionally use this kind of construct to be able to turn on/off debugging output on-the-fly: The compiler has to always read the value from memory, so changing that variable/memory during debugging should modify the control flow accordingly. The MSVC output does re-read the value but ignores it (presumably due to constant folding and/or dead code elimination), which of course defeats my intentions here.
Edits:
The elimination of the reads and writes to volatile is discussed here: Is it allowed for a compiler to optimize away a local volatile variable? (thanks Nathan!). I think the standard is abundantly clear that those reads and writes must happen. But that discussion does not cover whether it is legal for the compiler to take the results of those reads for granted and optimize based on that. I suppose this is under-/unspecified in the standard, but I'd be happy if someone proved me wrong.
I can of course make x a non-local variable to side-step the issue. This question is more out of curiosity.
I think [intro.execution] (paragraph number vary) could be used to explain MSVC behavior:
An instance of each object with automatic storage duration is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended...
The standard does not permit elimination of a read through a volatile glvalue, but the paragraph above could be interpreted as allowing to predict the value false.
BTW, the C Standard (N1570 6.2.4/2) says that
An object exists, has a constant address, and retains its last-stored value throughout its lifetime.34
34) In the case of a volatile object, the last store need not be explicit in the program.
It is unclear if there could be a non-explicit store into an object with automatic storage duration in C memory/object model.
TL;DR The compiler can do whatever it wants on each volatile access. But the documentation has to tell you.--"The semantics of an access through a volatile glvalue are implementation-defined."
The standard defines for a program permitted sequences of "volatile accesses" & other "observable behavior" (achieved via "side-effects") that an implementation must respect per "the 'as-if' rule".
But the standard says (my boldface emphasis):
Working Draft, Standard for Programming Language C++
Document Number: N4659
Date: 2017-03-21
§ 10.1.7.1 The cv-qualifiers
5 The semantics of an access through a volatile glvalue are implementation-defined. […]
Similarly for interactive devices (my boldface emphasis):
§ 4.6 Program execution
5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. [...]
7 The least requirements on a conforming implementation are:
(7.1) — Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.
(7.2) — At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
(7.3) — The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program. [...]
(Anyway what specific code is generated for a program is not specified by the standard.)
So although the standard says that volatile accesses can't be elided from the abstract sequences of abstract machine side effects & consequent observable behaviors that some code (maybe) defines, you can't expect anything to be reflected in object code or real-world behaviour unless your compiler documentation tells you what constitutes a volatile access. Ditto for interactive devices.
If you are interested in volatile vis a vis the abstract sequences of abstract machine side effects and/or consequent observable behaviors that some code (maybe) defines then say so. But if you are interested in what corresponding object code is generated then you must interpret that in the context of your compiler & compilation.
Chronically people wrongly believe that for volatile accesses an abstract machine evaluation/read causes an implemented read & an abstract machine assignment/write causes an implemented write. There is no basis for this belief absent implementation documentation saying so. When/iff the implementation says that it actually does something upon a "volatile access", people are justified in expecting that something--maybe, the generation of certain object code.
I believe it is legal to skip the check.
The paragraph that everyone likes to quote
34) In the case of a volatile object, the last store need not be explicit in the program
does not imply that an implementation must assume such stores are possible at any time, or for any volatile variable. An implementation knows which stores are possible. For instance, it is entirely reasonable to assume that such implicit writes only happen for volatile variables that are mapped to device registers, and that such mapping is only possible for variables with external linkage. Or an implementation may assume that such writes only hapen to word-sized, word-aligned memory locations.
Having said that, I think MSVC behaviour is a bug. There is no real-world reason to optimise away the call. Such optimisation may be compliant, but it is needlessly evil.
In the Fortran book by Nyhoff, on p. 51, it is stated that a variable can be initialized by a "constant expression". However, Nyhoff doesn't seem to say what exactly a constant expression is.
Question 1: Can a variable be initialized by a user-defined function? Ex: real :: myreal=myrealfunc(4.0) (It is possible for a variable to be initialized by an intrinsic function.)
Question 2: This is not directly related to Question 1. Consider a case when a main program does not use a specific module, but one of the subprograms use this module. If a variable is part of this module, does it get initialized at the beginning of the execution of the main program, or until the subprogram (that use the module) is invoked?
References, if available, would also help.
Answer 1: No, a variable cannot be initialised by a user-defined function. R505 (of the Fortran 2008 draft standard I have in front of me) says that a variable can only be initialised by a constant-expr. Para 7.1.12.1 of the same document defines constant-expr and includes the words It is an expression in which each operation is intrinsic.
Answer 2: The use-d variable is only accessible inside the scope(s) in which it is use-d. When it is actually initialised is a moot point. I suspect that it is processor-dependent (as that term is defined at clause 1.3.114 of the standard).
I also suspect that compilers will initialise at program start up. Although the standard doesn't require it, scarcely even hints at it, initialisation expressions are probably, in practice, computable by the compiler. I guess you would be able to figure out the behaviour of your processor by examining the memory consumption of a program which initialises a large variable. You won't be able to simply print or otherwise examine a variable during execution when the variable is not in scope.
Of course variables which are initialised acquire the save attribute which means that their values are saved across invocations of their enclosing scope.
A reference in the standard to validate this answer escapes me right now.
For the first question I'll take the same starting point as High Performance Mark's answer, paragraph 7.1.12.1, but interpret it slightly differently. I'll come to the same conclusion.
Yes, each operation in a constant expression must be intrinsic, but a reference to a user-defined function as in the question isn't an operation. It is, instead, a primary and a list of allowed primaries follows in that paragraph.
That said, from the allowed cases for a primary in the sub-paragraphs that follow (4--8), each function reference must be to an intrinsic function.
I know that the order of initialization of static variables defined in different translation units (e.g. different cpp/lib/dll/so files) is undefined. Does it mean that the behavior of following program is not well defined?
#include <vector>
std::vector<int> v;
int main()
{
v.push_back(1);
}
EDIT: Here I used STL vector as an example. But it could be an object of any other "3rd party" class. As such we wouldn't know if that object initialized via some other global variable. This means that in C++ it not safe to create even a single global object with nontrivial constructor. Right?
No, because when you use v in main, it is perfectly defined. The static initialization phase takes place before you use v in main ...
The problem arise if you use 2 globals in different translation units and there is a dependency between the two. See this C++ FAQ lite for an explanation. The next items in the FAQ explains how to avoid the 'fiasco'.
The problem of static initialization made globals worse in C++ than in any other language. Good library writers know the problem and avoid the static order initialization fiasco. And even if not, if the library is well spread, someone will hit the problem and, I hope, fix it. But 3rd party libs are not always well written, they can be libraries written in your company by an ignorant new to C++ programmer ...
So, yes, it is unsafe, you're right. And in C++ avoid globals even more than in other languages !
Note: Columbo as pointed out that the standard does not not exactly say that v is defined before entering main (see his answer). No practical difference in your instance.
It's specified in [basic.start.init]/4:
It is implementation-defined whether the dynamic initialization of a
non-local variable with static storage duration is done before the
first statement of main. If the initialization is deferred to some
point in time after the first statement of main, it shall occur before
the first odr-use (3.2) of any function or variable defined in the
same translation unit as the variable to be initialized.
It is therefore defined that v is initialized before its first use in any function of this translation unit, including main. That implies that in this particular program v is initialized before the first statement of main.
The static initialization order fiasco occurs when multiple variables in different translation units depend on their relative order of initialization; The initializations may be indeterminately sequenced with respect to each other, depending on their initialization.
Since there's only one global object being defined, there can be only one ordering of its initialization, and therefore there is no issue.
If I were to load up some symbols using something like dlopen in C++ while other classes in that translation unit had static member variables what exactly is the behavior of those static member variables. Do they get initialized or no because the library isn't really loaded just the symbols that you looked up (which I'm thinking the latter is not true because if the symbol you looked up needs those they need to be loaded too)?
In short, there's no guarantee that static variables that cannot be initialized at compile time will be initialized before an externally visible function or variable in the same translation unit is referred to. This is true even for static linking. As for trying to get static variables in dynamically loaded libraries to initialize upon loading, my experience is that often you'll get lucky, especially for small programs, but fundamentally this is undefined behavior and should not be relied on. The resulting bugs are unpredictable, difficult to reproduce, and highly system specific.
First, some standardese and an explanation of why this is undefined behavior and then some workarounds.
The word static is unfortunately overloaded in the Standard so bear with me. The Standard makes reference to both static storage duration and static initialization. The types of storage duration defined by the Standard are static, thread, automatic, and dynamic. They are as they sound. Static storage duration means that the lifetime of such a variable is the entire duration of the program.
Static initialization is a distinct concept. Although a variable may be stored only once per program execution, the value it will be initialized with may be not be known when the program starts. At the start of the program, all variables with static storage duration will be zero initialized and those that can be will then be constant initialized. The fine points are in §3.6.2 but roughly, a static variable will be constant initialized if its initialization relies only on constant expressions. Together, zero initialization and constant initialization are termed static initialization. The counterpart is dynamic initialization. These are the interesting ones but unfortunately there's no portable way to force dynamic initialization to take place before main() first executes, in the case of dynamic linking, or before dlopen() returns, in the case of dynamic loading. C++ simply does not demand such.
The key part of the C++11 Standard is in §3.6.2:
It is implementation-defined whether the dynamic initialization of a
non-local variable with static storage duration is done before the
first statement of main. If the initialization is deferred to some
point in time after the first statement of main, it shall occur before
the first odr-use (3.2) of any function or variable defined in the
same translation unit as the variable to be initialized.
Nonetheless, if you've experimented, you've noticed that sometimes this does work. Sometimes you can get arbitrary code to run upon library loading by stuffing it in the constructors of static variables. Whether this happens is simply up to the compiler (not the linker). The manpage for dlopen explains.
If a dynamic library exports a routine named _init(), then that code is executed after the loading, before dlopen() returns
Inspecting the asm output of a small shared object written in standard C++, I can see that clang 3.4 and g++ 4.8 both add an _init section, however they are not required to do so.
As for workarounds, a gcc extension that has become commonplace does allow control of this behavior. By adding a constructor attribute to functions, we can insist that they be run upon library initialization. The linked manpage for dlopen suggests using this method.
See the GCC documentation on function attributes and this SO question which has an example usage. This extension is supported by gcc, clang, IBM XL, and my guess is that icc supports it too. MSVC does not support this but I understand there's something similar.
A truly portable solution is elusive. As the Standard says, if you can somehow cause an odr usage in the same translation unit as the static variable, then the static variable must be initialized. Calling a function, even a dummy function just for this purpose, would work.
The C++ standard section 3.6.2 paragraph 3 states that it is implementation-defined whether dynamic initialization of non-local objects occurs after the first statement of main().
Does anyone know what the rationale for this is, and which compilers postpone non-local object initialization this way? I am most familiar with g++, which performs these initializations before main() has been entered.
This question is related: Dynamic initialization phase of static variables
But I'm specifically asking what compilers are known to behave this way.
It may be that the only rationale for this paragraph is to support dynamic libraries loaded at runtime, but I do not think that the standard takes dynamic loading issues into consideration.
One of the reasons may be the following:
static char data[1000000000000000000000000000000];
void main(int argc)
{
if (argc > 0)
data[0] = 0;
}
It might be reasonable to allocate and init this static array only when it turns out that it is really needed. It might happen that some application were coming across something similar and had enough voice to convince the committee. In my own experience with C# I came across situation when static members of the class were not allocated right after jitting the class. They were allocated one by one, on the first use. In that case there was absolutely no justification for doing that. It was a plain disaster. Maybe they fixed this now.
Other reasons are possible also.
From the C++11 draft:
It is implementation-defined whether the dynamic initialization of a non-local variable with static storage
duration is done before the first statement of main. If the initialization is deferred to some point in time after the first statement of main, it shall occur before the first odr-use (3.2) of any function or variable defined in the same translation unit as the variable to be initialized. [emphasis mine]
That is, the static variable has to be initialized before any use of anything defined in the same translation unit.
It looks to me that it is done this way to allow dynamic libraries (DLLs or SOs) to be loaded and initialized lazily, or even dynamically (calling dlopen or LoadLibrary or whatever).
It is obvious that a variable defined in a DLL cannot be initialized before the DLL itself is loaded.
Naturally, C++ knows nothing about DLLs so there is no direct mention to them in the standard. But the people from the commitee do know about real environments and compilers, and certainly know about DLLs. Without this clause, lazy loading a DLL would technically violate the C++ specification. (Not that it would prevent implementators to do it anyway, but it is better if we all try to go along with each other.)
And about which systems support this, that I know of, at least the MS Visual C++ compiler supports lazy dynamic linking (the DLL will not even be loaded until first use). And most modern platforms support dynamic loading a DLL.