When you have a static global variable in a C++ header file, each translation unit that includes the header file ends up with its own copy of the variable.
However, if I declare a class in that same header file, and create a member function of that class, implemented inline within the class declaration, that uses the static global variable, for example:
#include <iostream>
static int n = 10;
class Foo {
public:
void print() { std::cout << n << std::endl; }
};
then I see slightly odd behavior under gcc 4.4:
If I compile without optimization, all uses of the member function use the copy of the variable from one of the translation units (the first one mentioned on the g++ command line).
If I compile with -O2, each use of the member function uses the copy of the variable from the translation unit in which the case is made.
Obviously this is really bad design, so this question is just out of curiosity. But my question, nonetheless, is what does the C++ standard say about this case? Is g++ behaving correctly by giving different behavior with and without optimization enabled?
The standard says (3.2/5):
There can be more than one definition
of a class type (clause 9),
... provided the definitions satisfy
the following requirements ... in each
definition of D, corresponding names,
looked up according to 3.4, shall
refer to an entity defined within the
definition of D, or shall refer to the
same entity
This is where your code loses. The uses of n in the different definitions of Foo do not refer to the same object. Game over, undefined behavior, so yes gcc is entitled to do different things at different optimization levels.
3.2/5 continues:
except that a name can refer to a
const object with internal or no
linkage if the object has the same
integral or enumeration type in all
definitions of D, and the object is
initialized with a constant expression
(5.19), and the value (but not the
address) of the object is used, and
the object has the same value in all
definitions of D
So in your example code you could make n into a static const int and all would be lovely. It's not a coincidence that this clause describes conditions under which it makes no difference whether the different TUs "refer to" the same object or different objects - all they use is a compile-time constant value, and they all use the same one.
Related
Lets assume I have a library somelib.a, that is distributed as binary by the package manager. And this library makes use of the header only library anotherlib.hpp.
If I now link my program against somelib.a, and also use anotherlib.hpp but with a different version, then this can result in UB, if somelib.a uses parts of the anotherlib.hpp in its include headers.
But what will happen if somelib.a will reference/use anotherlib.hpp only in its cpp files (so I don't know that it uses them)? Will the linking step between my application and somelib.a ensure that somelib.a and my application will both use their own version of anotherlib.hpp.
The reason I ask is if I link the individual compilation units of my program to the final program, then the linker removes duplicate symbols (depending on if it is internal linkage or not). So a header only library is normally written in a way that removing duplicate symbols can be done.
A minimal example
somelib.a is build on a system with nlohmann/json.hpp version 3.2
somelib/somelib.h
namespace somelib {
struct config {
// some members
};
config read_configuration(const std::string &path);
}
somelib.cpp
#include <nlohmann/json.hpp>
namespace somelib {
config read_configuration(const std::string &path)
{
nlohmann::json j;
std::ifstream i(path);
i >> j;
config c;
// populate c based on j
return c;
}
}
application is build on another system with nlohmann/json.hpp version 3.5 and 3.2 and 3.5 are not compatible, and then application is then linked against the somelib.a that was build on the system with version 3.2
application.cpp
#include <somelib/somelib.h>
#include <nlohmann/json.hpp>
#include <ifstream>
int main() {
auto c = somelib::read_configuration("config.json");
nlohmann::json j;
std::ifstream i("another.json");
i >> j;
return 0;
}
It hardly makes any difference that you are using a static library.
The C++ standard states that if in a program there is multiple definitions of an inline function (or class template, or variable, etc.) and all the definitions are not the same, then you have UB.
Practically, it means that unless the changes between the 2 versions of the header library are very limited you will have UB.
For instance, if the only changes are whitespace changes, comments, or adding new symbols, then you will not have undefined behavior. However, if a single line of code in an existing function was changed, then it is UB.
From the C++17 final working draft (n4659.pdf):
6.2 One-definition rule
[...]
There can be more than one definition of a class type (Clause 12),
enumeration type (10.2), inline function with external linkage
(10.1.6), inline variable with external linkage (10.1.6), class
template (Clause 17), non-static function template (17.5.6), static
data member of a class template (17.5.1.3), member function of a class
template (17.5.1.1), or template specialization for which some
template parameters are not specified in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the
following requirements.
Given such an entity named D defined in more than one translation
unit, then
each definition of D shall consist of the same
sequence of tokens; and
in each definition of D, corresponding
names, looked up according to 6.4, shall refer to an entity defined
within the definition of D, or shall refer to the same entity, after
overload resolution (16.3) and after matching of partial template
specialization (17.8.3), except that a name can refer to (6.2.1)
a non-volatile const object with internal or no linkage if the object
has the same literal type in all definitions of D,
(6.2.1.2)
is initialized with a constant expression (8.20),
is not odr-used in any definition of D, and
has the same value in all definitions of D,
or
a reference with internal or no linkage initialized with a constant expression
such that the reference refers to the same entity in all definitions
of D; and (6.3)
in each definition of D, corresponding entities
shall have the same language linkage; and
in each definition
of D, the overloaded operators referred to, the implicit calls to
conversion functions, constructors, operator new functions and
operator delete functions, shall refer to the same function, or to a
function defined within the definition of D; and
in each definition of
D, a default argument used by an (implicit or explicit) function call
is treated as if its token sequence were present in the definition of
D; that is, the default argument is subject to the requirements
described in this paragraph (and, if the default argument has
subexpressions with default arguments, this requirement applies
recursively).28
if D is a class with an implicitly-declared
constructor (15.1), it is as if the constructor was implicitly defined
in every translation unit where it is odr-used, and the implicit
definition in every translation unit shall call the same constructor
for a subobject of D.
If D is a template and is defined in more than one translation unit,
then the preceding requirements shall apply both to names from the
template’s enclosing scope used in the template definition (17.6.3),
and also to dependent names at the point of instantiation (17.6.2). If
the definitions of D satisfy all these requirements, then the behavior
is as if there were a single definition of D. If the definitions of D
do not satisfy these requirements, then the behavior is undefined.
C++17 allows static member variables to be defined thus:
class X {
public:
static inline int i = 8;
};
What is the rationale behind requiring the inline specification? Why not simply allow programmers to write
static int i = 8;
in the class?
Without inline, it's explicitly stated as only a declaration. As specified in [class.static.data]/2
The declaration of a non-inline static data member in its class
definition is not a definition and may be of an incomplete type other
than cv void. The definition for a static data member that is not
defined inline in the class definition shall appear in a namespace
scope enclosing the member's class definition.
The rationale is most probably to keep legacy code intact and valid. Recall that we could initialize integral constants in the class definition itself since about forever. But odr-using them still required an out-of-class definition in some translation unit.
So to makes such variables implicitly inline could be problematic in existing codebases. The committee is always thinking about backwards compatibility when core language features are added.
For instance, consider this valid C++03 class definition:
struct foo {
static const int n = 3;
double bar[n];
};
n can be used as a constant expression to define the extent of bar, and it's not considered an odr-use. Nowadays we'd write it as constexpr1, however that above is still valid. But there may be cases were n would have to be odr-used (imagine its address taken, or a reference bound to it, etc). They are probably not many, and probably not common, but certain API's have crazy requirements that would end up necessitating this
const int foo::n;
to appear in some translation unit.
Now, if static inline int i = 8; was suddenly implicitly inline, the definition above (that is in an existing code base) would be an odr-violation. Now previously well-formed code, is ill-formed. So it's best to allow only explicit inline to take effect here, since only new code will actually have it.
1 One could argue that static constexpr variables may have the same issue (and yet they are implicitly inline). But IIRC their original wording allowed this change without potentially breaking existing code. It was essentially already "inline" by everything but name.
struct A
{
int a = 5; //OK
const int b = 5; //OK
static const int c = 5; //OK
static int d = 5; //Error!
}
error: ISO C++ forbids in-class initialization of non-const static member 'A::d'
Why is it so? Can someone explain to me the reasoning behind this?
It has to do with where the data is stored. Here's a breakdown:
int: member variable, stored wherever the class instance is stored
const int: same as int
static const int: doesn't need to be stored, it can simply be "inlined" where used
static int: this must have a single storage location in the program...where?
Since the static int is mutable, it must be stored in an actual location somewhere, so that one part of the program can modify it and another part can see that modification. But it can't be stored in a class instance, so it must be more like a global variable. So why not just make it a global variable? Well, class declarations are usually in header files, and a header file may be #included in multiple translation units (.cpp files). So effectively the header file says "there is an int...somewhere." But the storage needs to be put into the corresponding .cpp file (like a global variable).
In the end, this is not really about initialization, but rather the storage. You could leave off the initializer and you'd still not have a valid program until you add this to your .cpp file:
int A::d; // initialize if you want to, default is zero
Without this, references to the static int will be undefined and linking will fail.
Initialization of static const member variables is available for integral and enum types. This feature existed in C++ since the first language standard (C++98). It is needed to facilitate usage of static const members in integral constant expressions (i.e. as compile-time constants), which is an important feature of the language. The reason integral and enum types were singled out and treated in this exceptional fashion is that integral constants are often used in compile-time contexts, which require no storage (no definition) for the constant.
The ability to supply initializers for non-static members is a new (for C++11) feature. It is a completely different feature, even though it looks similar at syntax level. Such initializers are used as construction-time initializers for those class members that were not explicitly initialized by the user.
In other words, it is not correct to lump these two features (initializers for static and non-static members) together. These two features are completely different. They are based on completely unrelated internal mechanics. Your question essentially applies the first feature: how come non-const static members cannot be initialized in-class? It is basically a C++98 question and the most likely answer to it is that there was never any reason to treat non-const static members in such an exceptional way. Non-const static members are treated in accordance with the general rules: they require a separate definition and the initializer should be provided at the point of definition.
If I were to do this
class Gone
{
public:
static const int a = 3;
}
it works but if do
class Gone
{
public:
static int a = 3;
}
it gives a compile error. Now I know why the second one doesn't work, I just don't know why the first one does.
Thanks in advance.
This trick works only for constant compile-time expressions. Consider the following simple example:
#include <iostream>
class Foo {
public:
static const int bar = 0;
};
int main()
{
std::cout << Foo::bar << endl;
}
It works just fine, because compiler knows that Foo::bar is 0 and never changes. Thus, it optimizes the whole thing away.
However, the whole thing breaks once you take the address of that variable like this:
int main()
{
std::cout << Foo::bar << " (" << &Foo::bar << ")" << std::endl;
}
Linker sends you to fix the program because compile-time constants don't have addresses.
Now, the second case in your example doesn't work simply because a non-constant variable cannot be a constant compile-time expression. Thus, you have to define it somewhere and cannot assign any values in initialization.
C++11, by the way, has constexpr. You can check Generalized constant expressions wiki (or C++11 standard :-)) for more info.
Also, be careful - with some toolchains you will never be able to link program as listed in your first example when optimizations are turned off, even if you never take an address of those variables. I think there is a BOOST_STATIC_CONSTANT macro in Boost to work around this problem (not sure if it works though because I reckon seeing linkage failures with some old gcc even with that macro).
The static const int declaration is legal because you're declaring a constant, not a variable. a doesn't exist as a variable - the compiler is free to optimize it out, replacing it with the declared value 3 anywhere a reference to Gone::a appears. C++ allows the static initialization in this restricted case where it's an integer constant.
You can find more details, including an ISO C++ standard citation here.
Initialization of variables has to be done at the point of definition, not the point of declaration in the general case. Inside the class brackets you only have a declaration and you need to provide a definition in a single translation unit*:
// can be in multiple translation units (i.e. a header included in different .cpp's)
struct test {
static int x; // declaration
static double d; // declaration
};
// in a single translation unit in your program (i.e. a single .cpp file)
int test::x = 5; // definition, can have initialization
double test::d = 5.0; // definition
That being said, there is an exception for static integral constants (and only integral constants) where you can provide the value of the constant in the declaration. The reason for the exception is that it can be used as a compile-time constant (i.e. to define the size of an array), and that is only possible if the compiler sees the value of the constant in all translation units where it is needed.
struct test {
static const int x = 5; // declaration with initialization
};
const int test::x; // definition, cannot have initialization
Going back to the original question:
Why is it not allowed for non-const integers?
because initialization happens in the definition and not declaration.
Why is it allowed for integral constants?
so that it can be used as a compile-time constant in all translation units
* The actual rules require the definition whenever the member attribute is used in the program. Now the definition of used is a bit tricky in C++03 as it might not be all that intuitive, for example the use of that constant as an rvalue does not constitute use according to the standard. In C++11 the term used has been replaced with odr-used in an attempt to avoid confusion.
A static const is defined in the class definition since everybody that uses the code need to know the value at compile time, not link time. An ordinary static is actually only declared in the class definition, but defined once, in one translation unit.
I seem to recall that originally (ARM) it was not allowed, and we used to use enum to define constants in class declarations.
The const case was explicitly introduced so as to support availability of the value in headers for use in constant expressions, such as array sizes.
I think (and please comment if I have this wrong) that strictly you still need to define the value:
const int Gone::a;
to comply with the One Definition Rule. However, in practice, you might find that the compiler optimises away the need for an address for Gone::a and you get away without it.
If you take:
const int* b = &Gone::a;
then you might find you do need the definition.
See the standard, $9.4.2:
ISO 1998:
"4 If a static data member is of const integral or const enumeration
type, its declaration in the class definition can specify a
constantinitializer which shall be an integral constant expression
(5.19). In that case, the member can appear in integral constant
expressions within its scope. The member shall still be defined in a
namespace scope if it is used in the program and the namespace scope
definition shall not contain an initializer."
Draft for c++11:
"3 If a static data member is of const effective literal type, its
declaration in the class definition can specify a constant-initializer
brace-or-equal-initializer with an initializer-clause that is an
integral constant expression. A static data member of effective
literal type can be declared in the class definition with the
constexpr specifier; if so, its declaration shall specify a
constant-initializer brace-or-equal-initializer with an
initializerclause that is an integral constant expression. In both
these cases, the member may appear in integral constant expressions.
The member shall still be defined in a namespace scope if it is used
in the program and the namespace scope definition shall not contain an
initializer."
I am not sure entirely what this covers, but I think it means that we can now use the same idiom for floating point and possibly string literals.
What is the difference between these three statements?
static const int foo = 42;
const int foo = 42;
#define foo 42
2) const int foo = 42;
This is an int variable whose value you can't change.
1) static const int foo = 42;
This is the same as 2), but it's only visible in the source code file that it's in. So you can't use it in another .cpp file for example if you compile them separately and then link together. By using static with variables and functions you allow the compiler to optimize them better because the compiler can rely on that it knows all situations where that variable or function is used. The word static has different meanings in different situations, but this is its behavior if you use it in the global level. If you use this inside a function, it has a different meaning. Then it means that this variable is only initialized once and it stays in the memory no matter how many times the code execution passes that definition. This has more meaning if you don't use const at the same time, because then you can change the value of the variable and it will "remember" that value even when you exit the part of code where that variable is visible (called "scope") and re-enter it.
3) #define foo 42
This is a precompiler macro. So the precompiler will substitute all "foo"s with the number 42 before giving to code to the actual compiler. Some years ago people used this approach because it was faster than const variables, but nowadays they are equally fast.
static const int foo = 42;
What this does depends on where it is found and what language you are using:
If this is a declaration at namespace scope (in C++) or file scope (in C), then it declares and defines a const-qualified object named foo that has internal linkage (this means that the name foo only refers to this object in the current translation unit, not in other translation units).
If this is a declaration at function scope (in C or C++), then it declares and defines a const-qualified object named foo that has no linkage (locally declared variables don't have linkage) and the lifetime of that object is the duration of the program (this means that in every call to the function, foo refers to the same object).
If this is C++ and this is a declaration inside of a class, then it declares but does not define a const-qualified static data member named foo that has external linkage (this means that the name foo (in this case, when qualified with the class name) refers to the same object when used in any translation unit).
const int foo = 42;
What this does depends on what language you are using and where the declaration appears. If the declaration is at namespace or file scope, then
In C++ this declares and defines a const-qualified object named foo that has internal linkage (the const implies internal linkage in C++).
In C this declares and defines a const-qualified object named foo that has external linkage (the const does not imply internal linkage in C).
In both C++ and C, if this declaration is at function scope, then this declares and defines a const-qualified local variable named foo that has no linkage.
#define foo 42
This does not define an object; it defines a macro named foo that is replaced by the token sequence consisting of a single token, 42.
The const modifier is used to specify that foo is a constant. i.e. After initialisation, its value may not be changed.
The static keyword is used to to tell the compiler that the value of the variable you're declaring as static should be retained even when it goes out of scope. This means that if you declare a variable static inside a function, it's value will be remembered even after the function returns (unlike automatic variables). It's also used to tell the compiler that a variable is visible only within the current unit of compilation. This means if that you declare a top level variable/function static, it's not visible from other files that you're compiling along with this one.
#define is a preprocessor directive that performs textual substitution on all occurrences of (in this case) foo before actual compilation takes place. The compiler doesn't even see foo. It only sees 42.
First two declare a variable called foo, while the third one doesn't declare any variable, it's just an alias of 42.
In C, first one is a file-cope variable, while the second one has external linkage, means it can be referred to from another translation unit!
But in C++, both ( first and second) are same, because in C++ const are static by default; both has internal linkage!
static const int foo = 42;
A constant static variable. The value once initialized can't be changed as long as the variable exists. Being static the variable can't be included in any files even if the file in which it is declared is included. If this is member of a class then only one copy of the variable exists for all the instances of the class.
const int foo = 42;
A constant variable where the value once initialized stays and cant be changed.
#define foo 42
Not a variable but a symbolic constant. so operations like &foo are not allowed. foo only serves as an alias to 42. Unlike others this is processed by the preprocessor .
What does static mean? hint: fgbentr fcrpvsvre
What does const mean? hint: ernq-bayl inevnoyr
What does #define do? In what stage of compilation does it take place? What's that mean for generated code?