In the class:
class foo
{
public:
static int bar; //declaration of static data member
};
int foo::bar = 0; //definition of data member
We have to explicitly define the static variable, otherwise it will result in a
undefined reference to 'foo::bar'
My question is:
Why do we have to give an explicit definition of a static variable?
Please note that this is NOT a duplicate of previously asked undefined reference to static variable questions. This question intends to ask the reason behind explicit definition of a static variable.
From the beginning of time C++ language, just like C, was built on the principle of independent translation. Each translation unit is compiled by the compiler proper independently, without any knowledge of other translation units. The whole program only comes together later, at linking stage. Linking stage is the earliest stage at which the entire program is seen by linker (it is seen as collection of object files prepared by the compiler proper).
In order to support this principle of independent translation, each entity with external linkage has to be defined in one translation unit, and in only one translation unit. The user is responsible for distributing such entities between different translation units. It is considered a part of user intent, i.e. the user is supposed to decide which translation unit (and object file) will contain each definition.
The same applies to static members of the class. Static members of the class are entities with external linkage. The compiler expects you to define that entity in some translation unit. The whole purpose of this feature is to give you the opportunity to choose that translation unit. The compiler cannot choose it for you. It is, again, a part of your intent, something you have to tell the compiler.
This is no longer as critical as it used to be a while ago, since the language is now designed to deal with (and eliminate) large amount of identical definitions (templates, inline functions, etc.), but the One Definition Rule is still rooted in the principle of independent translation.
In addition to the above, in C++ language the point at which you define your variable will determine the order of its initialization with regard to other variables defined in the same translation unit. This is also a part of user intent, i.e. something the compiler cannot decide without your help.
Starting from C++17 you can declare your static members as inline. This eliminates the need for a separate definition. By declaring them in that fashion you effectively tell compiler that you don't care where this member is physically defined and, consequently, don't care about its initialization order.
In early C++ it was allowed to define the static data members inside the class which certainly violate the idea that class is only a blueprint and does not set memory aside. This has been dropped now.
Putting the definition of static member outside the class emphasize that memory is allocated only once for static data member (at compile time). Each object of that class doesn't have it own copy.
static is a storage type, when you declare the variable you are telling the compiler "this week be in the data section somewhere" and when you subsequently use it, the compiler emits code that loads a value from a TBD address.
In some contexts, the compiler can drive that a static is really a compile time constant and replace it with such, for example
static const int meaning = 42;
Inside a function that never takes the address of the value.
When dealing with class members, however, the compiler can't guess where this value should be created. It might be in a library you will link against, or a dll, or you might be providing a library where the value must be provided by the library consumer.
Usually, when someone asks this, though, it is because they are misusing static members.
If all you want us a constant value, e.g
static int MaxEntries;
...
int Foo::MaxEntries = 10;
You would be better off with one or other of the following
static const int MaxEntries = 10;
// or
enum { MaxEntries = 10 };
The static requires no separate definition until something tries to take the address of or form a reference to the variable, the enum version never does.
Inside the class you are only declaring the variable, ie: you tell the compiler that there is something with this name.
However, a static variable must get some memory space to live in, and this must be inside one translation unit. The compiler reserves this space only when you DEFINE the variable.
Structure is not variable, but its instance is. Hence we can include same structure declaration in multiple modules but we cannot have same instance name defined globally in multiple modules.
Static variable of structure is essentially a global variable. If we define it in structure declaration itself, we won't be able to use the structure declaration in multiple modules. Because that would result in having same global instance name (of static variable) defined in multiple modules causing linker error "Multiple definitions of same symbol"
Related
I am trying to understand the difference between the declaration & definition of static and non-static data members. Apology, if I am fundamentally miss understood concepts. Your explanations are highly appreciated.
Code Trying to understand
class A
{
public:
int ns; // declare non-static data member.
static int s; // declare static data member.
void foo();
};
int A::s; // define non-static data member.
// int A::ns; //This gives an error if defined.
void A::foo()
{
ns = 10;
s = 5; // if s is not defined this gives an error 'undefined reference'
}
When you declare something, you're telling the compiler that the name being declared exists and what kind of name it is (type, variable, function, etc.) The definition could be with the declaration (as with your class A) or be elsewhere—the compiler and linker will have to connect the two later.
The key point of a variable or function definition is that it tells the compiler and linker where this variable/function will live. If you have a variable, there needs to be a place in memory for it. If you have a function, there needs to be a place in the binary containing the function's instructions.
For non-static data members, the declaration is also the definition. That is, you're giving them a place to live¹. This place is within each instance of the class. Every time you make a new A object, it comes with an ns as part of it.
Static data members, on the other hand, have no associated object. Without a definition, you've got a situation where you have N instances of A all sharing the same s, but nowhere to put s. Therefore, C++ makes you choose one translation unit for it via a definition, most often the source file that acommpanies that header.
You could argue that the compiler should just pick one instance for it, but this won't work for various reasons, one being that you can use static data members before ever creating an instance, after the last instance is gone, or without having instances at all.
Now you might wonder why the compiler and linker still can't just figure it out on their own, and... that's actually pretty much what happens if you slap an inline on the variable or function. You can end up with multiple definitions, but only one will be chosen.
1: Giving them a place to live is a little beside the point here. All the compiler needs to know when it creates an object of that class is how much space to give it and which parts of that space are which data members. You could think of it as the compiler doing the definition part for you since there's only one place that data member could possibly live.
static members are essentially global variables with a special name and access rules tied to the class. Hence, they inherit all the problems for usual global variables. Namely, in the whole C++ program (which is the union of all translation units aka .cpp files) there should be exactly one definition of each global variable, no more.
You can think of "variable definition" as "the place which will allocate memory for the variable".
However, classes are typically defined in a header file (.h/.hpp/etc) which is included in multiple translation units. So it's up to the programmer to specify which translation unit actually defines the variable. Note that since C++17 we have the inline keyword which places this burden on a compiler, look for "inline variables". The naming is weird for historical reasons.
However, non-static members do not really exist until you create an instance of the class, i.e. an object. And it's the object lifetime and storage duration which define how each individual member is created/stored/destroyed. So there is no need to actually define them anywhere outside of the class.
static variables belongs to the class definition. non-static variables belong to the instances created with the class definition.
int main()
{
A::s = 5; // this is ok
A a;
a.ns = 5 // this is also ok
}
In the class:
class foo
{
public:
static int bar; //declaration of static data member
};
int foo::bar = 0; //definition of data member
We have to explicitly define the static variable, otherwise it will result in a
undefined reference to 'foo::bar'
My question is:
Why do we have to give an explicit definition of a static variable?
Please note that this is NOT a duplicate of previously asked undefined reference to static variable questions. This question intends to ask the reason behind explicit definition of a static variable.
From the beginning of time C++ language, just like C, was built on the principle of independent translation. Each translation unit is compiled by the compiler proper independently, without any knowledge of other translation units. The whole program only comes together later, at linking stage. Linking stage is the earliest stage at which the entire program is seen by linker (it is seen as collection of object files prepared by the compiler proper).
In order to support this principle of independent translation, each entity with external linkage has to be defined in one translation unit, and in only one translation unit. The user is responsible for distributing such entities between different translation units. It is considered a part of user intent, i.e. the user is supposed to decide which translation unit (and object file) will contain each definition.
The same applies to static members of the class. Static members of the class are entities with external linkage. The compiler expects you to define that entity in some translation unit. The whole purpose of this feature is to give you the opportunity to choose that translation unit. The compiler cannot choose it for you. It is, again, a part of your intent, something you have to tell the compiler.
This is no longer as critical as it used to be a while ago, since the language is now designed to deal with (and eliminate) large amount of identical definitions (templates, inline functions, etc.), but the One Definition Rule is still rooted in the principle of independent translation.
In addition to the above, in C++ language the point at which you define your variable will determine the order of its initialization with regard to other variables defined in the same translation unit. This is also a part of user intent, i.e. something the compiler cannot decide without your help.
Starting from C++17 you can declare your static members as inline. This eliminates the need for a separate definition. By declaring them in that fashion you effectively tell compiler that you don't care where this member is physically defined and, consequently, don't care about its initialization order.
In early C++ it was allowed to define the static data members inside the class which certainly violate the idea that class is only a blueprint and does not set memory aside. This has been dropped now.
Putting the definition of static member outside the class emphasize that memory is allocated only once for static data member (at compile time). Each object of that class doesn't have it own copy.
static is a storage type, when you declare the variable you are telling the compiler "this week be in the data section somewhere" and when you subsequently use it, the compiler emits code that loads a value from a TBD address.
In some contexts, the compiler can drive that a static is really a compile time constant and replace it with such, for example
static const int meaning = 42;
Inside a function that never takes the address of the value.
When dealing with class members, however, the compiler can't guess where this value should be created. It might be in a library you will link against, or a dll, or you might be providing a library where the value must be provided by the library consumer.
Usually, when someone asks this, though, it is because they are misusing static members.
If all you want us a constant value, e.g
static int MaxEntries;
...
int Foo::MaxEntries = 10;
You would be better off with one or other of the following
static const int MaxEntries = 10;
// or
enum { MaxEntries = 10 };
The static requires no separate definition until something tries to take the address of or form a reference to the variable, the enum version never does.
Inside the class you are only declaring the variable, ie: you tell the compiler that there is something with this name.
However, a static variable must get some memory space to live in, and this must be inside one translation unit. The compiler reserves this space only when you DEFINE the variable.
Structure is not variable, but its instance is. Hence we can include same structure declaration in multiple modules but we cannot have same instance name defined globally in multiple modules.
Static variable of structure is essentially a global variable. If we define it in structure declaration itself, we won't be able to use the structure declaration in multiple modules. Because that would result in having same global instance name (of static variable) defined in multiple modules causing linker error "Multiple definitions of same symbol"
I have the following working code:
#include <string>
#include <iostream>
class A {
public:
const std::string test = "42";
//static const std::string test = "42"; // fails
};
int main(void){
A a;
std::cout << a.test << '\n';
}
Is there a good reason why it is not possible to make the test a static const ? I do understand prior to c++11 it was constrained by the standard. I thought that c++11 introduced in-class initializations to make it a little bit friendlier. I also not such semantic are available for integral type since quite some time.
Of course it works with the out-of class initialization in form of const std::string A::test = "42";
I guess that, if you can make it non-static, then the problem lies in one of the two. Initializing it out-of-class scope (normally consts are created during the instantiation of the object). But I do not think this is the problem if you are creating an object independant of any other members of the class. The second is having multiple definitions for the static member. E.g. if it were included in several .cpp files, landing into several object-files, and then the linker would have troubles when linking those object together (e.g. into one executable), as they would contain copies of the same symbol. To my understanding, this is exactly equal to the situation when ones provides the out-of-class right under the class declaration in the header, and then includes this common header in more than one place. As I recall, this leads to linker errors.
However, now the responsibility of handling this is moved onto user/programmer. If one wants to have a library with a static they need to provide a out-of-class definition, compile it into a separate object file, and then link all other object to this one, therefore having only one copy of the binary definition of the symbol.
I read the answers in Do we still need to separately define static members, even if they are initialised inside the class definition? and Why can't I initialize non-const static member or static array in class?.
I still would like to know:
Is it only a standard thing, or there is deeper reasoning behind it?
Can this be worked-around with the constexpr and user-defined
literals mechanisms. Both clang and g++ say the variable cannot have non-literal type. Maybe I can make one. (Maybe for some reason its also a bad idea)
Is it really such a big issue for linker to include only one copy of
the symbol? Since it is static const all should be binary-exact
immutable copies.
Plese also comment if I am missing or missunderstanding something.
Your question sort of has two parts. What does the standard say? And why is it so?
For a static member of type const std::string, it is required to be defined outside the class specifier and have one definition in one of the translation units. This is part of the One Definition Rule, and is specified in clause 3 of the C++ standard.
But why?
The problem is that an object with static storage duration needs unique static storage in the final program image, so it needs to be linked from one particular translation unit. The class specifier doesn't have a home in one translation unit, it just defines the type (which is required to be identically defined in all translation units where it is used).
The reason a constant integral doesn't need storage, is that it is used by the compiler as a constant expression and inlined at point of use. It never makes it to the program image.
However a complex type, like a std::string, with static storage duration need storage, even if they are const. This is because they may need to be dynamically initialized (have their constructor called before the entry to main).
You could argue that the compiler should store information about objects with static storage duration in each translation unit where they are used, and then the linker should merge these definitions at link-time into one object in the program image. My guess for why this isn't done, is that it would require too much intelligence from the linker.
Here are two variables declared with the keyword static:
void fcn() {
static int x = 2;
}
class cls() {
static int y;
};
We all know that in order for cls to link properly, int cls::y needs to be explicitly defined by the programmer exactly once.
Based on the answers to static variables in an inlined function , it seems that even though no out-of-class definition is required for fcn::x , it is guaranteed that even inlined versions of fcn from different compilation units will reference the same fcn::x. If this is true, then the linker has to be smart enough to reach between compilation units and connect multiple instances of "the same" variable to ensure that static function variables perform as expected.
If this is possible for static function variables, it seems to me that it should also be possible for static class members... so why does the standard require a single out-of-class definition of static class members?
Yes, the linker will indeed have to merge different instances of fcn::x. In other words, even though formally the language says that fcn::x has no linkage, physically it will have to be exposed as an external symbol in all object files that contain it. This is how it is typically implemented in practice: your compiler will expose fcx::x as some sort of heavily mangled external name ##$%^&_fcx_x or such (to ensure it can never clash with "real" external names). This is what the linker will use to merge all instances of fcn::x into one.
As for class members... Firstly, it is not really about what is "possible". It is about the language-level concepts of declarations and definitions. It is about One Definition Rule, which is a higher-level concept than what is "possible" based on raw linker features. According to that rule, objects with external linkage shall be defined by the user and shall have one and only one definition. Static data members of the class are objects with external linkage. The rest follows.
Secondly, and more practically, there's another serious issue with static data members. It is their order of initialization. Static data members are [guaranteed to be] initialized no later than when the first function from the containing translation unit is called (which refers to the translation unit that contains the data member definition). And static objects declared in a single translation unit are initialized the order of their definition, top-to bottom. This is an important property of static data member initialization process. Allowing static data members to be defined "automatically" would defy this part of the specification and would require massive changes to this part of the language.
In other words, when you provide a dedicated definition for a static data member of the class, you are not just doing it for ODR compliance, you are actually expressing your desired initialization order for that object.
Meanwhile, static variables inside functions are objects with no linkage. Hence they receive a different treatment at the conceptual level. And they have well-defined order-of-initialization semantics that is completely unaffected by the need to merge multiple definitions into one.
I have the following working code:
#include <string>
#include <iostream>
class A {
public:
const std::string test = "42";
//static const std::string test = "42"; // fails
};
int main(void){
A a;
std::cout << a.test << '\n';
}
Is there a good reason why it is not possible to make the test a static const ? I do understand prior to c++11 it was constrained by the standard. I thought that c++11 introduced in-class initializations to make it a little bit friendlier. I also not such semantic are available for integral type since quite some time.
Of course it works with the out-of class initialization in form of const std::string A::test = "42";
I guess that, if you can make it non-static, then the problem lies in one of the two. Initializing it out-of-class scope (normally consts are created during the instantiation of the object). But I do not think this is the problem if you are creating an object independant of any other members of the class. The second is having multiple definitions for the static member. E.g. if it were included in several .cpp files, landing into several object-files, and then the linker would have troubles when linking those object together (e.g. into one executable), as they would contain copies of the same symbol. To my understanding, this is exactly equal to the situation when ones provides the out-of-class right under the class declaration in the header, and then includes this common header in more than one place. As I recall, this leads to linker errors.
However, now the responsibility of handling this is moved onto user/programmer. If one wants to have a library with a static they need to provide a out-of-class definition, compile it into a separate object file, and then link all other object to this one, therefore having only one copy of the binary definition of the symbol.
I read the answers in Do we still need to separately define static members, even if they are initialised inside the class definition? and Why can't I initialize non-const static member or static array in class?.
I still would like to know:
Is it only a standard thing, or there is deeper reasoning behind it?
Can this be worked-around with the constexpr and user-defined
literals mechanisms. Both clang and g++ say the variable cannot have non-literal type. Maybe I can make one. (Maybe for some reason its also a bad idea)
Is it really such a big issue for linker to include only one copy of
the symbol? Since it is static const all should be binary-exact
immutable copies.
Plese also comment if I am missing or missunderstanding something.
Your question sort of has two parts. What does the standard say? And why is it so?
For a static member of type const std::string, it is required to be defined outside the class specifier and have one definition in one of the translation units. This is part of the One Definition Rule, and is specified in clause 3 of the C++ standard.
But why?
The problem is that an object with static storage duration needs unique static storage in the final program image, so it needs to be linked from one particular translation unit. The class specifier doesn't have a home in one translation unit, it just defines the type (which is required to be identically defined in all translation units where it is used).
The reason a constant integral doesn't need storage, is that it is used by the compiler as a constant expression and inlined at point of use. It never makes it to the program image.
However a complex type, like a std::string, with static storage duration need storage, even if they are const. This is because they may need to be dynamically initialized (have their constructor called before the entry to main).
You could argue that the compiler should store information about objects with static storage duration in each translation unit where they are used, and then the linker should merge these definitions at link-time into one object in the program image. My guess for why this isn't done, is that it would require too much intelligence from the linker.