C++ singleton lazy initialization implementation and linkage seems conflict - c++

C++ Singleton design pattern I come across this question and learned that there are two ways to implement the singleton pattern in c++.
1) allocate the single instance in heap and return it in the instance() call
2) return a static instance in the instance() call, this is also known as the lazy initialization implementation.
But I think the second, that is the lazy initialization implementation, is wrong due to following reasons.
Static object returned from the instance() call has internal linkage and will have unique copies in different translation unit. So if user modifies the singleton, it will not be reflected in any other translation unit.
But there are many statement that the second implementation is correct, am I missing something?

In the context of a method, the static keyword is not about linkage. It just affects the "storage class" of the defined variable. And for static local variables the standard explicitly states:
9.3.6 A static local variable in a member function always refers to the same object, whether or not the member function is inline.
So it doesn't matter at all whether you put the code in a header or cpp file.
Note that for free / non-member function it does indeed depend on the linkage of the function, as KerrekSB pointed out.

The linkage of the name of implementation object does not matter. What matters is the linkage of the name of the function you use to access the object, and that name has, of course, external linkage:
thing.h:
Thing & TheThing(); // external linkage
thing.cpp:
#include "thing.h"
Thing & TheThing() { static Thing impl; return impl; }
Every use of the name TheThing in the program refers to the same entity, namely the function defined (uniquely) in thing.cpp.
Remember, linkage is a property of names, not of objects.

You are wrong, because the singleton is defined in one single translation unit, the one that contains the definition of the function that returns it. That means that all translation units that wants to use the singleton ask it to the single one that actually defines it, and in the end all use the same object (as expected for a singleton pattern :-) ).

Related

Multiple static const int class variables in DLL [duplicate]

In the class:
class foo
{
public:
static int bar; //declaration of static data member
};
int foo::bar = 0; //definition of data member
We have to explicitly define the static variable, otherwise it will result in a
undefined reference to 'foo::bar'
My question is:
Why do we have to give an explicit definition of a static variable?
Please note that this is NOT a duplicate of previously asked undefined reference to static variable questions. This question intends to ask the reason behind explicit definition of a static variable.
From the beginning of time C++ language, just like C, was built on the principle of independent translation. Each translation unit is compiled by the compiler proper independently, without any knowledge of other translation units. The whole program only comes together later, at linking stage. Linking stage is the earliest stage at which the entire program is seen by linker (it is seen as collection of object files prepared by the compiler proper).
In order to support this principle of independent translation, each entity with external linkage has to be defined in one translation unit, and in only one translation unit. The user is responsible for distributing such entities between different translation units. It is considered a part of user intent, i.e. the user is supposed to decide which translation unit (and object file) will contain each definition.
The same applies to static members of the class. Static members of the class are entities with external linkage. The compiler expects you to define that entity in some translation unit. The whole purpose of this feature is to give you the opportunity to choose that translation unit. The compiler cannot choose it for you. It is, again, a part of your intent, something you have to tell the compiler.
This is no longer as critical as it used to be a while ago, since the language is now designed to deal with (and eliminate) large amount of identical definitions (templates, inline functions, etc.), but the One Definition Rule is still rooted in the principle of independent translation.
In addition to the above, in C++ language the point at which you define your variable will determine the order of its initialization with regard to other variables defined in the same translation unit. This is also a part of user intent, i.e. something the compiler cannot decide without your help.
Starting from C++17 you can declare your static members as inline. This eliminates the need for a separate definition. By declaring them in that fashion you effectively tell compiler that you don't care where this member is physically defined and, consequently, don't care about its initialization order.
In early C++ it was allowed to define the static data members inside the class which certainly violate the idea that class is only a blueprint and does not set memory aside. This has been dropped now.
Putting the definition of static member outside the class emphasize that memory is allocated only once for static data member (at compile time). Each object of that class doesn't have it own copy.
static is a storage type, when you declare the variable you are telling the compiler "this week be in the data section somewhere" and when you subsequently use it, the compiler emits code that loads a value from a TBD address.
In some contexts, the compiler can drive that a static is really a compile time constant and replace it with such, for example
static const int meaning = 42;
Inside a function that never takes the address of the value.
When dealing with class members, however, the compiler can't guess where this value should be created. It might be in a library you will link against, or a dll, or you might be providing a library where the value must be provided by the library consumer.
Usually, when someone asks this, though, it is because they are misusing static members.
If all you want us a constant value, e.g
static int MaxEntries;
...
int Foo::MaxEntries = 10;
You would be better off with one or other of the following
static const int MaxEntries = 10;
// or
enum { MaxEntries = 10 };
The static requires no separate definition until something tries to take the address of or form a reference to the variable, the enum version never does.
Inside the class you are only declaring the variable, ie: you tell the compiler that there is something with this name.
However, a static variable must get some memory space to live in, and this must be inside one translation unit. The compiler reserves this space only when you DEFINE the variable.
Structure is not variable, but its instance is. Hence we can include same structure declaration in multiple modules but we cannot have same instance name defined globally in multiple modules.
Static variable of structure is essentially a global variable. If we define it in structure declaration itself, we won't be able to use the structure declaration in multiple modules. Because that would result in having same global instance name (of static variable) defined in multiple modules causing linker error "Multiple definitions of same symbol"

Why can't we declare static variables in the class definition? [duplicate]

I have the following working code:
#include <string>
#include <iostream>
class A {
public:
const std::string test = "42";
//static const std::string test = "42"; // fails
};
int main(void){
A a;
std::cout << a.test << '\n';
}
Is there a good reason why it is not possible to make the test a static const ? I do understand prior to c++11 it was constrained by the standard. I thought that c++11 introduced in-class initializations to make it a little bit friendlier. I also not such semantic are available for integral type since quite some time.
Of course it works with the out-of class initialization in form of const std::string A::test = "42";
I guess that, if you can make it non-static, then the problem lies in one of the two. Initializing it out-of-class scope (normally consts are created during the instantiation of the object). But I do not think this is the problem if you are creating an object independant of any other members of the class. The second is having multiple definitions for the static member. E.g. if it were included in several .cpp files, landing into several object-files, and then the linker would have troubles when linking those object together (e.g. into one executable), as they would contain copies of the same symbol. To my understanding, this is exactly equal to the situation when ones provides the out-of-class right under the class declaration in the header, and then includes this common header in more than one place. As I recall, this leads to linker errors.
However, now the responsibility of handling this is moved onto user/programmer. If one wants to have a library with a static they need to provide a out-of-class definition, compile it into a separate object file, and then link all other object to this one, therefore having only one copy of the binary definition of the symbol.
I read the answers in Do we still need to separately define static members, even if they are initialised inside the class definition? and Why can't I initialize non-const static member or static array in class?.
I still would like to know:
Is it only a standard thing, or there is deeper reasoning behind it?
Can this be worked-around with the constexpr and user-defined
literals mechanisms. Both clang and g++ say the variable cannot have non-literal type. Maybe I can make one. (Maybe for some reason its also a bad idea)
Is it really such a big issue for linker to include only one copy of
the symbol? Since it is static const all should be binary-exact
immutable copies.
Plese also comment if I am missing or missunderstanding something.
Your question sort of has two parts. What does the standard say? And why is it so?
For a static member of type const std::string, it is required to be defined outside the class specifier and have one definition in one of the translation units. This is part of the One Definition Rule, and is specified in clause 3 of the C++ standard.
But why?
The problem is that an object with static storage duration needs unique static storage in the final program image, so it needs to be linked from one particular translation unit. The class specifier doesn't have a home in one translation unit, it just defines the type (which is required to be identically defined in all translation units where it is used).
The reason a constant integral doesn't need storage, is that it is used by the compiler as a constant expression and inlined at point of use. It never makes it to the program image.
However a complex type, like a std::string, with static storage duration need storage, even if they are const. This is because they may need to be dynamically initialized (have their constructor called before the entry to main).
You could argue that the compiler should store information about objects with static storage duration in each translation unit where they are used, and then the linker should merge these definitions at link-time into one object in the program image. My guess for why this isn't done, is that it would require too much intelligence from the linker.

Why static variable needs to be explicitly defined?

In the class:
class foo
{
public:
static int bar; //declaration of static data member
};
int foo::bar = 0; //definition of data member
We have to explicitly define the static variable, otherwise it will result in a
undefined reference to 'foo::bar'
My question is:
Why do we have to give an explicit definition of a static variable?
Please note that this is NOT a duplicate of previously asked undefined reference to static variable questions. This question intends to ask the reason behind explicit definition of a static variable.
From the beginning of time C++ language, just like C, was built on the principle of independent translation. Each translation unit is compiled by the compiler proper independently, without any knowledge of other translation units. The whole program only comes together later, at linking stage. Linking stage is the earliest stage at which the entire program is seen by linker (it is seen as collection of object files prepared by the compiler proper).
In order to support this principle of independent translation, each entity with external linkage has to be defined in one translation unit, and in only one translation unit. The user is responsible for distributing such entities between different translation units. It is considered a part of user intent, i.e. the user is supposed to decide which translation unit (and object file) will contain each definition.
The same applies to static members of the class. Static members of the class are entities with external linkage. The compiler expects you to define that entity in some translation unit. The whole purpose of this feature is to give you the opportunity to choose that translation unit. The compiler cannot choose it for you. It is, again, a part of your intent, something you have to tell the compiler.
This is no longer as critical as it used to be a while ago, since the language is now designed to deal with (and eliminate) large amount of identical definitions (templates, inline functions, etc.), but the One Definition Rule is still rooted in the principle of independent translation.
In addition to the above, in C++ language the point at which you define your variable will determine the order of its initialization with regard to other variables defined in the same translation unit. This is also a part of user intent, i.e. something the compiler cannot decide without your help.
Starting from C++17 you can declare your static members as inline. This eliminates the need for a separate definition. By declaring them in that fashion you effectively tell compiler that you don't care where this member is physically defined and, consequently, don't care about its initialization order.
In early C++ it was allowed to define the static data members inside the class which certainly violate the idea that class is only a blueprint and does not set memory aside. This has been dropped now.
Putting the definition of static member outside the class emphasize that memory is allocated only once for static data member (at compile time). Each object of that class doesn't have it own copy.
static is a storage type, when you declare the variable you are telling the compiler "this week be in the data section somewhere" and when you subsequently use it, the compiler emits code that loads a value from a TBD address.
In some contexts, the compiler can drive that a static is really a compile time constant and replace it with such, for example
static const int meaning = 42;
Inside a function that never takes the address of the value.
When dealing with class members, however, the compiler can't guess where this value should be created. It might be in a library you will link against, or a dll, or you might be providing a library where the value must be provided by the library consumer.
Usually, when someone asks this, though, it is because they are misusing static members.
If all you want us a constant value, e.g
static int MaxEntries;
...
int Foo::MaxEntries = 10;
You would be better off with one or other of the following
static const int MaxEntries = 10;
// or
enum { MaxEntries = 10 };
The static requires no separate definition until something tries to take the address of or form a reference to the variable, the enum version never does.
Inside the class you are only declaring the variable, ie: you tell the compiler that there is something with this name.
However, a static variable must get some memory space to live in, and this must be inside one translation unit. The compiler reserves this space only when you DEFINE the variable.
Structure is not variable, but its instance is. Hence we can include same structure declaration in multiple modules but we cannot have same instance name defined globally in multiple modules.
Static variable of structure is essentially a global variable. If we define it in structure declaration itself, we won't be able to use the structure declaration in multiple modules. Because that would result in having same global instance name (of static variable) defined in multiple modules causing linker error "Multiple definitions of same symbol"

C++: Static Members can't be Defined at Declaration, but Static Function Variables can?

Here are two variables declared with the keyword static:
void fcn() {
static int x = 2;
}
class cls() {
static int y;
};
We all know that in order for cls to link properly, int cls::y needs to be explicitly defined by the programmer exactly once.
Based on the answers to static variables in an inlined function , it seems that even though no out-of-class definition is required for fcn::x , it is guaranteed that even inlined versions of fcn from different compilation units will reference the same fcn::x. If this is true, then the linker has to be smart enough to reach between compilation units and connect multiple instances of "the same" variable to ensure that static function variables perform as expected.
If this is possible for static function variables, it seems to me that it should also be possible for static class members... so why does the standard require a single out-of-class definition of static class members?
Yes, the linker will indeed have to merge different instances of fcn::x. In other words, even though formally the language says that fcn::x has no linkage, physically it will have to be exposed as an external symbol in all object files that contain it. This is how it is typically implemented in practice: your compiler will expose fcx::x as some sort of heavily mangled external name ##$%^&_fcx_x or such (to ensure it can never clash with "real" external names). This is what the linker will use to merge all instances of fcn::x into one.
As for class members... Firstly, it is not really about what is "possible". It is about the language-level concepts of declarations and definitions. It is about One Definition Rule, which is a higher-level concept than what is "possible" based on raw linker features. According to that rule, objects with external linkage shall be defined by the user and shall have one and only one definition. Static data members of the class are objects with external linkage. The rest follows.
Secondly, and more practically, there's another serious issue with static data members. It is their order of initialization. Static data members are [guaranteed to be] initialized no later than when the first function from the containing translation unit is called (which refers to the translation unit that contains the data member definition). And static objects declared in a single translation unit are initialized the order of their definition, top-to bottom. This is an important property of static data member initialization process. Allowing static data members to be defined "automatically" would defy this part of the specification and would require massive changes to this part of the language.
In other words, when you provide a dedicated definition for a static data member of the class, you are not just doing it for ODR compliance, you are actually expressing your desired initialization order for that object.
Meanwhile, static variables inside functions are objects with no linkage. Hence they receive a different treatment at the conceptual level. And they have well-defined order-of-initialization semantics that is completely unaffected by the need to merge multiple definitions into one.

Static vs New/Malloc

I was wondering if people could shed some light on the uses of "static." I have never run into an issue where I have explicitly declared a variable or method as static. I understand that when declaring something as "static" it gets stuffed into the data segment of your program, similar to globals, and hence the variable is accessible for the run of your program. If this is the case, why not just make a static variable a global variable. Hell, why not just throw this variable on the heap using a new or a malloc, both methods ensure the variable will be available for you throughout the run of your program.
static has multiple meanings in C, and C++ heaps on even more.
In a file scope declaration (what I think the question is about), static controls the visibility of an identifier.
Let's set aside C++ and use the C concepts.
File scope identifiers which name objects or functions have linkage. Linkage can be external (program-wide) or internal (within one translation unit).
static specifies internal linkage.
This is important because if a name with internal linkage appears in multiple units, those occurrences are not related. One module can have a static foo function and another one in the same program can have a different foo function. They both exist and are reachable by the name foo from their respective units.
This is not possible with external linkage: there must be one foo.
malloc creates an object which is potentially available everywhere in a program, as long as it is not freed, but in a different sense. The object is available if you have its pointer. A pointer is a kind of "run time name": an access key to get to the object. Linkage makes an object or function available if you know its name (at compile time) and if that object and function has the right kind of linkage relative to where you're trying to access it from.
In a dynamic operating system in which multiple programs come into life and terminate, the storage for its static data and functions (whether they have external or internal linkage) is in fact dynamically allocated. The system routine which loads a program has to do something similar to malloc to fetch memory for all of the fixed areas of the program.
Sometimes C programs use malloc even for "singleton" objects that are referenced globally via global pointers. These objects behave like de-facto static variables since they basically have a lifetime which is almost that of the entire program, and are accessed through the pointer, which is accessed by name. This is useful if the objects have properties (such as size) that is not known until run time, or if their initialization is expensive and they are not always needed (only when certain cases occur in the program).
Supplemental factoids about static and extern:
In C, at file scope, extern ensures that the declaration of an object, where an initializer is omitted, is in fact a declaration. Without extern it is a tentative definition, but if an initializer is present, then it is a definition.
In C, at file scope extern doesn't mean "this declaration has external linkage", surprisingly. An extern declaration inherits linkage from a previous declaration of the same name.
A block-scope extern in C means "this name, which is being introduced into this scope, refers to the external definition with external linkage". The linkage is inherited from a previous file-scope declaration of the name, if it exists, otherwise it is external.
A block-scope static on an object controls not linkage, but storage duration. A static object is not instantiated each time on entry into the block; a single copy of it exists, and can be initialized prior to program startup. (In C++, non-constant expressions can initialize such object or its members; in that case, initialization occurs on the first execution of the block).
A block-scope static function declaration declares a function with internal linkage.
There is no way, in a block scope, to declare an external object name which has internal linkage. Paradoxically, the first extern declaration in the following snippet is correct, but the second, block-scope one, is erroneous!
static int name; /* external name with internal linkage */
extern int name; /* redundant redeclaration of the above */
void foo(void)
{
int name; /* local variable shadowing external one */
{
/* attempt to "punch through" shadow and reach external: */
extern int name; /* ERROR! */
}
}
Clearly, the word "external" has an ambiguous meaning between "outside of any function" and "program-wide linkage" and this ambiguity is embroiled in the extern keyword.
In C++, static takes on additional meanings. In a class declaration, it declares "static member functions" which belong to the class scope and have the same access to class instances as non-static member functions do, but are not invoked on objects (do not have the implicit this parameter). Class data members marked static have a single class-wide instance; they are not instantiated per-object. (Unfortunately, they don't participate properly in inheritance like true object-oriented class variables, which can be overridden in a derived class to be instance or vice versa.)
In C++, a privacy similar to internal linkage can be achieved using an unnamed namespace rather than static. Namespaces make the internal/external linkage concept mostly an obsolete mechanism for C compatibility.
C++ involves extern in the special extern "LANG" syntax (e.g. extern "C").
static_cast is unrelated to static; what they have in common is "static" meaning "prior to program run time": static storage is determined prior to run time, and the conversion of a static casts is also determined at compile time (without run-time-type info).