Here is an example code:
enum Foo // or enum class whatever
{ BAR
, STUFF
};
inline const char* to_string( const Foo& foo )
{
static const char* const NAMES[] =
{ "BAR"
, "STUFF"
};
// let's assume I have some boundary checks here, it's not the point
return NAMES[foo];
};
This function is inline, is in a header used in several compilation units.
The goal here is to make the compiler do nothing if there is no use of this function.
Questions:
Does the C++ standard guarantee that NAMES will exists in only one object file, or is it left to the compiler to decide or does it guarantee that every object file will have it's copy?
If there will be multiple copies, will it be a linking problem (I'm assuming I can't test enough compilers to check that).
Will gcc, msvc and clang all optimize out this case by making the final binary have only one instance of NAMES?
Yes, the standard guarantees that there will be only one object. From C++03 §7.1.2/4:
[...] A static
local variable in an extern inline function always refers to the same object. A string literal in an
extern inline function is the same object in different translation units.
(Note that an extern inline function is an inline function with external linkage, i.e. an inline function not marked as static.)
Exactly which object file it appears in will depend on the compiler, but what I suspect happens is that each object file that uses it will get a copy, and the linker will arbitrarily choose one of the symbols and discard the rest.
The standard guarantees that only one copy will be used. It doesn't guarantee that there won't be unused copies taking up space in the code.
The linker is generally responsible for consolidating all the references to use the same instance.
Related
Several times I've seen code like this in a header
IMPORTOREXPORT std::string const& foo();
IMPORTOREXPORT std::string const& bar();
IMPORTOREXPORT std::string const& baz();
and the following corresponding code in a cpp file
std::string const& foo() {
static std::string const s{"foo"};
return s;
}
std::string const& bar() {
static std::string const s{"bar"};
return s;
}
std::string const& baz() {
static std::string const s{"baz"};
return s;
}
where IMPORTOREXPORT is a macro to deal with linkage, so yes, I'm talking of a multi-module code, where different translation units are compiled independently and then linked together.
Is there any reason why I should prefer the above solution to simply have these in the header?
inline std::string const foo = "foo";
inline std::string const bar = "bar";
inline std::string const baz = "baz";
I found this related question, but I'm not sure it entirely answers this. For instance, the accepted answer contains this (part of a) statement
for different modules would be different addresses
but this doesn't seem to be inline with what read on cppreference (emphasis mine):
An inline function or variable (since C++17) with external linkage (e.g. not declared static) has the following additional properties:
It has the same address in every translation unit.
As you note, the inline variable approach guarantees that the variable has a unique address, just like the static local variable approach.
The inline approach is certainly easier to write, but:
It doesn't work on pre-C++17 compilers.
It's more expensive to build:
Every translation unit that odr-uses the inline variable must emit a weak definition for that variable, and the linker must discard all but one of these symbols so that the variable has a unique address.
If the variable has dynamic initialization, then every translation unit that odr-uses it also needs to emit a piece of code to dynamically initialize it, and again, the linker must discard all but one copy.
Every translation unit that includes the header that defines the inline variable is forced to have a compile-time dependency on the header that defines the variable's type, even if there are no uses of the variable besides the definition itself.
There are two implications of using the inline keyword(§ 7.1.3/4):
It hints the compiler that substitution of function body at the point of call is preferable over the usual function call mechanism.
Even if the inline substitution is omitted, the other rules(especially w.r.t One Definition Rule) for inline are followed.
Usually any mainstream compiler will substitute function body at the point of call if needed, so marking function inline merely for #1 is not really needed.
Further w.r.t #2, As I understand when you declare a function as static inline function,
The static keyword on the function forces the inline function to have an internal linkage(inline functions have external linkage) Each instance of such a function is treated as a separate function(address of each function is different) and each instance of these functions have their own copies of static local variables & string literals(an inline function has only one copy of these)
Thus such a function acts like any other static function and the keyword inline has no importance anymore, it becomes redundant.
So, Practically marking a function static and inline both has no use at all. Either it should be static(not most preferred) or inline(most preferred),
So, Is using static and inline together on a function practically useless?
Your analysis is correct, but doesn't necessarily imply uselessness. Even if most compilers do automatically inline functions (reason #1), it's best to declare inline just to describe intent.
Disregarding interaction with inline, static functions should be used sparingly. The static modifier at namespace scope was formerly deprecated in favor of unnamed namespaces (C++03 §D.2). For some obscure reason that I can't recall it was removed from deprecation in C++11 but you should seldom need it.
So, Practically marking a function static and inline both has no use at all. Either it should be static(not most preferred) or inline(most preferred),
There's no notion of preference. static implies that different functions with the same signature may exist in different .cpp files (translation units). inline without static means that it's OK for different translation units to define the same function with identical definitions.
What is preferred is to use an unnamed namespace instead of static:
namespace {
inline void better(); // give the function a unique name
}
static inline void worse(); // kludge the linker to allowing duplicates
Static and inline are orthogonal (independent). Static means the function should not be visible outside of the translation unit, inline is a hint to the compiler the programmer would like to have this function inlined. Those two are not related.
Using static inline makes sense when the inlined function is not used outside of the translation unit. By using it you can prevent a situation of accidental violation of ODR rule by naming another inlined function in another tranlation unit with the same name.
Example:
source1.cpp:
inline int Foo()
{
return 1;
}
int Bar1()
{
return Foo();
}
source2.cpp:
inline int Foo()
{
return 2;
}
int Bar2()
{
return Foo();
}
Without using static on Foo (or without using an anonymous namespace, which is preferred way by most C++ programmers), this example violates ODR and the results are undefined. You can test with Visual Studio the result of Bar1/Bar2 will depend on compiler settings - in Debug configuration both Bar1 and Bar2 will return the same value (inlining not used, one implementation selected randomly by the linker), in Release configuration each of them will return the intended value.
I may not be completely right about this, but as far as I know declaring a function static inline is the only way to make (or allow) the compiler to generate a machine code where the function really is not defined in the compiled code at all, and all you have is a direct substitution of the function call into a sequence of instructions, like it were just a regular procedure body, with no trace in the machine code of a procedure call relative to that function definition from the source code.
That is, only with static inline you can really substitute the use of a macro, inline by itself is not enough.
A simple Google search for "static inline" will show you compiler documentation pages that talk about it. I guess this should be enough to answer your question, and say, "no, it is not practically useless". Here is one example of a site discussing the use of inline, and specifically of static inline http://www.greenend.org.uk/rjk/tech/inline.html
If you talk about free functions (namespace scope), then your assumption is correct. static inline functions indeed don't have much value. So static inline is simply a static function, which automatically satisfies ODR and inline is redundant for ODR purpose.
However when we talk about member methods (class scope), the static inline function does have the value.
Once you declare a class method as inline, it's full body has to be visible to all translation units which includes that class.
Remember that static keyword has a different meaning when it comes for a class.
Edit: As you may know that static function inside a class doesn't have internal linkage, in other words a class cannot have different copies of its static method depending on the translation (.cpp) units.
But a free static function at namespace/global scope does have different copies per every translation unit.
e.g.
// file.h
static void foo () {}
struct A {
static void foo () {}
};
// file1.cpp
#include"file.h"
void x1 ()
{
foo(); // different function exclusive to file1.cpp
A::foo(); // same function
}
// file2.cpp
#include"file.h"
void x2 ()
{
foo(); // different function exclusive to file2.cpp
A::foo(); // same function
}
I just read a man page for gcc and it specifically states the use of static inline with a compiler flag. In the case of the flag, it inlines the function and if it is also static and is inlined in every instance that it is called, then it gets rid of the function definition which will never be used in the created object file, thereby reducing the size of the generated code by that little bit.
For static member variables in C++ class - the initialization is done outside the class. I wonder why? Any logical reasoning/constraint for this? Or is it purely legacy implementation - which the standard does not want to correct?
I think having initialization in the class is more "intuitive" and less confusing.It also gives the sense of both static and global-ness of the variable. For example if you see the static const member.
Fundamentally this is because static members must be defined in exactly one translation unit, in order to not violate the One-Definition Rule. If the language were to allow something like:
struct Gizmo
{
static string name = "Foo";
};
then name would be defined in each translation unit that #includes this header file.
C++ does allow you to define integral static members within the declaration, but you still have to include a definition within a single translation unit, but this is just a shortcut, or syntactic sugar. So, this is allowed:
struct Gizmo
{
static const int count = 42;
};
So long as a) the expression is const integral or enumeration type, b) the expression can be evaluated at compile-time, and c) there is still a definition somewhere that doesn't violate the one definition rule:
file: gizmo.cpp
#include "gizmo.h"
const int Gizmo::count;
In C++ since the beginning of times the presence of an initializer was an exclusive attribute of object definition, i.e. a declaration with an initializer is always a definition (almost always).
As you must know, each external object used in C++ program has to be defined once and only once in only one translation unit. Allowing in-class initializers for static objects would immediately go against this convention: the initializers would go into header files (where class definitions usually reside) and thus generate multiple definitions of the same static object (one for each translation unit that includes the header file). This is, of course, unacceptable. For this reason, the declaration approach for static class members is left perfectly "traditional": you only declare it in the header file (i.e. no initializer allowed), and then you define it in a translation unit of your choice (possibly with an initializer).
One exception from this rule was made for const static class members of integral or enum types, because such entries can for Integral Constant Expressions (ICEs). The main idea of ICEs is that they are evaluated at compile time and thus do no depend on definitions of the objects involved. Which is why this exception was possible for integral or enum types. But for other types it would just contradict the basic declaration/definition principles of C++.
It's because of the way the code is compiled. If you were to initialize it in the class, which often is in the header, every time the header is included you'd get an instance of the static variable. This is definitely not the intent. Having it initialized outside the class gives you the possibility to initialize it in the cpp file.
Section 9.4.2, Static data members, of the C++ standard states:
If a static data member is of const integral or const enumeration type, its declaration in the class definition can specify a const-initializer which shall be an integral constant expression.
Therefore, it is possible for the value of a static data member to be included "within the class" (by which I presume that you mean within the declaration of the class). However, the type of the static data member must be a const integral or const enumeration type. The reason why the values of static data members of other types cannot be specified within the class declaration is that non-trivial initialization is likely required (that is, a constructor needs to run).
Imagine if the following were legal:
// my_class.hpp
#include <string>
class my_class
{
public:
static std::string str = "static std::string";
//...
Each object file corresponding to CPP files that include this header would not only have a copy of the storage space for my_class::str (consisting of sizeof(std::string) bytes), but also a "ctor section" that calls the std::string constructor taking a C-string. Each copy of the storage space for my_class::str would be identified by a common label, so a linker could theoretically merge all copies of the storage space into a single one. However, a linker would not be able to isolate all copies of the constructor code within the object files' ctor sections. It would be like asking the linker to remove all of the code to initialize str in the compilation of the following:
std::map<std::string, std::string> map;
std::vector<int> vec;
std::string str = "test";
int c = 99;
my_class mc;
std::string str2 = "test2";
EDIT It is instructive to look at the assembler output of g++ for the following code:
// SO4547660.cpp
#include <string>
class my_class
{
public:
static std::string str;
};
std::string my_class::str = "static std::string";
The assembly code can be obtained by executing:
g++ -S SO4547660.cpp
Looking through the SO4547660.s file that g++ generates, you can see that there is a lot of code for such a small source file.
__ZN8my_class3strE is the label of the storage space for my_class::str. There is also the assembly source of a __static_initialization_and_destruction_0(int, int) function, which has the label __Z41__static_initialization_and_destruction_0ii. That function is special to g++ but just know that g++ will make sure that it gets called before any non-initializer code gets executed. Notice that the implementation of this function calls __ZNSsC1EPKcRKSaIcE. This is the mangled symbol for std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&).
Going back to the hypothetical example above and using these details, each object file corresponding to a CPP file that includes my_class.hpp would have the label
__ZN8my_class3strE for sizeof(std::string) bytes as well as assembly code to call __ZNSsC1EPKcRKSaIcE within its implementation of the __static_initialization_and_destruction_0(int, int) function. The linker can easily merge all occurrences of __ZN8my_class3strE, but it cannot possibly isolate the code that calls __ZNSsC1EPKcRKSaIcE within the object file's implementation of __static_initialization_and_destruction_0(int, int).
I think the main reason to have initialization done outside the class block is to allow for initialization with return values of other class member functions. If you wanted to intialize a::var with b::some_static_fn() you'd need to make sure that every .cpp file that includes a.h includes b.h first. It'd be a mess, especially when (sooner or later) you run into a circular reference that you could only resolve with an otherwise unnecessary interface. The same issue is the main reason for having class member function implementations in a .cpp file instead of putting everything in your main class' .h.
At least with member functions you do have the option to implement them in the header. With variables you must do the initialization in a .cpp file. I don't quite agree with the limitation, and I don't think there's a good reason for it either.
For static member variables in C++ class - the initialization is done outside the class. I wonder why? Any logical reasoning/constraint for this? Or is it purely legacy implementation - which the standard does not want to correct?
I think having initialization in the class is more "intuitive" and less confusing.It also gives the sense of both static and global-ness of the variable. For example if you see the static const member.
Fundamentally this is because static members must be defined in exactly one translation unit, in order to not violate the One-Definition Rule. If the language were to allow something like:
struct Gizmo
{
static string name = "Foo";
};
then name would be defined in each translation unit that #includes this header file.
C++ does allow you to define integral static members within the declaration, but you still have to include a definition within a single translation unit, but this is just a shortcut, or syntactic sugar. So, this is allowed:
struct Gizmo
{
static const int count = 42;
};
So long as a) the expression is const integral or enumeration type, b) the expression can be evaluated at compile-time, and c) there is still a definition somewhere that doesn't violate the one definition rule:
file: gizmo.cpp
#include "gizmo.h"
const int Gizmo::count;
In C++ since the beginning of times the presence of an initializer was an exclusive attribute of object definition, i.e. a declaration with an initializer is always a definition (almost always).
As you must know, each external object used in C++ program has to be defined once and only once in only one translation unit. Allowing in-class initializers for static objects would immediately go against this convention: the initializers would go into header files (where class definitions usually reside) and thus generate multiple definitions of the same static object (one for each translation unit that includes the header file). This is, of course, unacceptable. For this reason, the declaration approach for static class members is left perfectly "traditional": you only declare it in the header file (i.e. no initializer allowed), and then you define it in a translation unit of your choice (possibly with an initializer).
One exception from this rule was made for const static class members of integral or enum types, because such entries can for Integral Constant Expressions (ICEs). The main idea of ICEs is that they are evaluated at compile time and thus do no depend on definitions of the objects involved. Which is why this exception was possible for integral or enum types. But for other types it would just contradict the basic declaration/definition principles of C++.
It's because of the way the code is compiled. If you were to initialize it in the class, which often is in the header, every time the header is included you'd get an instance of the static variable. This is definitely not the intent. Having it initialized outside the class gives you the possibility to initialize it in the cpp file.
Section 9.4.2, Static data members, of the C++ standard states:
If a static data member is of const integral or const enumeration type, its declaration in the class definition can specify a const-initializer which shall be an integral constant expression.
Therefore, it is possible for the value of a static data member to be included "within the class" (by which I presume that you mean within the declaration of the class). However, the type of the static data member must be a const integral or const enumeration type. The reason why the values of static data members of other types cannot be specified within the class declaration is that non-trivial initialization is likely required (that is, a constructor needs to run).
Imagine if the following were legal:
// my_class.hpp
#include <string>
class my_class
{
public:
static std::string str = "static std::string";
//...
Each object file corresponding to CPP files that include this header would not only have a copy of the storage space for my_class::str (consisting of sizeof(std::string) bytes), but also a "ctor section" that calls the std::string constructor taking a C-string. Each copy of the storage space for my_class::str would be identified by a common label, so a linker could theoretically merge all copies of the storage space into a single one. However, a linker would not be able to isolate all copies of the constructor code within the object files' ctor sections. It would be like asking the linker to remove all of the code to initialize str in the compilation of the following:
std::map<std::string, std::string> map;
std::vector<int> vec;
std::string str = "test";
int c = 99;
my_class mc;
std::string str2 = "test2";
EDIT It is instructive to look at the assembler output of g++ for the following code:
// SO4547660.cpp
#include <string>
class my_class
{
public:
static std::string str;
};
std::string my_class::str = "static std::string";
The assembly code can be obtained by executing:
g++ -S SO4547660.cpp
Looking through the SO4547660.s file that g++ generates, you can see that there is a lot of code for such a small source file.
__ZN8my_class3strE is the label of the storage space for my_class::str. There is also the assembly source of a __static_initialization_and_destruction_0(int, int) function, which has the label __Z41__static_initialization_and_destruction_0ii. That function is special to g++ but just know that g++ will make sure that it gets called before any non-initializer code gets executed. Notice that the implementation of this function calls __ZNSsC1EPKcRKSaIcE. This is the mangled symbol for std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&).
Going back to the hypothetical example above and using these details, each object file corresponding to a CPP file that includes my_class.hpp would have the label
__ZN8my_class3strE for sizeof(std::string) bytes as well as assembly code to call __ZNSsC1EPKcRKSaIcE within its implementation of the __static_initialization_and_destruction_0(int, int) function. The linker can easily merge all occurrences of __ZN8my_class3strE, but it cannot possibly isolate the code that calls __ZNSsC1EPKcRKSaIcE within the object file's implementation of __static_initialization_and_destruction_0(int, int).
I think the main reason to have initialization done outside the class block is to allow for initialization with return values of other class member functions. If you wanted to intialize a::var with b::some_static_fn() you'd need to make sure that every .cpp file that includes a.h includes b.h first. It'd be a mess, especially when (sooner or later) you run into a circular reference that you could only resolve with an otherwise unnecessary interface. The same issue is the main reason for having class member function implementations in a .cpp file instead of putting everything in your main class' .h.
At least with member functions you do have the option to implement them in the header. With variables you must do the initialization in a .cpp file. I don't quite agree with the limitation, and I don't think there's a good reason for it either.
For static member variables in C++ class - the initialization is done outside the class. I wonder why? Any logical reasoning/constraint for this? Or is it purely legacy implementation - which the standard does not want to correct?
I think having initialization in the class is more "intuitive" and less confusing.It also gives the sense of both static and global-ness of the variable. For example if you see the static const member.
Fundamentally this is because static members must be defined in exactly one translation unit, in order to not violate the One-Definition Rule. If the language were to allow something like:
struct Gizmo
{
static string name = "Foo";
};
then name would be defined in each translation unit that #includes this header file.
C++ does allow you to define integral static members within the declaration, but you still have to include a definition within a single translation unit, but this is just a shortcut, or syntactic sugar. So, this is allowed:
struct Gizmo
{
static const int count = 42;
};
So long as a) the expression is const integral or enumeration type, b) the expression can be evaluated at compile-time, and c) there is still a definition somewhere that doesn't violate the one definition rule:
file: gizmo.cpp
#include "gizmo.h"
const int Gizmo::count;
In C++ since the beginning of times the presence of an initializer was an exclusive attribute of object definition, i.e. a declaration with an initializer is always a definition (almost always).
As you must know, each external object used in C++ program has to be defined once and only once in only one translation unit. Allowing in-class initializers for static objects would immediately go against this convention: the initializers would go into header files (where class definitions usually reside) and thus generate multiple definitions of the same static object (one for each translation unit that includes the header file). This is, of course, unacceptable. For this reason, the declaration approach for static class members is left perfectly "traditional": you only declare it in the header file (i.e. no initializer allowed), and then you define it in a translation unit of your choice (possibly with an initializer).
One exception from this rule was made for const static class members of integral or enum types, because such entries can for Integral Constant Expressions (ICEs). The main idea of ICEs is that they are evaluated at compile time and thus do no depend on definitions of the objects involved. Which is why this exception was possible for integral or enum types. But for other types it would just contradict the basic declaration/definition principles of C++.
It's because of the way the code is compiled. If you were to initialize it in the class, which often is in the header, every time the header is included you'd get an instance of the static variable. This is definitely not the intent. Having it initialized outside the class gives you the possibility to initialize it in the cpp file.
Section 9.4.2, Static data members, of the C++ standard states:
If a static data member is of const integral or const enumeration type, its declaration in the class definition can specify a const-initializer which shall be an integral constant expression.
Therefore, it is possible for the value of a static data member to be included "within the class" (by which I presume that you mean within the declaration of the class). However, the type of the static data member must be a const integral or const enumeration type. The reason why the values of static data members of other types cannot be specified within the class declaration is that non-trivial initialization is likely required (that is, a constructor needs to run).
Imagine if the following were legal:
// my_class.hpp
#include <string>
class my_class
{
public:
static std::string str = "static std::string";
//...
Each object file corresponding to CPP files that include this header would not only have a copy of the storage space for my_class::str (consisting of sizeof(std::string) bytes), but also a "ctor section" that calls the std::string constructor taking a C-string. Each copy of the storage space for my_class::str would be identified by a common label, so a linker could theoretically merge all copies of the storage space into a single one. However, a linker would not be able to isolate all copies of the constructor code within the object files' ctor sections. It would be like asking the linker to remove all of the code to initialize str in the compilation of the following:
std::map<std::string, std::string> map;
std::vector<int> vec;
std::string str = "test";
int c = 99;
my_class mc;
std::string str2 = "test2";
EDIT It is instructive to look at the assembler output of g++ for the following code:
// SO4547660.cpp
#include <string>
class my_class
{
public:
static std::string str;
};
std::string my_class::str = "static std::string";
The assembly code can be obtained by executing:
g++ -S SO4547660.cpp
Looking through the SO4547660.s file that g++ generates, you can see that there is a lot of code for such a small source file.
__ZN8my_class3strE is the label of the storage space for my_class::str. There is also the assembly source of a __static_initialization_and_destruction_0(int, int) function, which has the label __Z41__static_initialization_and_destruction_0ii. That function is special to g++ but just know that g++ will make sure that it gets called before any non-initializer code gets executed. Notice that the implementation of this function calls __ZNSsC1EPKcRKSaIcE. This is the mangled symbol for std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&).
Going back to the hypothetical example above and using these details, each object file corresponding to a CPP file that includes my_class.hpp would have the label
__ZN8my_class3strE for sizeof(std::string) bytes as well as assembly code to call __ZNSsC1EPKcRKSaIcE within its implementation of the __static_initialization_and_destruction_0(int, int) function. The linker can easily merge all occurrences of __ZN8my_class3strE, but it cannot possibly isolate the code that calls __ZNSsC1EPKcRKSaIcE within the object file's implementation of __static_initialization_and_destruction_0(int, int).
I think the main reason to have initialization done outside the class block is to allow for initialization with return values of other class member functions. If you wanted to intialize a::var with b::some_static_fn() you'd need to make sure that every .cpp file that includes a.h includes b.h first. It'd be a mess, especially when (sooner or later) you run into a circular reference that you could only resolve with an otherwise unnecessary interface. The same issue is the main reason for having class member function implementations in a .cpp file instead of putting everything in your main class' .h.
At least with member functions you do have the option to implement them in the header. With variables you must do the initialization in a .cpp file. I don't quite agree with the limitation, and I don't think there's a good reason for it either.