I have these three files
// foo.h
#pragma once
template <typename T> const T foo;
template <>
const int foo<int> = 1;
// a.cpp
#include "foo.h"
int main() {}
// b.cpp
#include "foo.h"
If I build these with GCC 7.5.0 on Linux it works fine:
$ g++ -std=c++17 a.cpp b.cpp -o out
$
However with Apple Clang 12.0.0 on Mac it gives this error:
$ clang++ -std=c++17 a.cpp b.cpp -o out
duplicate symbol 'foo<int>' in:
/var/folders/g5/8twmk1xj481_6btvppyw5j4h0000gp/T/a-62bdde.o
/var/folders/g5/8twmk1xj481_6btvppyw5j4h0000gp/T/b-ea4997.o
ld: 1 duplicate symbol for architecture x86_64
Should this be an error, or is it a bug in Clang?
RedFrog's answer is correct - this violates the One Definition Rule (ODR) but it doesn't really explain why it violates the ODR. In C++ global const variables have internal (static) linkage, which should result in a separate instance of the variable for each compilation unit, so ODR isn't violated. However it turns out that const does not cause template variables to have internal linkage:
The const qualifier used on a declaration of a non-local non-volatile non-template (since C++14) non-inline (since C++17) variable that is not declared extern gives it internal linkage.
There are a few ways to fix this.
Internal Linkage using static
You can use static on template variables to gives them internal linkage:
Any of the following names declared at namespace scope have internal linkage:
variables, variable templates (since C++14), functions, or function templates declared static;
Edit: I don't recommend using this technique. If you apply static to the declaration and the specialisations then it works fine in Clang but GCC (as of now at least) complains that
error: explicit template specialization cannot have a storage class
If you set static only on the declaration, then it compiles in GCC but you get duplicate symbol errors with Clang.
Internal Linkage using namespace
Putting template variables in an anonymous namespace also gives them internal linkage.
In addition, all names declared in unnamed namespace or a namespace within an unnamed namespace, even ones explicitly declared extern, have internal linkage.
Disable ODR using inline
This is a bit different. Template variables that are declared inline but not static still have external linkage, but are allowed to have more than one definition.
An inline function or variable (since C++17) with external linkage (e.g. not declared static) has the following additional properties:
There may be more than one definition of an inline function or variable (since C++17) in the program as long as each definition appears in a different translation unit and (for non-static inline functions and variables (since C++17)) all definitions are identical. For example, an inline function or an inline variable (since C++17) may be defined in a header file that is #include'd in multiple source files.
It must be declared inline in every translation unit.
It has the same address in every translation unit.
This means that inline for variables behaves like inline for functions.
If in doubt, inline is probably the best option to go for. In the very simple tests I've done they all produced identical output, even at -O0, but in theory inline guarantees that there is only one copy of the variable in your program because it still has external linkage, whereas the other methods don't. Maybe.
As given, this is indeed an ODR violation as stated in the other answers, but the point here should be that this is a GCC bug. [basic.link]/3.2 gives internal linkage to a “non-template variable of non-volatile const-qualified type”, but GCC is giving foo internal linkage despite it being a variable template. (Here, we could rationalize it as a consequence of there being no diagnostic required, but the bug can be observed by a well-formed program by comparing the address of an implicitly instantiated specialization in two different translation units.)
it violates ODR, and should be inline (since C++17).
why it violates ODR:
One and only one definition of every non-inline function or variable that is odr-used (see below) is required to appear in the entire program (including any standard and user-defined libraries). The compiler is not required to diagnose this violation, but the behavior of the program that violates it is undefined.
because foo.h is included by both a.cpp and b.cpp, the variable foo<int> is defined in each translation unit, a.cpp and b.cpp. so it has 2 definitions in this program, which violates the former rule, except if it's unused (it means, not ODR-used) or it's an inline variable.
you have 2 solutions:
put template<> int const foo<int> = 1; into another foo.cpp. and then it will have only one definition.
add inline (since C++17) for foo. an inline variable is allowed to have more than one same definitions among translation units.
Related
I have these three files
// foo.h
#pragma once
template <typename T> const T foo;
template <>
const int foo<int> = 1;
// a.cpp
#include "foo.h"
int main() {}
// b.cpp
#include "foo.h"
If I build these with GCC 7.5.0 on Linux it works fine:
$ g++ -std=c++17 a.cpp b.cpp -o out
$
However with Apple Clang 12.0.0 on Mac it gives this error:
$ clang++ -std=c++17 a.cpp b.cpp -o out
duplicate symbol 'foo<int>' in:
/var/folders/g5/8twmk1xj481_6btvppyw5j4h0000gp/T/a-62bdde.o
/var/folders/g5/8twmk1xj481_6btvppyw5j4h0000gp/T/b-ea4997.o
ld: 1 duplicate symbol for architecture x86_64
Should this be an error, or is it a bug in Clang?
RedFrog's answer is correct - this violates the One Definition Rule (ODR) but it doesn't really explain why it violates the ODR. In C++ global const variables have internal (static) linkage, which should result in a separate instance of the variable for each compilation unit, so ODR isn't violated. However it turns out that const does not cause template variables to have internal linkage:
The const qualifier used on a declaration of a non-local non-volatile non-template (since C++14) non-inline (since C++17) variable that is not declared extern gives it internal linkage.
There are a few ways to fix this.
Internal Linkage using static
You can use static on template variables to gives them internal linkage:
Any of the following names declared at namespace scope have internal linkage:
variables, variable templates (since C++14), functions, or function templates declared static;
Edit: I don't recommend using this technique. If you apply static to the declaration and the specialisations then it works fine in Clang but GCC (as of now at least) complains that
error: explicit template specialization cannot have a storage class
If you set static only on the declaration, then it compiles in GCC but you get duplicate symbol errors with Clang.
Internal Linkage using namespace
Putting template variables in an anonymous namespace also gives them internal linkage.
In addition, all names declared in unnamed namespace or a namespace within an unnamed namespace, even ones explicitly declared extern, have internal linkage.
Disable ODR using inline
This is a bit different. Template variables that are declared inline but not static still have external linkage, but are allowed to have more than one definition.
An inline function or variable (since C++17) with external linkage (e.g. not declared static) has the following additional properties:
There may be more than one definition of an inline function or variable (since C++17) in the program as long as each definition appears in a different translation unit and (for non-static inline functions and variables (since C++17)) all definitions are identical. For example, an inline function or an inline variable (since C++17) may be defined in a header file that is #include'd in multiple source files.
It must be declared inline in every translation unit.
It has the same address in every translation unit.
This means that inline for variables behaves like inline for functions.
If in doubt, inline is probably the best option to go for. In the very simple tests I've done they all produced identical output, even at -O0, but in theory inline guarantees that there is only one copy of the variable in your program because it still has external linkage, whereas the other methods don't. Maybe.
As given, this is indeed an ODR violation as stated in the other answers, but the point here should be that this is a GCC bug. [basic.link]/3.2 gives internal linkage to a “non-template variable of non-volatile const-qualified type”, but GCC is giving foo internal linkage despite it being a variable template. (Here, we could rationalize it as a consequence of there being no diagnostic required, but the bug can be observed by a well-formed program by comparing the address of an implicitly instantiated specialization in two different translation units.)
it violates ODR, and should be inline (since C++17).
why it violates ODR:
One and only one definition of every non-inline function or variable that is odr-used (see below) is required to appear in the entire program (including any standard and user-defined libraries). The compiler is not required to diagnose this violation, but the behavior of the program that violates it is undefined.
because foo.h is included by both a.cpp and b.cpp, the variable foo<int> is defined in each translation unit, a.cpp and b.cpp. so it has 2 definitions in this program, which violates the former rule, except if it's unused (it means, not ODR-used) or it's an inline variable.
you have 2 solutions:
put template<> int const foo<int> = 1; into another foo.cpp. and then it will have only one definition.
add inline (since C++17) for foo. an inline variable is allowed to have more than one same definitions among translation units.
When I declare a global variable in two different source files and only define it in one of the source files, I get different results compiling for C++ than for C. See the following example:
main.c
#include <stdio.h>
#include "func.h" // only contains declaration of void print();
int def_var = 10;
int main() {
printf("%d\n", def_var);
return 0;
}
func.c
#include <stdio.h>
#include "func.h"
/* extern */int def_var; // extern needed for C++ but not for C?
void print() {
printf("%d\n", def_var);
}
I compile with the following commands:
gcc/g++ -c main.c -o main.o
gcc/g++ -c func.c -o func.o
gcc/g++ main.o func.o -o main
g++/clang++ complain about multiple definition of def_var (this is the behaviour I expected, when not using extern).
gcc/clang compile just fine. (using gcc 7.3.1 and clang 5.0)
According to this link:
A tentative definition is a declaration that may or may not act as a definition. If an actual external definition is found earlier or later in the same translation unit, then the tentative definition just acts as a declaration.
So my variable def_var should be defined at the end of each translation unit and then result in multiple definitions (as it is done for C++). Why is that not the case when compiling with gcc/clang?
This isn't valid C either, strictly speaking. Says as much in
6.9 External definitions - p5
An external definition is an external declaration that is also a
definition of a function (other than an inline definition) or an
object. If an identifier declared with external linkage is used in an
expression (other than as part of the operand of a sizeof or _Alignof
operator whose result is an integer constant), somewhere in the entire
program there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one.
You have two definitions for an identifier with external linkage. You violate that requirement, the behavior is undefined. The program linking and working is not in opposition to that. It's not required to be diagnosed.
And it's worth noting that C++ is no different in that regard.
[basic.def.odr]/4
Every program shall contain exactly one definition of every non-inline
function or variable that is odr-used in that program outside of a
discarded statement; no diagnostic required. The definition can appear
explicitly in the program, it can be found in the standard or a
user-defined library, or (when appropriate) it is implicitly defined
(see [class.ctor], [class.dtor] and [class.copy]). An inline function
or variable shall be defined in every translation unit in which it is
odr-used outside of a discarded statement.
Again, a "shall" requirement, and it says explicitly that no diagnostic is required. As you may have noticed, there's quite a bit more machinery that this paragraph can apply to. So the front ends for GCC and Clang probably need to work harder, and as such are able to diagnose it, despite not being required to.
The program is ill-formed either way.
As M.M pointed out in a comment, the C standard has an informative section that mentions the very extension in zwol's answer.
J.5.11 Multiple external definitions
There may be more than one external definition for the identifier of
an object, with or without the explicit use of the keyword extern; if
the definitions disagree, or more than one is initialized, the
behavior is undefined (6.9.2).
I believe you are observing an extension to C known as "common symbols", implemented by most, but not all, Unix-lineage C compilers, originally (IIUC) for compatibility with FORTRAN. The extension generalizes the "tentative definitions" rule described in StoryTeller's answer to multiple translation units. All external object definitions with the same name and no initializer,
int foo; // at file scope
are collapsed into one, even if they appear in more than one TU, and if there exists an external definition with an initializer for that name,
int foo = 1; // different TU, also file scope
then all of the external definitions with no initializers are treated as external declarations. C++ compilers do not implement this extension, because (oversimplifying) nobody wanted to figure out what it should do in the presence of templates. For GCC and Clang, you can disable the extension with -fno-common, but other Unix C compilers may not have any way to turn it off.
I have tested the following code:
in file a.c/a.cpp
int a;
in file b.c/b.cpp
int a;
int main() { return 0; }
When I compile the source files with gcc *.c -o test, it succeeds.
But when I compile the source files with g++ *.c -o test, it fails:
ccIJdJPe.o:b.cpp:(.bss+0x0): multiple definition of 'a'
ccOSsV4n.o:a.cpp:(.bss+0x0): first defined here
collect2.exe: error: ld returned 1 exit status
I'm really confused about this. Is there any difference between the global variables in C and C++?
Here are the relevant parts of the standard. See my explanation below the standard text:
§6.9.2/2 External object definitions
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
ISO C99 §6.9/5 External definitions
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.
With the C version, the 'g' global variables are 'merged' into one, so you will only have one in the end of the day which is declared twice. This is OK due to the time when extern was not needed, or perhaps did not exits. Hence, this is for historical and compatibility reason to build old code. This is a gcc extension for this legacy feature.
It basically makes gcc allocate memory for a variable with the name 'a', so there can be more than one declarations, but only one definition. That is why the code below will not work even with gcc.
This is also called tentative definition. There is no such a thing with C++, and that is while it compiles. C++ has no concept of tentative declaration.
A tentative definition is any external data declaration that has no storage class specifier and no initializer. A tentative definition becomes a full definition if the end of the translation unit is reached and no definition has appeared with an initializer for the identifier. In this situation, the compiler reserves uninitialized space for the object defined.
Note however that the following code will not compile even with gcc because this is tentative definition/declaration anymore with values assigned:
in file "a.c/a.cpp"
int a = 1;
in file "b.c/b.cpp"
int a = 2;
int main() { return 0; }
Let us go even beyond this with further examples. The following statements show normal definitions and tentative definitions. Note, static would make it a bit difference since that is file scope, and would not be external anymore.
int i1 = 10; /* definition, external linkage */
static int i2 = 20; /* definition, internal linkage */
extern int i3 = 30; /* definition, external linkage */
int i4; /* tentative definition, external linkage */
static int i5; /* tentative definition, internal linkage */
int i1; /* valid tentative definition */
int i2; /* not legal, linkage disagreement with previous */
int i3; /* valid tentative definition */
int i4; /* valid tentative definition */
int i5; /* not legal, linkage disagreement with previous */
Further details can be on the following page:
http://c0x.coding-guidelines.com/6.9.2.html
See also this blog post for further details:
http://ninjalj.blogspot.co.uk/2011/10/tentative-definitions-in-c.html
gcc implements a legacy feature where uninitialized global variables are placed in a common block.
Although in each translation unit the definitions are tentative, in ISO C, at the end of the translation unit, tentative definitions are "upgraded" to full definitions if they haven't already been merged into a non-tentative definition.
In standard C, it is always incorrect to have the same variables with external linkage defined in more that one translation unit even if these definitions came from tentative definitions.
To get the same behaviour as C++, you can use the -fno-common switch with gcc and this will result in the same error. (If you are using the GNU linker and don't use -fno-common you might also want to consider using the --warn-common / -Wl,--warn-common option to highlight the link time behaviour on encountering multiple common and non-common symbols with the same name.)
From the gcc man page:
-fno-common
In C code, controls the placement of uninitialized global
variables. Unix C compilers have traditionally permitted multiple
definitions of such variables in different compilation units by
placing the variables in a common block. This is the behavior
specified by -fcommon, and is the default for GCC on most
targets. On the other hand, this behavior is not required by ISO
C, and on some targets may carry a speed or code size penalty on
variable references. The -fno-common option specifies that the
compiler should place uninitialized global variables in the data
section of the object file, rather than generating them as common
blocks. This has the effect that if the same variable is declared
(without extern) in two different compilations, you will get a
multiple-definition error when you link them. In this case, you
must compile with -fcommon instead. Compiling with
-fno-common is useful on targets for which it provides better
performance, or if you wish to verify that the program will work
on other systems which always treat uninitialized variable
declarations this way.
gcc's behaviour is a common one and it is described in Annex J of the standard (which is not normative) which describes commonly implemented extensions to the standard:
J.5.11 Multiple external definitions
There may be more than one external definition for the identifier of an object, with or
without the explicit use of the keyword extern; if the definitions disagree, or more than
one is initialized, the behavior is undefined (6.9.2).
gcc optimizes code when I pass it the -O2 flag, but I'm wondering how well it can actually do that if I compile all source files to object files and then link them afterwards.
Here's an example:
// in a.h
int foo(int n);
// in foo.cpp
int foo(int n) {
return n;
}
// in main.cpp
#include "a.h"
int main(void) {
return foo(5);
}
// code used to compile it all
gcc -c -O2 foo.cpp -o foo.o
gcc -c -O2 main.cpp -o main.o
gcc -O2 foo.o main.o -o executable
Normally, gcc should inline foo because it's a small function and -O2 enables -finline-small-functions, right? But here, gcc only sees the code of foo and main independently before it creates the object files, so there won't be any optimizations like that, right? So, does compiling like this really make code slower?
However, I could also compile it like this:
gcc -O2 foo.cpp main.cpp -o executable
Would that be faster? If not, would it be faster this way?
// in foo.cpp
int foo(int n) {
return n;
}
// in main.cpp
#include "foo.cpp"
int main(void) {
return foo(5);
}
Edit: I looked at objdump, and its disassembled code showed that only the #include "foo.cpp" thing worked.
It seems that you have rediscovered on your own the issue about the separate compilation model that C and C++ use. While it certainly eases memory requirements (which was important at the time of its creation), it does so by exposing only minimal information to the compiler, meaning that some optimizations (like this one) cannot be performed.
Newer languages, with their module systems can expose as much information as necessary, and we can hope to rip those benefits if modules get into the next version of C++...
In the mean time, the simplest thing to go for is called Link-Time Optimization. The idea is that you will perform as much optimization as possible on each TU (Translation Unit) to obtain an object file, but you will also enrich the traditional object file (which contain assembly) with IR (Intermediate Representation, used by compilers to optimize) for part of or all functions.
When the linker will be invoked to merge those object files together, instead of just merging the files together, it will merge the IR representations, rexeecute a number of optimization passes (constant propagation, inlining, ...) and then create assembly on its own. It means that instead of being just a linker, it is in fact a backend optimizer.
Of course, like all optimization passes this has a cost, so makes for longer compilation. Also, it means that both the compiler and the linker should be passed a special option to trigger this behavior, in the case of gcc, it would be -lto or -O4.
You may be looking for Link-Time Optimization (LTO), aka Whole Program Optimization.
Since you're using GCC, you can use the C99 inline function specifier mechanism. This is from ISO/IEC 9899:1999.
§ 6.7.4 Function specifiers
Syntax
¶1 function-specifier:
inline
Constraints
¶2 Function specifiers shall be used only in the declaration of an identifier for a function.
¶3 An inline definition of a function with external linkage shall not contain a definition of a
modifiable object with static storage duration, and shall not contain a reference to an
identifier with internal linkage.
¶4 In a hosted environment, the inline function specifier shall not appear in a declaration
of main.
Semantics
¶5 A function declared with an inline function specifier is an inline function. The
function specifier may appear more than once; the behavior is the same as if it appeared
only once. Making a function an inline function suggests that calls to the function be as
fast as possible.118) The extent to which such suggestions are effective is
implementation-defined.119)
¶6 Any function with internal linkage can be an inline function. For a function with external
linkage, the following restrictions apply: If a function is declared with an inline
function specifier, then it shall also be defined in the same translation unit. If all of the
file scope declarations for a function in a translation unit include the inline function
specifier without extern, then the definition in that translation unit is an inline
definition. An inline definition does not provide an external definition for the function,
and does not forbid an external definition in another translation unit. An inline definition
provides an alternative to an external definition, which a translator may use to implement
any call to the function in the same translation unit. It is unspecified whether a call to the
function uses the inline definition or the external definition.120)
¶7 EXAMPLE The declaration of an inline function with external linkage can result in either an external
definition, or a definition available for use only within the translation unit. A file scope declaration with
extern creates an external definition. The following example shows an entire translation unit.
inline double fahr(double t)
{
return (9.0 * t) / 5.0 + 32.0;
}
inline double cels(double t)
{
return (5.0 * (t - 32.0)) / 9.0;
}
extern double fahr(double); // creates an external definition
double convert(int is_fahr, double temp)
{
/* A translator may perform inline substitutions */
return is_fahr ? cels(temp) : fahr(temp);
}
¶8 Note that the definition of fahr is an external definition because fahr is also declared with extern, but
the definition of cels is an inline definition. Because cels has external linkage and is referenced, an
external definition has to appear in another translation unit (see 6.9); the inline definition and the external
definition are distinct and either may be used for the call.
118) By using, for example, an alternative to the usual function call mechanism, such as "inline
substitution". Inline substitution is not textual substitution, nor does it create a new function.
Therefore, for example, the expansion of a macro used within the body of the function uses the
definition it had at the point the function body appears, and not where the function is called; and
identifiers refer to the declarations in scope where the body occurs. Likewise, the function has a
single address, regardless of the number of inline definitions that occur in addition to the external
definition.
119) For example, an implementation might never perform inline substitution, or might only perform inline
substitutions to calls in the scope of an inline declaration.
120) Since an inline definition is distinct from the corresponding external definition and from any other
corresponding inline definitions in other translation units, all corresponding objects with static storage
duration are also distinct in each of the definitions.
Note that GCC also had inline functions in C before they were standardized. Read the GCC manual for details if you need that notation.
Let's assume that I have files a.cpp b.cpp and file c.h. Both of the cpp files include the c.h file. The header file contains a bunch of const int definitions and when I compile them I get no errors and yet I can access those const as if they were global variables. So the question, why don't I get any compilation errors if I have multiple const definitions as well as these const int's having global-like scope?
This is because a const declaration at namespace scope implies internal linkage. An object with internal linkage is only available within the translation unit in which it is defined. So in a sense, the one const object you have in c.h is actually two different objects, one internal to a.cpp and one internal to b.cpp.
In other words,
const int x = ...;
is equivalent to
static const int x = ...;
while
int x;
is similar to
extern int x;
because non-const declarations at namespace scope imply external linkage. (In this last case, they aren't actually equivalent. extern, as well as explicitly specifying external linkage, produces a declaration, not a definition, of an object.)
Note that this is specific to C++. In C, const doesn't change the implied linkage. The reason for this is that the C++ committee wanted you to be able to write
const int x = 5;
in a header. In C, that header included from multiple files would cause linker errors, because you'd be defining the same object multiple times.
From the current C++ standard...
7.1.1 Storage class specifiers
7) A name declared in a namespace scope without a storage-class-specifier has external linkage unless it has internal linkage because of a previous declaration and provided it is not declared const. Objects declared const and not explicitly declared extern have internal linkage.
3.5 Program and Linkage
2) When a name has internal linkage, the entity it denotes can be referred to by names from other scopes in the same translation unit.
The preprocessor causes stuff defined in headers to be included in the current translation unit.
When you do so, you create a separate const variable in each object file for every constant in the header. It's not a problem, since they are const.
Real reason: because #define is evil and needs to die.
Some usages of #define can be replaced with inline functions. Some - with const variable declarations. Since #define tends to be in header files, replacing those with consts in place better work. Thus, the "consts are static by default" rule.