extern, linkage and global variables - c++

In C++, say we have this header file:
myglobals.h
#ifndef my_globals_h
#define my_globals_h
int monthsInYear = 12;
#endif
and we include it in multiple implementation files, then we will get compilation errors since we end up with 'monthsInYear' defined multiple times, once in each file that monthsInYear is included in.
Fine. So if we modify our header and make our global const, like so:
const int monthsInYear = 12;
then our compilation errors go away. The explanation for this, as I understand it, and as given here for example, is that the const keyword here changes the linkage of monthsInYear to internal, which means that now each compilation unit that includes the header now has its own copy of the variable with internal linkage, hence we no longer have multiple definitions.
Now, an alternative would be to merely declare the variable in the header with extern, i.e.:
extern int monthsInYear;
and then define it in one of the implementation files that includes the header, i.e.:
#include "myglobals.h"
...
int monthsInYear = 12;
...
and then everywhere that includes it is dealing with one variable with external linkage.
This is fine, but the thing I'm a bit puzzled by is that using const gives it internal linkage, which fixes the problem, and using extern gives it external linkage, which also fixes the problem! It's like saying that any linkage will do, as long as we specify it. On top of that, when we just wrote:
int monthsInYear = 12;
wasn't the linkage already external? Which is why adding 'const' changed the linkage to internal?
It seems to me that the reason using 'extern' here actually fixes the problem isn't because it gives us external linkage (we already had that) but rather because it allows us to merely declare the variable without defining it, which we otherwise wouldn't be able to do, since:
int monthsInYear = 12;
both declares and defines it, and since including the header effectively copy-pastes the code into the implemenation file we end up with multiple definitions. Conversely, since extern allows us to merely declare it we end up with multiple declarations every time we include the header, which is fine since we're allowed to have multiple declarations, just not multiple definitions.
Is my understanding here correct? Sorry if this is massively obvious, but it seems that extern does at least three things:
specifies external linkage
allows declaration of objects without defining them
specifies static storage duration
but a lot of sources I look at don't make this clear, and when talking about using extern to stop 'multiple definition' errors with global variables don't make it clear that the key thing is separating the declaration from the definition.

For a non-const variable, extern has the effect of specifying that the variable has external linkage (which is the default), but it also converts the definition into a declaration if there's no initializer - i.e. extern int i; does not actually define i. (extern int i = 5; would, and would hopefully generate a compiler warning).
When you write extern int monthsInYear; in multiple source files (or #include it into them, which is equivalent), none of them define it. int monthsInYear = 12; defines the variable in that source file only.

The storage class specifiers are a part of the decl-specifier-seq of a declaration syntax. They control two independent properties of the names introduced by the declaration: their storage duration and their linkage.
extern - static or thread storage duration and external linkage
This is what cppreference says which matches your understanding perfectly.

Related

Any potential danger using constants in header files of C++ and asking their addresses in the program

I am defining constants in a header file and including in source files of my project. The C++ compiler normally does not create storage for these constants and keep their linkage as internal. If I ask for the address of any constant in my program, the compiler will be forced to create a storage for that constant variable. My question is if the compiler is creating storage for a constant variable, will the linkage of the variable also gets affected? Because if the linkage gets external, I would get linking errors while compilation. My test program does not give any linkage error when I point a pointer to a constant of the included header file (which needs the address of the constant). I will be grateful if any one can explain briefly the concepts of storage and linkage in C++ or direct me to some good explanation available somewhere. Thanks in advance.
//in constants.h
const double UNIT_LENGTH = 1e-10;
//in constants.cpp
#include "constants.h"
const double * temp = &UNIT_LENGTH;
//in main
#include "constants.h"
double A = UNIT_LENGTH; //why there is no linking error
Const-qualified variables have internal linkage. In the 2012 standard the wording in 3.5/3 is
A name having namespace scope (3.3.6) has internal linkage if it is the name of [...]
a variable that is explicitly declared const or constexpr and neither explicitly declared extern nor previously declared to have external linkage
"Namespace scope" includes the global namespace.
Whether you declare them in a header file or not is irrelevant, but be aware that in each translation unit the header file will define a different object. That will usually not matter because it's const, unless you want to compare addresses across translation units.
The linkage of global const variables in C++ (unlike in C) is defined to always be internal (aka static). So the problem you fear will not happen.
It is quite on the contrary. If you were to treat a global constant as you would a normal global variable, you would cause linker errors.
(But note that if those are char * variables, they need to be const char * const, not just const char *. Similar for other pointers, but chars are most often used.)

Aren't non-const variable considered external by default?

Learning from this: By default, non-const variables declared outside of a block are assumed to be external. However, const variables declared outside of a block are assumed to be internal.
But if I write this inside MyTools.h:
#ifndef _TOOLSIPLUG_
#define _TOOLSIPLUG_
typedef struct {
double LN20;
} S;
S tool;
#endif // !_TOOLSIPLUG_
and I include MyTools.h 3 times (from 3 different .cpp) it says that tool is already defined (at linker phase). Only if I change in:
extern S tool;
works. Isn't external default?
The declaration S tool; at namespace scope, does declare an extern linkage variable. It also defines it. And that's the problem, since you do that in three different translation units: you're saying to the linker that you have accidentally named three different global variables, of the same type, the same.
One way to achieve the effect you seem to desire, a single global shared variable declared in the header, is to do this:
inline auto tool_instance()
-> S&
{
static S the_tool; // One single instance shared in all units.
return the_tool;
}
static S& tool = tool_instance();
The function-accessing-a-local-static is called a Meyers' singleton, after Scott Meyers.
In C++17 and later you can also just declare an inline variable.
Note that global variables are considered Evil™. Quoting Wikipedia on that issue:
” The use of global variables makes software harder to read and understand. Since any code anywhere in the program can change the value of the variable at any time, understanding the use of the variable may entail understanding a large portion of the program. Global variables make separating code into reusable libraries more difficult. They can lead to problems of naming because a global variable defined in one file may conflict with the same name used for a global variable in another file (thus causing linking to fail). A local variable of the same name can shield the global variable from access, again leading to harder-to-understand code. The setting of a global variable can create side effects that are hard to locate and predict. The use of global variables makes it more difficult to isolate units of code for purposes of unit testing; thus they can directly contribute to lowering the quality of the code.
Disclaimer: code not touched by compiler's hands.
There's a difference between a declaration and a definition. extern int i; is a declaration: it says that there's a variable named i whose type is int, and extern implies that it will be defined somewhere else. int i;, on the other hand, is a definition of the variable i. When you write a definition, it tells the compiler to create that variable. If you have definitions of the same variable in more than one source file, you've got multiple definitions, and the compiler (well, in practice, the linker) should complain. That's why putting the definition S tool; in the header creates problems: each source file that #include's that header ends up defining tool, and the compiler rightly complains.
The difference between a const and a not-const definition is, as you say, that const int i = 3; defines a variable named i that is local to the file being compiled. There's no problem having the same definition in more than one source file, because those guys all have internal linkage, that is, they aren't visible outside the source file. When you don't have the const, for example, with int i = 3;, that also defines a variable named i, but it has external linkage, that it, it's visible outside the source file, and having the same definition in multiple files gives you that error. (technically, it's definitions of the same name that lead to problems; int i = 3; and double i = 3.0; in two different source files still are duplicate definitions).
By default, non-const variables declared outside of a block are assumed to be external. However, const variables declared outside of a block are assumed to be internal.
That statement is still correct in your case.
In your MyTools.h, tool is external.
S tool; // define tool, external scope
In your cpp file, however, the extern keyword merely means that S is defined else where.
extern S tool; // declare tool

Why is it that I can include a header file in multiple cpp files that contains const int and not have a compiler error?

Let's assume that I have files a.cpp b.cpp and file c.h. Both of the cpp files include the c.h file. The header file contains a bunch of const int definitions and when I compile them I get no errors and yet I can access those const as if they were global variables. So the question, why don't I get any compilation errors if I have multiple const definitions as well as these const int's having global-like scope?
This is because a const declaration at namespace scope implies internal linkage. An object with internal linkage is only available within the translation unit in which it is defined. So in a sense, the one const object you have in c.h is actually two different objects, one internal to a.cpp and one internal to b.cpp.
In other words,
const int x = ...;
is equivalent to
static const int x = ...;
while
int x;
is similar to
extern int x;
because non-const declarations at namespace scope imply external linkage. (In this last case, they aren't actually equivalent. extern, as well as explicitly specifying external linkage, produces a declaration, not a definition, of an object.)
Note that this is specific to C++. In C, const doesn't change the implied linkage. The reason for this is that the C++ committee wanted you to be able to write
const int x = 5;
in a header. In C, that header included from multiple files would cause linker errors, because you'd be defining the same object multiple times.
From the current C++ standard...
7.1.1 Storage class specifiers
7) A name declared in a namespace scope without a storage-class-specifier has external linkage unless it has internal linkage because of a previous declaration and provided it is not declared const. Objects declared const and not explicitly declared extern have internal linkage.
3.5 Program and Linkage
2) When a name has internal linkage, the entity it denotes can be referred to by names from other scopes in the same translation unit.
The preprocessor causes stuff defined in headers to be included in the current translation unit.
When you do so, you create a separate const variable in each object file for every constant in the header. It's not a problem, since they are const.
Real reason: because #define is evil and needs to die.
Some usages of #define can be replaced with inline functions. Some - with const variable declarations. Since #define tends to be in header files, replacing those with consts in place better work. Thus, the "consts are static by default" rule.

Is extern keyword really necessary?

...
#include "test1.h"
int main(..)
{
count << aaa <<endl;
}
aaa is defined in test1.h,and I didn't use extern keyword,but still can reference aaa.
So I doubt is extern really necessary?
extern has its uses. But it mainly involves "global variables" which are frowned upon. The main idea behind extern is to declare things with external linkage. As such it's kind of the opposite of static. But external linkage is in many cases the default linkage so you don't need extern in those cases. Another use of extern is: It can turn definitions into declarations. Examples:
extern int i; // Declaration of i with external linkage
// (only tells the compiler about the existence of i)
int i; // Definition of i with external linkage
// (actually reserves memory, should not be in a header file)
const int f = 3; // Definition of f with internal linkage (due to const)
// (This applies to C++ only, not C. In C f would have
// external linkage.) In C++ it's perfectly fine to put
// somethibng like this into a header file.
extern const int g; // Declaration of g with external linkage
// could be placed into a header file
extern const int g = 3; // Definition of g with external linkage
// Not supposed to be in a header file
static int t; // Definition of t with internal linkage.
// may appear anywhere. Every translation unit that
// has a line like this has its very own t object.
You see, it's rather complicated. There are two orthogonal concepts: Linkage (external vs internal) and the matter of declaration vs definition. The extern keyword can affect both. With respect to linkage it's the opposite of static. But the meaning of static is also overloaded and -- depending on the context -- does or does not control linkage. The other thing it does is to control the life-time of objects ("static life-time"). But at global scope all variables already have a static life-time and some people thought it would be a good idea to recycle the keyword for controlling linkage (this is me just guessing).
Linkage basically is a property of an object or function declared/defined at "namespace scope". If it has internal linkage, it won't be directly accessible by name from other translation units. If it has external linkage, there shall be only one definition across all translation units (with exceptions, see one-definition-rule).
I've found the best way to organise your data is to follow two simple rules:
Only declare things in header files.
Define things in C (or cpp, but I'll just use C here for simplicity) files.
By declare, I mean notify the compiler that things exist, but don't allocate storage for them. This includes typedef, struct, extern and so on.
By define, I generally mean "allocate space for", like int and so on.
If you have a line like:
int aaa;
in a header file, every compilation unit (basically defined as an input stream to the compiler - the C file along with everything it brings in with #include, recursively) will get its own copy. That's going to cause problems if you link two object files together that have the same symbol defined (except under certain limited circumstances like const).
A better way to do this is to define that aaa variable in one of your C files and then put:
extern int aaa;
in your header file.
Note that if your header file is only included in one C file, this isn't a problem. But, in that case, I probably wouldn't even have a header file. Header files are, in my opinion, only for sharing things between compilation units.
If your test1.h has the definition of aaa and you wanted to include the header file into more than one translation unit you will run into multiple definition error, unless aaa is constant.
Better you define the aaa in a cpp file and add extern definition in header file that could be added to other files as header.
Thumb rule for having variable and constant in header file
extern int a ;//Data declarations
const float pi = 3.141593 ;//Constant definitions
Since constant have internal linkage in c++ any constant that is defined in a translation unit will not be visible to other translation unit, but it is not the case for variable they have external linkage i.e., they are visible to other translation unit. Putting the definition of a variable in a header, that is shared in other translation unit would lead to multiple definition of a variable, leading to multiple definition error.
In that case, extern is not necessary. Extern is needed when the symbol is declared in another compilation unit.
When you use the #include preprocessing directive, the included file is copied out in place of the directive. In this case you don't need extern because the compiler already know aaa.
If aaa is not defined in another compilation unit you don't need extern, otherwise you do.

Use of "extern" storage class specifier in C

How does the following example usage of extern specifer behave.
We have a global variable int x both in files one.c and two.c
We want to use these in three.c so have declared this variable in three.c as
extern int x;
What would happen when we compile and link these files?
My answer is: compilation of all these files should succeed, however the linker should flag an error at linking, due to multiple declarations of x.
Would there be any difference in behavior in C++ ?
Is these any way to refer to int x (in three.c) simultaneously from both files, in C and C++.
In C++, I guess we can use namespaces to acheive this. Right?
By default, global variables have external linkage, which means that they can be used by other source files (or "translation units"). If you instead declare your global variables with the static keyword, they will have internal linkage, meaning they will not be usable by other source files.
For variables with external linkage, you can't have multiple variables with the same name, or the linker will complain. You can have two variables with the same name, though, as long as at least one has internal linkage, and of course you can't reference both of them in the same source file.
An extern declaration is just saying to the compiler "here is the name of some variable with external linkage defined in another translation unit," allowing you to refer to that variable.
C++ is exactly the same, except for the addition of namespaces. If global variables are put inside a namespace, then they can have the same name without linker errors, provided they are in different namespaces. Of course, all references to those variables then have to either refer to the full name namespace::var_name, or use a using declaration to establish a local namespace context.
C++ also has anonymous namespaces, which are entirely equivalent to using the static keyword for global variables in C: all variables and functions declared inside an anonymous namespace have internal linkage.
So, to answer your original question, you are right -- compilation would succeed, but linking would fail, due to multiple definitions of the variable x with external linkage (specifically, from the translation units one.c and two.c).
From three.c, there is no way to refer simultaneously to both variables x. You'll need to rename x in one or both modules, or switch to C++ and put at least one x inside a namespace.
In C, you could do this:
// one.c
static int x;
int *one_x = &x;
// two.c
static int x;
int *two_x = &x;
// three.c
extern int *one_x;
extern int *two_x;
Now you can refer unambiguously to the x in file one.c or the x in file two.c from the file three.c.
However, this might be a bit more effort than it's worth. Perhaps you should be coming up with more descriptive names for your global variables instead of toying around with theoretical ways to circumvent C's single global namespace.
To avoid generating duplicate symbols, you should declare extern int x; in a single header file (a .h file), and then have all .c files which will use x #include that header file, and define or initialize int x; in one of the .c files.
You might be interested by the answers to this question.
Summary: the linker may or may not fail to link the file. It depends on the initialization of the variable. It will definitely fail if the variable has different initializations in different files.
Remember that you can NOT extern a global static variable.. !!