External Linkge drawbacks - c++

Are there any drawbacks to having a symbol with external linkage (other then global namespace clutter/collision)? For instance, I would think that if I have a function witch I never call, if it has internal linkage, the compiler can just discard it, but if it is external the compiler has to leave that code in because someone might link to it later. Is this correct? Are there any other drawbacks?
I am asking because I know unnamed namespaces are recommended instead of the static keyword, but since symbols in an unnamed namespace still have external linkage, they would suffer from the above mentioned drawback (if I am right about it), and so are not totally better than static functions like the standard says.

The fact that functions in unnamed namespaces have external linkage is almost entirely a technicality. Because they have a "secret" translation unit dependent unique identifier it is impossible to name them from a different translation unit. This means that compiler can assume that they are never called by name from another translation unit. Most implementations that I know of turn functions in unnamed namespaces into local symbols and not global symbols, just like functions with true internal linkage.
A function in an unnamed namespace can be discarded without affecting a program if it is never called from the translation unit in which it is defined and it never has its address taken and passed out of the translation unit which might lead to it being called other than be a direct named function call.

Related

Do C++ modules make unnamed namespaces redundant?

C++20 introduced modules. Any symbol that is not exported in a module has module-internal linkage. While unnamed namespaces provide a mechanism to make definitions inside an unnamed namespace have file-internal linkage. Does this mean unnamed namespaces will become useless in future when modules become common practice in C++ community?
No: since (many) compilers see just one translation unit at a time, it’s still useful for optimization to indicate that an entity cannot be used in any other. It also avoids the possibility of accidental collisions between module units (even if those should be less likely than with broader codebases).

C++ achieve internal linkage without using anonymous namespaces

I have been reading about declaring anonymous namespaces for achieving a lower linking-time.
However, I have read that declaring anonymous namespaces in header files are trully not recommended:
When an unnamed namespace is defined in a header file, it can lead to surprising results. Due to default internal linkage, each translation unit will define its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can cause unexpected results, bloat the resulting executable, or inadvertently trigger undefined behavior due to one-definition rule (ODR) violations.
The above is a quote extracted from the link below, in which there are several examples of anonymous namespaces' unexpected behaviors:
https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL59-CPP.+Do+not+define+an+unnamed+namespace+in+a+header+file
So, my questions are:
The mentioned problems only applies to anonymous-namespace variables, not methods. Is that right?
Does the same problem appears when using static keyword for forcing internal linkage with variables? If so, is there any other way to achive this in a safety way?
The mentioned problems only applies to anonymous-namespace variables, not methods. Is that right?
The mentionned problem happen to anything inside anonymous-namespace.
Does the same problem appears when using static keyword for forcing internal linkage with variables?
The same happens.
If so, is there any other way to achive this in a safety way?
There are not.
The ODR violation will soon or later happen if you put inside a header file which is included in different translation units any entity with internal linkage (class, variable, member function, template, etc...). You will soon have a problem if any entity with external linkage uses one of these entity with internal linkage in its definition or declaration.
Any entity declared inside an anonymous namespace, those declared static and not-extern const variables have internal linkage.
There are 2 partial solutions to what you are, supposedly, looking for:
Inline variables and functions can have their definitions appearing inside mutliple translation units, so it is safe to define them in header files.
If what your are looking for is not to make the names visibles outside of the library your are writting, define them in a private header and apply to them visibility attributes ([[gnu:visibility("hidden")]] or no __dllexprot for MSVC)

Should unnamed namespace functions be avoided to reduce symbol table sizes?

I've heard it asserted that the use of unnamed namespaces in C++ to define functions and ensure that they cannot be called from outside the compilation unit in which they are defined is not good in very large code environments because they cause the symbol table to grow unnecessarily large by including entries to these symbols in the automatically generated namespaces that the C++ compiler provides when unnamed.
namespace {
// This function can only be accessed from hear to the end of
// any compilation unit that includes it.
void functionPerhapsInsertedIntoSymbolTable() {
return;
}
}
This is perhaps given that the above is supposed to be the same as doing the following:
namespace randomlyGenerateNameHereNotCollidingWithAnyExistingNames {
// This function can only be accessed from hear to the end of
// any compilation unit that includes it.
void functionPerhapsInsertedIntoSymbolTable() {
return;
}
}
using randomlyGenerateNameHereNotCollidingWithAnyExistingNames;
However, is it really that simple, is the compiler required to make a symbol table entry for symbols in the generated namespace name?
Instead, in such situations, I heard it suggested to use a static declaration:
// This function can only be accessed from hear to the end of
// any compilation unit that includes it.
static void functionNotInsertedIntoSymbolTable() {
return;
}
Is it true that using a static declaration before a function instead of placing it in an unnamed namespace has the same effect of making the function inaccessible outside the compilation unit in which it is defined? Is there any difference between these two approaches other than potentially not causing the symbol table to grow?
Is the issue with symbol table bloat due to unnamed namespaces just a bug in some implementations of C++ or is the compiler somehow required by the standard to create entries for such functions? If this bloat is considered a bug, are there known compilers for which this is not an issue?
Is it true that using a static declaration before a function instead of placing it in an unnamed namespace has the same effect of making the function inaccessible outside the compilation unit in which it is defined?
Yes.
Namespace-static was deprecated in C++03 in favour of unnamed namespaces, but in fact un-deprecated for C++11 when everybody realised that they are just the same thing and there was no purpose to the deprecation.
Is there any difference between these two approaches
No, not really. There may be some minor subtleties with name lookup due to the use of a namespace, but I can't think of any right now.
other than potentially not causing the symbol table to grow?
Since this is a language-lawyer question with no evident practical problem to solve, I am obliged to point out that the C++ language has no concept of a symbol table, and thus no indication of this effect.
It's also not going to have any noticeable effect until you have tens of thousands of unnamed namespaces; do you?
The reason for the unnamed namespace and deprecation of namespace level static is the ill-fated export keyword.
Everything an exported template relies on has to be link-accessible at the points of instantiation, which are most likely in different source files. The unnamed namespace allowed the 'privatization' aspect of static while still preserving linkage for exported templates.
Now that export has been removed in C++2011, I'm pretty sure the external linkage requirement for the unnamed namespace has been removed, and it now behaves exactly like namespace level static. Someone more familiar with the standard can confirm/refute this.

Does use of unnamed namespaces reduce link time?

Suppose I have a large system with many object files such that link time is a problem. Suppose also that I know that many of the classes and functions in my system are not used outside their translation unit.
Is it reasonable to assume that if I reduce the number of symbols with external linkage, my link-time will be reduced?
If so, will putting the entities (e.g., classes and functions) that are used in only a single TU into unnamed namespaces do me any good? Technically, the entities with external linkage will retain their external linkage in an unnamed namespace, but, as the C++11 standard notes,
Although entities in an unnamed namespace might have external linkage, they are effectively qualified by a name unique to their translation unit and therefore can never be seen from any other translation unit.
Do linker algorithms perform optimizations based on the knowledge that entities with external linkage in unnamed namespaces aren't visible outside their namespaces?
Yes I think is does reduce the link time. I think this on the Google chromium stie:
"Unnamed namespaces restrict these symbols to the compilation unit, improving function call cost and reducing the size of entry point tables." Here the link
I know this is about the chromium project but it should apply to other c++ projects.
I don't see how a linker could do such optimizations, because by the time the linker gets a hold of the symbol(s) in question they look like ordinary decorated external-linkage symbols. Unless the linker has specific information about how the compiler decorates names in an anonymous namespace I can't see any way that it could optimize its work.
Have you confirmed that your linker is in fact CPU bound and not I/O bound? If it's not CPU bound already it's probably not going to help to reorganize your code.

What is external linkage and internal linkage?

I want to understand the external linkage and internal linkage and their difference.
I also want to know the meaning of
const variables internally link by default unless otherwise declared as extern.
When you write an implementation file (.cpp, .cxx, etc) your compiler generates a translation unit. This is the source file from your implementation plus all the headers you #included in it.
Internal linkage refers to everything only in scope of a translation unit.
External linkage refers to things that exist beyond a particular translation unit. In other words, accessible through the whole program, which is the combination of all translation units (or object files).
As dudewat said external linkage means the symbol (function or global variable) is accessible throughout your program and internal linkage means that it is only accessible in one translation unit.
You can explicitly control the linkage of a symbol by using the extern and static keywords. If the linkage is not specified then the default linkage is extern (external linkage) for non-const symbols and static (internal linkage) for const symbols.
// In namespace scope or global scope.
int i; // extern by default
const int ci; // static by default
extern const int eci; // explicitly extern
static int si; // explicitly static
// The same goes for functions (but there are no const functions).
int f(); // extern by default
static int sf(); // explicitly static
Note that instead of using static (internal linkage), it is better to use anonymous namespaces into which you can also put classes. Though they allow extern linkage, anonymous namespaces are unreachable from other translation units, making linkage effectively static.
namespace {
int i; // extern by default but unreachable from other translation units
class C; // extern by default but unreachable from other translation units
}
A global variable has external linkage by default. Its scope can be extended to files other than containing it by giving a matching extern declaration in the other file.
The scope of a global variable can be restricted to the file containing its declaration by prefixing the declaration with the keyword static. Such variables are said to have internal linkage.
Consider following example:
1.cpp
void f(int i);
extern const int max = 10;
int n = 0;
int main()
{
int a;
//...
f(a);
//...
f(a);
//...
}
The signature of function f declares f as a function with external linkage (default). Its definition must be provided later in this file or in other translation unit (given below).
max is defined as an integer constant. The default linkage for constants is internal. Its linkage is changed to external with the keyword extern. So now max can be accessed in other files.
n is defined as an integer variable. The default linkage for variables defined outside function bodies is external.
2.cpp
#include <iostream>
using namespace std;
extern const int max;
extern int n;
static float z = 0.0;
void f(int i)
{
static int nCall = 0;
int a;
//...
nCall++;
n++;
//...
a = max * z;
//...
cout << "f() called " << nCall << " times." << endl;
}
max is declared to have external linkage. A matching definition for max (with external linkage) must appear in some file. (As in 1.cpp)
n is declared to have external linkage.
z is defined as a global variable with internal linkage.
The definition of nCall specifies nCall to be a variable that retains its value across calls to function f(). Unlike local variables with the default auto storage class, nCall will be initialized only once at the first invocation of f(). The storage class specifier static affects the lifetime of the local variable and not its scope.
NB: The keyword static plays a double role. When used in the definitions of global variables, it specifies internal linkage. When used in the definitions of the local variables, it specifies that the lifetime of the variable is going to be the duration of the program instead of being the duration of the function.
In terms of 'C' (Because static keyword has different meaning between 'C' & 'C++')
Lets talk about different scope in 'C'
SCOPE: It is basically how long can I see something and how far.
Local variable : Scope is only inside a function. It resides in the STACK area of RAM.
Which means that every time a function gets called all the variables
that are the part of that function, including function arguments are
freshly created and are destroyed once the control goes out of the
function. (Because the stack is flushed every time function returns)
Static variable: Scope of this is for a file. It is accessible every where in the file
in which it is declared. It resides in the DATA segment of RAM. Since
this can only be accessed inside a file and hence INTERNAL linkage. Any
other files cannot see this variable. In fact STATIC keyword is the
only way in which we can introduce some level of data or function
hiding in 'C'
Global variable: Scope of this is for an entire application. It is accessible form every
where of the application. Global variables also resides in DATA segment
Since it can be accessed every where in the application and hence
EXTERNAL Linkage
By default all functions are global. In case, if you need to
hide some functions in a file from outside, you can prefix the static
keyword to the function. :-)
Before talking about the question, it is better to know the term translation unit, program and some basic concepts of C++ (actually linkage is one of them in general) precisely. You will also have to know what is a scope.
I will emphasize some key points, esp. those missing in previous answers.
Linkage is a property of a name, which is introduced by a declaration. Different names can denote same entity (typically, an object or a function). So talking about linkage of an entity is usually nonsense, unless you are sure that the entity will only be referred by the unique name from some specific declarations (usually one declaration, though).
Note an object is an entity, but a variable is not. While talking about the linkage of a variable, actually the name of the denoted entity (which is introduced by a specific declaration) is concerned. The linkage of the name is in one of the three: no linkage, internal linkage or external linkage.
Different translation units can share the same declaration by header/source file (yes, it is the standard's wording) inclusion. So you may refer the same name in different translation units. If the name declared has external linkage, the identity of the entity referred by the name is also shared. If the name declared has internal linkage, the same name in different translation units denotes different entities, but you can refer the entity in different scopes of the same translation unit. If the name has no linkage, you simply cannot refer the entity from other scopes.
(Oops... I found what I have typed was somewhat just repeating the standard wording ...)
There are also some other confusing points which are not covered by the language specification.
Visibility (of a name). It is also a property of declared name, but with a meaning different to linkage.
Visibility (of a side effect). This is not related to this topic.
Visibility (of a symbol). This notion can be used by actual implementations. In such implementations, a symbol with specific visibility in object (binary) code is usually the target mapped from the entity definition whose names having the same specific linkage in the source (C++) code. However, it is usually not guaranteed one-to-one. For example, a symbol in a dynamic library image can be specified only shared in that image internally from source code (involved with some extensions, typically, __attribute__ or __declspec) or compiler options, and the image is not the whole program or the object file translated from a translation unit, thus no standard concept can describe it accurately. Since symbol is not a normative term in C++, it is only an implementation detail, even though the related extensions of dialects may have been widely adopted.
Accessibility. In C++, this is usually about property of class members or base classes, which is again a different concept unrelated to the topic.
Global. In C++, "global" refers something of global namespace or global namespace scope. The latter is roughly equivalent to file scope in the C language. Both in C and C++, the linkage has nothing to do with scope, although scope (like linkage) is also tightly concerned with an identifier (in C) or a name (in C++) introduced by some declaration.
The linkage rule of namespace scope const variable is something special (and particularly different to the const object declared in file scope in C language which also has the concept of linkage of identifiers). Since ODR is enforced by C++, it is important to keep no more than one definition of the same variable or function occurred in the whole program except for inline functions. If there is no such special rule of const, a simplest declaration of const variable with initializers (e.g. = xxx) in a header or a source file (often a "header file") included by multiple translation units (or included by one translation unit more than once, though rarely) in a program will violate ODR, which makes to use const variable as replacement of some object-like macros impossible.
I think Internal and External Linkage in C++ gives a clear and concise explanation:
A translation unit refers to an implementation (.c/.cpp) file and all
header (.h/.hpp) files it includes. If an object or function inside
such a translation unit has internal linkage, then that specific
symbol is only visible to the linker within that translation unit. If
an object or function has external linkage, the linker can also see it
when processing other translation units. The static keyword, when used
in the global namespace, forces a symbol to have internal linkage. The
extern keyword results in a symbol having external linkage.
The compiler defaults the linkage of symbols such that:
Non-const global variables have external linkage by default
Const global variables have internal linkage by default
Functions have external linkage by default
Basically
extern linkage variable is visible in all files
internal linkage variable is visible in single file.
Explain: const variables internally link by default unless otherwise declared as extern
by default, global variable is external linkage
but, const global variable is internal linkage
extra, extern const global variable is external linkage
A pretty good material about linkage in C++
http://www.goldsborough.me/c/c++/linker/2016/03/30/19-34-25-internal_and_external_linkage_in_c++/
Linkage determines whether identifiers that have identical names refer to the same object, function, or other entity, even if those identifiers appear in different translation units. The linkage of an identifier depends on how it was declared.
There are three types of linkages:
Internal linkage : identifiers can only be seen within a translation unit.
External linkage : identifiers can be seen (and referred to) in other translation units.
No linkage : identifiers can only be seen in the scope in which they are defined.
Linkage does not affect scoping
C++ only : You can also have linkage between C++ and non-C++ code fragments, which is called language linkage.
Source :IBM Program Linkage
In C++
Any variable at file scope and that is not nested inside a class or function, is visible throughout all translation units in a program. This is called external linkage because at link time the name is visible to the linker everywhere, external to that translation unit.
Global variables and ordinary functions have external linkage.
Static object or function name at file scope is local to translation unit. That is
called as Internal Linkage
Linkage refers only to elements that have addresses at link/load time; thus, class declarations and local variables have no linkage.