hiding internal symbols in a library - c++

I am working with a library. The library has one file containing the interface that I actually want exposed to other programs, contained in foo.h and foo.cpp. It also contains a bunch of helper classes and utility functions, in files bar1.h, bar2.h, bar1.cpp, bar2.cpp, etc.
If I compile all of these files and stick them in a .lib, the problem I run into is that some of the symbols in the bar files have very common names that clash with those in other external libraries I need to link against.
If all of the code were in one single .cpp file, I know how to fix this: I can use static or namespace { } to stop the linker from exporting the internal symbols. But obviously I have to declare the stuff in bar extern if I want to access it in foo.
I can wrap all of the .cpp files in namespace baz { }. If I choose baz carefully, so that there is little chance of it conflicting with namespaces used in other libraries, that will substantially fix the problem. But ideally, nothing outside of the symbols in foo.h should get exported into my .lib. Is there a technique for doing this?

You can achieve this, however it comes at some cost:
in C++ you can have internal linkage. Anything inside a unnamed namespace has internal linkage* (see footnote), as well as static free functions (you should prefer the anonymous namespace).
Update: here's the C++11 standard quote from §3.5,4:
An unnamed namespace or a namespace declared directly or indirectly within an unnamed namespace has
internal linkage. All other namespaces have external linkage. A name having namespace scope that has not
been given internal linkage above has the same linkage as the enclosing namespace if it is the name of
— a variable; or
— a function; or
— a named class (Clause 9), or an unnamed class defined in a typedef declaration in which the class has the typedef name for linkage purposes (7.1.3); or
— a named enumeration (7.2), or an unnamed enumeration defined in a typedef declaration in which the enumeration has the typedef name for linkage purposes (7.1.3); or
— an enumerator belonging to an enumeration with linkage; or
— a template.
However, internal linkage applies to translation units, not to static libraries. So if you would use the usual approach putting each class in its own translation unit (=cpp), you could not define them inside anonymous namespaces because you could not link them together to build the library.
You can solve this dilemma by making the whole library one single translation unit: one header providing the library's public interface, one source with the function definitions, and anything else as headers, defined in anonymous namespaces:
mylib.hpp
class MyLib {
public:
int foo();
double bar(int i);
};
mylib.cpp
#include "mylib.hpp"
#include "mylibimpl.h"
int MyLib::foo() {
return fooimpl();
}
double MyLib::bar(int i) {
return BarImpl(i).do();
}
mylibimpl.h
namespace {
inline int fooimpl() { return 42; }
class BarImpl {
double d;
public:
BarImpl(int i) : d(i*3.42) {}
double do() { return 2*d; }
};
}
You'll now have one translation unit (mylib.o / mylib.lib), and all the *impl classes and functions cannot be seen from outside, because they have internal linkage.
The cost is that you have to reorganize the sources of your internal classes (e.g. to resolve circular dependencies) and that every simple change of the library's internal code will lead to one big recompilation of everything in the lib, just because there is only the single huge translation unit. So you should do this only when the library code itself is very stable or if the library is not too big.
The benefit besides the complete hiding of internal symbols is that the compiler will be able to pull out any optimization it wants, because no implementation details are hidden in different translation units.
*Footnote:
As was commented by Billy ONeal, in C++03 entities in anonymous namespaces have not necessarily internal linkage. However, if they have external linkage, they have names unique to their tranlsation unit and are effectively not accessible from outside that TU, meaning that this procedure works in C++03 as well.

Related

Linkage of a variable defined in a namepace and used in multiple translation units

In C, to use a variable in multiple translation units, we need to
make sure the variable has external linkage.
Similarly, in C++, if I want a variable defined in a namespace to be used in
multiple translation units, does the variable have to have external
linkage? How shall I use namespace and linkage together properly?
What is the default linkage of a variable defined in a namespace?
Thanks.
It works the same as C, except with 'namespace xyz { }' around it, so in your header, you'd have:
namespace xyz { extern int myglobal; }
and in the source file where you define it, its storage and initialize it, you would have
namespace xyz { int myglobal = 0; }
I'll say as an tangential style comment, that using globals in this way is really not very common in C++; there's usually better ways of creating a shared global state, like static class members (which allows you to create access restrictions by making it protected/private and limiting access through static member functions).

Is it proper to have a template function inside an anonymous namespace of a cpp file?

I wanted to have a template function inside an anonymous namespace of a cpp file, purely as a helper function for std::array types of different sizes. This function is not to be used anywhere outside this translation unit.
Quite surprisingly to me, this worked out right away when I tried it in MSVC 14.1 (simplified code):
namespace
{
template<std::size_t SIZE>
bool isPushed(std::uint32_t id, std::array<std::uint32_t, SIZE>& states)
{
if(id >= states.size())
{
return false;
}
return ((states[id] & 32U) > 0U);
}
}
Does this conform to the C++ standard?
From what I had known, templates always need to be declared (and often also implemented) in a header, why not in this case?
Does this conform to the C++ standard?
Absolutely.
From what I had known, templates always need to be declared (and often also implemented) in a header, why not in this case?
That is mostly true only if the template is used in multiple translation units (read .cpp files). There are ways to implement templates in .cpp files using extern template. See https://msdn.microsoft.com/en-us/library/by56e477.aspx.
However, when it is used only in one .cpp file, it is perfectly fine to define it in the .cpp file.
Additional info, in response to OP's comment
From https://timsong-cpp.github.io/cppwp/n3337/temp#4
A template name has linkage.
From https://timsong-cpp.github.io/cppwp/n3337/basic.link#2.2
— When a name has internal linkage, the entity it denotes can be referred to by names from other scopes in the same translation unit.
From https://timsong-cpp.github.io/cppwp/n3337/basic.link#4
An unnamed namespace or a namespace declared directly or indirectly within an unnamed namespace has internal linkage. All other namespaces have external linkage. A name having namespace scope that has not been given internal linkage above has the same linkage as the enclosing namespace if it is the name of
...
— a template.
From the above, we can conclude that isPushed has internal linkage. It can be referred to only in the translation unit.

Why is it that I can include a header file in multiple cpp files that contains const int and not have a compiler error?

Let's assume that I have files a.cpp b.cpp and file c.h. Both of the cpp files include the c.h file. The header file contains a bunch of const int definitions and when I compile them I get no errors and yet I can access those const as if they were global variables. So the question, why don't I get any compilation errors if I have multiple const definitions as well as these const int's having global-like scope?
This is because a const declaration at namespace scope implies internal linkage. An object with internal linkage is only available within the translation unit in which it is defined. So in a sense, the one const object you have in c.h is actually two different objects, one internal to a.cpp and one internal to b.cpp.
In other words,
const int x = ...;
is equivalent to
static const int x = ...;
while
int x;
is similar to
extern int x;
because non-const declarations at namespace scope imply external linkage. (In this last case, they aren't actually equivalent. extern, as well as explicitly specifying external linkage, produces a declaration, not a definition, of an object.)
Note that this is specific to C++. In C, const doesn't change the implied linkage. The reason for this is that the C++ committee wanted you to be able to write
const int x = 5;
in a header. In C, that header included from multiple files would cause linker errors, because you'd be defining the same object multiple times.
From the current C++ standard...
7.1.1 Storage class specifiers
7) A name declared in a namespace scope without a storage-class-specifier has external linkage unless it has internal linkage because of a previous declaration and provided it is not declared const. Objects declared const and not explicitly declared extern have internal linkage.
3.5 Program and Linkage
2) When a name has internal linkage, the entity it denotes can be referred to by names from other scopes in the same translation unit.
The preprocessor causes stuff defined in headers to be included in the current translation unit.
When you do so, you create a separate const variable in each object file for every constant in the header. It's not a problem, since they are const.
Real reason: because #define is evil and needs to die.
Some usages of #define can be replaced with inline functions. Some - with const variable declarations. Since #define tends to be in header files, replacing those with consts in place better work. Thus, the "consts are static by default" rule.

In a C++ namespace does the `static` qualifier have any effect when prefixing non-member subroutines declared in the header?

Consider:
namespace JohnsLib {
static bool foobar();
bool bar();
}
What implications does static have here?
It changes the linkage from "external" to "static", making it invisible to the linker, and unusable from other compilation units. (Well, if other compilation units also include the header, they get their own separate copy)
static at namespace scope means that it is local to a translation unit (i.e. source file). If you define the function in the header file and include this header into multiple C++ files, you won't get redefinition errors because all the functions will be unique(more correctly, the functions will have internal linkage). The same effect can be achieved by means of anonymous namespaces, for example
namespace JohnsLib
{
namespace
{
bool foobar() {definition here, won't cause redefinition errors}
}
bool bar();
}
The result of static keyword in namespace scope (global or user defined namespace) is that such define object will not have external linkage; that is, it will not be available from other translation units and cannot be used as a (non-type one i.e. reference or pointer) template parameter.
In the C++ programming Language Bjarne states In C and C++ programs,
the keyword static is (confusingly) used to mean "use internal
linkage". Don't use static except inside functions and classes.
In Sutter/Alexandrescu C++ Coding Standards Item 61 is "Don't define entities with linkage in a header file."

Do classes have external linkage?

I have 2 files A.cpp and B.cpp which look something like
A.cpp
----------
class w
{
public:
w();
};
B.cpp
-----------
class w
{
public:
w();
};
Now I read somewhere (https://en.cppreference.com/w/cpp/language/static) that classes have external linkage. So while building I was expecting a multiple definition error but on the contrary it worked like charm. However when I defined class w in A.cpp, I got the redefinition error which makes me believe that classes have internal linkage.
Am I missing something here?
The correct answer is yes, the name of a class may have external linkage. The previous answers are wrong and misleading. The code you show is legal and common.
The name of a class in C++03 can either have external linkage or no linkage. In C++11 the name of a class may additionally have internal linkage.
C++03
§3.5 [basic.link]
A name is said to have linkage when it might denote the same object,
reference, function, type, template, namespace or value as a name
introduced by a declaration in another scope
Class names can have external linkage.
A name having namespace scope has external linkage if it is the name
of
[...]
— a named class (clause 9), or an unnamed class defined in a typedef declaration in which the class has the typedef name for linkage
purposes (7.1.3)
Class names can have no linkage.
Names not covered by these rules have no linkage. Moreover, except as
noted, a name declared in a local scope (3.3.2) has no linkage. A name
with no linkage (notably, the name of a class or enumeration declared
in a local scope (3.3.2)) shall not be used to declare an entity with
linkage.
In C++11 the first quote changes and class names at namespace scope may now have external or internal linkage.
An unnamed namespace or a namespace declared directly or indirectly
within an unnamed namespace has internal linkage. All other namespaces
have external linkage. A name having namespace scope that has not been
given internal linkage above [class names were not] has the same linkage
as the enclosing namespace if it is the name of
[...]
— a named class (Clause 9), or an unnamed class defined in a typedef
declaration in which the class has the typedef name for linkage
purposes (7.1.3);
The second quote also changes but the conclusion is the same, class names may have no linkage.
Names not covered by these rules have no linkage. Moreover, except as
noted, a name declared at block scope (3.3.3) has no linkage. A type
is said to have linkage if and only if:
— it is a class or enumeration type that is named (or has a name for
linkage purposes (7.1.3)) and the name has linkage; or
— it is an unnamed class or enumeration member of a class with linkage;
Some of the answers here conflate the abstract notion of linkage in the C++ Standard with the computer program known as a linker. The C++ Standard does not give special meaning to the word symbol. A symbol is what a linker resolves when combining object files into an executable. Formally, this is irrelevant to the notion of linkage in the C++ Standard. The document only ever addresses linkers in a footnote regarding character encoding.
Finally, your example is legal C++ and is not an ODR violation. Consider the following.
C.h
----------
class w
{
public:
w();
};
A.cpp
-----------
#include "C.h"
B.cpp
-----------
#include "C.h"
Perhaps this looks familiar. After preprocessor directives are evaluated we are left with the original example. The Wikipedia link provided by Alok Save even states this as an exception.
Some things, like types, templates, and extern inline functions, can
be defined in more than one translation unit. For a given entity, each
definition must be the same.
The ODR rule takes content into consideration. What you show is in fact required in order for a translation unit to use a class as a complete type.
§3.5 [basic.def.odr]
Exactly one definition of a class is required in a translation unit if
the class is used in a way that requires the class type to be
complete.
edit - The second half of James Kanze's answer got this right.
Technically, as Maxim points out, linkage applies to symbols, not to the
entities they denote. But the linkage of a symbol is partially
determined by what it denotes: symbols which name classes defined at
namespace scope have external linkage, and w denotes the same entity
in both A.cpp and B.cpp.
C++ has two different sets of rules concerning the definition of
entities: some entities, like functions or variables, may only be
defined once in the entire program. Defining them more than once will
result in undefined behavior; most implementations will (most of the
time, anyway) give a multiple definition error, but this is not required
or guaranteed. Other entities, such as classes or templates, are
required to be defined in each translation unit which uses them, with
the further requirement that every definition be identical: same
sequence of tokens, and all symbols binding to the same entity, with a
very limited exception for symbols in constant expressions, provided the
address is never taken. Violating these requirements is also undefined
behavior, but in this case, most systems will not even warn.
The class declaration
class w
{
public:
w();
};
does not produce any code or symbols, so there is nothing that could be linked and have "linkage". However, when your constructor w() is defined ...
w::w()
{
// object initialization goes here
}
it will have external linkage. If you define it in both A.cpp and B.cpp, there will be a name collision; what happens then depends on your linker. MSVC linkers e.g. will terminate with an error LNK2005 "function already defined" and/or LNK1169 "one or more multiply defined symbols found". The GNU g++ linker will behave similar. (For duplicate template methods, they will instead eliminate all but one instance; GCC docs call this the "Borland model").
There are four ways to resolve this problem:
If both classes are identical, put the definitions only into one .cpp file.
If you need two different, externally linked implementations of class w, put them into different namespaces.
Avoid external linkage by putting the definitions into an anonymous namespace.
namespace
{
w::w()
{
// object initialization goes here
}
}
Everying in an anonymous namespace has internal linkage, so you may also use it as a replacement for static declarations (which are not possible for class methods).
Avoid creating symbols by defining the methods inline:
inline w::w()
{
// object initialization goes here
}
No 4 will only work if your class has no static fields (class variables), and it will duplicate the code of the inline methods for each function call.
External linkage means the symbol (function or global variable) is accessible throughout your program and Internal linkage means that it's only accessible in one translation unit. you explicitly control the linkage of a symbol by using the extern and static keywords and the default linkage is extern for non-const symbols and static (internal) for const symbols.
A name with external linkage denotes an entity that can be referenced via names declared in the same scope or in other scopes of the same translation unit (just as with internal linkage), or additionally in other translation units.
The program actually violates the One Definition Rule but it is hard for the compiler to detect the error, because they are in different compilation units. And even the linker seems cannot detect it as an error.
C++ allows a workaround to bypass the One Definition Rule by making use of namespace.
[UPDATE] From C++03 Standard
§ 3.2 One definition rule, section 5 states:
There can be more than one definition of a class type ... in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then each definition of D shall consist of the same sequence of tokens.
Classes have no linkage to be pedantic.
Linkage only applies to symbols, that is, functions and variables, or code and data.
Seeing as you can't use static on a class, the only way to give a 'class' static linkage is to define the type in an anonymous namespace. Otherwise, it will have extern linkage. I put class in quotation marks because a class, which is a type, does not have a linkage, instead it is referring to the linkage of the symbols defined in the class scope (but not the linkage of an object made using the class). This includes static members and methods and non-static methods, but not non-static members as they are only part of the class type definition and do not additionally declare / define actual symbols.
The 'class' having static linkage means that the members and methods that would have had external linkage or external comdat linkage now both have static linkage only -- they are now local symbols, although the effect of inline at the compiler level still applies (i.e. it does not emit a symbol if it is not referenced in the translation unit) -- it's just no longer an external comdat symbol at assembler level, it's a local symbol. This is the case even if the member or method of the class is defined out-of-line and out of an anonymous namespace, it will still have static linkage.
If you declare the class type in an anonymous namespace, you will not be able to define the type outside of an anonymous namespace and it will not compile. You need to define it in the same anonymous namespace or a different anonymous namespace in the translation unit (different anonymous namespace doesn't matter because they're all combined into the same anonymous anonymous namespace name _GLOBAL__N_1).
This is the only way to change the linkage of members or methods of a class / struct because static will make it a static member and does not change the linkage, static will be ignored on out of line definitions, and extern is not allowed on class members / functions.