C++ achieve internal linkage without using anonymous namespaces - c++

I have been reading about declaring anonymous namespaces for achieving a lower linking-time.
However, I have read that declaring anonymous namespaces in header files are trully not recommended:
When an unnamed namespace is defined in a header file, it can lead to surprising results. Due to default internal linkage, each translation unit will define its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can cause unexpected results, bloat the resulting executable, or inadvertently trigger undefined behavior due to one-definition rule (ODR) violations.
The above is a quote extracted from the link below, in which there are several examples of anonymous namespaces' unexpected behaviors:
https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL59-CPP.+Do+not+define+an+unnamed+namespace+in+a+header+file
So, my questions are:
The mentioned problems only applies to anonymous-namespace variables, not methods. Is that right?
Does the same problem appears when using static keyword for forcing internal linkage with variables? If so, is there any other way to achive this in a safety way?

The mentioned problems only applies to anonymous-namespace variables, not methods. Is that right?
The mentionned problem happen to anything inside anonymous-namespace.
Does the same problem appears when using static keyword for forcing internal linkage with variables?
The same happens.
If so, is there any other way to achive this in a safety way?
There are not.
The ODR violation will soon or later happen if you put inside a header file which is included in different translation units any entity with internal linkage (class, variable, member function, template, etc...). You will soon have a problem if any entity with external linkage uses one of these entity with internal linkage in its definition or declaration.
Any entity declared inside an anonymous namespace, those declared static and not-extern const variables have internal linkage.
There are 2 partial solutions to what you are, supposedly, looking for:
Inline variables and functions can have their definitions appearing inside mutliple translation units, so it is safe to define them in header files.
If what your are looking for is not to make the names visibles outside of the library your are writting, define them in a private header and apply to them visibility attributes ([[gnu:visibility("hidden")]] or no __dllexprot for MSVC)

Related

Are static or unnamed namespace still useful when header and implementation are separated?

As answered in this question, I learnt that the static keyword to the function means it can only be seen from the functions in that file. I think unnamed namespace can be used for the same purpose.
However, usually, implementation and header files are separated. So, it seems to me that a programmer can hide all the "private stuff" inside the implementation file by not writing the declaration of such private stuff in the header file.
Considering above, when are static and unnamed namespaces helpful? The only case I can come up with is where multiple implementation files correspond to a single header file.
Keeping a definition in implementation file does not make it private in any sense. Any other header or implementation file can declare that function and use it. It's not always a bad thing - I used parts of private implementations of libraries when I really needed it (but I do not recommend doing that).
The worse part of having such not-so-private implementation is its potential for One Definition Rule violation. ODR states that every* function or variable must have exactly one definition in the whole program. If there is more than one definition, behaviour is undefined**.
This means that when you have your not-so-private implementation in your file and nobody knows about it, they can unknowingly write a function with the same name and arguments and get an ODR violation.
It would be a good practice to use static or anonymous namespace for all free functions that should be limited to a single file. Functions that need to be used from other files cannot use this strategy, so to limit the risk of ODR violations you should use descriptive names and perhaps (named) namespaces. Just make sure you don't overuse namespaces.
Note: using anonymous namespaces doesn't make sense in header files. Anonymous namespace limits the scope of its content to the translation unit in which it exists, but header files are copied and pasted into (potentially) multiple TUs. The one use of anonymous namespaces is in header-only libraries, as described in this question - it allows to create global objects in header file without ODR violation (but at a cost that each TU has its own copy of that variable).
*except for template functions, inline functions, functions defined in class definition and a couple more. Even then, all definitions must be exactly the same.
**When I encountered it once, linker used random definition, whichever it saw at that moment. Hilarity and long debugging sessions ensued.

Two objects with the same name in the same namespace in different cpp files

I've used MATCHER_P from gmock to create some matchers in different *.cpp test files. They happened to be defined in the same namespace and had the same name. When running the tests, I've got a segfault, because a test from fooTest.cpp used a matcher from barTest.cpp, even though a matcher with the same name was declared in fooTest.cpp and barTest.cpp was obvoiusly not included.
What is going on here? Why does the test from fooTest.cpp even see the matcher declared in barTest.cpp? Shouldn't it be confined to the scope of the file that it's declared in? And if not why am I not getting a compilation error that says something about an "ambiguous call?"
The compiler is not required to diagnose violations of the One Definition Rule, so your program exhibits undefined behavior.
From [basic.def.odr]
Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in
that program outside of a discarded statement; no diagnostic required.
IIRC some linkers can warn about these things, but again there's no requirement.
Best to simply move one of those functions into a different namespace.
Using my crystal ball, your problem is that you have to identically named inline methods or functions. Functions defined in the body of a class are implicitly inline.
When you have two different inline functions with the same name, the linker silently discards all but one of them. Such names include the names of the class they are part of.
If the implementation of one or the other is different, this leads to problems. One classic problem is inlined non-trivial zero argument constructors for objects of different size; one of them is discarded, and now you are clearing the wrong amount of memory for the wrong class.
This could be laid at the feet of a one definition rule violation -- the very existence of your two classes is a ODR violation and makes your program ill formed, no diagnostic required.
But knowing the step that actually caused your program to keel over is useful.
The fix this this is to always define everything in a cpp file within an anonymous namespace. Anything that defines a symbol in a header file should fully qualify it. Now no definitions in one cpp file can collide with definitions in another accidentally.

How does this header-only library guard against linker problems?

After reading this question I thought I understood everything, but then I saw this file from a popular header-only library.
The library uses the #ifndef line, but the SO question points out that this is NOT adequate protection against multiple definition errors in multiple TUs.
So one of the following must be true:
It is possible to avoid multiple definition linker errors in ways other than described in the SO question. Perhaps the library is using techniques not mentioned in to the other SO question that are worthy of additional explanation.
The library assumes you won't include its header files in more than translation unit -- this seems fragile since a robust library shouldn't make this assumption on its users.
I'd appreciate having some light shed on this seemingly simple curiosity.
A header that causes linking problems when included in multiple translation units is one that will (attempt to) define some object (not just, for an obvious example, a type) in each source file where it's included.
For example, if you had something like: int f = 0; in a header, then each source file into which it was included would attempt to define f, and when you tried to link the object files together you'd get a complaint about multiple definitions of f.
The "technique" used in this header is simple: it doesn't attempt to define any actual objects. Rather, it includes some typedefs, and the definition of one fairly large class--but not any instances of that class or any instance of anything else either. That class includes a number of member functions, but they're all defined inside the function definition, which implicitly defines them as inline functions (so defining separately in each translation unit in which they're used is not only allowed, but required).
In short, the header only defines types, not objects, so there's nothing there to cause linker collisions when it's included in multiple source files that are linked together.
If the header defines items, as opposed to just declaring them, then it's possible to include it in more than one translation unit (i.e. cpp file) and have multiple definitions and hence linker errors.
I've used boost's unit test framework which is header only. I include a specified header in only one of my own cpp files to get my project to compile. But I include other unit test headers in other cpp files which presumably use the items that are defined in the specified header.
Headers only include libraries like Boost C++ Libraries use (mostly) stand-alone templates and as so are compiled at compile-time and don't require any linkage to binary libraries (that would need separate compilation). One designed to never need linkage is the great Catch
Templates are a special case in C++ regarding multiple definitions, as long as they're the same. See the "One Definition Rule" section of the C++ standard:
There can be more than one definition of a class type (Clause 9),
enumeration type (7.2), inline function with external linkage (7.1.2),
class template (Clause 14), non-static function template (14.5.6),
static data member of a class template (14.5.1.3), member function of
a class template (14.5.1.1), or template specialization for which some
template parameters are not specified (14.7, 14.5.5) in a program
provided that each definition appears in a different translation unit,
and provided the definitions satisfy the following requirements. ....
This is then followed by a list of conditions that make sure the template definitions are identical across translation units.
This specific quote is from the 2014 working draft, section 3.2 ("One Definition Rule"), subsection 6.
This header file can indeed be included in difference source files without causing "multiple symbol definition" errors.
This happens because it is fine to have multiple identically named symbols in different object files as long as these symbols are either weak or local.
Let's take a closer look at the header file. It (potentially) defines several objects like this helper:
static int const helper[] = {0,7,8,13};
Each translation unit that includes this header file will have this helper in it. However, there will be no "multiple symbol definition" errors, since helper is static and thus has internal linkage. The symbols created for helpers will be local and linker will just happily put them all in the resulting executable.
The header file also defines a class template connection. But it is also okay. Class templates can be defined multiple times in different translation units.
In fact, even regular class types can be defined multiple times (I've noticed that you've asked about this in the comments). Symbols created for member functions are usually weak symbols. Once again, weak symbols don't cause "multiple symbol definition" errors, because they can be redefined. Linker will just keep redefining weak symbols with names he has already seen until there will be just one symbol per member function left.
There are also other cases, where certain things (like inline functions and enumerations) can be defined several times in different translation units (see ยง3.2). And mechanisms of achieving this can be different (see class templates and inline functions). But the general rule is not to place stuff with global linkage in header files. As long as you follow this rule, you're really unlikely to stumble upon multiple symbol definitions problems.
And yes, include guards have nothing to do with this.

External Linkge drawbacks

Are there any drawbacks to having a symbol with external linkage (other then global namespace clutter/collision)? For instance, I would think that if I have a function witch I never call, if it has internal linkage, the compiler can just discard it, but if it is external the compiler has to leave that code in because someone might link to it later. Is this correct? Are there any other drawbacks?
I am asking because I know unnamed namespaces are recommended instead of the static keyword, but since symbols in an unnamed namespace still have external linkage, they would suffer from the above mentioned drawback (if I am right about it), and so are not totally better than static functions like the standard says.
The fact that functions in unnamed namespaces have external linkage is almost entirely a technicality. Because they have a "secret" translation unit dependent unique identifier it is impossible to name them from a different translation unit. This means that compiler can assume that they are never called by name from another translation unit. Most implementations that I know of turn functions in unnamed namespaces into local symbols and not global symbols, just like functions with true internal linkage.
A function in an unnamed namespace can be discarded without affecting a program if it is never called from the translation unit in which it is defined and it never has its address taken and passed out of the translation unit which might lead to it being called other than be a direct named function call.

When is it appropriate to use static (over unnamed namespaces) in C++?

I have been reading articles about unnamed namespaces the whole day, most articles explained when you should use unnamed namespaces over the static keyword. But I am still left with one big question when is it appropriate to use static? After all it is not completely deprecated, what about header files with static functions should I put them into unnamed namespaces now?
#ifndef HEADER_H
#define HEADER_H
static int func() {
...
}
// versus:
namespace {
int func() {
...
}
};
#endif // HEADER_H
Or what about static member functions?
Greetings
The precise wording of the standard is:
The use of the static keyword is deprecated when declaring objects in namespace scope.
Functions in a header file should be inline rather than static or in an unnamed namespace. inline means you will only end up with at most one copy of the function in your program, while the other methods will give you a separate copy from each file that includes the header. As well as bloat, this could give incorrect behaviour if the function contains function-static data. (EDIT: unless the function is supposed to have different definitions in different compilation units, perhaps due to different preprocessor macros that are defined before including the header file. In that case the best approach is not to include it at all, but rather to bury it in an unmarked grave with a stake through its unholy heart.)
Data objects, apart from constants, usually shouldn't be defined in header files at all, only declared extern.
Static member functions are a different kettle of fish, and you have to use static there as there is no other way to declare them. That usage isn't deprecated, since it isn't in namespace scope.
UPDATE: C++11 has removed the deprecation, so there's no longer any particular reason to prefer unnamed namespaces over static. But you still shouldn't use either in a header file unless you're doing something weird.
There is no advantage of static in namespace scope over unnamed namespaces which I know of. Use unnamed namespaces, as in the example above. Actually in the example above I can't see why is static or unnamed namespace necessary. Maybe inline?
And static member functions have nothing to do with static at namespace scope. Static member functions (and nonfunction members) are still valid.
In a header file there's usually no point in specifying internal linkage, or using an anonymous namespace.
In a separately compiled implementation file you can use static or an anonymous namespace to avoid linkage level name collisions or client code relying on implementation details. An anonymous namespace lets you have external linkage, which is required for template parameters, and it also supports class definitions. But in the end it's just a question of practicality and personal preference, on a case by case basis.
static for a member function has nothing to do with linkage specification. A static member function has external linkage anyway.
static can be preferable if you are using some tools. It also behaves a bit better with auto indent functionality in most text editors. It's a bit sad, when you have to avoid using something useful because it doesn't work with tools that should really support it, but I promise you'll get over it.
An example question that gives some hint of potential pain in the debugging department:
Viewing namespaced global variables in Visual Studio debugger?
You probably won't have to look very hard to find more problems, but the debugger issues were enough to make me entirely give up on namespaces, to the maximum extent possible, so I've never looked any further.
My personal recommendation is that for code that will be around for less than forever, one might as well go with static. Does effectively the same thing as an unnamed namespace, but, averaged out over all tools, it is better supported. In theory, one day it will disappear completely, but I'm happy to publicly admit that I'm certain that day will never actually come to pass. And in the mean time, you are saving yourself some pain.