Suppose I have a large system with many object files such that link time is a problem. Suppose also that I know that many of the classes and functions in my system are not used outside their translation unit.
Is it reasonable to assume that if I reduce the number of symbols with external linkage, my link-time will be reduced?
If so, will putting the entities (e.g., classes and functions) that are used in only a single TU into unnamed namespaces do me any good? Technically, the entities with external linkage will retain their external linkage in an unnamed namespace, but, as the C++11 standard notes,
Although entities in an unnamed namespace might have external linkage, they are effectively qualified by a name unique to their translation unit and therefore can never be seen from any other translation unit.
Do linker algorithms perform optimizations based on the knowledge that entities with external linkage in unnamed namespaces aren't visible outside their namespaces?
Yes I think is does reduce the link time. I think this on the Google chromium stie:
"Unnamed namespaces restrict these symbols to the compilation unit, improving function call cost and reducing the size of entry point tables." Here the link
I know this is about the chromium project but it should apply to other c++ projects.
I don't see how a linker could do such optimizations, because by the time the linker gets a hold of the symbol(s) in question they look like ordinary decorated external-linkage symbols. Unless the linker has specific information about how the compiler decorates names in an anonymous namespace I can't see any way that it could optimize its work.
Have you confirmed that your linker is in fact CPU bound and not I/O bound? If it's not CPU bound already it's probably not going to help to reorganize your code.
Related
C++20 introduced modules. Any symbol that is not exported in a module has module-internal linkage. While unnamed namespaces provide a mechanism to make definitions inside an unnamed namespace have file-internal linkage. Does this mean unnamed namespaces will become useless in future when modules become common practice in C++ community?
No: since (many) compilers see just one translation unit at a time, it’s still useful for optimization to indicate that an entity cannot be used in any other. It also avoids the possibility of accidental collisions between module units (even if those should be less likely than with broader codebases).
From what I read in other SO answers, like this and this, compiler converts the source code into object file. And the object files might contain references to functions like printf that needs to be resolved by the linker.
What I don't understand is, when both declaration and definition exist in the same file, like the following case, does compiler or linker resolve the reference to return1?
Or is this just a part of the compiler optimization?
int return1();
int return2() {
int b = return1();
return b + 1;
}
int return1() {
return 1;
}
int main() {
int b = return2();
}
I have ensured that preprocessing has nothing to do with this by running g++ -E main.cpp.
Update 2020-7-22
The answers are all helpful! Thanks!
From the answers below, it seems to me that the compiler may or may not resolve the reference to return1. However, I'm still unclear if there's only one translation unit, like the example I gave, and if compiler did not resolve it, does this mean that linker must resolve it then?
Since it seems to me that linker will link several (greater than one) object files together, and if there's only one translation unit (object file), linker need to link the object file with itself, am I right?
And is there anyway to know for sure which one is the case on my computer?
It depends. Both options are possible, so are options that you didn't mention, like either the compiler or the linker rearranging the code so that none of the functions exist any more. It's fine thinking about compilers emitting references to functions and linkers resolving those references as a way of understanding C++, but bear in mind is that all the compiler and linker have to do is produce a working program and there are many different ways to do that.
One thing the compiler and linker must do however, is make sure that any calls to standard library functions happen (like printf as you mentioned), and happen in the order that the C++ source specifies. Apart from that (and some other similar concerns) they can more or less do as they wish.
[lex.phases]/1.9, covering the final phase of translation, states [emphasis mine]:
All external entity references are resolved. Library components are linked to satisfy external references to entities not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment.
It is, however, up to the compiler to decide whether a library component is a single translation unit or a combination of them; as governed by [lex.separate]/2 [emphasis mine]:
[ Note: Previously translated translation units and instantiation units can be preserved individually or in libraries. The separate translation units of a program communicate ([basic.link]) by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units can be separately translated and then later linked to produce an executable program. — end note ]
OP: [...] does compiler or linker resolve the reference to return1?
Thus, even if return1 has external linkage, as it is defined in the translation unit where it is referred to (in return2), the linker should not need to resolve the reference to it, as its is definition exists in the current translation. The standard passage is, however, (likely intentionally) a bit vague regarding requirements for when linking to satisfy external references need to occur, and I do not see it to be a non-compliant implementation to defer resolving the reference to return1 in return2 until the linking phase.
Practically speaking, the problem is with the following code:
static int return1();
int return2() {
int b = return1();
return b + 1;
}
int return1() {
return 1;
}
The problem for the linker is that each Translation Unit can now contain its own return1, so the linker would have a problem in choosing the right return1. There are tricks around this, e.g. adding the Translation Unit name to the function name. Most ABI's do not do that, but the C++ standard would allow it. For anonymous namespaces however, i.e. namespace { int function1(); }, ABI's will use such tricks.
I have been reading about declaring anonymous namespaces for achieving a lower linking-time.
However, I have read that declaring anonymous namespaces in header files are trully not recommended:
When an unnamed namespace is defined in a header file, it can lead to surprising results. Due to default internal linkage, each translation unit will define its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can cause unexpected results, bloat the resulting executable, or inadvertently trigger undefined behavior due to one-definition rule (ODR) violations.
The above is a quote extracted from the link below, in which there are several examples of anonymous namespaces' unexpected behaviors:
https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL59-CPP.+Do+not+define+an+unnamed+namespace+in+a+header+file
So, my questions are:
The mentioned problems only applies to anonymous-namespace variables, not methods. Is that right?
Does the same problem appears when using static keyword for forcing internal linkage with variables? If so, is there any other way to achive this in a safety way?
The mentioned problems only applies to anonymous-namespace variables, not methods. Is that right?
The mentionned problem happen to anything inside anonymous-namespace.
Does the same problem appears when using static keyword for forcing internal linkage with variables?
The same happens.
If so, is there any other way to achive this in a safety way?
There are not.
The ODR violation will soon or later happen if you put inside a header file which is included in different translation units any entity with internal linkage (class, variable, member function, template, etc...). You will soon have a problem if any entity with external linkage uses one of these entity with internal linkage in its definition or declaration.
Any entity declared inside an anonymous namespace, those declared static and not-extern const variables have internal linkage.
There are 2 partial solutions to what you are, supposedly, looking for:
Inline variables and functions can have their definitions appearing inside mutliple translation units, so it is safe to define them in header files.
If what your are looking for is not to make the names visibles outside of the library your are writting, define them in a private header and apply to them visibility attributes ([[gnu:visibility("hidden")]] or no __dllexprot for MSVC)
Are there any drawbacks to having a symbol with external linkage (other then global namespace clutter/collision)? For instance, I would think that if I have a function witch I never call, if it has internal linkage, the compiler can just discard it, but if it is external the compiler has to leave that code in because someone might link to it later. Is this correct? Are there any other drawbacks?
I am asking because I know unnamed namespaces are recommended instead of the static keyword, but since symbols in an unnamed namespace still have external linkage, they would suffer from the above mentioned drawback (if I am right about it), and so are not totally better than static functions like the standard says.
The fact that functions in unnamed namespaces have external linkage is almost entirely a technicality. Because they have a "secret" translation unit dependent unique identifier it is impossible to name them from a different translation unit. This means that compiler can assume that they are never called by name from another translation unit. Most implementations that I know of turn functions in unnamed namespaces into local symbols and not global symbols, just like functions with true internal linkage.
A function in an unnamed namespace can be discarded without affecting a program if it is never called from the translation unit in which it is defined and it never has its address taken and passed out of the translation unit which might lead to it being called other than be a direct named function call.
In a .cpp file, is there any difference/preference either way?
// file scope outside any namespace
using X::SomeClass;
typedef SomeClass::Buffer MyBuf;
v/s
namespace { // anonymous
using X::SomeClass;
typedef SomeClass::Buffer MyBuf;
}
I would say that the second usage is rather uncommon, at least in the code that I've seen so far (and I've seen quite a lot C++ of code). Could you explain what the reasoning behind the second technique is?
You will normally use an anonymous namespace in a C++ implementation file to achieve the same thing that 'static' would do in C (or C++, but we'll gloss over that), namely restricting the visibility of the symbols to the current translation unit. The typedef doesn't actually produce symbols that are exported for the linker to see as they don't create anything 'concrete' in the sense of anything concrete that you could link against.
My recommendation? I'd go with the first notation. The second one adds an unnecessary complication and in my opinion, doesn't buy you anything.
There is not much point in placing typedefs in anonymous namespaces. The main use for anonymous namespaces is to avoid symbol collision between translation units by placing definitions with external linkage in them.