I have an argument with another developer, I'd like to settle here over Dynamic Link vs. Static Link.
In Theory:
Say you have a library with 100 functions, each has significant amounts of code inside it:
int A()
int B()
int C()
..
..and so on...
And your application only calls or depends on one of them.
You have two methods at your disposal.
Build the library as a dynamic linked library
Build the library as a statically linked library
My colleague claims that linking the static library to our application, the compiler/linker will not add the code of the 99 unused functions into our executable. I claim it will. I claim in this scenario the only advantage is having a single executable and not having to distribute the library with our application, but it will not have significant size differences if we used a dynamically linked library approach.
Who is correct?
It can depend on a combination of how the code is organized, and what compiler flags you use.
Following the classic, simple model of things, the linker would link in whatever object files in the library were needed to satisfy the symbol references, so if your A(), B() and C() were each defined in different object files, only the object file that contained the symbol you actually used would be linked into the program (unless it, in turn, depended upon one or more of the others, in which case, the linker would find object files to satisfy those references as well, recursively, until it either satisfied them all, or found one it couldn't satisfy (at which time you'd get the standard "Unresolved external XXX" error message).
More recently, most compilers can "package" functions into separate "modules" without your having to put them into separate source files to create separate object files. Details vary, but can reduce (or eliminate) the necessity for having each source file as tiny as possible just to keep what ends up in the final executable to a minimum.
So, bottom line: at least for the most part, he's right and you're wrong.
It depends :-)
If you put each function in its own source file, or use the /Gy compile option, each function will be packaged in a separate section of the static library.
The linker will then be able to pick them up as needed, and only include the functions that are actually called.
Related
Background
First of all, I think this question goes beyond the C++ standard. The standard deals with multiple translation units (instantiation units) and thus multiple object modules, but does not seem to acknowledge the possibility of having multiple independently compiled and linked binary modules (i.e., .so files on Linux and .dll files on Windows). After all, the latter more of less enters into the world of application binary interface (ABI) that the standard leaves to implementations to consider at present.
When only a single binary module is involved, the following code snippet illustrates an elegant and portable (standard-compliant) solution to make singletons.
inline T& get() {
static T var{};
return var;
}
There are two things to note about this solution. First, the inline specifier makes the function a candidate to be included in multiple translation units, which is very convenient. Note that, the standard guarantees there is only a single instance of get() and the local static variable var in the final binary module (see here).
The second thing to note is that since C++11, initialization of static local variables is properly synchronized (see the Static local variables section here). So, concurrent invocations of get() is fine.
Now, I try to extend this solution to the case when multiple binary modules are involved. I find the following variant works with VC++ on Windows.
// dllexport is used in building the library module, and
// dllimport is used in using the library in an application module.
// Usually controlled by a macro switch.
__declspec(dllexport/dllimport) inline T& get() {
static T var{};
return var;
}
Note for non-Windows users: __declspec(dllexport) specifies that an entity (i.e., a function, a class or an object) is implemented (defined) in this module and is to be referenced by other modules. __declspec(dllimport), on the other hand, specifies that an entity is not implemented in this module and is to be found in some other module.
Since VC++ supports exporting and importing template instantiations (see here), the above solution can even be templated. For example:
template <typename T> inline
T& get() {
static T var{};
return var;
}
// EXTERN is defined to be empty in building the library module, and
// to `extern` in using the library module in an application module.
// Again, this is usually controlled by a macro switch.
EXTERN template __declspec(dllexport/dllimport) int& get<int>();
As a side note, the inline specifier is not mandatory here. See this S.O. question.
The Question
Since there is no __declspec(dllexport/import) equivalents in GCC and clang, is there a way to make a variant of the above solution that works on these two compilers?
Also, in Boost.Log, I noticed the BOOST_LOG_INLINE_GLOBAL_LOGGER_DEFAULT macro (see the Global logger objects section here). It is claimed to create singletons even if the application consists of multiple modules. If someone knows about the inner workings of this macro, explanations are welcome here.
Finally, if you know about any better solutions for making singletons, feel free to post it as an answer.
Since there is no __declspec(dllexport/import) equivalents in GCC and clang, is there a way to make a variant of the above solution that works on these two compilers?
First, this is not as much a compiler-related question but rather an underlying operating system one. GCC (and supposedly clang) do support __declspec(dllexport/import) on Windows and basically do the same as what MSVC does with the functions and object marked this way. Basically, the marked symbol is placed in a table of exported symbols from the dll (an export table). This table can be used, for instance, when you query for a symbol in a dll in run time (see GetProcAddress).
Along with the dll there comes an associated lib file that contains auxiliary data for linking your application with the dll. When you link your application with the library, the linker uses the lib file to resolve references to the dll symbols and compose the import table in your application binary. When the application starts, the OS (or rather the runtime loader component of the OS) uses the import table to find out what dlls your application depends on and what symbols it imports from those dlls. It then uses export tables in the dlls to resolve addresses of the referenced symbols in the dlls and complete the linking process.
The important side effect of this process is that only imported symbols are dynamically resolved and every symbol you dynamically link to is associated with a particular dll. You can have same-named symbols in multiple dlls and the application itself, and these symbols will refer to distinct entities as long as they are not exported. If they are exported, linking process will fail because of ambiguity. This makes process-wide singletons difficult on Windows. This also breaks some C/C++ language rules, because taking address of an object or function with external linkage (in the language terms) can produce different addresses in different parts of the program. On the other hand, the dlls are more self-contained and depend on the loading context in a lesser degree.
Things are significantly different on Linux and other POSIX-like OSs. When linked, for each shared object (which can be an so library or the application executable) a table of symbols is compiled. It lists both the symbols this shared object implements and the symbols it is missing. Additionally, the linker may embed into the shared object a list of other shared objects (optionally, with search paths) that could be used to resolve the missing symbols. The runtime loader includes a linker which loads the shared objects sequentially and constructs a global table of symbols comprising symbols from all shared objects. As that table is constructed, the duplicate symbols from multiple shared objects are resolved to a single implementation (as all implementations are considered equivalent, the first shared object in the load list that implements the symbol is used). Any missing symbols are also resolved as the subsequent shared objects in the link order are loaded.
The effect of this process is that each symbol with external linkage resolve to a single implementation in one of the shared objects, even if multiple shared objects implement it. This is more in line with the C/C++ language rules and makes it simpler to implement process-wide singletons. A simple function-local static variable, not marked in any special way, is enough.
Now, there are ways to influence the linking process, and in particular there are ways to limit the symbols that are exported from a shared object. The most common ways to do that are using symbol visibility and linker scripts. With these tools it is possible to achieve linking behavior very close to Windows, with all its pros and cons. Note that when you limit symbol visibility you do have to mark the symbols you intend to export from the shared object with the visibility attribute or pragma. There's no need to mark symbols for import though.
Also, in Boost.Log, I noticed the BOOST_LOG_INLINE_GLOBAL_LOGGER_DEFAULT macro (see the Global logger objects section here). It is claimed to create singletons even if the application consists of multiple modules. If someone knows about the inner workings of this macro, explanations are welcome here.
Boost.Log requires to be built as a shared library when it is used from a multi-module application. This makes it possible for it to have a process-wide storage of references to global loggers declared throughout the application (the storage is implemented within the Boost.Log dll/so). When you obtain a logger declared with the BOOST_LOG_INLINE_GLOBAL_LOGGER_DEFAULT or similar macro, the storage is first looked up for the reference to the logger. If it is not found, the logger is created and a reference to it is stored back to the internal storage. Otherwise the existing reference is used. Along with reference caching, this provides performance very close to a function-local static variable.
Finally, if you know about any better solutions for making singletons, feel free to post it as an answer.
Although this is not really an answer, you should generally avoid singletons. They are difficult to implement correctly and in a way that does not hamper performance. If you really do have to implement one then the solution similar to Boost.Log looks generic enough. Note however that with this solution it is generally not known which module created (and as such, 'owns') the singleton, so you can't unload any modules dynamically. There may be simpler ways that are case-specific, like exporting a function returning a reference to the local static object. If you want portability and support non-default symbol visibility by default, always explicitly export your symbols.
I'm far from fully understanding how the C++ linker works and I have a specific question about it.
Say I have the following:
Utils.h
namespace Utils
{
void func1();
void func2();
}
Utils.cpp
#include "some_huge_lib" // Needed only by func2()
namespace Utils
{
void func1() { /* Do something */ }
void func2() { /* Make use of some functions defined in some_huge_lib */ }
}
main.cpp
int main()
{
Utils::func1();
}
My goal is to generate as small binary files as possible.
Will some_huge_lib be included in the output object file?
Including or linking against large libraries usually won't make a difference unless you use that stuff. Linkers should perform dead code elimination and thus ensure that at build time you won't be getting large binaries with a lot of unused code (read your compiler/linker manual to find out more, this isn't enforced by the C++ standard).
Including lots of headers won't increase your binary size either (but it might substantially increase your compilation time, cfr. precompiled headers). Some exceptions stand for global objects and dynamic libraries (those can't be stripped). I also recommend to read this passage (gcc only) regarding separating code into multiple sections.
One last notice about performances: if you use a lot of position dependent code (i.e. code that can't just map to any address with relative offsets but needs some 'hotpatching' via a relocation or similar table) then there will be a startup cost.
This depends a lot on what tools and switches you use in order to link and compile.
Firstly, if link some_huge_lib as a shared library, all the code and dependencies will need to be resolved on linking the shared library. So yes, it'll get pulled in somewhere.
If you link some_huge_lib as an archive, then - it depends. It is good practice for the sanity of the reader to put func1 and func2 in separate source code files, in which case in general the linker will be able to disregard the unused object files and their dependencies.
If however you have both functions in the same file, you will, on some compilers, need to tell them to produce individual sections for each function. Some compilers do this automatically, some don't do it at all. If you don't have this option, pulling in func1 will pull in all the code for func2, and all the dependencies will need to be resolved.
Think of each function as a node in a graph.
Each node is associated with a piece of binary code - the compiled binary of the node's function.
There is a link (directed edge) between 2 nodes if one node (function) depends on (calls) another.
A static library is primarily a list of such nodes (+ an index).
The program starting-node is the main() function.
The linker traverses the graph from main() and links into the executable all the nodes that are reachable from main(). That's why it is called a linker (the linking maps the function call addresses within the executable).
Unused functions, do not have links from nodes in the graph emanating from main().
Thus, such disconnected nodes are not reachable and are not included in the final executable.
The executable (as opposed to the static library) is primarily a list of all nodes reachable from main() (+ an index and startup code among other things).
In addition to other replies, it must be said that normally linkers work in terms of sections, not functions.
Compilers typically have it configurable whether they put all of your object code into one monolithic section or split it into a number of smaller ones. For example, GCC options to switch on splitting are -ffunction-sections (for code) and -fdata-sections (for data); MSVC option is /Gy (for both). -fnofunction-sections, -fnodata-sections, /Gy- respectively to put all code or data into one section.
You might 'play' with compiling your modules in both modes and then dumping them (objdump for GCC, dumpbin for MSVC) to see the generated object file structure.
Once a section is formed by the compiler, for the linker it is a unit. Sections define symbols and refer to symbols defined in other sections. The linker will build dependency graph between the sections (starting at a number of roots) and then either disband or keep each of them entirely. So, if you have a used and an unused function in a section, the unused function will be kept.
There are both benefits and drawbacks in either mode. Turning splitting on means smaller executable files, but larger object files and longer linking times.
It has to also be noted that in C++, unlike C, there are certain situations where the One Definition Rule is relaxed, and multiple definitions of a function or data object are allowed (for example, in case of inline functions). The rules are formulated in such way that the linker is allowed to pick any definition.
From the point of view of sections, putting inline functions together with non-inline ones would mean that in a typical use scenario the linker would typically be forced to keep virtually every definition of every inline function; that would mean excessive code bloat. Therefore, such functions and data are normally put into their own sections regardless of compiler command line options.
UPDATE: As #janm correctly reminded in his comment, the linker must also be instructed to get rid of unreferenced sections by specifying --gc-sections (GNU) or /opt:ref (MS).
I've created a simple static library, contained in a .a file. I might use it in a variety of projects, some of which simply will not need 90% of it. For example, if I want to use neural networks, which are a part of my library, on an AVR microcomputer, I probably wont need a tonne of other stuff, but will that be linked in my code potentially generating a rather large file?
I intend to compile programs like this:
g++ myProg.cpp myLib.a -o prog
G++ will pull in only the object files it needs from your library, but this means that if one symbol from a single object file is used, everything in that object file gets added to your executable.
One source file becomes one object file, so it makes sense to logically group things together only when they are sure to be needed together.
This practice varies by compiler (actually by linker). For example, the Microsoft linker will pick object files apart and only include those parts that actually are needed.
You could also try to break your library into independent smaller parts and only link the parts you are really going to need.
When you link to a static library the linker pulls in things that resolve names used in other parts of the code. In general, if the name isn't used it doesn't get linked in.
The GNU linker will pull in the stuff it needs from the libraries you have specified on an object file by object file basis. Object files are atomic units as far as the GNU linker is concerned. It doesn't split them apart. The linker will bring in an object file if that object file defines one or more unresolved external references. That object file may have external references. The linker will try to resolve these, but if it can't, the linker adds those to the set of references that need to be resolved.
There are a couple of gotchas that can make for a much larger than needed executable. By larger than needed, I mean an executable that contains functions that will never be called, global objects that will never be examined or modified, during the execution of the program. You will have binary code that is unreachable.
One of these gotchas results when an object file contains a large number of functions or global objects. Your program might only need one of these, but your executable gets all of them because object files are atomic units to the linker. Those extra functions will be unreachable because there's no call path from your main to these functions, but they're still in your executable. The only way to ensure that this doesn't happen is to use the "one function per source file" rule. I don't follow that rule myself, but I do understand the logic of it.
Another set of gotchas occur when you use polymorphic classes. A constructor contains auto-generated code as well as the body of the constructor itself. That auto-generated code calls the constructors for parent classes, inserts a pointer to the vtable for the class in the object, and initializes data members per the initializer list. These parent class constructors, the vtable, and the mechanisms to process the initializer list might be external references that the linker needs to resolve. If the parent class constructor is in a larger header file, you've just dragged all that stuff into your executable.
What about the vtable? The GNU compiler picks a key member function as the place to store the vtable. That key function is the first member function in the class that does not have a an inline definition. Even if you don't call that member function, you get the object file that contains it in your executable -- and you get everything that that object file drags in.
Keeping your source files down to a small size once again helps with this "look what the cat dragged in!" problem. It's a good idea to pay special attention to the file that contains that key member function. Keep that source file small, at least in terms of stuff the cat will drag in. I tend to put small, self-contained member functions in that source file. Functions that will inevitably drag in a bunch of other stuff shouldn't go there.
Another issue with the vtable is that it contains pointers to all of the virtual functions for a class. Those pointers need to point to something real. Your executable will contain the object files that define each and every virtual function defined for a class, including the ones you never call. And you're going to get everything that those virtual functions drag in as well.
One solution to this problem is to avoid making big huge classes. They tend to drag in everything. God classes in particular are problematic in this regard. Another solution is to think hard about whether a function really does need to be virtual. Don't just make a function virtual because you think someday someone will need to overload it. That's speculative generality, and with virtual functions, speculative generality comes with a high cost.
I am currently working on a project that has a number of COM objects written in C++ with ATL.
Currently, they are all defined in .cpp and .idl files that are directly compiled into the COM DLL.
To allow unit tests to be written easier, I am planning on moving the implementation of the COM objects out into a separate static library. That library can then be linked in to the main DLL, and the separate unit test project.
I am assuming that there's nothing particularly special about the code generated by ATL, and that this will work much like all other C++ code when it comes to linking with static libraries. However, I don't have too much actual knowledge of ATL myself so don't know if this is really the case.
Will this work as I'm expecting? Or are there pitfalls that I should look out for?
There are gotchas since LIBs are pulled in only if they are referenced, as opposed to OBJs which are explicitly included.
Larry Osterman discussed some of the subtleties a few years ago:
When I moved my code into a library, what happened to my ATL COM
objects?
A caveat: This post discusses details of how ATL7 works. For other
version of ATL, YMMV. The general principals apply for all
versions, but the details are likely to be different.
My group’s recently been working on reducing the number of DLLs
that make up the feature we’re working on (going from somewhere
around 8 to 4). As a part of this, I’ve spent the past couple of
weeks consolidating a bunch of ATL COM DLL’s.
To do this, I first changed the DLLs to build libraries, and then
linked the libraries together with a dummy DllInit routine (which
basically just called CComDllModule::DllInit()) to make the DLL.
So far so good. Everything linked, and I got ready to test the new
DLL.
For some reason, when I attempted to register the DLL, the
registration didn’t actually register the COM objects. At that
point, I started kicking my self for forgetting one of the
fundamental differences between linking objects together to make an
executable and linking libraries together to make an executable.
To explain, I’ve got to go into a bit of how the linker works. When
you link an executable (of any kind), the linker loads all the
sections in the object files that make up the executable. For each
extdef symbol in the object files, it starts looking for a public
symbol that matches the symbol.
Once all of the symbols are matched, the linker then makes a second
pass combining all the .code sections that have identical contents
(this has the effect of collapsing template methods that expand into
the same code (this happens a lot with CComPtr)).
Then a third pass is run. The third pass discards all of the
sections that have not yet been referenced. Since the sections
aren’t referenced, they’re not going to be used in the resulting
executable, so to include them would just bloat the executable.
Ok, so why didn’t my ATL based COM objects get registered? Well,
it’s time to play detective.
Well, it turns out that you’ve got to dig a bit into the ATL code to
figure it out.
The ATL COM registration logic gets picked in the CComModule
object. Within that object, there’s a method
RegisterClassObjects, which redirects to
AtlComModuleRegisterClassObjects. This function walks a list of
_ATL_OBJMAP_ENTRY structures and calls the RegisterClassObject
on each structure. The list is retrieved from the
m_ppAutoObjMapFirst member of the CComModule (ok, it’s really a
member of the _ATL_COM_MODULE70, which is a base class for the
CComModule). So where did that field come from?
It’s initialized in the constructor of the CAtlComModule, which
gets it from the __pobjMapEntryFirst global variable. So where’s
__pobjMapEntryFirst field come from?
Well, there are actually two fields of relevance,
__pobjMapEntryFirst and __pobjMapEntryLast.
Here’s the definition for the __pobjMapEntryFirst:
__declspec(selectany) __declspec(allocate("ATL$__a")) _ATL_OBJMAP_ENTRY* __pobjMapEntryFirst = NULL;
And here’s the definition for __pobjMapEntryLast:
__declspec(selectany) __declspec(allocate("ATL$__z")) _ATL_OBJMAP_ENTRY* __pobjMapEntryLast = NULL;
Let’s break this one down:
__declspec(selectany): __declspec(selectany) is a directive to
the linker to pick any of the similarly named items from the section
– in other words, if a __declspec(selectany) item is found
in multiple object files, just pick one, don’t complain about it
being multiply defined.
__declspec(allocate("ATL$__a")): This one’s the one that makes
the magic work. This is a declaration to the compiler, it tells the
compiler to put the variable in a section named "ATL$__a" (or
"ATL$__z").
Ok, that’s nice, but how does it work?
Well, to get my ATL based COM object declared, I included the
following line in my header file:
OBJECT_ENTRY_AUTO(<my classid>, <my class>)
OBJECT_ENTRY_AUTO expands into:
#define OBJECT_ENTRY_AUTO(clsid, class) \
__declspec(selectany) ATL::_ATL_OBJMAP_ENTRY __objMap_##class = {&clsid, class::UpdateRegistry, class::_ClassFactoryCreatorClass::CreateInstance, class::_CreatorClass::CreateInstance, NULL, 0, class::GetObjectDescription, class::GetCategoryMap, class::ObjectMain }; \
extern "C" __declspec(allocate("ATL$__m")) __declspec(selectany) ATL::_ATL_OBJMAP_ENTRY* const __pobjMap_##class = &__objMap_##class; \
OBJECT_ENTRY_PRAGMA(class)
Notice the declaration of __pobjMap_##class above – there’s
that declspec(allocate("ATL$__m")) thingy again. And that’s where
the magic lies. When the linker’s laying out the code, it sorts
these sections alphabetically – so variables in the ATL$__a
section will occur before the variables in the ATL$__z section.
So what’s happening under the covers is that ATL’s asking the linker
to place all the __pobjMap_<class name> variables in the
executable between __pobjMapEntryFirst and __pobjMapEntryLast.
And that’s the crux of the problem. Remember my comment above about
how the linker works resolving symbols? It first loads all the items
(code and data) from the OBJ files passed in, and resolves all the
external definitions for them. But none of the files in the wrapper
directory (which are the ones that are explicitly linked) reference
any of the code in the DLL (remember, the wrapper doesn’t do much more
than simply calling into ATL’s wrapper functions – it doesn’t
reference any of the code in the other files.
So how did I fix the problem? Simple. I knew that as soon as the
linker pulled in the module that contained my COM class definition,
it'd start resolving all the items in that module. Including the
__objMap_<class>, which would then be added in the right location so that ATL would be able to pick it up. I put a dummy function call
called ForceLoad<MyClass> inside the module in the library, and
then added a function called CallForceLoad<MyClass> to my DLL
entry point file (note: I just added the function – I didn’t
call it from any code).
And voila, the code was loaded, and the class factories for my COM
objects were now auto-registered.
What was even cooler about this was that since no live code called
the two dummy functions that were used to pull in the library, pass
three of the linker discarded the code!
I've got an application that's using a static library I made. One .cpp file in the library has a static variable declaration, whose ctor calls a function on a singleton that does something- e.g. adds a string.
Now when I use that library from the application, my singleton doesn't seem to contain any traces of the string that was supposed to be added.
I'm definitely missing something but I don't know what..
If you have an object in a static library that is not EXPLICITLY used in the application. Then the linker will not pull that object from the lib into the application.
There is a big difference between static and dynamic libraries.
Dynamic Library:
At compile time nothing is pulled from the dynamic library. Extra code is added to explicitly load and resolve the symbols at run-time. At run time the whole library is loaded and thus object initializers are called (though when is implementation detail).
Static libraries are handled very differently:
When you link against a static library it pulls all the items that are not defined in application that are defined in the library into the application. This is repeated until there are no more dependencies that the library can resolve. The side effect of this is that objects/functions not explicitly used are not pulled form the library (thus global variables that are not directly accessed will not be pulled).
My memory of this is a bit hazy, but you might be getting hit with an initialization order problem. There are no guarantees in which order static variable initializers in different files get called, so if your singleton isn't initialized yet when your static variable in the library is being initialized, that might produce the effect you're seeing.
The way I've gotten around these problems is to have some sort of an explicit init function that does this stuff and that I call at the start of main or something. You might be able to fiddle with the order in which you give the object file and library arguments to the compiler (or linker, actually) because that's also worked for me, but that solution is a bit fragile because it depends not only on using the specific linker but probably also the specific version.
Refactor the classes doing static initialization so they do not depend on any other such classes. That is, make each class's initialization independent and self-sufficient.