Does the linker prevent duplicated linkage? - c++

After some research and tests, I've found that, when linking to a library, my project needs to have the same linking option for the runtime library (MT, MD, etc) as the library I'm linking to.
What I'm wondering is, if I use a static library, (which are usually statically linked to the runtime library), am I not linking twice with the runtime library, since it's statically linked in my library and in my application? Or the linker prevents this?

Usually the static library will not be linked to the runtime library. Instead, all references to the runtime will be left unresolved, i.e. your static library file will simply contain the object files of your code, but not the object files of the runtime library.
Only when you build an actual executable using that library do you link to the runtime library, which will resolve the open references from the static library.
This is usually the default behavior when compiling static linked libraries.
That being said, with most compilers you can force the linker to resolve external references already for the static library. This may be beneficial if your library has dependencies that you don't want to pass on to compilation of the final executable.
However, if that leads to duplicate symbols, as is likely when forcing early linking to the runtime, that may break the linker. If you're lucky you'll just get a warning about duplicate symbols, but it may just as well not link at all, depending on the implementation of your linker.

Related

Static linking third party libraries together with my C++ wrapper code

I am working on a little project to understand the chain of compiler and linker better.
Suppose that i have the libraries libfoo.a and libbar.a. I want to create a library libmy.a, that acts like a wrapper or top level API to both libraries. The target is, that only libmy.a should be required to build an executable, that uses my defined wrapper functions. I created a cmake project and set up the library
cmake_minimum_required(VERSION 3.14)
project(Wrapper)
set(CMAKE_CXX_STANDARD 11)
add_library(my STATIC ${SOME_SRC_FILES})
#set up the lib/inc paths and libs to link
target_include_directories(my PUBLIC /path/to/Foo/inc/ /path/to/Bar/inc/)
target_link_directories(my PUBLIC /path/to/Foo/lib/ /path/to/Bar/lib)
target_link_libraries(my PUBLIC foo bar)
That works fine and there is no problem in compilation. However, if I try to reference the object from an external project, it tells me, that I have undefined references to the functions in libfoo.a and libbar.a. As far as I understand the problem, the linker only creates a declaration in the libmy.a, without including its definition from the external library. I checked this by opening libmy.a with the nm libmy.a command, where the used functions of the external libraries are declared, but undefined.
I came across one solution that used ar to combine multiple library files. However I would like to avoid such methods, because if it is not a single library, but a bunch of, say 10 libraries, it is not suitable to search each library for a definition and copy it into libmy.a. Just throwing all libraries together isn't a solution either, because the file will get too big.
Is it importand to note, that one of these library packages is CUDA?
I am sure there is a solution, but I was not able to find one. Any help would be appreciated
The target is, that only libmy.a should be required to build an executable
This is already an unconventional goal for static libraries.
Static libraries normally only contain the object code built from the source code for that library. Users of that library must also link to the libraries that your library requires, because the definitions have not been copied in to your library when it was built.
Tools like ar can be used to combine multiple static libraries together, since they are just archives of object code. The tool can not predict which object code the end-user will use though, so it will bundle entire libraries. Otherwise, the end user may be looking for a definition that you left out and then need to link in a 2nd copy of the dependency lib anyways.
If you want to provide a library that has everything the end-user needs, cut down to what your wrapper actually uses, you can build a shared library. Shared libraries are considered executable, so the compiler knows that any unreferenced object code is not going to be used, and it will not be included in the shared library.
You can force the entire static libraries to be included in shared libraries though.
On GCC you can use the linker argument: --whole-archive to ensure that all of the object code from the following libraries is included.
On MSVC, you can use the /WHOLEARCHIVE:<library file name> argument to do the same.

Why can dynamic libraries link to other libraries, but static ones can't?

Consider the following code structure:
main.cpp -> depends on libone.a -> depends on libtwo.a
Assume that in main.cpp only functions from libone.a are used. So realistically the programmer writing main.cpp really only cares about libone.a. At this point they don't even know libone.a has a dependency on libtwo.a.
They attempt to compile their code as follows and get linker errors:
g++ -o main main.cpp -lone
-- Error! Undefined symbols!
This becomes an issue because since libone.a depends on libtwo.a, anyone who uses libone.a must know about this dependency... As you can imagine this problem can occur with FAR more dependencies than a single library and can quickly become a linking nightmare.
Attempt 1 at solving this issue:
A first thought to solve this issue was "It's simple, i'll just link libone.a with libtwo.a when I compile libone.a!
It turns out it isn't as simple as I had hoped... When compiling libone.a there is no way to link libtwo.a. Static libraries don't link to anything when you compile them, instead all of the dependencies must be linked when the libraries are compiled into an executable.
For example, to compile main.cpp that depends on a static library that in turn depends on another static library, you must link both libraries. ALWAYS.
g++ -o main main.cpp -lone -ltwo
Attempt 2 at solving this issue:
Another thought was to try and compile libone as a dynamic library that links to libtwo.a.
Oddly enough this just worked! After compiling and linking libone.so the main program only needs to care about libone.so and doesn't need to know about libtwo.a anymore.
g++ -o main main.cpp -lone
Success!
After going through this exercise one piece is still missing. I just can't seem to figure out any reason why static libraries can't link in other libraries, but dynamic ones can. As a matter of fact, the dynamic library, libone.so would not compile at all until I linked libtwo.a. That's fine though, because as the author of libone.so I would know about its dependency on libtwo.a - The author of main.cpp, however would not know. And realistically they should not have to know.
So down to the real question... Why can dynamic libraries link to other libraries like this while static ones cannot? This seems to be an obvious advantage dynamic libraries have over static ones, but I've never seen it mentioned anywhere!
A static library is just an archive of object files, there is no concept of dependency because it was never linked.
Shared libraries are linked, solving symbols, and they can have, as such, dependencies.
Since your question refers to gcc and .so/.a files, I’ll assume you’re using some flavor of Unix that uses ELF files for object code.
After going through this exercise one piece is still missing. I just
can't seem to figure out any reason why static libraries can't link in
other libraries, but dynamic ones can.
Static libraries are not linked, as was mentioned in another answer. They are just an archive of compiled object files. Shared libraries are in fact linked, which means the linker actually resolves all the symbols reachable by any exported symbol. Think of exported symbols as the library’s API. A fully linked shared library contains either the definition of each symbol, or the dependency information necessary to tell the OS (specifically the dynamic loader) what other shared libraries are needed to have access to the symbol. The linker assembles all that into a special file format called an ELF shared object (dynamic library).
As a matter of fact, the dynamic library, libone.so would not compile
at all until I linked libtwo.a. That's fine though, because as the
author of libone.so I would know about its dependency on libtwo.a -
The author of main.cpp, however would not know. And realistically they
should not have to know.
libone.so probably compiles fine, but won’t link without libtwo due to unresolved symbols. Because the linker must resolve all reachable symbols when linking a shared library, it will fail if it can’t find any. Since libone.so uses symbols in libtwo, the linker needs to know about libtwo.a to find them. When you link a static library into a shared library, the symbols are resolved by copying the definitions directly into the output shared object file, so at this point, users of libone.so can be none the wiser about its usage of libtwo since its symbols are just in libone.so.
The other option is to link shared libraries into other shared libraries. If you are linking libtwo.so into libone.so (note the .so suffix), then the linker resolves the symbols needed by libone by adding a special section to the output shared object file that says it needs libtwo.so at runtime. Later, when the OS loads libone.so, it knows it also needs to load libtwo.so. And, if your application only uses libone directly, that’s all you need to tell the linker at build time, since it’ll link in libone, see that it needs libtwo, and recursively resolve until everything is good.
Now, all that loading at runtime the OS has to do incurs a performance cost, and there are some gotchas with global static variables that exist in multiple shared objects if you aren’t careful. There are some other potential performance benefits for linking statically that I won’t go into here, but suffice it to say that using dynamic libraries isn’t quite as performant on average, but that difference is also negligible for most real world situations.

Should I link a C++ application to shared libraries which are used indirectly

Let's say you compile a C++ shared library libBeta.so which makes use of pre-existing C++ shared libraries libAlpha1.so, libAlpha2.so, libAlpha3.so, etc. If I then write a C++ application which uses libBeta.so directly (and therefore indirectly uses the other libraries), should I link my application to libBeta.so only, or should I link my application to all libraries?
My intuition tells me that I should only link to libBeta.so, because linking to all libraries seems redundant as libBeta.so is already linked to the other libraries. However, undefined reference to errors are proving my intuition wrong.
Could someone explain me why my intuition might be wrong in particular cases?
p.s.:
OS: Linux
Compiler: g++
EDIT
As it turns out the tool I was using for compiling has different behaviour for compiling an executable and compiling a shared library. Linkage to sub-libraries were being omitted when compiling a shared library :(
Shared libraries are fully linked entities, and you don't need to explicitly link to their dependencies.
This is unlike static libraries which is only a collection of object files. When you use a static library you must link to its dependencies. But for share libraries, no you don't need that.
If you get undefined references, then it's not for the dependencies of the shared libraries you link to. It's either that you are missing linking with your own code, or you actually link with a static library.
You only need to link with your direct dependency, libBeta.so.
Actually, a few years ago on some Linux distributions you could get away with having indirect dependencies in your executable -- in this case, say, on libAlpha1.so -- and as long as the dependency gets loaded at runtime, directly or indirectly, the dependency would get satisfied.
This is no longer the case.

Linking a shared object into other shared object C++ project

I am working in a very big C++ project to create a big shared object where we are using an external SDK which have several header files and several shared libraries which belong to each other. This means that the declaration of SDK classes are in the header files but their definitions are in the shared objects.
I understand that because of the declarations in header files I can compile this code.
But what I do not understand exactly is when do I have to specify the used shared objects for the linker explicitly?
Namely if I specify it (e.g. in cmake with target_link_libraries command) then the linker can check that a symbol will be in the shared library or not. But what happens if I do not specify it (i.e. there is not any -l[shared_object_name] flags in linkage)? My experience is (which surprised me) that is work properly (i.e. the whole building process finished). How can it possible?
In POSIX shared libraries, you can have undefined symbols in a shared library, and all will link just fine. As long as the executable is fully linked, there will be no linker errors.
That's done this way because dynamic libraries mimic the behaviour of static libraries, and static libraries can have undefined symbols (static libraries are not linked, to begin with).
If you come from a Windows background, then it will surprise you, because Windows DLLs cannot have undefined symbols.
If you are worried about this, you can check the linker options --no-undefined and --no-allow-shlib-undefined.
My experience is (which surprised me) that is work properly (i.e. the whole building process finished).
That seems unlikely.
In fact…
How can it possible?
It's not.
The only explanation is that you weren't using symbols defined inside those library files. They were either in header-only parts of the third-party code, or they weren't part of the third-party code at all.
Or you were building a shared library of your own. The ultimate executable would still need the third-party libraries linked in, though.

Why does the C++ linker require the library files during a build, even though I am dynamically linking?

I have a C++ executable and I'm dynamically linking against several libraries (Boost, Xerces-c and custom libs).
I understand why I would require the .lib/.a files if I choose to statically link against these libraries (relevant SO question here). However, why do I need to provide the corresponding .lib/.so library files when linking my executable if I'm dynamically linking against these external libraries?
The compiler isn't aware of dynamic linking, it just knows that a function exists via its prototype. The linker needs the lib files to resolve the symbol. The lib for a DLL contains additional information like what DLL the functions live in and how they are exported (by name, by ordinal, etc.) The lib files for DLL's contain much less information than lib files that contain the full object code - libcmmt.lib on my system is 19.2 MB, but msvcrt.lib is "only" 2.6 MB.
Note that this compile/link model is nearly 40 years old at this point, and predates dynamic linking on most platforms. If it were designed today, dynamic linking would be a first class citizen (for instance, in .NET, each assembly has rich metadata describing exactly what it exports, so you don't need separate headers and libs.)
Raymond Chen wrote a couple blog entries about this specific to Windows. Start with The classical model for linking and then follow-up with Why do we have import libraries anyway?.
To summarize, history has defined the compiler as the component that knows about detailed type information, whereas the linker only knows about symbol names. So the linker ends up creating the .DLL without type information, and therefore programs that want to link with it need some sort of metadata to tell it about how the functions are exported and what parameter types they take and return.
The reason .DLLs don't have all the information you need to link with them directly is is historic, and not a technical limitation.
For one thing, the linker inserts the versions of the libraries that exist at link time so that you have some chance of your program working if library versions are updated. Multiple versions of shared libraries can exist on a system.
The linker has the job of validating that all your undefined symbols are accounted for, either with static content or dynamic content.
By default, then, it insists on all your symbols being present.
However, that's just the default. See -z, and --allow-shlib-undefined, and friends.
Perhaps this dynamic linking is done via import libraries (function has __declspec(dllimport) before definition).
If this is the way than compilator expects that there's __imp_symbol function declared and this function is responsible for forwarding call to the right library dynamically loaded.
Those functions are generated during linkage of symbols with __declspec(dllimport) keyword
Here is a very SIMPLIFIED description that may help. Static linking puts all of the code needed to run your program into the executable so everything is found. Dynamic linking means some of the required code does not get put into the executable and will be found at runtime. Where do I find it? Is function x() there? How do I make a call to function x()? That is what the library tells the linker when you are dynamically linking.