How does linker know which symbols should be resolved at runtime? - c++

How does linker know which symbols should be resolved at runtime? Particularly I'm interested what information shared object files carry that instruct linker to resolve symbols at runtime. How does the dynamic symbol resolution work at runtime, i.e. what executable will do to find the symbol and in case multiple symbols with the same name were defined which would be found?
What happens if the file was linked only statically, but then it's linked dynamically at run-time as part of a shared library? Which symbol will be used by the executable? In other words, is that possible to override symbols in an executable by linking those symbols into a shared library?
The platform in question is SUN OS.

Try the below link. I hope it answers your question
http://www.linuxjournal.com/article/6463

Check out this article from Linux Journal. For more information -- perhaps specifically related to Windows, AIX, OSx, etc -- I would recommend the Wikipedia article on Linker (computing) and the references therein.

If a file is statically linked there is no run time resolution to speak of. If a shared object links to that same library either dynamically or statically, the version linked to the library will only effect code executed in the library. This can cause problems if you link to two different versions of the same library that are incompatible and shift data back and forth.

Related

Linking a shared object into other shared object C++ project

I am working in a very big C++ project to create a big shared object where we are using an external SDK which have several header files and several shared libraries which belong to each other. This means that the declaration of SDK classes are in the header files but their definitions are in the shared objects.
I understand that because of the declarations in header files I can compile this code.
But what I do not understand exactly is when do I have to specify the used shared objects for the linker explicitly?
Namely if I specify it (e.g. in cmake with target_link_libraries command) then the linker can check that a symbol will be in the shared library or not. But what happens if I do not specify it (i.e. there is not any -l[shared_object_name] flags in linkage)? My experience is (which surprised me) that is work properly (i.e. the whole building process finished). How can it possible?
In POSIX shared libraries, you can have undefined symbols in a shared library, and all will link just fine. As long as the executable is fully linked, there will be no linker errors.
That's done this way because dynamic libraries mimic the behaviour of static libraries, and static libraries can have undefined symbols (static libraries are not linked, to begin with).
If you come from a Windows background, then it will surprise you, because Windows DLLs cannot have undefined symbols.
If you are worried about this, you can check the linker options --no-undefined and --no-allow-shlib-undefined.
My experience is (which surprised me) that is work properly (i.e. the whole building process finished).
That seems unlikely.
In fact…
How can it possible?
It's not.
The only explanation is that you weren't using symbols defined inside those library files. They were either in header-only parts of the third-party code, or they weren't part of the third-party code at all.
Or you were building a shared library of your own. The ultimate executable would still need the third-party libraries linked in, though.

Dynamic linking: is it possible to disable automatic loading of non used shared objects?

I have a limited knowledge of dynamic libraries and I usually have problems related to libraries that I do not understand.
I recently learned of libraries from google search and especially from the following links:
Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?.
http://www.ibm.com/developerworks/library/l-dynamic-libraries/. That article was very useful in understanding the dynamic libraries and their usage:
If I understood well (correct me if I am wrong), there are two possible usages of shared objects:
dynamic linking: the shared object is automatically loaded by the dynamic linker when the program starts.
dynamic loading: the share object is loaded and used under the program control at runtime through the dynamic loading API (dlopen, dlerror, dlsym and dlclose). That option is useful for plugins.
If I got everything right, in the case of dynamic linking, all the symbols are verified at compilation time. This allows the compiler/linker to know exactly which shared object is effectively used by the program and which one is not used.
Now, it happens that the dynamic linker is always invoked at runtime even if the shared object is not used. It can be verified by linking an empty program against libraries that are not in locations searchable at runtime, and the execution will fail. Linking a program against library that is not actually used in the program can happen when there are updates and the use of a library is no longer necessary. It also happen when one isolates a part of the program for debugging, and link against all the libraries of the main program.
My question is: is there an option to ask the compiler/linker to not include reference to shared objects that do not have symbols referred to in the program?
Is there any issue that prevent the compiler from doing that?
The following posts share some similarities with the present question, but none of them has an accepted answer, nor an answer that satisfies my curiosity:
https://stackoverflow.com/questions/22617744/how-to-disable-the-runtime-checking-of-shared-object-if-they-are-not-used
Delay-Load equivalent in unix based systems
If you happen to use g++/ld there are a few suggestions spelled out on How to remove unused C/C++ symbols with GCC and ld?
For example:
gcc -Os -fdata-sections -ffunction-sections test.cpp -o test.o -Wl,--gc-sections
-dead_strip
-dead_strip_dylibs
However I'm actually not sure it's possible for the compiler to do this in the general case. Consider a dependent shared library that has a weak reference to the library that you want to remove from your link line: How would the compiler know that it's safe to remove the library and/or symbols at that point?

G++ Dynamic Library Linking Issues

I am trying to link a number of dynamic libraries into an application and running into problems with g++.
Consider:
libA.so
libB.so depends on libA.so
libC.so depends on libB.so
Application D depends directly on libC.so
If I try to link application D just to libC.so, I get unresolved symbols for the symbols in A and B. I feel as if the compiler should be able to figure it out, and when I use the intel compiler, it does. G++, however, can't figure out the linking. I would like my libraries and executables to only have to link to the things they directly need, not try to anticipate what the libraries they are using need.
I have also had problems when libA.so links to a static library, and when I try to compile the executable I get unresolved symbols from the static library that libA.so was supposed to be using.
I have seen a number of other people ask this and similar questions and get a variety of answers (Linking with dynamic library with dependencies), but the answers are all rather vague, often conflicting, and very much along the lines of "keep on trucking and RTFM".
I get the impression that link order matters. How so, and how do I know what order to link in?
Update
I believe that what is happening is something along the lines of libA.so contains two functions (AA and AB). libB.so needs AA and libC.so needs AB. When libB.so gets linked, g++ gets libA.so, sees that only AA is used, and drops AB. Then when libC.so is linked in, g++ sees that libA.so was already linked and doesn't revisit it, resulting in AB being undefined. I have seen documentation indicating that static libraries work this way, but would the compiler treat dynamic libraries the same way? If so, is there a way to work around it?
(You haven't shown the actual linker error, or provided nearly enough information about the problem, so what follows is partly guesswork...)
If I try to link application D just to libC.so, I get unresolved symbols for the symbols in A and B.
When linking an executable the GNU linker checks that all symbols are available. You can turn that off with --allow-shlib-undefined (to tell GCC to pass that to the linker use -Wl,--allow-shlib-undefined)
It is better not to use that option, but in that case the linker needs to know where to find libA.so and libB.so so it can check that the symbols needed by libC.so will be found. You can do that with the -rpath-link linker option
When using ELF or SunOS, one shared library may require another. This happens when an "ld -shared" link includes a shared library as one of the input files.
When the linker encounters such a dependency when doing a non-shared, non-relocatable link, it will automatically try to locate the required shared library and include it in the link, if it is not included explicitly.
So you should be able to fix the problem by using -Wl,-rpath-link,. to tell the linker to look in the current directory (.) for the libraries libC.so depends on.
I get the impression that link order matters. How so, and how do I know what order to link in?
Yes, link order matters. You should link in the obvious order ;-) If a file foo.cc depends on a library then put the library later in the linker line, so it will be found after processing foo.cc, and if that library depends on another library put that even later, so it will be processed after the earlier library that needs it. If you put a library at the start of the link line then the linker doesn't have any unresolved symbols to look up, so doesn't need to link to that library.
You need to explicitly specify all libraries that you directly use.
During static linking, the dependencies of the loaded .so are not used; when linking the main program, all symbols have to be found in either the main program itself, in a static library specified on the command line, or in a shared library specified on the command line.
This is where you get an error.
When the program is executed, the dependencies of dynamic libraries are loaded so that references from within other shared libraries can be resolved.
By the time the program runs, it might actually be linked (dynamically) against a different version of the shared library. This different version might have different dependencies, so the main program MUST NOT rely on the set of additional libraries that get loaded as dependencies.
This is why the static linker stops you early.

How does the linker locate code in stripped dynamic libraries?

It's common practice to strip a symbol table from a dynamic library (.dll on Windows, .dylib on OSX, and .so on Linux/Solaris/BSD). This makes sense because it drastically reduces the file size of the library, often more than 75 percent.
However, this one question's been bugging me: A stripped library has no symbol table. If I write an executable that references a function in this library, how does the operating system's dynamic linker know where to locate the section of code in the stripped library when there's no symbol table to provide this information?
This question comprises both the situation where the library was stripped before the executable was linked at compile-time and the situation where the library was stripped after the executable was linked at compile-time.
The symbols that are stripped when you run strip are the debugging symbols, not the names of actual exported symbols.
The dynamic symbols, the ones that the linker searches for, are still there, and can be listed by using the -D (Lists dynamic symbols) argument.

Why does the C++ linker require the library files during a build, even though I am dynamically linking?

I have a C++ executable and I'm dynamically linking against several libraries (Boost, Xerces-c and custom libs).
I understand why I would require the .lib/.a files if I choose to statically link against these libraries (relevant SO question here). However, why do I need to provide the corresponding .lib/.so library files when linking my executable if I'm dynamically linking against these external libraries?
The compiler isn't aware of dynamic linking, it just knows that a function exists via its prototype. The linker needs the lib files to resolve the symbol. The lib for a DLL contains additional information like what DLL the functions live in and how they are exported (by name, by ordinal, etc.) The lib files for DLL's contain much less information than lib files that contain the full object code - libcmmt.lib on my system is 19.2 MB, but msvcrt.lib is "only" 2.6 MB.
Note that this compile/link model is nearly 40 years old at this point, and predates dynamic linking on most platforms. If it were designed today, dynamic linking would be a first class citizen (for instance, in .NET, each assembly has rich metadata describing exactly what it exports, so you don't need separate headers and libs.)
Raymond Chen wrote a couple blog entries about this specific to Windows. Start with The classical model for linking and then follow-up with Why do we have import libraries anyway?.
To summarize, history has defined the compiler as the component that knows about detailed type information, whereas the linker only knows about symbol names. So the linker ends up creating the .DLL without type information, and therefore programs that want to link with it need some sort of metadata to tell it about how the functions are exported and what parameter types they take and return.
The reason .DLLs don't have all the information you need to link with them directly is is historic, and not a technical limitation.
For one thing, the linker inserts the versions of the libraries that exist at link time so that you have some chance of your program working if library versions are updated. Multiple versions of shared libraries can exist on a system.
The linker has the job of validating that all your undefined symbols are accounted for, either with static content or dynamic content.
By default, then, it insists on all your symbols being present.
However, that's just the default. See -z, and --allow-shlib-undefined, and friends.
Perhaps this dynamic linking is done via import libraries (function has __declspec(dllimport) before definition).
If this is the way than compilator expects that there's __imp_symbol function declared and this function is responsible for forwarding call to the right library dynamically loaded.
Those functions are generated during linkage of symbols with __declspec(dllimport) keyword
Here is a very SIMPLIFIED description that may help. Static linking puts all of the code needed to run your program into the executable so everything is found. Dynamic linking means some of the required code does not get put into the executable and will be found at runtime. Where do I find it? Is function x() there? How do I make a call to function x()? That is what the library tells the linker when you are dynamically linking.