Linking and Loading of static library - c++

My question is how exactly the linker works.
I am linking an executable with multiple third-party static libraries. Out of those static libraries, only a few of them are used by the executable. In the above case, does linker links only to the libraries whose functions are referenced in the executable?
If a static library has multiple object files and only one is used by executable, does it only links to that object file ? or its links to the whole static library but loads only the object file which is used?

For your first question if no symbols from a given library are used it will usually not be included in the final product.
Regarding object files the linker likely won't even include full object files but only symbols that are actually referenced, though your linker may have flags that change this behavior and cause the entire library to be included.

... how exactly the linker works.
a) ... does linker links only to the libraries whose functions are referenced
in the executable?
b) ... static library has multiple object files and only one is used by executable, does it only links to that object file ?
It depends ... on Linux there are two kinds of libraries ... ".so", and the .a (archive).
example:
/usr/lib/x86_64-linux-gnu/libgmpxx.a
/usr/lib/x86_64-linux-gnu/libgmpxx.so
If you specify the .a in the link portion of your build command, only the contained object files referenced by your app will be linked (not the whole library). This executable is 'stand-alone', and every copy running has its own copy of any functions it uses.
If you specify the .so in the link portion of your build command, and your app is the first to use a particular ".so" lib, I believe your app will be briefly suspended during its start-up while the WHOLE ".so" lib is loaded.
If you specify the .so in the link portion of your build command, and your app is not-the-first to use this particular .so, then the loader will add to your app a mapping to the already-loaded-'.so' in system memory. (a much faster connection)
Executable's using .so's rely on the system to have loaded the .so libraries into memory, and to memory-map the library into the app memory and complete the links of the app to the required functions.
I believe your 'static library' corresponds to the use of ".a" (archive) library.
a) yes - the linker (sometimes linking-loader) 'finishes' when there are no more unresolved references (to objects or functions).
b) yes - see a)

Related

How does a linker produce a library? What are the contents of that library?

Referring to this answer: https://stackoverflow.com/a/6264256/5324086,
I found that a linker has even more functionality than just managing absolute addresses for object file symbols.
What does the library produced by linker contain? Is it something other than ... say a C Standard library?
Why does the linker even need to produce a library?
The exact details depend on the type of library (you can search for shared library formats) but the basic components will include the compiled code, plus a symbol table that tells the linker which address corresponds to each name. Note that this is very similar to an object file. Static libraries are basically archives of object files and the compiler links them in a similar way. With dynamic libraries, the OS can look this up whenever it loads a program, and link the symbols then. They won't generally have the same absolute addresses in every program's address space, so these addresses will be relative to where the OS loads the library.
The C standard library (MSVC runtime on Windows) is an example of a library.
Static libraries are just a collection of object files. You can think of them as a tar file containing all the relevant .a files (or, on Windows, as a zip file containing obj files). The linking part of the linker is not involved here (in facts traditionally static libraries on Unix systems are done with the ar utility, which is somehow related to tar). They are completely resolved at compile time, and they are simply used as a way to avoid rebuilding all the time stuff that is long to build or has complex build procedures.
Dynamic libraries are a different beast. They are fully fledged executables that can be loaded by other processes, so the regular linker is needed for the same reasons it is used in normal executables. Instead of providing just a single entrypoint, they export a full symbols table that is used by the loader (or "runtime linker") to allow the host program to locate the required procedures. Generally they also contain relocation information to allow loading at any address in the target address space (or they are compiled in position independent code for this same reason).

What is the difference between .o, .a, and .so files?

I know .o are object files, .a are static libraries and .so are dynamic libraries? What is their physical significance? When can I use some and when not?
.a is an "archive". Although an archive can contain any type of file, in the context of the GNU toolchain, it is a library of object files (other toolchains especially on Windows use .lib for the same purpose, but the format of these is not typically a general purpose archive, and often specific to the toolchain). It is possible to extract individual object files from an archive which is essentially what the linker does when it uses the library.
.o is an object file. This is code that is compiled to machine code but not (typically) fully linked - it may have unresolved references to symbols defined in other object files (in a library or individually) generated by separate compilation. Object files contain meta-data to support linking with other modules, and optionally also for source-level symbolic debugging (in GDB for example). Other toolchains, again typically on Windows, use the extension .obj rather than .o.
.so is a shared object library (or just shared library). This is dynamically linked to an executable when a program is launched rather then statically linked at build time. It allows smaller executables, and a single object library instance to be used by multiple executables. Operating system APIs are typically shared libraries, and they are often used also in GNU for licensing reasons to separate LGPL code from closed-source proprietary code for example (I am not a lawyer - I am making no claims regarding the legitimacy of this approach in any particular situation). Unlike .o or .a files, .so files used by an application must be available on the runtime system. Other systems (again typically Windows) use .dll (dynamic link library) for the same purpose.
It is perhaps useful to understand that .o files are linked before object code in .a files such that if a symbol resolution is satisfied by a .o file, any library implementation will not be linked - allowing you to essentially replace library implementations with your own, and also for library implementations to call user-defined code - for example a GUI framework might call an application entry-point.
Static libraries are archives that contain the object code for the library, when linked into an application that code is compiled into the executable.
Shared libraries are different in that they aren't compiled into the executable. Instead the dynamic linker searches some directories looking for the library(s) it needs, then loads that into memory. More then one executable can use the same shared library at the same time, thus reducing memory usage and executable size. However, there are then more files to distribute with the executable. You need to make sure that the library is installed onto the user's system somewhere where the linker can find it, static linking eliminates this problem but results in a larger executable file.
.so are shared library files.
.a are static library files.
You can statically link to .a libraries and dynamically link and load at runtime .so files, provided you compile and link that way.
.o are object files (they get compiled from *.c files and can be linked to create executables, .a or .so libraries. Read more about it here

How static libraries work? (C/C++)

I know how to use and create them, but I can't find a text about how it is implemented, how function call happens and so on, can someone help me with that information? Because I want to understand them, but not just know what is it and how it is working
As you may know, when you compile a source file you get an object file. Depending on your platform its extension may be .o or .obj or anything else. A static library is basically a collection of object files, kind of like a .zip file but probably not compressed. The linker, when trying to generate an executable tries to resolve the referenced symbols, i.e. locate in which object file (be it in a library or otherwise) they are defined and links them together. So, a static library may also contain an index of defined symbols in order to facilitate this. Exact implementation depends on the specific linker and library file format but the basic architecture is as mentioned.
You may want to check the italicized keywords in Wikipedia or something for more info on them.
I think wikipedia explains it well:
In computer science, a static library or statically-linked library is
a set of routines, external functions and variables which are resolved
in a caller at compile-time and copied into a target application by a
compiler, linker, or binder, producing an object file and a
stand-alone executable. This executable and the process of compiling
it are both known as a static build of the program. Historically,
libraries could only be static. Static libraries are either merged
with other static libraries and object files during building/linking
to form a single executable, or they may be loaded at run-time into
the address space of the loaded executable at a static memory offset
determined at compile-time/link-time.
A static library is purely a collection of .o files, put together in an archive that's something like a zip file (with no compression). When you use it for linking, the linker will search the library for .o files that provide any of the missing symbols in the main program, and pull in those .o files for linking, as if they had been included on the command line like .o files in your main program. This process is applied recursively, so if any of the .o files pulled in from the library have unresolved symbols, the library is searched again for other .o files that provide the definitions.

Should I create .a or .so when packaging my code as a library?

I have a software library and I used to create .a files, so that people can install them and link against them: g++ foo.o -L/path/to -llibrary
But now I often encounter third-party libraries where only .so files are available (instead of .a), and you just link against them without the -l switch, e.g. g++ foo.o /path/to/liblibrary.so.
What are the differences between these solutions? Should I prefer creating .so files for the users of my library?
Typically, libfoo.a is a static library, and libfoo.so is a shared library. You can use the same -L/-l linker options against either a static or shared. Or you can name the full path to the lib with static or shared. Often libraries are built both static and shared to provide application developers the choice of which they want.
All the code needed from a static lib is part of the final executable. This obviously makes it bigger, but it also means it's self-contained. Once it is compiled, you can run your app without the lib.
Code from a shared lib is not part of the executable. There are just some hooks in place to make the executable aware of the name of the lib it needs. In order to run your app, the shared lib has to be present in the lib search path (e.g. $LD_LIBRARY_PATH).
If you have two apps that share the same code, they can each link against a shared lib to keep the binary size down. If you want to upgrade parts of the app without rebuilding the whole thing, shared libs are good for that too.
Good overview of static, shared dynamic and loadable libraries at
http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html
Some features that aren't really called out from comments I've seen so far.
Static linkage (.a/.lib)
Sharing memory between these compilation units is generally ok because they should(?will) all be using the same runtime.
Static linkage means you avoid 'dll hell' but the cost is recompilation to make use of any change at all. static linkage into Shared libraries (.so) can lead to strange results if you have more than 1 such shared library used by the final executable - global variables may exist multiple times and which one is used and when they are initialised can cause an entirely different hell.
The library will be part of the shipped product but obfuscated and not directly usable.
Shared/Dynamic libraries (.so/.dll)
Sharing memory between these compilation units can be hazardous as they may choose to use different runtime. This can mean you provide different Shared/Dynamic libraries based on the debug/release or single/multi threaded or...
Shared libraries (.so) are less prone to 'dll hell' then Dynamic libraries (.dll) as they include options for quite specific versioning.
Compiling against a .so will capture version information internal to the file (hard to fake) so that you get quite specific .so usage. Compiling against the .lib/.dll only gives a basic file name, any versioning is done managed by the developer (using naming or manually loading the library and checking version details by hand)
The library will have to ship with the final product (somebody else can pick it up and use it)
But now I often encounter third-party libraries where only .so files are available [...] and you just link against them without the -l switch, e.g. g++ foo.o /path/to/liblibrary.so.
JFYI, if you link to a shared library which does not have a SONAME set (compare with readelf -a liblibrary.so), you will end up putting the specified path of liblibrary.so into your target object (executable or another shared library), and which is usually undesired, for users have their own ideas of where to put a program and its associated files. The preferred way is to use -L/path/to -llibrary, perhaps together with -Wl,-rpath,/whatever/path/to if this is the final path (such pathing decisions are made by Linux distributions for example).
Should I prefer creating .so files for the users of my library?
If you distribute source code, the user will make the particular choice.

What is inside .lib file of Static library, Statically linked dynamic library and dynamically linked dynamic library?

What is inside of a .lib file of Static library, Statically linked dynamic library and dynamically linked dynamic library?
How come there is no need for a .lib file in dynamically linked dynamic library and also that in static linking, the .lib file is nothing but a .obj file with all the methods. Is that correct?
For a static library, the .lib file contains all the code and data for the library. The linker then identifies the bits it needs and puts them in the final executable.
For a dynamic library, the .lib file contains a list of the exported functions and data elements from the library, and information about which DLL they came from. When the linker builds the final executable then if any of the functions or data elements from the library are used then the linker adds a reference to the DLL (causing it to be automatically loaded by Windows), and adds entries to the executable's import table so that a call to the function is redirected into that DLL.
You don't need a .lib file to use a dynamic library, but without one you cannot treat functions from the DLL as normal functions in your code. Instead you must manually call LoadLibrary to load the DLL (and FreeLibrary when you're done), and GetProcAddress to obtain the address of the function or data item in the DLL. You must then cast the returned address to an appropriate pointer-to-function in order to use it.
I found following answer from Hans also useful here.It clears the air that there could two types of lib files.
A LIB file is used to build your program, it only exists on your build
machine and you don't ship it. There are two kinds. A static link
library is a bag of .obj files, collected into a single file. The
linker picks any chunks of code from the file when it needs to resolve
an external identifier.
But more relevant to DLLs, a LIB file can also be an import library.
It is then a simple small file that includes the name of the DLL and a
list of all the functions exported by the DLL. You'll need to provide
it to the linker when you build a program that uses the DLL so it
knows that an external identifier is actually a function exported by
the DLL. The linker uses the import library to add entries to the
import table for the EXE. Which is then in turn used by Windows at
runtime to figure out what DLLs need to be loaded to run the program.
In a static library, the lib file contains the actual object code for the functions provided by the library. In the shared version (what you referred to as statically linked dynamic library), there is just enough code to establish the dynamic linkage at runtime.
I'm not sure about "dynamically linked dynamic libraries" (loaded programmatically). Do you even link with a .lib in that case?
Edit:
A bit late in coming, but no, you don't link a .lib. Well, you link to the lib with libraryloaderex in it. But for the actual library you're using, you provide your own bindings via C function pointers and loadlibrary fills those in.
Here's a summary:
Linking ǁ Static | DLL | LoadLibrary
=========ǁ===============|======================|===================
API code ǁ In your com- | In the DLL | In the DLL
lives ǁ piled program | |
---------ǁ---------------|----------------------|-------------------
Function ǁ Direct, may | Indirect via table | Indirect via your
calls ǁ be elided | filled automatically | own function ptrs
---------ǁ---------------|----------------------|-------------------
Burden ǁ Compiler | Compiler/OS | You/OS
A lib files is read by the linker and a dll file is used during execution. A lib file is essentially useless during execution and a linker is incapable of reading a dll file (except possibly in a manner irrelevant here).
The differences between the use of lib files for static and dynamic linking might be confusing but if you understand a little history then it becomes very clear.
Originally there were only static libraries. For a static library, the .lib file contains obj files. Each obj file is the output of one and only one compiler source code input file. A lib file is just a collection of related obj files, much like putting obj files in a directory. That is essentially what a lib file is, a library of obj files. For a static link, all of the obj files that an executable uses are combined into one file. Compare that to a dynamic link in which the executable is in a file separate from the other code it uses.
To implement dynamic linking, Microsoft modified the use of lib files such that they refer to a dll file instead of locations in an obj file. Other than that, all the information that is in a library for a static link is the same as for a dynamic link. They are all the same as far as the information in them except that a lib file for a dynamic link specifies the dll file.
In dll's are "things" like in an exe (there can be any kind of data, imports, exports, read/write/executable sections) but the difference is that an exe file exports only the entry point (function) but dll's export one/many functions.