How static libraries work? (C/C++) - c++

I know how to use and create them, but I can't find a text about how it is implemented, how function call happens and so on, can someone help me with that information? Because I want to understand them, but not just know what is it and how it is working

As you may know, when you compile a source file you get an object file. Depending on your platform its extension may be .o or .obj or anything else. A static library is basically a collection of object files, kind of like a .zip file but probably not compressed. The linker, when trying to generate an executable tries to resolve the referenced symbols, i.e. locate in which object file (be it in a library or otherwise) they are defined and links them together. So, a static library may also contain an index of defined symbols in order to facilitate this. Exact implementation depends on the specific linker and library file format but the basic architecture is as mentioned.
You may want to check the italicized keywords in Wikipedia or something for more info on them.

I think wikipedia explains it well:
In computer science, a static library or statically-linked library is
a set of routines, external functions and variables which are resolved
in a caller at compile-time and copied into a target application by a
compiler, linker, or binder, producing an object file and a
stand-alone executable. This executable and the process of compiling
it are both known as a static build of the program. Historically,
libraries could only be static. Static libraries are either merged
with other static libraries and object files during building/linking
to form a single executable, or they may be loaded at run-time into
the address space of the loaded executable at a static memory offset
determined at compile-time/link-time.

A static library is purely a collection of .o files, put together in an archive that's something like a zip file (with no compression). When you use it for linking, the linker will search the library for .o files that provide any of the missing symbols in the main program, and pull in those .o files for linking, as if they had been included on the command line like .o files in your main program. This process is applied recursively, so if any of the .o files pulled in from the library have unresolved symbols, the library is searched again for other .o files that provide the definitions.

Related

Linking and Loading of static library

My question is how exactly the linker works.
I am linking an executable with multiple third-party static libraries. Out of those static libraries, only a few of them are used by the executable. In the above case, does linker links only to the libraries whose functions are referenced in the executable?
If a static library has multiple object files and only one is used by executable, does it only links to that object file ? or its links to the whole static library but loads only the object file which is used?
For your first question if no symbols from a given library are used it will usually not be included in the final product.
Regarding object files the linker likely won't even include full object files but only symbols that are actually referenced, though your linker may have flags that change this behavior and cause the entire library to be included.
... how exactly the linker works.
a) ... does linker links only to the libraries whose functions are referenced
in the executable?
b) ... static library has multiple object files and only one is used by executable, does it only links to that object file ?
It depends ... on Linux there are two kinds of libraries ... ".so", and the .a (archive).
example:
/usr/lib/x86_64-linux-gnu/libgmpxx.a
/usr/lib/x86_64-linux-gnu/libgmpxx.so
If you specify the .a in the link portion of your build command, only the contained object files referenced by your app will be linked (not the whole library). This executable is 'stand-alone', and every copy running has its own copy of any functions it uses.
If you specify the .so in the link portion of your build command, and your app is the first to use a particular ".so" lib, I believe your app will be briefly suspended during its start-up while the WHOLE ".so" lib is loaded.
If you specify the .so in the link portion of your build command, and your app is not-the-first to use this particular .so, then the loader will add to your app a mapping to the already-loaded-'.so' in system memory. (a much faster connection)
Executable's using .so's rely on the system to have loaded the .so libraries into memory, and to memory-map the library into the app memory and complete the links of the app to the required functions.
I believe your 'static library' corresponds to the use of ".a" (archive) library.
a) yes - the linker (sometimes linking-loader) 'finishes' when there are no more unresolved references (to objects or functions).
b) yes - see a)

How does a linker produce a library? What are the contents of that library?

Referring to this answer: https://stackoverflow.com/a/6264256/5324086,
I found that a linker has even more functionality than just managing absolute addresses for object file symbols.
What does the library produced by linker contain? Is it something other than ... say a C Standard library?
Why does the linker even need to produce a library?
The exact details depend on the type of library (you can search for shared library formats) but the basic components will include the compiled code, plus a symbol table that tells the linker which address corresponds to each name. Note that this is very similar to an object file. Static libraries are basically archives of object files and the compiler links them in a similar way. With dynamic libraries, the OS can look this up whenever it loads a program, and link the symbols then. They won't generally have the same absolute addresses in every program's address space, so these addresses will be relative to where the OS loads the library.
The C standard library (MSVC runtime on Windows) is an example of a library.
Static libraries are just a collection of object files. You can think of them as a tar file containing all the relevant .a files (or, on Windows, as a zip file containing obj files). The linking part of the linker is not involved here (in facts traditionally static libraries on Unix systems are done with the ar utility, which is somehow related to tar). They are completely resolved at compile time, and they are simply used as a way to avoid rebuilding all the time stuff that is long to build or has complex build procedures.
Dynamic libraries are a different beast. They are fully fledged executables that can be loaded by other processes, so the regular linker is needed for the same reasons it is used in normal executables. Instead of providing just a single entrypoint, they export a full symbols table that is used by the loader (or "runtime linker") to allow the host program to locate the required procedures. Generally they also contain relocation information to allow loading at any address in the target address space (or they are compiled in position independent code for this same reason).

What is the difference between .o, .a, and .so files?

I know .o are object files, .a are static libraries and .so are dynamic libraries? What is their physical significance? When can I use some and when not?
.a is an "archive". Although an archive can contain any type of file, in the context of the GNU toolchain, it is a library of object files (other toolchains especially on Windows use .lib for the same purpose, but the format of these is not typically a general purpose archive, and often specific to the toolchain). It is possible to extract individual object files from an archive which is essentially what the linker does when it uses the library.
.o is an object file. This is code that is compiled to machine code but not (typically) fully linked - it may have unresolved references to symbols defined in other object files (in a library or individually) generated by separate compilation. Object files contain meta-data to support linking with other modules, and optionally also for source-level symbolic debugging (in GDB for example). Other toolchains, again typically on Windows, use the extension .obj rather than .o.
.so is a shared object library (or just shared library). This is dynamically linked to an executable when a program is launched rather then statically linked at build time. It allows smaller executables, and a single object library instance to be used by multiple executables. Operating system APIs are typically shared libraries, and they are often used also in GNU for licensing reasons to separate LGPL code from closed-source proprietary code for example (I am not a lawyer - I am making no claims regarding the legitimacy of this approach in any particular situation). Unlike .o or .a files, .so files used by an application must be available on the runtime system. Other systems (again typically Windows) use .dll (dynamic link library) for the same purpose.
It is perhaps useful to understand that .o files are linked before object code in .a files such that if a symbol resolution is satisfied by a .o file, any library implementation will not be linked - allowing you to essentially replace library implementations with your own, and also for library implementations to call user-defined code - for example a GUI framework might call an application entry-point.
Static libraries are archives that contain the object code for the library, when linked into an application that code is compiled into the executable.
Shared libraries are different in that they aren't compiled into the executable. Instead the dynamic linker searches some directories looking for the library(s) it needs, then loads that into memory. More then one executable can use the same shared library at the same time, thus reducing memory usage and executable size. However, there are then more files to distribute with the executable. You need to make sure that the library is installed onto the user's system somewhere where the linker can find it, static linking eliminates this problem but results in a larger executable file.
.so are shared library files.
.a are static library files.
You can statically link to .a libraries and dynamically link and load at runtime .so files, provided you compile and link that way.
.o are object files (they get compiled from *.c files and can be linked to create executables, .a or .so libraries. Read more about it here

C/C++. Advantages of libraries over combined object files

While it is commonplace to combine multiple object files in a library, it is possible (at least in Linux) to combine multiple object files into another object file.
(See combine two GCC compiled .o object files into a third .o file)
As there are downsides to using libraries instead of just combined object files:
1: It's easier to work with only one type of file (object) when linking, especially if all files do the same thing.
2: When linking (At least in GCC), libraries (by default) need to be ordered and can't handle cyclic dependencies.
I want to know what advantages there are to libraries (apart from the catch 22 that they're used lots).
After searching for a while, the only explanation I get seems to be that single libraries are better than multiple object files.
While it depends on the linker being used, object files are being included in the final binary in their entirety. So, if you combine several object files into one object file, then the resulting (combined) object file is included in the resultant binary.
In contrast, a library is just that, a library of object files. The linker will only pull the object files from the library that it needs to resolve all the symbolic links. If an object file (in the library) is not needed, then it is not included int the binary.
Generally library is better since linker can optimize out unused .o files in library.
Combining the files somehow has some advantages too:
If you combine all your source files into one then that tremendously increases compilation speed.
Sometimes in C++ the linker may optimize out .o file that you did not want to be optimized out. For example when you need side-effects of constructors of objects defined there but not used in any other translation unit.
If you use object files, then all the code in the object file is placed in your application. If you use libraries, then only the required code is.
One reason is that objects in a .a library will only be pulled in to satisfy undefined symbol references - so if you want to allow the possibility for the calling application to define a symbol, or use a default definition in the "library" code, it's possible to do this with a real library but not with a single .o file.

Can I have the gcc linker create a static library?

I have a library consisting of some 300 c++ files.
The program that consumes the library does not want to dynamically link to it. (For various reasons, but the best one is that some of the supported platforms do not support dynamic linking)
Then I use g++ and ar to create a static library (.a), this file contains all symbols of all those files, including ones that the library doesn't want to export.
I suspect linking the consuming program with this library takes an unnecessary long time, as all the .o files inside the .a still need to have their references resolved, and the linker has more symbols to process.
When creating a dynamic library (.dylib / .so) you can actually use a linker, which can resolve all intra-lib symbols, and export only those that the library wants to export. The result however can only be "linked" into the consuming program at runtime.
I would like to somehow get the benefits of dynamic linking, but use a static library.
If my google searches are correct in thinking this is indeed not possible, I would love to understand why this is not possible, as it seems like something that many c and c++ programs could benefit from.
Static libraries are just archives (hence ".a"), a collection of .o files. Like a tar archive, just even more plain. Since ar is not a linker, there is no conglomeration (as "ld -r" would do) and thus no intralibrary symbol elimination.
That's why shared libraries were invented in the first place, and they are pretty common now, so people just ignore the drawbacks of static libraries. They simply go by "it compiles? ship it.".
I haven't tried or tested this, but it looks like ld's ability to perform incremental or partial linking might be what you're looking for. Check if the --relocatable option (you might also need to look at the -Ur option if dealing with C++) when applied to the object files that would go into the library will do what you want.
I think you should then be able to use the output of that operation as an object file (or have it in a static library itself) for your program's final link step.