What is the difference between .o, .a, and .so files? - c++

I know .o are object files, .a are static libraries and .so are dynamic libraries? What is their physical significance? When can I use some and when not?

.a is an "archive". Although an archive can contain any type of file, in the context of the GNU toolchain, it is a library of object files (other toolchains especially on Windows use .lib for the same purpose, but the format of these is not typically a general purpose archive, and often specific to the toolchain). It is possible to extract individual object files from an archive which is essentially what the linker does when it uses the library.
.o is an object file. This is code that is compiled to machine code but not (typically) fully linked - it may have unresolved references to symbols defined in other object files (in a library or individually) generated by separate compilation. Object files contain meta-data to support linking with other modules, and optionally also for source-level symbolic debugging (in GDB for example). Other toolchains, again typically on Windows, use the extension .obj rather than .o.
.so is a shared object library (or just shared library). This is dynamically linked to an executable when a program is launched rather then statically linked at build time. It allows smaller executables, and a single object library instance to be used by multiple executables. Operating system APIs are typically shared libraries, and they are often used also in GNU for licensing reasons to separate LGPL code from closed-source proprietary code for example (I am not a lawyer - I am making no claims regarding the legitimacy of this approach in any particular situation). Unlike .o or .a files, .so files used by an application must be available on the runtime system. Other systems (again typically Windows) use .dll (dynamic link library) for the same purpose.
It is perhaps useful to understand that .o files are linked before object code in .a files such that if a symbol resolution is satisfied by a .o file, any library implementation will not be linked - allowing you to essentially replace library implementations with your own, and also for library implementations to call user-defined code - for example a GUI framework might call an application entry-point.

Static libraries are archives that contain the object code for the library, when linked into an application that code is compiled into the executable.
Shared libraries are different in that they aren't compiled into the executable. Instead the dynamic linker searches some directories looking for the library(s) it needs, then loads that into memory. More then one executable can use the same shared library at the same time, thus reducing memory usage and executable size. However, there are then more files to distribute with the executable. You need to make sure that the library is installed onto the user's system somewhere where the linker can find it, static linking eliminates this problem but results in a larger executable file.

.so are shared library files.
.a are static library files.
You can statically link to .a libraries and dynamically link and load at runtime .so files, provided you compile and link that way.
.o are object files (they get compiled from *.c files and can be linked to create executables, .a or .so libraries. Read more about it here

Related

Linking and Loading of static library

My question is how exactly the linker works.
I am linking an executable with multiple third-party static libraries. Out of those static libraries, only a few of them are used by the executable. In the above case, does linker links only to the libraries whose functions are referenced in the executable?
If a static library has multiple object files and only one is used by executable, does it only links to that object file ? or its links to the whole static library but loads only the object file which is used?
For your first question if no symbols from a given library are used it will usually not be included in the final product.
Regarding object files the linker likely won't even include full object files but only symbols that are actually referenced, though your linker may have flags that change this behavior and cause the entire library to be included.
... how exactly the linker works.
a) ... does linker links only to the libraries whose functions are referenced
in the executable?
b) ... static library has multiple object files and only one is used by executable, does it only links to that object file ?
It depends ... on Linux there are two kinds of libraries ... ".so", and the .a (archive).
example:
/usr/lib/x86_64-linux-gnu/libgmpxx.a
/usr/lib/x86_64-linux-gnu/libgmpxx.so
If you specify the .a in the link portion of your build command, only the contained object files referenced by your app will be linked (not the whole library). This executable is 'stand-alone', and every copy running has its own copy of any functions it uses.
If you specify the .so in the link portion of your build command, and your app is the first to use a particular ".so" lib, I believe your app will be briefly suspended during its start-up while the WHOLE ".so" lib is loaded.
If you specify the .so in the link portion of your build command, and your app is not-the-first to use this particular .so, then the loader will add to your app a mapping to the already-loaded-'.so' in system memory. (a much faster connection)
Executable's using .so's rely on the system to have loaded the .so libraries into memory, and to memory-map the library into the app memory and complete the links of the app to the required functions.
I believe your 'static library' corresponds to the use of ".a" (archive) library.
a) yes - the linker (sometimes linking-loader) 'finishes' when there are no more unresolved references (to objects or functions).
b) yes - see a)

standalone static library (.a) with cmake

I need to provide an SDK with a static library. Let's call it "libsdk.a".
This library should hopefully be standalone, meaning a simple example "example.cpp" could link against it without any other library, except the system ones.
Here my configuration :
cmake for all my 10 dependencies libraries. There is a static library (.a) generated for each of my module. These libraries contain only object files .o of the given module. Dependency tree is not flat, some of them depends on others.
a simple example "example.cpp", with a cmake, which compiles and works. At this level, cmakes generates a very complex link command to deal with deps tree.
external dependencies such as boost (some static libraries too)
At the moment, I tried this :
make an archive of the different .a generated but it doesn't work because linking against this lib tells me the archive a no index (even after ranlib). Yet, I remembered I could add .a libraries inside .a files without problems.
extract all .o object (with ar -x) files from all *.a files and recreated a "libsdk.a" with all these object files. It doesn't work either (unresolved references). Moreover, it includes all objects even those which are not needed... My working example takes 3.7M. This library is around 35M.
make a .so shared library. It seems to work but I would prefer having a static library.
compile all statically but then linker complains it doesn't find -lgcc_s. Ok I want to compile in static but not that far, and just my own libs together !
So my final question : is there any way I can generate my static lib containing all my other libs, and not system ones ?
BTW, another interesting topic on that :
Combining static libraries
Thank you for any piece of advice to open my mind !
What you are trying to do by hand is the job of the linker. While it's feasible, you shouldn't bother with it.
When you compile libsdk.a, make sure that all of its dependencies are linked statically. If you do this, libsdk.a should be standalone. Static linking means copying the code to the right places in the final executable, so anything that is linked statically will not need to be provided in an external file.
See this post on CMake mailing list. libutils.cmake attached to the message has MERGE_STATIC_LIBS() macro that does the job. On Linux (and all other Unixes except OSX) it uses ar to pack and unpack objects files.

Should I create .a or .so when packaging my code as a library?

I have a software library and I used to create .a files, so that people can install them and link against them: g++ foo.o -L/path/to -llibrary
But now I often encounter third-party libraries where only .so files are available (instead of .a), and you just link against them without the -l switch, e.g. g++ foo.o /path/to/liblibrary.so.
What are the differences between these solutions? Should I prefer creating .so files for the users of my library?
Typically, libfoo.a is a static library, and libfoo.so is a shared library. You can use the same -L/-l linker options against either a static or shared. Or you can name the full path to the lib with static or shared. Often libraries are built both static and shared to provide application developers the choice of which they want.
All the code needed from a static lib is part of the final executable. This obviously makes it bigger, but it also means it's self-contained. Once it is compiled, you can run your app without the lib.
Code from a shared lib is not part of the executable. There are just some hooks in place to make the executable aware of the name of the lib it needs. In order to run your app, the shared lib has to be present in the lib search path (e.g. $LD_LIBRARY_PATH).
If you have two apps that share the same code, they can each link against a shared lib to keep the binary size down. If you want to upgrade parts of the app without rebuilding the whole thing, shared libs are good for that too.
Good overview of static, shared dynamic and loadable libraries at
http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html
Some features that aren't really called out from comments I've seen so far.
Static linkage (.a/.lib)
Sharing memory between these compilation units is generally ok because they should(?will) all be using the same runtime.
Static linkage means you avoid 'dll hell' but the cost is recompilation to make use of any change at all. static linkage into Shared libraries (.so) can lead to strange results if you have more than 1 such shared library used by the final executable - global variables may exist multiple times and which one is used and when they are initialised can cause an entirely different hell.
The library will be part of the shipped product but obfuscated and not directly usable.
Shared/Dynamic libraries (.so/.dll)
Sharing memory between these compilation units can be hazardous as they may choose to use different runtime. This can mean you provide different Shared/Dynamic libraries based on the debug/release or single/multi threaded or...
Shared libraries (.so) are less prone to 'dll hell' then Dynamic libraries (.dll) as they include options for quite specific versioning.
Compiling against a .so will capture version information internal to the file (hard to fake) so that you get quite specific .so usage. Compiling against the .lib/.dll only gives a basic file name, any versioning is done managed by the developer (using naming or manually loading the library and checking version details by hand)
The library will have to ship with the final product (somebody else can pick it up and use it)
But now I often encounter third-party libraries where only .so files are available [...] and you just link against them without the -l switch, e.g. g++ foo.o /path/to/liblibrary.so.
JFYI, if you link to a shared library which does not have a SONAME set (compare with readelf -a liblibrary.so), you will end up putting the specified path of liblibrary.so into your target object (executable or another shared library), and which is usually undesired, for users have their own ideas of where to put a program and its associated files. The preferred way is to use -L/path/to -llibrary, perhaps together with -Wl,-rpath,/whatever/path/to if this is the final path (such pathing decisions are made by Linux distributions for example).
Should I prefer creating .so files for the users of my library?
If you distribute source code, the user will make the particular choice.

static library, but I still need headers?

I have a bunch of projects that all could share a "common" static library of classes.
What confuses me is if I make a static library out of these classes and link against it in my projects that I still need the headers of the classes in the static library in my main projects.
What is the benefit of the static library then?
How do companies like Adobe deal with this?
Static libraries allow you to create a library and use that library in many projects.
The need for header files:
Since the project using the library is programmed and compiled independent of the library, that program needs to know the declaration of the things you're using. Otherwise how would your compiler know you're writing valid code?
A compiler only takes source code as input and produces output. It does not deal with compiled object files or static libraries on input.
The need for linking in the library:
So having the headers allows you to write valid code in your project, but when it comes to link time you'll need to provide the definition which is contained inside the static library.
The linker takes all object files (compiled code) and also all static libraries and produces an executable or binary.
More info about static libraries (benefits, comparing dynamic, etc...):
Amongst other things, it is nice to separate your project into libraries so that you don't end up with 1 huge monolithic project.
You do not need to distribute the source code (typically in the .cpp files) this way.
If you were to simply include all the .cpp files in every project that used the common library then you would have to compile the .cpp files each time.
An advantage of static libraries over dynamic libraries is that you can always be sure that your programs will be self contained and that they are using the correct version of the library (since they are compiled into the executable itself). You will also have a slight speed advantage over dynamic linking.
Disadvantages of static libraries over dynamic libraries include that your file sizes will be bigger because each executable needs their own copy, and that you can't swap out a different version of the library since it's not dynamically loaded.
Re your question: How do companies deal with this:
A typical company will make use of both static and dynamic libraries extensively.
The typical way you make use of a static library is to have a target in your Makefile (or whatever build system you use) that installs the headers into an appropriate location at the same time that it installs the library.
So, your static library ends up in /usr/local/lib, and the headers go into /usr/local/include or wherever.
Also, when compared with linking against object files, linking against static library may result is a smaller final executable. The reason for this is, if you don't call any of the functions from a particular object file (included in the static library), the linker will not include the code for those functions in you final executable. See Extraneous Library Linkage

Visual Studio: What exactly are lib files (used for)?

I'm learning C++ and came across those *.lib files that are obviously used by the linker. I had to set some additional dependencies for OpenGL.
What exactly are library files in this context used for?
What are their contents?
How are they generated?
Is there anything else worth knowing about them?
Or are they just nothing more than relocateable object code similiar to *.obj files?
In simple terms, yes - .lib files are just a collection of .obj files.
There is a slight complication on Windows that you can have two classes of lib files.
Static lib files essentially contain a collection of .obj and are linked with your program to provide all the functions inside the .lib. They are mainly a convenience to save you having as many files to deal with.
There are also stub .lib which provide just the definitions of functions which are contained in a .dll file.
The .lib file is used at compile time to tell the compiler what to expect from the function, but the code is loaded at run time from the dll.
.lib files are "libraries" and contain "collections" of compiled code so-to-speak. So it is a way to provide software components, without giving away the internal source-code for example. They can be generated as "output" of a "build" just like executables are.
The specific contents depend on your platform / development environment, but they will contain symbols for the linker to "hook up" function-calls provided by e.g. the header-file of the library.
Some libraries are "dynamic" (.DLL's on Windows), which means the "hooking" of function-calls is setup when the executable using the library is loaded, allowing the library implementation to be changed without rebuilding the executable.
One last thing. You say you're learning C++, and a common confusing point is, that "symbols" generated by C++ compilers are "mangled" (in order to allow e.g. function overloading), and this "mangling" is not standardized across different compilers, so libraries often resort to C for the "API" of the library (just like OpenGL), even though the library may be implemented in C++ internally.
I hope shed some light on .lib-files. Happy OpenGL coding :-)
What exactly are library files in this
context used for?
They are compiled and linked code just like your executable. They're called static libraries that other programs can link to at compile time. In the case of OpenGL, you link to their libraries to build an executable that can run OpenGL code. Dynamic libraries (DLLs) are another type of library that executables link against, except at runtime.
What are their contents?
Static libs contain linked object code just like an exe. The *.obj files are the object code that the compiler generates for the linker.
How are they generated?
When the compiler creates the object files, it passes the work to the linker. You can create them in your development environment, just like executables.
Is there anything else worth knowing
about them?
Yeah, they're used everywhere, so it doesn't hurt to get used to them.