Benefits of splitting project into executable and libraries - build

I sometimes observe that big projects are split into dynamic libraries and an executable.
The libraries are ad-hoc - they contain functionality that is only required by this executable. They also reside in the same repository and build by the same build pipeline as the executable. From my point of view this approach creates additional trouble since we need to deploy not only executable but also libraries. So the question is why it is done this way? Why not just statically link everything and produce single executable?

So the question is why it is done this way? Why not just statically link everything and produce single executable?
There are a few possible reasons:
If the project is sufficiently large, it may not be possible to link code into a single executable on x86_64 or i686 platform (the default small memory model limits a single binary to 2GiB or .text and .data),
Even if the binary links fine as a single static executable, it may be much faster to rebuild a shared library. If the ABI didn't change (e.g. a small fix to internal implementation detail), then relinking the full executable is unnecessary if shared library is used. This can greatly speed up edit/build/test cycle.
This may also be solved by using a faster linker (e.g. Gold was significantly faster than BFD ld, and lld is faster still). But the project may have been split before Gold and lld became available, or it may use a platform to which faster linkers have not been ported.
Even when neither of the two reasons above applies, it may still be desirable to maintain API separation between a given library and its clients if the library is maintained by a different sub-team. The less of the implementation is exposed, the fewer chances there are to misuse the API or introduce unwanted dependencies on the current implementation, and shared libraries allow maintainers to hide much of the internals via symbol visibility.

Related

How to manage compilation of C++ header-only libraries across shared objects

I'm developing a large software package consisting of many packages which are compiled to shared objects. For performance reasons, I want to compile Eigen 3 (a header-only library) with vector instructions, but the templated methods are being compiled all over the place. How can I ensure that the Eigen functions are compiled into a specific object file?
This software consists of ~2000 individual packages. To keep development going at a reasonable pace, the recommended way of compiling the program is to sparsely check out some of the packages and compile them, after which the program can be executed using precompiled (by some CI system) shared libraries.
The problem is that part of my responsibility is to optimise the CPU time of the program. In order to do so, I wanted to compile the package I am working on (let's call it A.so) with the -march flag so Eigen can exploit modern SIMD processor extensions.
Unfortunately, because Eigen is a header-only library, the Eigen functions are compiled into many different shared objects. For example, one of the most CPU intensive methods called in A.so is the matrix multiplaction kernel which is compiled in B.so. Many other Eigen functions are compiled into C.so, D.so, etc. Since these objects are compiled for older, more widely implemented instruction set extensions, they are not compiled with AVX, AVX2, etc.
Of course, one possible solution is to include packages B, C, D, etc. into my own sparse compilation but this negates the advantage of compiling only a part of the project. In addition, it leaves me including ever more and more packages if I really want to vectorise all linear algebra operations in the code of package A.
What I am looking for is a way to compile all the Eigen functions that package A uses into A.so, as if the Eigen functions were defined with the static keyword. Is this possible? Is there some mechanism in the compiler/linker that I can leverage to make this happen?
One obvious solution is to hide these symbols. This happens (if I understand the problem properly) because these functions are exported and can be used by other subsequently loaded libraries.
When you build your library and link against the other libraries, the linker reuses what it can. And the old packages as well. I hope you don't require these libraries for your own build?
So two options:
Force the loading of A before the other libraries (but if you need the other libraries, I don't think this is doable),
Tell the linker that these functions should not be visible by other libraries (visibility=hidden by default).
I saw something similar happening with a badly compiled 3rd-party library. It was built in debug mode, shipped in the product, and all of a sudden one of our libraries experienced a slow down. The map files identified where the culprit debug function came from, as it exported all its symbols by default.
An alternative way to change visibility without modifying the code is to filter symbols during linking stage using version scripts -> https://sourceware.org/binutils/docs/ld/VERSION.html. You'll need something like
{
global: *;
local:
extern "C++"
{
Eigen::*;
*Eigen::internal::*;
};
};

Linux: C/C++ standard library static vs dynamic linking [duplicate]

This question already has answers here:
Static linking vs dynamic linking
(16 answers)
Closed 9 years ago.
Probably on any OS it is possible to compile C++/C standard library statically or dynamically. On Windows I prefer static builds always, because it helps to avoid "dll hell" problem with different versions of libraries installed or not installed on specific Windows version, edition and service pack, etc. Static linking makes software more portable and less dependent on what end user did with his operating system (I even saw examples when end user could make SHIFT+DEL on some DLLs in system32, he couldn't explain why, or when users claim that my app contains virus because it tried to download dynamically linked prerequisites from official Microsoft website...) So, on Windows static linking is usually better than dynamic one in my experience.
However, I am new to Linux, so can anyone share his experience? My question is: what kind of linking (dynamic or static) is preffered on Linux if we ignore the fact that dynamic one allows to save memory & hard drive space and if we plan to distribute software with automated install program (hard drive space and memory are cheap enough now, so there are not reason to sacrifice hours of working time required to create really good and portable installer to win some megabytes of RAM or hard drive space). Are there any Linux-specific issues with dynamic/static linking?
On Linux you normally have a package manager that ensures you only have one version of libraries installed. So there normally is no dll hell and no problem with linking dynamically. Linking dynamically is the standard way on Linux.
I'd say the answer depends on how you distribute the software.
If you package the software for a specific Linux distribution and version dynamic linking is usually preferred. You know which libraries to find on the system and you can specify dependencies.
However, if you want to distribute the software as a Linux binary that runs on "any" system (such as various games or software like Matlab for example) you will end up with the same dll (or .so) hell problem as on windows. You don't know which versions of which libraries are on the system. Thus, you will have to provide your own .so files or link statically.
See the whole point in using dynamic linking is to reduce the size of executables and memory usage.If you neglect that there is too less to talk about.
On the other hand you mentioned about saving memory and disk space.It is necessary to save disk space because when you want to export your app/program, you can't put a 2Gb app on the internet for download(for example openCV library is about 2.1GB). The solution is to dynamically link them and load only those modules which are necessary to you.This enables efficient multitasking also(creates just one copy of the module and the whole program uses the same copy).
peculiarly:
For example, a media player application might originally be shipped
with a codec that
supports the mp3 file format. If the media player were statically linked it would not
be possible to dynamically update it to support a different file format, without
replacing the entire application. Dynamic linking means that a new version of the
shared library containing a more up-to-date codec, which includes some enhancements
and bug fixes, could be dynamically loaded by a dynamic linker into memory at run-time
to replace the original shared library. A shared library can also be shared by more than one application. For example, two
different media players could both use the same shared library containing the same
codec. This potentially means that the device running the application requires less
physical memory, depending on the size of the dynamic linker.
third, in linux everything is dynamically linked except for the /bin/ash.static whic also has its dynamic version /bin/ash but this shouldn't stop you from static linking in linux.
when using gcc the linking is by default dynamic.I guess you should use the "-static" flag to statically link the libraries
#Vitaliy good that you brought this up.The important thing to note here is that Smart linking and the creation of shared (or dynamic) libraries are mutually exclusive, that is, if you turn on smart linking, then the creation of shared libraries is turned of.
smart linking breaks the code into small code blocks and their dependencies are loaded.
So if you are calling a dependency multiple times, it gets loaded multiple times.
This gives a very good execution time but very high compilation time especially for large units.So there is a certain trade-off.

Merge Mach-O executable with a static lib?

Suppose you have
a pre-built iOS executable app (for simulator or device).
a pre-built static archive library static library which among other things contains c++ static initializers.
Now it should be possible to merge the two built products to produce the a new iOS executable which is like the old one, except that it is now also linked with the additional static library, and on execution will run the static library's static initializers.
Which tool (if any) could help solve this merge problem?
Edit: An acceptable solution is also to dynamically load the library using dlopen. The whole purpose of this is for application testing, so the re-linked app will never see app store.
How a compiler work (in a simple explanation)
The most popular C++ compilers (like say, GCC), work by translating all the C++ (and Obj-C, C, etc...) code to ASM.
Then it calls the appropriate assembler for the target processor, and create the object binaries.
Then it calls the linker, that search on those binaries for the symbols that explain what links with what. A common optimisation that linkers can do, is also strip of the final binary anything from the statically linked libraries that was not used, other common optimisation is not attempt to link at all unused libraries.
Also finally, the linker removes the things that only it needed.
What this mean in your case
You have a library, the library has the linking symbols. You also has a executable, that one had its linking symbols stripped, in fact depending on how it was optimised the internal jumps might be only a couple of jmp instructions to arbitrary addresses on the code. No machine, can do what you want in a automatic manner, because you don't have the needed information on the executable.
How to do it anyway
You need to disassemble the executable, figure on your own where are the function calls, and then manually reassemble it with your library, changing those functions call to jump to addresses in your library instead.
This process is sometimes used by game moders to change the video drivers of old games (for example to update their OpenGL version, or to force Glide games to use some newer drivers, and so on).
So if you want to do that anyway (I warn you: it is absurdly crazy to do though...) ask those guys :) I don't remember right now anyone to point to you, but they exist.
Analogy
When you are in normal linking phase, the compiled object files are like a source code that the machine understands, full of function calls as needed.
After it is compiled, all function calls became goto.
So if you are a linker tasked in doing what you want to do, imagine that you would be reading a source code filled with goto to random places in the code (sometimes even to inside loops) and that you have to somehow figure what ones of those you want to change to jump to the new part you are trying to paste there.

C++ application - should I use static or dynamic linking for the libraries?

I am going to start a new C++ project that will rely on a series of libraries, including part of the Boost libraries, the log4cxx or the google logging library - and as the project evolves other ones as well (which I can not yet anticipate).
It will have to run on both 32 and 64 bit systems, most probably in a quite diverse Linux environment where I do not expect to have all the required libraries available nor su privileges.
My question is, should I build my application by dynamically or statically linking to all these libraries?
Notes:
(1) I am aware the static linking might be a pain during development (longer compile times, cross-compiling for both 32 and 64 bit, going down dependency chains to include all libraries, etc), but it's a lot easier during testing - just move the file and run.
(2) On the other hand, dynamic linking seams easier during development phase - short compile times, (don't really know how to handle dynamic linking to 64 bit libraries from my 32 bit dev environment), no hustle with dependency chains. Deployment of new versions on the other hand can be ugly - especially when new libraries are required (see condition above of not having su rights on the targeted machines, nor these libraries available).
(3) I've read the related questions regarding this topic but couldn't really figure out which approach would best fit my scenario.
Conclusions:
Thank you all for your input!
I will probably go with static linking because:
Easier deployment
Predictable performance and more consistent results during perf. testing (look at this paper: http://www.inf.usi.ch/faculty/hauswirth/publications/CU-CS-1042-08.pdf)
As pointed out, the size and duration of compilation of static vs. dynamic does not seem to be such a huge difference
Easier and faster test cycles
I can keep all the dev. cycle on my dev. machine
Static linking has a bad rap. We have huge hard drives these days, and extraordinarily fat pipes. Many of the old arguments in favor of dynamic linking are way less important now.
Plus, there is one really good reason to prefer static linking on Linux: The plethora of platform configurations out there make it almost impossible to guarantee your executable will work across even a small fraction of them without static linking.
I suspect this will not be a popular opinion. Fine. But I have 11 years experience deploying applications on Linux, and until something like LSB really takes off and really extends it's reach, Linux will continue to be much more difficult to deploy applications on. Until then, statically link your application, if you have to run across a wide range of platforms.
I would probably use dynamic linking during (most of) development, and then change over to static linking for the final phases of development and (all of) deployment. Fortunately, there's little need for extra testing when switching from dynamic to static linkage of the libraries.
This is another vote for static linking. I haven't noticed significantly longer linking times for out application. The app in question is a ~50K line console app, with multiple libraries that is compiled for a bunch of out of the ordinary machines, mostly supercomputers with 100-10,000 cores. With static linking, you know exactly what libraries you are going to be using, can easily test out new versions of them.
In general, this is the way that most Mac apps are built. It is what allows installation to be simply copying a directory onto the system.
Best is to leave that up to the packager and provide both options in the configure/make scripts. Usually dynamic linking would have the preference since then it would be easy to upgrade the libraries when necessary, i.e. when security vulnerabilities, etc. are discovered.
Note that if you do not have root privileges to install the libraries in the system directories you can compile the program such that it will first look elsewhere for any needed dynamic libraries, this is accomplished by setting the runpath directive in ELF binaries. You can specify such a directory with the -rpath option of the linker ld.

Where should I be using a static library in C++

What are the use cases of using static libraries in C++? I have seen that people create DLLs instead or some that use static libraries only. Whats your recommendation?
I'm a big fan of static libraries pretty much everywhere. The one big thing that DLLs get you that static libs cannot do is the ability to dynamically load and unload library functionality. So if your application is going to support some sort of hot swapping plugins, you need to use dynamic libs. Otherwise you can probably use static libs.
Static libs open the door to a lot of optimizations that you can't do with dynamic libs because they are performed at link-time. In the microsoft world Link Time Code Generation (LTCG) give you the ability to do whole program optimization and dead code stripping through not only your application, but also your libraries (in gcc this is called Link Time Optimization [LTO])
Additionally static libs tend to make your program easier to distribute because you aren't forced to pass around a lot of library files, and you can completely avoid DLL-hell if you ever were to version your library.
You should use shared libraries (DLL) if you have a significant functionality that needs to be shared between applications; AND this functionality may be improved independant of all the application and updates shipped seprately.
The 'AND' part is the hardest to fulfill: usually you ship your application with any new functionality added and never update the library without updating the application at the same time (I am not saying that never happens) but usually the two ship in lockstep.
Otherwise it is easier to just build normal libs and ship the application.
An Example of a good (I use the term loosely for example purposes) is DirectX. When a new version of DirectX is shipped (and the interface has not changed) you just need to update the DLL and all apllications that use DirectX get the benifit of the new version of the library. In reality it is not quite that simple but you get the idea.
In general, although there are always exceptions to the rule, I would say:
Advantages of DLLs
Less physical memory usage when running multiple instances of an application. (Copy on write optimisation of memory usage.)
Faster link times.
Smaller executables.
Better modularity.
Advantages of static libraries
Less virtual memory usage (and probably less physical memory usage) when running a single instance of an application.
Performance. Approximately 10% (more or less) improvement over DLLs, depending on your application.
Reliability. You tested your application against a specific version (or specific versions) of a library. An upgrade to a DLL could potentially break your application.
There is the advantage of not having to recompile your entire program if you make a change to a dynamically linked library. #Chris makes a good point about dll-hell but if it s a minor bug fix that doesn't affect the API, this can save you the recompilation.
There is a SO post that talks about Windows not being able to apply updates to your program if you statically link their libraries (link to come). Although i think you are more talking about statically linking your own modules.
Use static version of your libraries where you can. Use dynamic libraries where you need to (license, availability or plugin system).
I use static libraries to implement UML's "package" concept. All modules belonging to a package gets put into their own subdirectory, and I create an IDE subproject or makefile for that directory which builds a static library *.a file. Modern IDEs make it possible to work with your top-level package along with sub-packages within the same "workspace".
If a package (or a group of packages) can be deployed separately from the main executable, then I compile it into a shared library (*.so or *.dll) instead and consider it a "component" in UML jargon.
Well a Static DLL would be for holding huge libraries and also for using Multi-Os coode as i like to call it so it's able to be ran on Linux , Windows ...