How to manage compilation of C++ header-only libraries across shared objects - c++

I'm developing a large software package consisting of many packages which are compiled to shared objects. For performance reasons, I want to compile Eigen 3 (a header-only library) with vector instructions, but the templated methods are being compiled all over the place. How can I ensure that the Eigen functions are compiled into a specific object file?
This software consists of ~2000 individual packages. To keep development going at a reasonable pace, the recommended way of compiling the program is to sparsely check out some of the packages and compile them, after which the program can be executed using precompiled (by some CI system) shared libraries.
The problem is that part of my responsibility is to optimise the CPU time of the program. In order to do so, I wanted to compile the package I am working on (let's call it A.so) with the -march flag so Eigen can exploit modern SIMD processor extensions.
Unfortunately, because Eigen is a header-only library, the Eigen functions are compiled into many different shared objects. For example, one of the most CPU intensive methods called in A.so is the matrix multiplaction kernel which is compiled in B.so. Many other Eigen functions are compiled into C.so, D.so, etc. Since these objects are compiled for older, more widely implemented instruction set extensions, they are not compiled with AVX, AVX2, etc.
Of course, one possible solution is to include packages B, C, D, etc. into my own sparse compilation but this negates the advantage of compiling only a part of the project. In addition, it leaves me including ever more and more packages if I really want to vectorise all linear algebra operations in the code of package A.
What I am looking for is a way to compile all the Eigen functions that package A uses into A.so, as if the Eigen functions were defined with the static keyword. Is this possible? Is there some mechanism in the compiler/linker that I can leverage to make this happen?

One obvious solution is to hide these symbols. This happens (if I understand the problem properly) because these functions are exported and can be used by other subsequently loaded libraries.
When you build your library and link against the other libraries, the linker reuses what it can. And the old packages as well. I hope you don't require these libraries for your own build?
So two options:
Force the loading of A before the other libraries (but if you need the other libraries, I don't think this is doable),
Tell the linker that these functions should not be visible by other libraries (visibility=hidden by default).
I saw something similar happening with a badly compiled 3rd-party library. It was built in debug mode, shipped in the product, and all of a sudden one of our libraries experienced a slow down. The map files identified where the culprit debug function came from, as it exported all its symbols by default.

An alternative way to change visibility without modifying the code is to filter symbols during linking stage using version scripts -> https://sourceware.org/binutils/docs/ld/VERSION.html. You'll need something like
{
global: *;
local:
extern "C++"
{
Eigen::*;
*Eigen::internal::*;
};
};

Related

How to distribute C++20 modules?

All the literature about modules is quite recently new, and I am struggling with one core concept thing.
When I make my own modules, after the linkage process, does exists a conventional or accepted way of package those modules to distribute them as a library?
Broadly speaking, the products of building a module's interface (as distinct from the linker-products of compilation, like a static/shared library) are not sharable between compilers. At least not the way that compiled libraries for the same OS/platform are. Compiled module formats are compiler-specific and may not even be stable between versions of the same compiler.
As such, if you want to ship a pre-compiled library that was build using modules, then just like non-module builds, you will need to ship textual files that are used to consume that module. Specifically, you need all of the interface units for any modules built into that library. Implementation units need not be given, as their products are all in the compiled form of the library (unless they are implementation partitions included by interface units).
Perhaps in the future, compilers for the same platform will standardize a compiled module format, or even across platforms. But until then, you're going to have to keep shipping text with your pre-compiled libraries.

Benefits of splitting project into executable and libraries

I sometimes observe that big projects are split into dynamic libraries and an executable.
The libraries are ad-hoc - they contain functionality that is only required by this executable. They also reside in the same repository and build by the same build pipeline as the executable. From my point of view this approach creates additional trouble since we need to deploy not only executable but also libraries. So the question is why it is done this way? Why not just statically link everything and produce single executable?
So the question is why it is done this way? Why not just statically link everything and produce single executable?
There are a few possible reasons:
If the project is sufficiently large, it may not be possible to link code into a single executable on x86_64 or i686 platform (the default small memory model limits a single binary to 2GiB or .text and .data),
Even if the binary links fine as a single static executable, it may be much faster to rebuild a shared library. If the ABI didn't change (e.g. a small fix to internal implementation detail), then relinking the full executable is unnecessary if shared library is used. This can greatly speed up edit/build/test cycle.
This may also be solved by using a faster linker (e.g. Gold was significantly faster than BFD ld, and lld is faster still). But the project may have been split before Gold and lld became available, or it may use a platform to which faster linkers have not been ported.
Even when neither of the two reasons above applies, it may still be desirable to maintain API separation between a given library and its clients if the library is maintained by a different sub-team. The less of the implementation is exposed, the fewer chances there are to misuse the API or introduce unwanted dependencies on the current implementation, and shared libraries allow maintainers to hide much of the internals via symbol visibility.

Runtime dependency and build dependency concepts

I have been hearing about build dependency / runtime dependency. They are quite self explanatory terms. As far as I understand, build dependency is used for components required in the compile time. For example if A has a build dependency to B, A cannot be built without B. Runtime dependency on the other hand is dynamic. If A has a runtime dependency to B, A can be built without B but cannot run without B.
This information however is too shallow. I'd like to read and understand these concepts better. I have been googling but could not find a source, can you please provide me a link or right keywords to search?
I'll try to keep it simple and theoretical only.
When you write code that calls function "func", compiler needs your function descriptor (e.g. "int func(char c);" usually available in .h files) to verify arguments correctness and linker needs your function implementation (where your actual code reside).
Operating systems provide mechanism to separate functions implementation into different compiled modules. It is usually required for
Better code reuse (multiple applications can use the same code, with different data context)
More efficient compilation (you don't need to recompile all dependency libraries)
Partial upgrades
Distribution of compiled libraries, without disclosing the source code
To support such functionality compiler is provided with function descriptors (.h files) as usual. While Linker is provided with lib files containing function stubs. Operating system is responsible to load an actual implementation file during application loading procedure (if it is not yet loaded for different application) and to map actual functions into memory of the new application.
Dynamic load functionality is extended for object oriented languages as well (C++, C#, Java and etc.)
Practical implementations are OS dependent - dynamic linking is implemented as DLL files in Windows or as SO files in Linux
Special OS dependent techniques can be used to share context (variables, objects) between different applications that uses the same dynamic library.
Meir Tseitlin

Merge Mach-O executable with a static lib?

Suppose you have
a pre-built iOS executable app (for simulator or device).
a pre-built static archive library static library which among other things contains c++ static initializers.
Now it should be possible to merge the two built products to produce the a new iOS executable which is like the old one, except that it is now also linked with the additional static library, and on execution will run the static library's static initializers.
Which tool (if any) could help solve this merge problem?
Edit: An acceptable solution is also to dynamically load the library using dlopen. The whole purpose of this is for application testing, so the re-linked app will never see app store.
How a compiler work (in a simple explanation)
The most popular C++ compilers (like say, GCC), work by translating all the C++ (and Obj-C, C, etc...) code to ASM.
Then it calls the appropriate assembler for the target processor, and create the object binaries.
Then it calls the linker, that search on those binaries for the symbols that explain what links with what. A common optimisation that linkers can do, is also strip of the final binary anything from the statically linked libraries that was not used, other common optimisation is not attempt to link at all unused libraries.
Also finally, the linker removes the things that only it needed.
What this mean in your case
You have a library, the library has the linking symbols. You also has a executable, that one had its linking symbols stripped, in fact depending on how it was optimised the internal jumps might be only a couple of jmp instructions to arbitrary addresses on the code. No machine, can do what you want in a automatic manner, because you don't have the needed information on the executable.
How to do it anyway
You need to disassemble the executable, figure on your own where are the function calls, and then manually reassemble it with your library, changing those functions call to jump to addresses in your library instead.
This process is sometimes used by game moders to change the video drivers of old games (for example to update their OpenGL version, or to force Glide games to use some newer drivers, and so on).
So if you want to do that anyway (I warn you: it is absurdly crazy to do though...) ask those guys :) I don't remember right now anyone to point to you, but they exist.
Analogy
When you are in normal linking phase, the compiled object files are like a source code that the machine understands, full of function calls as needed.
After it is compiled, all function calls became goto.
So if you are a linker tasked in doing what you want to do, imagine that you would be reading a source code filled with goto to random places in the code (sometimes even to inside loops) and that you have to somehow figure what ones of those you want to change to jump to the new part you are trying to paste there.

Where should I be using a static library in C++

What are the use cases of using static libraries in C++? I have seen that people create DLLs instead or some that use static libraries only. Whats your recommendation?
I'm a big fan of static libraries pretty much everywhere. The one big thing that DLLs get you that static libs cannot do is the ability to dynamically load and unload library functionality. So if your application is going to support some sort of hot swapping plugins, you need to use dynamic libs. Otherwise you can probably use static libs.
Static libs open the door to a lot of optimizations that you can't do with dynamic libs because they are performed at link-time. In the microsoft world Link Time Code Generation (LTCG) give you the ability to do whole program optimization and dead code stripping through not only your application, but also your libraries (in gcc this is called Link Time Optimization [LTO])
Additionally static libs tend to make your program easier to distribute because you aren't forced to pass around a lot of library files, and you can completely avoid DLL-hell if you ever were to version your library.
You should use shared libraries (DLL) if you have a significant functionality that needs to be shared between applications; AND this functionality may be improved independant of all the application and updates shipped seprately.
The 'AND' part is the hardest to fulfill: usually you ship your application with any new functionality added and never update the library without updating the application at the same time (I am not saying that never happens) but usually the two ship in lockstep.
Otherwise it is easier to just build normal libs and ship the application.
An Example of a good (I use the term loosely for example purposes) is DirectX. When a new version of DirectX is shipped (and the interface has not changed) you just need to update the DLL and all apllications that use DirectX get the benifit of the new version of the library. In reality it is not quite that simple but you get the idea.
In general, although there are always exceptions to the rule, I would say:
Advantages of DLLs
Less physical memory usage when running multiple instances of an application. (Copy on write optimisation of memory usage.)
Faster link times.
Smaller executables.
Better modularity.
Advantages of static libraries
Less virtual memory usage (and probably less physical memory usage) when running a single instance of an application.
Performance. Approximately 10% (more or less) improvement over DLLs, depending on your application.
Reliability. You tested your application against a specific version (or specific versions) of a library. An upgrade to a DLL could potentially break your application.
There is the advantage of not having to recompile your entire program if you make a change to a dynamically linked library. #Chris makes a good point about dll-hell but if it s a minor bug fix that doesn't affect the API, this can save you the recompilation.
There is a SO post that talks about Windows not being able to apply updates to your program if you statically link their libraries (link to come). Although i think you are more talking about statically linking your own modules.
Use static version of your libraries where you can. Use dynamic libraries where you need to (license, availability or plugin system).
I use static libraries to implement UML's "package" concept. All modules belonging to a package gets put into their own subdirectory, and I create an IDE subproject or makefile for that directory which builds a static library *.a file. Modern IDEs make it possible to work with your top-level package along with sub-packages within the same "workspace".
If a package (or a group of packages) can be deployed separately from the main executable, then I compile it into a shared library (*.so or *.dll) instead and consider it a "component" in UML jargon.
Well a Static DLL would be for holding huge libraries and also for using Multi-Os coode as i like to call it so it's able to be ran on Linux , Windows ...