Creating a Minimal Shared Library - c++

For background, I'm creating some C++ software that uses dynamically loaded shared library plugins for hardware output (the specifics of it aren't relevant here).
I'm building the executable by compiling everything into object files and then linking the ones needed, which is simple using an exclusion list. I can then build the shared library by specifying its primary object file (the one that's dynamically loaded and accessed at runtime) along with every other object file referenced by the primary one.
My question is this: Is there a way to provide the linker with the primary object file, and create a shared library containing only the objects it depends upon? All of the object files are in the same directory, I'm not using a Makefile (yet; if one could solve the problem, it's a valid answer), and compilation speed isn't an issue.
I've looked into the linker options --as-needed, --gc-sections, and --no-undefined, but I haven't been able to piece together a working build process.
Example: For source files main.cpp, a.cpp, b.cpp, a.h, and b.h, where main.cpp and a.cpp both include b.h:
gcc -fPIC -c *.cpp -I. builds object files main.o, a.o, and b.o.
gcc -o main.out *.o builds the final executable main.out from the object files... including a.o, which is unused. (--gc-sections should fix this.)
gcc -fPIC -shared -o a.so a.o -Wl,--as-needed !(a).o builds the final shared library a.so from all of the object files... including main.o, which is unused. How do I prevent main.o from being included in a.so?

Is there a way to provide the linker with the primary object file, and create a shared library containing only the objects it depends upon?
Yes: package all objects into an archive library liball.a, then link like this:
gcc -shared -o a.so a.o liball.a
The linker will then pull out from liball.a all objects that a.o depends on, and only these objects, as explained here.
Note: liball.a may contain a.o, there is no harm (as above link explains).
Update:
Is there a way to do it without needing to create an archive first?
I don't know of any portable way to do that. The Gold linker has --start-lib and --end-lib command line flags that achieve exactly that.

Related

C++ Creating a dynamic library using source, static archives and other dynamic libraries

In my use case, I have YAML-CPP, SQLite3, and my 'data.cpp' file that I want to all be combined into the same dynamic library, 'libdata.so'.
I first compiled yaml-cpp (as an archive):
mkdir -p "build"
cd "build"
cmake ..
make -j5
to get 'libyaml-cpp.a'.
I then compile sqlite3:
gcc -c -o libsqlite3.a sqlite3.c -lpthread -ldl
to get 'libsqlite3.a'. I know that this a C-based file, and there are differences between it and C++, but I've read that it shouldn't make too much difference here. I also know that I'm using -lpthread -ldl which is for dynamic loading, but I'm not sure how to get around it.
My question is: Can I compile my 'data.cpp' file with YAMP-CPP and SQLite3 such that they all exist in the same 'libdata.so' output file (where the linker will use the YAML-CPP and SQLite3 functions contained in 'libdata.so' when they're called by 'data.cpp')?
I have tried:
g++ -c -fPIC -o libdata.so \
-Wl,--whole-archive libsqlite3.a \
-Wl,--whole-archive libyaml-cpp.a \
-ldl -lpthread \
data.cpp
(for the sake of the snippet, all files reside in the same directory)
UPDATE
I added the suggestion from botje to the line and it helped in part. After more research, I found a few more pieces that progressed further:
gcc -DSQLITE_OMIT_LOAD_EXTENSION -c -fPIC -lpthread -o libsqlite3.a sqlite3.c
mkdir -p "build"
cd "build"
env CFLAGS='-fPIC' CXXFLAGS='-fPIC' cmake ..
make -j$(CORES)
cd ..
cp "build/libyaml-cpp.a" ./
g++ -shared -fPIC -o libdata.so \
-L./ \
-Wl,-Bdynamic data.cpp \
-Wl,-Bstatic -lsqlite3 -lyaml-cpp \
-Wl,-Bdynamic -lpthread
g++ -L./ -ldata -o tester tester.cpp
The library now compiles, however, when I try to link against it with 'tester.cpp', I get the error:
/usr/bin/ld: libdata.so: undefined reference to YAML::detail...
I'm guessing this may be a flag ordering issue, but I'm not sure what order it should be then. Placing the flags for SQLite3 and YAML-CPP before the data.cpp argument fails to compile the shared library.
After some more research, here's the method that worked for me (with extra verbosity):
# Compile SQLite3:
# - Disable the plugin loader (removes the libdl dependency)
# - Compile only (-c)
# - Use Position Independent Code (-fPIC)
# - Add the PThread library
# - After compilation, archive object (for completeness)
gcc -DSQLITE_OMIT_LOAD_EXTENSION -c -fPIC -pthread -o sqlite3.o sqlite3.c
# Compile YAML-CPP
# - Create (and enter) a build directory
# - Run CMAKE with -fPIC enabled
# - Run MAKE
# - Exit and copy archive from build directory
mkdir -p "build"
cd "build"
env CFLAGS='-fPIC' CXXFLAGS='-fPIC' cmake ..
make -j$(CORES)
cd ..
cp build/libyaml-cpp.a libyaml-cpp.a
# Compile Shared Library
# - Ensure shared (-shared) (also prevents looking for a 'main')
# - Use Position Independent Code (-fPIC)
# - Use current directory for locating libraries
# - Set target CPP file
# - STATICALLY link from SQLite3 and YAML-CPP archives
# - DYNAMICALLY link from PThread library (used by SQLite3 for thread-safe access)
g++ -shared -fPIC -o libdata.so data.cpp \
-L./ \
-Wl,-Bstatic -l:sqlite3.o -lyaml-cpp \
-Wl,-Bdynamic -pthread
# Compile Test Program
# - Specify current directory for includes and libraries
# - Link dynamically to 'libdata.so'
g++ -I./ -L./ -ldata -o tester tester.cpp
The last issue I encountered ended up being a missing include directory for YAML-CPP.
A couple of notes for credit:
#Botje: For pointing out that I need -shared and not -c in the compilation of a shared library. (libdata.so)
#Maxim Egorushkin: For linking to a very useful document on the matter.
One thing to note as well, is that when linking against a C library in a C++ program, you may need to use 'extern "C"' (as elaborated in the linked page). This is especially important when using the SQLite3 library.
Note that linking .a files into .so is rather unusual. People do that, but for wrong reasons.
When you link a .so, provide individual .o files compiled with -fPIC. Don't pack those .o files into .a first, that doesn't make much sense.
Why? Because .a file is merely a bunch of .o files. There is no point in making a .a file from a bunch of .o files just to turn that then into .so file.
To make a static library one builds .o files and packs them into .a. In fact, static library is a wrong name, technically, .a file is an archive (of .o files). Archives cannot link to other libraries they need because .o file cannot carry dependencies, neither can .a files.
To make a shared library one builds .o files with -fPIC option and links them into .so, along with any required libraries (static or shared). This is the .so file that carries dependency information on other .so files, .a archives are linked in.
When you build a .a that means you trade sharing code (in form of .so) for maximum execution efficiency (in the form of linking parts of .a into your executable directly). That means you build .o files without -fPIC option (it introduces extra access overhead) and bundle them into .a. Note, that .a file cannot refer to other libraries it needs (unlike .so), it is just a bunch of .o files. Static library .a is almost just a form to refer to multiple .o file. For local builds you should use thin archives that don't copy .o into .a rather refer to .o.
Also note, that when you link .a archive, only those .o files from the archive get linked into your executable (or shared library) that resolve currently unresolved symbols (unless --whole-archive). That means, if you have a global/namespace scope object with a constructor and link that into .so then it links in everything from the supplied object files and your global object constructor runs as expected. However, if you link in .a, the linker only pulls in those symbols/object-files that resolve currently undefined symbols, so that if your global object isn't referred to (possibly indirectly) from a file with main function, it won't be linked in and its constructor won't run.
For your purpose of building one .so from multiple 3rd-party libraries, you should compile those libraries' object files with -fPIC but not link them into .a. Then you link all those .o files into one .so file with all the libraries required by those comprising .o (either statically or dynamically).
With regards to -lpthread this is sadly a very common misconception perpetuated by POSIX standard wording being out of date.
In the old days there were two implementations of Pthreads API on Linux (and probably other systems): LinuxThreads and NPTL. POSIX standard merely says that if you want POSIX-compliant behaviour then link NPTL, not LinuxThreads and that is what that -lpthread linker option for. They fail to explain this reasoning or remove that sentence because it is woefully out of date.
Nowadays, modern Linux, and probably other systems, provide only the POSIX-compliant version. Hence, that -lpthread flag is obsolete, serves no purpose and isn't sufficient to build correct multi-threaded programs.
When you build multi-threaded programs you need to follow the documentation of your compiler. gcc and clang require using -pthread flag for both compiling and linking.

Link two libraries to each other

I have a function in libA.so that is used in libB.so,
And a function in libB.so that is used in libA.so!
So defenetly i can not compile none of these libraries.
How can I compile these two libraries?
Should i used third library and move the dependebcies to this library?
I used qt and c++
Updated:
in compile libA.so get error cannot find libB.so and in libB.so get error can not find libA.so
BIG FAT DISCLAIMER Only do this if absolutely necessary. The preferred way is to refactor your project structure such that it doesn't contain dependency cycles.
When producing a shared library, the linker in general does not need to know about other shared libraries. One can use them on the command line but this is optional. Example:
// libA.cpp
extern void funcB();
void funcA() {
funcB();
}
Compile and link:
g++ -fPIC -c libA.cpp
g++ -shared -o libA.so libA.o
funcB is supposed to live in libB.so but we are not telling the linker where to find it. The symbol is simply left undefined in libA.so, and will be (hopefully) resolved at load time.
// libB.cpp
extern void funcA();
void funcB() {
funcA();
}
Compile and link, now using libA.so explicitly (ignore the infinite recursion, it's just an example):
g++ -fPIC -c libB.cpp
g++ -shared -o libB.so libB.o -L/where/libA/is -lA
Now it is up to the executable to load libB.so before loading libA.so, otherwise libA.so cannot be loaded. It's easy to do so (just link the executable with only libB.so and not libA.so), but can be inconvenient at times. So one can re-link libA.so after building libB.so:
g++ -shared -o libA.so libA.o -L /where/libB/is -lB
Now one can link an executable to libA or libB and the other one will be picked up automatically.
This seems a bit problematic for future re-use, you might want to either separate your functions differnetly between those libraries or create a third one thatt contains all of the "tool" funtions to have LibA and libB function without one another .
I have a function in libA.so that is used in libB.so, And a function in libB.so that is used in libA.so!
This is wrong design. A library cannot, even indirectly, depend upon itself. Such a circularity is the symptom of something very wrong, and you are misunderstanding what a software library is (it is more than a random collection of functions, or of object files; it has to be somehow a "software module" and it is related to modular programming and often defines and implements completely a collection of related abstract data types).
So throw both libA.so and libB.so away. And make a single libAB.so containing all the code that you have put in both libA.so and libB.so shared objects (and not genuine libraries).
The answer from n.m. gives a technical way to solve your problem, but at heart your design is wrong and you are abusing libraries (and you cannot call your libA or your libB a library, even if you built them as some shared object in ELF).
You could also design your code by adding some indirection with callbacks, or closures or function pointers held in some variable or data (and provide some way to set these callbacks, or initialize the closures or the function pointers at runtime). Since you use Qt, consider also defining appropriately your new Qt signals and slots (they are based on some callback machinery).
Read Program Library HowTo and Drepper's How to Write Shared libraries paper for more.
Finally I solve it.
As #n.m. said we dont need to link libA.so and libB.so in compile time, so I remove -lA and -lB when build them and i didnt get any error. And In app that want to use libA.so or libB.so I linked them with -lA or -lB. So this work correctly.

Link only needed symbols when compiling an executable with a Shared Library

I'm working on a heavy project that has a lot of static libraries that are interdependent. Furthermore some symbols are redundant between some libraries, with different implementations. My goal is to make the project work with shared libraries.
I tried to compile an executable with one of my shared libs, and I get undefined symbols errors on functions that my executable isn't using. After some research I understood that the dynamic linker works in very different ways than the static linker. If I understood right, when linking a shared library, all symbols need to be resolved as the whole library is loaded in the memory.
A simple workaround would be to add all the dependencies of my libraries when compiling the executable. But they're so full of dependencies that this sometimes means adding 10+ libraries to the command line, and this would be for something like a hundred executable.
So far I tried using -Wl,--as-needed, -Wl,--unresolved-symbols=ignore-in-shared-libs, and opening the shared object with dlopen to get the function I want with dlsym. But all of these methods fail at one point or another.
My question is: Are you forced to resolve every undefined symbol of a dynamic library when linking it against an executable ?
Details of dynamic linking and the kinds of objects involved vary across environments and toolchains. On Linux, where you say you are, and on Solaris, and several other UNIX-y platforms, you are looking at ELF objects and semantics.
So far I tried using -Wl,--as-needed,
-Wl,--unresolved-symbols=ignore-in-shared-libs,
These both have their full effect at (static) link time. The first tells the linker that the libraries following it on the command line should be linked in only if they resolve at least one as-yet undefined symbol. The latter tells the linker to not worry about resolving symbols in shared libraries included in the link. That has nothing to do with the behavior of the dynamic linker when you run the program.
and opening the shared object with dlopen to get the function I want with dlsym.
dlopen instructs the dynamic linker to link in a shared object at runtime that was not specified in the binary as a required shared library. Its behavior at that point can be modulated by the flags passed to dlopen, but the options available are not more than can be specified at link time. There is little reason to use dlopen when you actually know at link time what libraries you need.
Are you forced to resolve every undefined symbol of a dynamic library
when linking it against an executable ?
Focusing on ELF and the GNU toolchain, no. -Wl,--unresolved-symbols=ignore-in-shared-libs serves precisely the purpose of avoiding that. But as you've discovered, that comes with caveats.
In the first place, in every shared object, every symbol referring to data needs to be resolved at runtime by the dynamic linker, no matter how you linked the various shared objects, including the main program. This is primarily an operational consideration -- the dynamic linker has no way to defer resolving symbols referring to objects because it has no good way to trap attempts to access them.
On the other hand, it is possible to defer resolution of symbols referring to functions until their first use. In fact, this is the GNU linker's default, but you can reaffirm this by passing -Wl,-z,lazy to gcc when linking. Note well, however, that this sets a property of the object being linked, so you should ensure that every shared object is built with that link option (but ordinarily they are because, again, that's the default).
Additionally, you should be aware that the dynamic linker's behavior can be influenced by environment variables. In particular, lazy binding will be disabled if the dynamic linker finds LD_BIND_NOW set to a nonempty string in the runtime environment.
A simple workaround would be to add all the dependencies of my
libraries when compiling the executable. But they're so full of
dependencies that this sometimes means adding 10+ libraries to the
command line, and this would be for something like a hundred
executable.
And what's the big deal with that, really? Surely you have a well-factored Makefile (or several) to help you, so it shouldn't be a big deal to ensure that all the libraries are linked. Right?
But you should also consider refactoring your libraries, especially if "interdependent" means there are loops in the dependency graph. Dynamic linking is different from static linking, as you've discovered, and the differences are sometimes more subtle than those you're presently struggling with. Although it is not a hard rule, I urge you to avoid creating situations where the shared objects used by one process contain among them multiple definitions of the same external symbol, especially if that symbol is actually used.
Update
The above discussion focuses on linking shared libraries to an executable, but there is another important consideration: how the libraries themselves are linked. Each ELF object, whether executable or shared library, carries its own list of needed shared libraries. The dynamic linker will recursively include all of these in the list of shared libraries to be loaded (immediately) at program startup, notwithstanding its behavior with respect to lazy binding of symbols referring to functions.
Therefore, if you want an executable not to require a given shared library X, then not only that executable itself but also every shared library it does rely upon must avoid expressing a dependency on X. If some of the shared libs require X when used in conjunction with other programs, then that puts the onus on you to link in all the needed libraries when building those programs (otherwise, you can arrange to link only direct dependencies). You can tell the GNU linker to build shared libraries this way by passing it the --allow-shlib-undefined flag.
Here is a complete proof of concept:
main.c
int mul(int, int);
int main(void) {
return mul(2, 3);
}
mul.c
int add(int, int);
int mul(int x, int y) {
return x * y;
}
int mul2(int x, int y) {
return add(x, y) * add(x, -y);
}
Makefile
CC = gcc
LD = gcc
CFLAGS = -g -O2 -fPIC -DPIC
LDFLAGS = -Wl,--unresolved-symbols=ignore-in-shared-libs
SHLIB_LDFLAGS = -shared -Wl,--allow-shlib-undefined
all: main
main: main.o libmul.so
$(LD) $(CFLAGS) $(LDFLAGS) -o $# $^
libmul.so: mul.o
$(LD) $(CFLAGS) $(SHLIB_LDFLAGS) -o $# $^
clean:
rm -f main main.o libmul.so mul.o
Demo
$ make
gcc -g -O2 -fPIC -DPIC -c -o main.o main.c
gcc -g -O2 -fPIC -DPIC -c -o mul.o mul.c
gcc -g -O2 -fPIC -DPIC -shared -Wl,--allow-shlib-undefined -o libmul.so mul.o
gcc -g -O2 -fPIC -DPIC -Wl,--unresolved-symbols=ignore-in-shared-libs -o main main.o libmul.so
$ LD_LIBRARY_PATH=$(pwd) ./main
$ echo $?
6
$
Note that the -zlazy linker option discussed in comments is omitted, as it's the default.

Is it possible to un-link object files from an executable

Background: I am looking at developing a package manager similar to portage in Gentoo Linux ( I may end up forking portage). For those that know little about Gentoo it is a source based distro, which means that all packages are compiled from source code. Currently it is possible to compile a program into object files and then into executable's.
$ gcc -c a.c -o a.o
$ gcc -c b.c -o b.o
$ gcc a.o b.o -o executable
The improvements I would like to make to portage are the following.
Ability to only re-compile object files that have been updated (track changes using GIT or otherwise).
Decompile/Unlink executable to object files.
Re-compile/re-link object files replacing only the old object files with the updated object files (Changes tracked using GIT or otherwise).
Then the newly compiled package replaces the old package. (trivial task)
Reasoning: I am an Arch linux user who loves the idea of a source based distribution but cannot be bothered with the enormous task of keeping my system up to date. I also do most of my work on a laptop computer with a small hard drive, hence the reason behind de-compiling/un-linking the executable to object files rather that just keeping the object files which take up a large amount of space. It would also likely decrease the overall compile time of the system as the need to re-compile most of the source code would be greatly reduced. It would also allow for an easy way to change the USE flags on a package without the need to completely re-compile.
Question: Is it possible to compile object files into an executable and then to de-compile back into object files. An example of this is below.
$ gcc -c a.c -o a.o
$ gcc -c b.c -o b.o
$ gcc a.o b.o -o executable
and then
$ SomeCommand executable
output << a.o b.o
If this is not currently possible. Would it be doable to modify a version of GNU's linker "$ld" to log the changes it makes when linking object files, so as to make intentionally make the program "reverse Engineerable" ???
Edit: Another use for this would be too separate a singular object file from an executable of a large project to swap the separated object file with a new one and to re-link again. This would reduce the overhead of re-linking large projects from many different files when only one is updated. This would allow for incremental compilation on the binary level.
No, this is not possible. A large amount of the linker's work is replacing symbolic references (valid for any combination of object files being linked together) with numeric offsets (valid only for the particular way the linker decided to lay out that particular combination of object files, that particular time). Once the references are "baked" in this way, they cannot be recovered.
It might be doable if you alter/configure ld to keep sections for each object file apart and also keeps the relocation table for each object file in the executable. Also you have to make sure ld stores the object file names in the executable if you want to get the original file names.
Basically a linker could just join the object files together and then do the relocations, if the relocations are inversible you should be able to reverse the process.

symbol resolutions when creating (and linking) libraries

Suppose a.cc defines a function f_a() that uses a function f_b() defined in b.cc. From a.cc and b.cc I create a dynamic library libdynamic.so.
Suppose the file main.cc uses f_a, I'd compile it as follows:
g++ -o main main.cc -ldynamic
How does the dynamic linker bring the definition of f_a (and subsequently f_b) into the executable? Is the definition of f_a in libdynamic.so already resolved with f_b? Or the dynamic linker will also resolve this (internal) dependency at runtime?
Since you're using a shared library (*.so), the definition is not brought into the executable. It remains in the library itself and is resolved at run time, which is why if you remove the shared library the program will not function correctly.
On the other hand, all the internal symbols in the library (in your example, f_a and f_b) must be resolved when the library is built. This is evident from the compilation process:
g++ -fPIC -c a.cc
g++ -fPIC -c b.cc
g++ -shared -Wl,-soname,libdynamic.so -o libdynamic.so a.o b.o
In the last stage, g++ calls the linker (ld) to link f_a.o and f_b.o. In fact, you could (probably) call the linker directly instead:
ld -shared -soname=libdynamic.so -o libdynamic.so a.o b.o
If you're still curious about the whole process and all its gory details, here is a useful reference article: Linkers and Loaders, by Sandeep Grover.
Basically Dynamic libraries are linked with the Executable file at Run time(That is when you are running ./main). The compiler will take care about the solving the dependency at run time. If you want to check the dependency is resolved or not by nm command. The default information that the ‘nm’ command provides is-
Virtual address of the symbol
A character which depicts the symbol type. If the character is in lower case then the symbol is local but if the character is in upper case then the symbol is external
Name of the symbol
For more information nm.
After compiling your program just execute nm exefilename(i think for your's nm main).