Link only needed symbols when compiling an executable with a Shared Library

Link only needed symbols when compiling an executable with a Shared Library - c++

I'm working on a heavy project that has a lot of static libraries that are interdependent. Furthermore some symbols are redundant between some libraries, with different implementations. My goal is to make the project work with shared libraries.
I tried to compile an executable with one of my shared libs, and I get undefined symbols errors on functions that my executable isn't using. After some research I understood that the dynamic linker works in very different ways than the static linker. If I understood right, when linking a shared library, all symbols need to be resolved as the whole library is loaded in the memory.
A simple workaround would be to add all the dependencies of my libraries when compiling the executable. But they're so full of dependencies that this sometimes means adding 10+ libraries to the command line, and this would be for something like a hundred executable.
So far I tried using -Wl,--as-needed, -Wl,--unresolved-symbols=ignore-in-shared-libs, and opening the shared object with dlopen to get the function I want with dlsym. But all of these methods fail at one point or another.
My question is: Are you forced to resolve every undefined symbol of a dynamic library when linking it against an executable ?

Details of dynamic linking and the kinds of objects involved vary across environments and toolchains. On Linux, where you say you are, and on Solaris, and several other UNIX-y platforms, you are looking at ELF objects and semantics.
So far I tried using -Wl,--as-needed,
-Wl,--unresolved-symbols=ignore-in-shared-libs,
These both have their full effect at (static) link time. The first tells the linker that the libraries following it on the command line should be linked in only if they resolve at least one as-yet undefined symbol. The latter tells the linker to not worry about resolving symbols in shared libraries included in the link. That has nothing to do with the behavior of the dynamic linker when you run the program.
and opening the shared object with dlopen to get the function I want with dlsym.
dlopen instructs the dynamic linker to link in a shared object at runtime that was not specified in the binary as a required shared library. Its behavior at that point can be modulated by the flags passed to dlopen, but the options available are not more than can be specified at link time. There is little reason to use dlopen when you actually know at link time what libraries you need.
Are you forced to resolve every undefined symbol of a dynamic library
when linking it against an executable ?
Focusing on ELF and the GNU toolchain, no. -Wl,--unresolved-symbols=ignore-in-shared-libs serves precisely the purpose of avoiding that. But as you've discovered, that comes with caveats.
In the first place, in every shared object, every symbol referring to data needs to be resolved at runtime by the dynamic linker, no matter how you linked the various shared objects, including the main program. This is primarily an operational consideration -- the dynamic linker has no way to defer resolving symbols referring to objects because it has no good way to trap attempts to access them.
On the other hand, it is possible to defer resolution of symbols referring to functions until their first use. In fact, this is the GNU linker's default, but you can reaffirm this by passing -Wl,-z,lazy to gcc when linking. Note well, however, that this sets a property of the object being linked, so you should ensure that every shared object is built with that link option (but ordinarily they are because, again, that's the default).
Additionally, you should be aware that the dynamic linker's behavior can be influenced by environment variables. In particular, lazy binding will be disabled if the dynamic linker finds LD_BIND_NOW set to a nonempty string in the runtime environment.
A simple workaround would be to add all the dependencies of my
libraries when compiling the executable. But they're so full of
dependencies that this sometimes means adding 10+ libraries to the
command line, and this would be for something like a hundred
executable.
And what's the big deal with that, really? Surely you have a well-factored Makefile (or several) to help you, so it shouldn't be a big deal to ensure that all the libraries are linked. Right?
But you should also consider refactoring your libraries, especially if "interdependent" means there are loops in the dependency graph. Dynamic linking is different from static linking, as you've discovered, and the differences are sometimes more subtle than those you're presently struggling with. Although it is not a hard rule, I urge you to avoid creating situations where the shared objects used by one process contain among them multiple definitions of the same external symbol, especially if that symbol is actually used.
Update
The above discussion focuses on linking shared libraries to an executable, but there is another important consideration: how the libraries themselves are linked. Each ELF object, whether executable or shared library, carries its own list of needed shared libraries. The dynamic linker will recursively include all of these in the list of shared libraries to be loaded (immediately) at program startup, notwithstanding its behavior with respect to lazy binding of symbols referring to functions.
Therefore, if you want an executable not to require a given shared library X, then not only that executable itself but also every shared library it does rely upon must avoid expressing a dependency on X. If some of the shared libs require X when used in conjunction with other programs, then that puts the onus on you to link in all the needed libraries when building those programs (otherwise, you can arrange to link only direct dependencies). You can tell the GNU linker to build shared libraries this way by passing it the --allow-shlib-undefined flag.
Here is a complete proof of concept:
main.c
int mul(int, int);
int main(void) {
return mul(2, 3);
}
mul.c
int add(int, int);
int mul(int x, int y) {
return x * y;
}
int mul2(int x, int y) {
return add(x, y) * add(x, -y);
}
Makefile
CC = gcc
LD = gcc
CFLAGS = -g -O2 -fPIC -DPIC
LDFLAGS = -Wl,--unresolved-symbols=ignore-in-shared-libs
SHLIB_LDFLAGS = -shared -Wl,--allow-shlib-undefined
all: main
main: main.o libmul.so
$(LD) $(CFLAGS) $(LDFLAGS) -o $# $^
libmul.so: mul.o
$(LD) $(CFLAGS) $(SHLIB_LDFLAGS) -o $# $^
clean:
rm -f main main.o libmul.so mul.o
Demo
$ make
gcc -g -O2 -fPIC -DPIC -c -o main.o main.c
gcc -g -O2 -fPIC -DPIC -c -o mul.o mul.c
gcc -g -O2 -fPIC -DPIC -shared -Wl,--allow-shlib-undefined -o libmul.so mul.o
gcc -g -O2 -fPIC -DPIC -Wl,--unresolved-symbols=ignore-in-shared-libs -o main main.o libmul.so
$ LD_LIBRARY_PATH=$(pwd) ./main
$ echo $?
6
$
Note that the -zlazy linker option discussed in comments is omitted, as it's the default.

Related

Link two libraries to each other

I have a function in libA.so that is used in libB.so,
And a function in libB.so that is used in libA.so!
So defenetly i can not compile none of these libraries.
How can I compile these two libraries?
Should i used third library and move the dependebcies to this library?
I used qt and c++
Updated:
in compile libA.so get error cannot find libB.so and in libB.so get error can not find libA.so

BIG FAT DISCLAIMER Only do this if absolutely necessary. The preferred way is to refactor your project structure such that it doesn't contain dependency cycles.
When producing a shared library, the linker in general does not need to know about other shared libraries. One can use them on the command line but this is optional. Example:
// libA.cpp
extern void funcB();
void funcA() {
funcB();
}
Compile and link:
g++ -fPIC -c libA.cpp
g++ -shared -o libA.so libA.o
funcB is supposed to live in libB.so but we are not telling the linker where to find it. The symbol is simply left undefined in libA.so, and will be (hopefully) resolved at load time.
// libB.cpp
extern void funcA();
void funcB() {
funcA();
}
Compile and link, now using libA.so explicitly (ignore the infinite recursion, it's just an example):
g++ -fPIC -c libB.cpp
g++ -shared -o libB.so libB.o -L/where/libA/is -lA
Now it is up to the executable to load libB.so before loading libA.so, otherwise libA.so cannot be loaded. It's easy to do so (just link the executable with only libB.so and not libA.so), but can be inconvenient at times. So one can re-link libA.so after building libB.so:
g++ -shared -o libA.so libA.o -L /where/libB/is -lB
Now one can link an executable to libA or libB and the other one will be picked up automatically.

This seems a bit problematic for future re-use, you might want to either separate your functions differnetly between those libraries or create a third one thatt contains all of the "tool" funtions to have LibA and libB function without one another .

I have a function in libA.so that is used in libB.so, And a function in libB.so that is used in libA.so!
This is wrong design. A library cannot, even indirectly, depend upon itself. Such a circularity is the symptom of something very wrong, and you are misunderstanding what a software library is (it is more than a random collection of functions, or of object files; it has to be somehow a "software module" and it is related to modular programming and often defines and implements completely a collection of related abstract data types).
So throw both libA.so and libB.so away. And make a single libAB.so containing all the code that you have put in both libA.so and libB.so shared objects (and not genuine libraries).
The answer from n.m. gives a technical way to solve your problem, but at heart your design is wrong and you are abusing libraries (and you cannot call your libA or your libB a library, even if you built them as some shared object in ELF).
You could also design your code by adding some indirection with callbacks, or closures or function pointers held in some variable or data (and provide some way to set these callbacks, or initialize the closures or the function pointers at runtime). Since you use Qt, consider also defining appropriately your new Qt signals and slots (they are based on some callback machinery).
Read Program Library HowTo and Drepper's How to Write Shared libraries paper for more.

Finally I solve it.
As #n.m. said we dont need to link libA.so and libB.so in compile time, so I remove -lA and -lB when build them and i didnt get any error. And In app that want to use libA.so or libB.so I linked them with -lA or -lB. So this work correctly.

Creating a Minimal Shared Library

For background, I'm creating some C++ software that uses dynamically loaded shared library plugins for hardware output (the specifics of it aren't relevant here).
I'm building the executable by compiling everything into object files and then linking the ones needed, which is simple using an exclusion list. I can then build the shared library by specifying its primary object file (the one that's dynamically loaded and accessed at runtime) along with every other object file referenced by the primary one.
My question is this: Is there a way to provide the linker with the primary object file, and create a shared library containing only the objects it depends upon? All of the object files are in the same directory, I'm not using a Makefile (yet; if one could solve the problem, it's a valid answer), and compilation speed isn't an issue.
I've looked into the linker options --as-needed, --gc-sections, and --no-undefined, but I haven't been able to piece together a working build process.
Example: For source files main.cpp, a.cpp, b.cpp, a.h, and b.h, where main.cpp and a.cpp both include b.h:
gcc -fPIC -c *.cpp -I. builds object files main.o, a.o, and b.o.
gcc -o main.out *.o builds the final executable main.out from the object files... including a.o, which is unused. (--gc-sections should fix this.)
gcc -fPIC -shared -o a.so a.o -Wl,--as-needed !(a).o builds the final shared library a.so from all of the object files... including main.o, which is unused. How do I prevent main.o from being included in a.so?

Is there a way to provide the linker with the primary object file, and create a shared library containing only the objects it depends upon?
Yes: package all objects into an archive library liball.a, then link like this:
gcc -shared -o a.so a.o liball.a
The linker will then pull out from liball.a all objects that a.o depends on, and only these objects, as explained here.
Note: liball.a may contain a.o, there is no harm (as above link explains).
Update:
Is there a way to do it without needing to create an archive first?
I don't know of any portable way to do that. The Gold linker has --start-lib and --end-lib command line flags that achieve exactly that.

An executable and a shared library dependent on a same statically linked library

Suppose you're developing a shared library libshared.so.
And you have a static library libstatic.a with some internal classes and functionality you need. You'd like to link it to your .so like this:
g++ -o libshared.so -shared myObj.o -lstatic
Also you have an executable.sh which will use your .so and dynamically open it in the runtime
dlopen("libshared.so", RTLD_NOW)
You know this executable was as well statically linked against libstatic.a (but you're not sure the version of the library is exactly the same as yours).
So the question is:
Is it safe and correct to statically link your libshared.so against libstatic.a when you know the same library is already used in executable.sh?

You should avoid linking a static library into a shared one.
Because a shared library should have position independent code (otherwise, the dynamic linker has to do too much relocation, and you lose the benefits of shared libraries), but a static library usually does not have PIC.
Read Drepper's paper: How to write a shared library
You build your library with
g++ -Wall -O -fPIC mySrc.cc -c -o myObj.pic.o
g++ -o libshared.so -shared myObj.pic.o -lotherlib

C Shared Object, With C++ Archive, Static Ctors/Dtors, and dlopen

I have a C shared object I'm loading with dlopen. The C shared object includes another library as a static archive (fully specified path /usr/local/.../libsomelib.a). libsomelib.a is a C++ library and it has global and static locals.
On Ubuntu, the static initializers do not appear to run when opening the shared library with RTLD_GLOBAL and RTLD_GLOBAL | RTLD_LAZY. The symptom I am seeing is a program crash.
The behavior I am seeing seems similar to linking with -nostartfiles or -nostdlib (but I'm not using them). I found a similar thread at C++ Static Constructors and dlopen'd Shared Libraries, but its for a NetBSD system.
If the EXE explicitly includes libsomelib.a and calls a function from it, the C++ library will initialize and the program no longer crashes when calling through the function pointer.
EDIT: here's how the shared object is being built (its the simplest case that I've experienced, without mixing/matching C and C++). cryptopp-so-test.exe calls dlopen:
CXXFLAGS = -g -ggdb -fPIC -DDEBUG -O1 -Wall -Wextra -Wno-unused -DUSE_PRECOMPILED_HEADERS=1 -I. -I/usr/local/include/cryptopp
...
precompile:
$(CXX) $(CXXFLAGS) pch.h -o pch.h.gch
cryptopp-so-test.exe: precompile $(EXEOBJECTS)
$(CXX) $(CXXFLAGS) -o $# $(EXESOURCES) -ldl -lpthread
dsotest: precompile $(DLLOBJECTS)
$(CXX) $(CXXFLAGS) $(DLLSOURCES) -o dsotest-1.so -shared /usr/local/lib/libcryptopp.a
While the code above build one EXE (cryptopp-so-test.exe) and one SO (dsotest-1.so), I actually build and load 4 shared objects (they are built identically).
What flags (or other methods) should I use to ensure the static initializers are run when a C shared object with C++ components is dlopen'd?

On Ubuntu, the static initializers do not appear to run when opening the shared library with RTLD_GLOBAL and RTLD_GLOBAL | RTLD_LAZY.
When you dlopen a shared library, the global constructors are called. You are likely jumping to the wrong conclusions.
The symptom I am seeing is a program crash.
That symptom could be caused by anything. You need to look at the crash in the debugger, and understand what is causing it, rather than blindly guess "static initializers".
One difference between linking libsomelib.a into the main executable and linking it into a shared library is that depending on which code calls which functions, you may end up with wildly different parts of libsomelib.a included into each (the linker will only pull in the parts of libsomelib.a that it can see are necessary).
You could try linking the entire libsomelib.a into the shared library as follows:
g++ $(OBJS) -o dsotest-1.so -shared \
-Wl,--whole-archive -lsomelib -Wl,--no-whole-archive

g++: In what order should static and dynamic libraries be linked?

Let's say we got a main executable called "my_app" and it uses several other libraries: 3 libraries are linked statically, and other 3 are linked dynamically.
In which order should they be linked against "my_app"?
But in which order should these be linked?
Let's say we got libSA (as in Static A) which depends on libSB, and libSC which depends on libSB:
libSA -> libSB -> libSC
and three dynamic libraries:libDA -> libDB -> libDC (libDA is the basic, libDC is the highest)
in which order should these be linked? the basic one first or last?
g++ ... -g libSA libSB libSC -lDA -lDB -lDC -o my_app
seems like the currect order, but is that so? what if there are dependencies between any dynamic library to a static one, or the other way?

In the static case, it doesn't really matter, because you don't actually link static libraries - all you do is pack some object files together in one archive. All you have to is compile your object files, and you can create static libraries right away.
The situation with dynamic libraries is more convoluted, there are two aspects:
A shared library works exactly the same way as static library (except for shared segments, if they are present), which means, you can just do the same - just link your shared library as soon as you have the object files. This means for example symbols from libDA will appear as undefined in libDB
You can specify the libraries to link to on the command line when linking shared objects. This has the same effect as 1., but, marks libDB as needing libDA.
The difference is that if you use the former way, you have to specify all three libraries (-lDA, -lDB, -lDC) on the command line when linking the executable. If you use the latter, you just specify -lDC and it will pull the others automatically at link time. Note that link time is just before your program runs (which means you can get different versions of symbols, even from different libraries).
This all applies to UNIX; Windows DLL work quite differently.
Edit after clarification of the question:
Quote from the ld info manual.
The linker will search an archive only
once, at the location where it is
specified on the command line. If the
archive defines a symbol which was
undefined in some object which
appeared before the archive on the
command line, the linker will include
the appropriate file(s) from the
archive. However, an undefined symbol
in an object appearing later on the
command line will not cause the linker
to search the archive again.
See the `-(' option for a way to force
the linker to search archives multiple
times.
You may list the same archive multiple
times on the command line.
This type of archive searching is
standard for Unix linkers. However, if
you are using `ld' on AIX, note that
it is different from the behaviour of
the AIX linker.
That means:
Any static library or object that depends on other library should be placed before it in the command line. If static libraries depend on each other circularly, you can eg. use the -( command line option, or place the libraries on the command line twice (-lDA -lDB -lDA). The order of dynamic libraries doesn't matter.

This is the sort of question that's best solved by a trivial example. Really! Take 2 minutes, code up a simple example, and try it out! You'll learn something, and it's faster than asking.
For example, given files:
a1.cc
#include <stdio.h>
void a1() { printf("a1\n"); }
a2.cc
#include <stdio.h>
extern void a1();
void a2() { printf("a2\n"); a1(); }
a3.cc
#include <stdio.h>
extern void a2();
void a3() { printf("a3\n"); a2(); }
aa.cc
extern void a3();
int main()
{
a3();
}
Running:
g++ -Wall -g -c a1.cc
g++ -Wall -g -c a2.cc
g++ -Wall -g -c a3.cc
ar -r liba1.a a1.o
ar -r liba2.a a2.o
ar -r liba3.a a3.o
g++ -Wall -g aa.cc -o aa -la1 -la2 -la3 -L.
Shows:
./liba3.a(a3.o)(.text+0x14): In function `a3()':
/tmp/z/a3.C:4: undefined reference to `a2()'
Whereas:
g++ -Wall -g -c a1.C
g++ -Wall -g -c a2.C
g++ -Wall -g -c a3.C
ar -r liba1.a a1.o
ar -r liba2.a a2.o
ar -r liba3.a a3.o
g++ -Wall -g aa.C -o aa -la3 -la2 -la1 -L.
Succeeds. (Just the -la3 -la2 -la1 parameter order is changed.)
PS:
nm --demangle liba*.a
liba1.a:
a1.o:
U __gxx_personality_v0
U printf
0000000000000000 T a1()
liba2.a:
a2.o:
U __gxx_personality_v0
U printf
U a1()
0000000000000000 T a2()
liba3.a:
a3.o:
U __gxx_personality_v0
U printf
U a2()
0000000000000000 T a3()
From man nm:
If lowercase, the symbol is local; if uppercase, the symbol is global (external).
"T" The symbol is in the text (code) section.
"U" The symbol is undefined.

I worked in a project with a bunch of internal libraries that unfortunately depended on each other (and it got worse over time). We ended up "solving" this by setting up SCons to specify all libs twice when linking:
g++ ... -la1 -la2 -la3 -la1 -la2 -la3 ...

The dependencies for linking a library or executable have to be present at link-time, so you cannot link libXC before libXB is present. It doesn't matter if statically or dynamically.
Start with the most basic one, which has no (or just outside of your project) dependencies.

It's good practice to keep libraries independent of each other to avoid link order issues.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js