Why does ld need library that my executable depends on?

Why does ld need library that my executable depends on? - c++

I'm trying to build my executable (that depends on library utils.so) using the following command
g++ -L/path/to/libutils -lutils -I/path/to/utils_headers executable.cpp -o executable
Actually I don't have utils.so - only the header files of utils library.
I'm getting the error:
ld: cannot find -lutils
Does linker really need to access all the libraries my executable depends on in order to build my executable? If it does then I'd like to know why it needs to access them.
My executable is a shared library. I'm sure that header files of the utils lib are enough to build it (i.e without having utils.so).

The linkage option -lutils by default directs the linker to search,
first in the specified library search directories (-Ldir) and then
in its default search directories, for either of the files libutils.so (
shared library) or libutils.a (static library), preferring libutils.so
if both of them are found in the same search directory.
If such a file is found, the linker stops searching and adds that file
to the input files of the linkage, whether or not it resolves any references in
the linkage. The linker cannot know whether the file resolves any references
if it does not input the file.
If no such file is found, the linker gives the error: cannot find -lutils. Because
you told it to find libutils.{so|a} and it could not.
You say:
My executable is a shared library
But it isn't. Your compile-and-link command:
$ g++ -L/path/to/libutils -lutils -I/path/to/utils_headers executable.cpp -o executable
is not an attempt to link a shared library. It is an attempt to link a program.1
This would be an attempt to link a shared library:
$ g++ -shared -I/path/to/utils_headers -o libexecutable.so executable.cpp -L/path/to/libutils -lutils
You cannot link a program with unresolved references. But you can link a shared library
with unresolved references.
So, you could link a libexecutable.so like that, or you could link it simply like:
$ g++ -shared -I/path/to/utils_headers -o libexecutable.so executable.cpp
These are two different linkages: if they succeed they produce different output files.
In the first linkage, some symbols will (let's assume) be resolved to definitions provided in libutils.so or libutils.a
(whichever one is found), and this will be reflected by:
libutils.so is found: The .dynamic section of libexecutable.so contains a DT_NEEDED
structure that expresses a runtime dependency on libutils.so. libutils.so will need to be included in any linkage that includes libexecutable.so, but the output file of such a linkage will itself contain a runtime dependency only on libexecutable.so.
libutils.a is found: libexecutable.so itself contains the definitions for all the symbols
it uses that are defined by object files in libutils.a.2 libexecutable.so may be included in subsequent linkages with no need for libutils.{so|a}.
In the second linkage, the .dynamic section of libexecutable.so will not express a runtime
dependency on libutils.so nor will the file contain definitions of any symbols provided by libutils.{so|a}. libutils.so will (again) need to be included in an subsequent linkage that includes libexecutable.so, but the output file of such a linkage will acquire independent runtime dependencies on both libexecutable.so and libutils.so.
But, if you specify -lutils in the linkage - or any linkage - and the linker cannot find libutils.{so|a}
in any of its search directories, then you get the error you observe, because you told the linker
to input a file, whose effects on the linkage can only be determined and implemented if that file is found - and it cannot be found.
[1] An attempt that is likely to fail, because it consumes libraries before the object
files that refer to them
[2] See static-libraries to understand
why.

In general, an ELF linker needs a sufficiently accurate representation of the shared object that is linked in. It does not have to be an actually working shared objects, just a sufficiently close representation of it. A few things absolute require data that is not available in the object itself:
When compiling C programs, a reference to a global data object of incomplete type does not contain size information. The linker cannot place the object into the data segment unless it obtains the size information from somewhere. By default (when compiling for executables, including PIE) the object needs to be allocated in the data segment on many targets because of the relocations the compiler uses for compiling accesses to global data objects.
Similarly, the link editor might get the alignment of global data objects wrong if it has insufficient information.
Many libraries use symbol versioning. Symbol version information is only available when the link editor can see the shared object. If that information is missing, the link editor will not emit a symbol version, which instructs the dynamic linker to bind the symbol to the base version at run time, leading to subtle bugs.
However, if you only use C function symbols (not data symbols, or the varieties of symbols that C++ requires) and the target library does not use symbol versioning, you can use a stub library for linking. This is a library that defines all the functions you need and has the appropriate soname, but the functions are just dummies which do not actually do anything.

Related

Is it possible to artificially induce object file extraction for a given static library?

I was recently reading this answer and noticed that it seems inconvenient for users to have to link static libraries in the correct order.
Is there some flag or #pragma I can pass to gcc when compiling my library so that my library's object files will always be included?
To be more specific, I want it to be case that the user can link my static library before another library that depends on it, and not wind up with unresolved symbols, in the same way that object files which are explicitly specified on the linker line are.
In particular, I am looking for solutions where the user of the library does not need to do anything special, and merely needs to pass -lMylibrary the way they would link any other library.

Is there some flag or #pragma I can pass to gcc
No.
I want it to be case that the user can link my static library before another library that depends on it, and not wind up with unresolved symbols, in the same way that object files which are explicitly specified on the linker line are.
Ship your "library" as a single object file. In other words, instead of:
ar ru libMyLibrary.a ${OBJS}
use:
ld -r -o libMyLibrary.a ${OBJS}
In particular, I am looking for solutions where the user of the library does not need to do anything special, and merely needs to pass -lMylibrary the way they would link any other library.
You can name your object file libMyLibrary.a. I believe the linker will search for it using the usual rules, but when it finds it, it will discover that this is an object file, and treat it as such, despite it being "misnamed". This should work at least on Linux and other ELF platforms. I am not sure whether it will work on Windows.

Get dlopen to ignore undefined symbols

I am compiling a dynamically generated C++ file as shared object which contains references to symbols available only in it's full build.
g++ -o tmp_form.so -fPIC -shared -lsomelib -std=gnu99 tmp_form.cc
I don't need the missing symbols for my current program, only those from the linked library. But dlopen does require them to be available or fails otherwise. The missing symbols are all variables which are being referenced in structs.
One option would be to add the weak reference attribute to the missing symbols in the generated code. But I would like to avoid making changes to the code generator if possible.
Any advise is appreciated.

Your link command is incorrect:
... -lsomelib ... tmp_form.cc
should be
... tmp_form.cc -lsomelib
The order of sources/objects and libraries on the link line does matter.
If you are using an ELF platform and a very recent build of Gold linker, you can "downgrade" unresolved symbols to weak with --weak-unresolved-symbols option (added here) without modifying the source.
Otherwise, you'll have to modify sources, there is no other way.
P.S. Function references would not have a problem with RTLD_LAZY due to lazy binding, but for data references weak unresolved is your only choice, lazy binding is not possible for them.

Try dlopen("/path/to/the/library", RTLD_LAZY);

How to make gcc/ld iterate over many '-l library' when using -static?

I want to compile statically pdf2svg so I will be able to use newest version in stable Debian. The ./configure doesn't give --enable-static option so I added manually in Makefile -static option for linker.
Unfortunately the result wasn't quite as I suspected. The linking gave me enormous amounts of undefined reference errors. After some googling I figured out that the problem is caused by wrong order of -lsome_lib. Gcc linker tries to statically link in each library once, when it first sees it - info and Stackoverflow question: Why does the order in which libraries are linked sometimes cause errors in GCC?.
Is there a possibility of making linker make multiple passes through the list of libraries?

Maybe this is what you search for (from gnu ld manpage):
-( archives -)
--start-group archives --end-group
The archives should be a list of archive files. They may be either explicit file names, or -l options.
The specified archives are searched repeatedly until no new undefined references are created. Normally, an archive is searched only once in the order that it is
specified on the command line. If a symbol in that archive is needed to resolve an undefined symbol referred to by an object in an archive that appears later on
the command line, the linker would not be able to resolve that reference. By grouping the archives, they all be searched repeatedly until all possible references
are resolved.
Using this option has a significant performance cost. It is best to use it only when there are unavoidable circular references between two or more archives.

A tick is, whenever possible, to add a static reference to an object of the class (or to the function) that were not linked in another cpp file of the same library (or in another library already used).
I have this situation:
library A with class clsA in clsA.cpp that gives the error
library A with foo.cpp that gives no reference errors
library B that uses class clsA
Application uses both libraries and uses classes/functions from foo.cpp
I get the unresolved reference in Application while using the object in library B that uses the clsA class.
Linking Application with library A and B give me the error. Since i use CodeLite, it's hard to change library order. I simply put a static object in foo.cpp:
#include "clsA.h"
clsA objA;
The linker now see that clsA are referenced in library A (between foo.cpp) and will link correctly in application because foo.cpp were already linked.
But the trick works even if the object were created in a dummy function, never called, so the object would never been allocated:
// foo.cpp
#include "clsA.h"
void dummyf()
{
clsA objA;
}

g++ linking order dependency when linking c code to c++ code

Prior to today I had always believed that the order that objects and libraries were passed to g++ during the linking stage was unimportant. Then, today, I tried to link from c++ code to c code. I wrapped all the C headers in an extern "C" block but the linker still had difficulties finding symbols which I knew were in the C object archives.
Perplexed, I created a relatively simple example to isolate the linking error but much to my surprise, the simpler example linked without any problems.
After a little trial and error, I found that by emulating the linking pattern used in the simple example, I could get the main code to link OK. The pattern was object code first, object archives second eg:
g++ -o serverCpp serverCpp.o algoC.o libcrypto.a
Can anyone shed some light on why this might be so?. I've never seen this problem when linking ordinary c++ code.

The order you specify object files and libraries is VERY important in GCC - if you haven't been bitten by this before you have lead a charmed life. The linker searches symbols in the order that they appear, so if you have a source file that contains a call to a library function, you need to put it before the library, or the linker won't know that it has to resolve it. Complex use of libraries can mean that you have to specify the library more than once, which is a royal pain to get right.

The library order pass to gcc/g++ does actually matter. If A depends on B, A must be listed first. The reason is that it optimizes out symbols that aren't referenced, so if it sees library B first, and no one has referenced it at that point then it won't link in anything from it at all.

A static library is a collection of object files grouped into an archive. When linking against it, the linker only chooses the objects it needs to resolve any currently undefined symbols. Since the objects are linked in order given on the command line, objects from the library will only be included if the library comes after all the objects that depend on it.
So the link order is very important; if you're going to use static libraries, then you need to be careful to keep track of dependencies, and don't introduce cyclic dependencies between libraries.

You can use --start-group archives --end-group
and write the 2 dependent libraries instead of archives:
gcc main.o -L. -Wl,--start-group -lobj_A -lobj_b -Wl,--end-group

Includes with the Linux GCC Linker

I don't understand how GCC works under Linux. In a source file, when I do a:
#include <math.h>
Does the compiler extract the appropriate binary code and insert it into the compiled executable OR does the compiler insert a reference to an external binary file (a-la Windows DLL?)
I guess a generic version of this question is: Is there an equivalent concept to Windows DLLs under *nix?

Well. When you include math.h the compiler will read the file that contains declarations of the functions and macros that can be used. If you call a function declared in that file (header), then the compiler inserts a call instruction into that place in your object file that will be made from the file you compile (let's call it test.c and the object file created test.o). It also adds an entry into the relocation table of that object-file:
Relocation section '.rel.text' at offset 0x308 contains 1 entries:
Offset Info Type Sym.Value Sym. Name
0000001c 00000902 R_386_PC32 00000000 bar
This would be a relocation entry for a function bar. An entry in the symbol table will be made noting the function is yet undefined:
9: 00000000 0 NOTYPE GLOBAL DEFAULT UND bar
When you link the test.o object file into a program, you need to link against the math library called libm.so . The so extension is similar to the .dll extension for windows. It means it is a shared object file. The compiler, when linking, will fix-up all the places that appear in the relocation table of test.o, replacing its entries with the proper address of the bar function. Depending on whether you use the shared version of the library or the static one (it's called libm.a then), the compiler will do that fix-up after compiling, or later, at runtime when you actually start your program. When finished, it will inject an entry in the table of shared libraries needed for that program. (can be shown with readelf -d ./test):
Dynamic section at offset 0x498 contains 22 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libm.so.6]
0x00000001 (NEEDED) Shared library: [libc.so.6]
... ... ...
Now, if you start your program, the dynamic linker will lookup that library, and will link that library to your executable image. In Linux, the program doing this is called ld.so. Static libraries don't have a place in the dynamic section, as they are just linked to the other object files and then they are forgotten about; they are part of the executable from then on.
In reality it is actually much more complex and i also don't understand this in detail. That's the rough plan, though.

There are several aspects involved here.
First, header files. The compiler simply includes the content of the file at the location where it was included, nothing more. As far as I know, GCC doesn't even treat standard header files differently (but I might be wrong there).
However, header files might actually not contain the implementation, only its declaration. If the implementation is located somewhere else, you've got to tell the compiler/linker that. By default, you do this by simply passing the appropriate library files to the compiler, or by passing a library name. For example, the following two are equivalent (provided that libcurl.a resides in a directory where it can be found by the linker):
gcc codefile.c -lcurl
gcc codefile.c /path/to/libcurl.a
This tells the link editor (“linker”) to link your code file against the implementation of the static library libcurl.a (the compiler gcc actually ignores these arguments because it doesn't know what to do with them, and simply passes them on to the linker). However, this is called static linking. There's also dynamic linking, which takes place at startup of your program, and which happens with .dlls under Windows (whereas static libraries correspond to .lib files on Windows). Dynamic library files under Linux usually have the file extension .so.
The best way to learn more about these files is to familiarize yourself with the GCC linker, ld, as well as the excellent toolset binutils, with which you can edit/view library files effortlessly (any binary code files, really).

Is there an equivalent concept to Windows DLLs under *nix?
Yes they are called "Shared Objects" or .so files. They are dynamically linked into your binary at runtime. In linux you can use the "ldd" command on your executable to see which shared objects your binary is linked to. You can use ListDLLs from sysinternals to accomplish the same thing in windows.

The compiler is allowed to do whatever it pleases, as long as, in effect, it acts as if you'd included the file. (All the compilers I know of, including GCC, simply include a file called math.h.)
And no, it doesn't usually contain the function definitions itself. That's libm.so, a "shared object", similar to windows .DLLs. It should be on every system, as it is a companion of libc.so, the C runtime.
Edit: And that's why you have to pass -lm to the linker if you use math functions - it instructs it to link against libm.so.

There is. The include does a textual include of the header file (which is standard C/C++ behavior). What you're looking for is the linker . The -l argument to gcc/g++ tells the linker what library(ies) to add in. For math (libm.so), you'd use -lm. The common pattern is:
source file: #include <foo.h>
gcc/g++ command line: -lfoo
shared library: libfoo.so
math.h is a slight variation on this theme.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js