Includes with the Linux GCC Linker - c++

I don't understand how GCC works under Linux. In a source file, when I do a:
#include <math.h>
Does the compiler extract the appropriate binary code and insert it into the compiled executable OR does the compiler insert a reference to an external binary file (a-la Windows DLL?)
I guess a generic version of this question is: Is there an equivalent concept to Windows DLLs under *nix?

Well. When you include math.h the compiler will read the file that contains declarations of the functions and macros that can be used. If you call a function declared in that file (header), then the compiler inserts a call instruction into that place in your object file that will be made from the file you compile (let's call it test.c and the object file created test.o). It also adds an entry into the relocation table of that object-file:
Relocation section '.rel.text' at offset 0x308 contains 1 entries:
Offset Info Type Sym.Value Sym. Name
0000001c 00000902 R_386_PC32 00000000 bar
This would be a relocation entry for a function bar. An entry in the symbol table will be made noting the function is yet undefined:
9: 00000000 0 NOTYPE GLOBAL DEFAULT UND bar
When you link the test.o object file into a program, you need to link against the math library called libm.so . The so extension is similar to the .dll extension for windows. It means it is a shared object file. The compiler, when linking, will fix-up all the places that appear in the relocation table of test.o, replacing its entries with the proper address of the bar function. Depending on whether you use the shared version of the library or the static one (it's called libm.a then), the compiler will do that fix-up after compiling, or later, at runtime when you actually start your program. When finished, it will inject an entry in the table of shared libraries needed for that program. (can be shown with readelf -d ./test):
Dynamic section at offset 0x498 contains 22 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libm.so.6]
0x00000001 (NEEDED) Shared library: [libc.so.6]
... ... ...
Now, if you start your program, the dynamic linker will lookup that library, and will link that library to your executable image. In Linux, the program doing this is called ld.so. Static libraries don't have a place in the dynamic section, as they are just linked to the other object files and then they are forgotten about; they are part of the executable from then on.
In reality it is actually much more complex and i also don't understand this in detail. That's the rough plan, though.

There are several aspects involved here.
First, header files. The compiler simply includes the content of the file at the location where it was included, nothing more. As far as I know, GCC doesn't even treat standard header files differently (but I might be wrong there).
However, header files might actually not contain the implementation, only its declaration. If the implementation is located somewhere else, you've got to tell the compiler/linker that. By default, you do this by simply passing the appropriate library files to the compiler, or by passing a library name. For example, the following two are equivalent (provided that libcurl.a resides in a directory where it can be found by the linker):
gcc codefile.c -lcurl
gcc codefile.c /path/to/libcurl.a
This tells the link editor (“linker”) to link your code file against the implementation of the static library libcurl.a (the compiler gcc actually ignores these arguments because it doesn't know what to do with them, and simply passes them on to the linker). However, this is called static linking. There's also dynamic linking, which takes place at startup of your program, and which happens with .dlls under Windows (whereas static libraries correspond to .lib files on Windows). Dynamic library files under Linux usually have the file extension .so.
The best way to learn more about these files is to familiarize yourself with the GCC linker, ld, as well as the excellent toolset binutils, with which you can edit/view library files effortlessly (any binary code files, really).

Is there an equivalent concept to Windows DLLs under *nix?
Yes they are called "Shared Objects" or .so files. They are dynamically linked into your binary at runtime. In linux you can use the "ldd" command on your executable to see which shared objects your binary is linked to. You can use ListDLLs from sysinternals to accomplish the same thing in windows.

The compiler is allowed to do whatever it pleases, as long as, in effect, it acts as if you'd included the file. (All the compilers I know of, including GCC, simply include a file called math.h.)
And no, it doesn't usually contain the function definitions itself. That's libm.so, a "shared object", similar to windows .DLLs. It should be on every system, as it is a companion of libc.so, the C runtime.
Edit: And that's why you have to pass -lm to the linker if you use math functions - it instructs it to link against libm.so.

There is. The include does a textual include of the header file (which is standard C/C++ behavior). What you're looking for is the linker . The -l argument to gcc/g++ tells the linker what library(ies) to add in. For math (libm.so), you'd use -lm. The common pattern is:
source file: #include <foo.h>
gcc/g++ command line: -lfoo
shared library: libfoo.so
math.h is a slight variation on this theme.

Related

In the compilation system, how does linker (ld) know who to link myprogram.o to?

I recently read the CSAPP and had some doubts about the compilation system part of it.
Now we have a sample using HelloWorld.c(just print hello world). The book said in Pre-processor phase, they replace the "#include " line with the content of this header file. But when I open the stdio.h, I find that there is only a declaration for printf() and there is no concrete implementation. So in the compilation system, when will the specific implementation of printf() be introduced?
And the book also said, in linking phase, the linker(ld) linked helloworld.o and printf.o . Why the linker knows to link my object file to printf.o? In a compilation system, why does it declare this function in the first step(Pre-processor phase) and link the concrete implementation in the last step(linking phase)?
Practically, over-simplified:
You can compile a function into a library (ex. .a or .so file on unix).
The library has a function body (assembly instructions) and a function name. Ex. the library libc.so has printf function that starts at character number 0xaabbccdd in the library file libc.so.
You want to compile your program.
You need to know what arguments printf takes. Does it take int ? Does it take char *? Does it take uint_least64_t? It's in the header file - int printf(const char *, ...);. The header tells the compiler how to call the function (what parameters does the function take and what type it returns). Note that each .c file is compiled separately.
The function declaration (what arguments the function takes and what does it return) is not stored in the library file. It is stored in the header (only). The library has function name (only printf) and compiled function body. The header has int printf(const char *, ...); without function body.
You compile your program. The compiler generates the code, so that arguments with proper size are pushed onto the stack. And from the stack your code takes variable returned from the function. Now your program is compiled into assembly that looks like push pointer to "%d\n" on the stack; push some int on the stack; call printf; pop from the stack the returned "int"; rest of the instructions;.
Linker searches through your compiled program and it sees call printf. It then says: "Och, there is no printf body in your code". So then it searches printf in the libraries, to see where it is. The linker goes through all the libraries you link your program with and it finds printf in the standard library - it's in libc.so at address 0xaabbccdd. So linker substitutes call printf for goto libs.so file to address 0xaabbccdd kind-of instruction.
After all "symbols" (ie. function names, variables names) are "resolved" (the linker has found them somewhere), then you can run your program. The call printf will jump into the file libc.so at specified location.
What I have written above is only for illustration purposes.
Why the linker knows to link my object file to printf.o
Because the complier notes this inside what it produces, typically called object files (.o).
why does it declare this function in the first step ...
To know about it.
... and link the concrete implementation in the last step
Because there is no need to do this earlier.
All the C and C++ standards tell you is that you need to #include a given header file in order to introduce some functionality (on some platforms that might not even be necessary although inclusion is a good idea since then you're writing portable code).
That affords compilers a lot of flexibility.
The linking, if any, will be done automatically. Note that some functions might even be hardcoded into the compiler itself.
By default the library ( containing the implementation of printf ) is linked everytime in your C program.
By including headers you just specify (for the time being) at compile time that the implementations of the declared functions (inside the header) are somewhere else. And later in the linking phase, those function implementations are 'added' in your code.
Why the linker knows to link my object file to printf.o?
LD knows how to search and find them. You can see the with man ld.so:
If a shared object dependency does not contain a slash, then it is
searched for in the following order:
Using the directories specified in the DT_RPATH dynamic section attribute of the binary if present and DT_RUNPATH attribute does not
exist. Use of DT_RPATH is deprecated.
Using the environment variable LD_LIBRARY_PATH, unless the executable is being run in secure-execution mode (see below), in which
case this variable is ignored.
Using the directories specified in the DT_RUNPATH dynamic section attribute of the binary if present. Such directories are searched only
to find those objects required by DT_NEEDED (direct dependencies)
entries and do not apply to those objects' children, which must
themselves have their own DT_RUNPATH entries. This is unlike DT_RPATH,
which is applied to searches for all children in the dependency tree.
From the cache file /etc/ld.so.cache, which contains a compiled list of candidate shared objects previously found in the augmented
library path. If, however, the binary was linked with the -z nodeflib
linker option, shared objects in the default paths are skipped. Shared
objects installed in hardware capability directories (see below) are
preferred to other shared objects.
In the default path /lib, and then /usr/lib. (On some 64-bit architectures, the default paths for 64-bit shared objects are /lib64,
and then /usr/lib64.) If the binary was linked with the -z nodeflib
linker option, this step is skipped.
In a compilation system, why does it declare this function in the first step(Pre-processor phase) and link the concrete implementation in the last step(linking phase)?
In the compilation stage, you need to know what you're going to link to and compile accordingly, so it needs to read the .h files with the definition. In the linking stage, only .o files are needed.

Why does ld need library that my executable depends on?

I'm trying to build my executable (that depends on library utils.so) using the following command
g++ -L/path/to/libutils -lutils -I/path/to/utils_headers executable.cpp -o executable
Actually I don't have utils.so - only the header files of utils library.
I'm getting the error:
ld: cannot find -lutils
Does linker really need to access all the libraries my executable depends on in order to build my executable? If it does then I'd like to know why it needs to access them.
My executable is a shared library. I'm sure that header files of the utils lib are enough to build it (i.e without having utils.so).
The linkage option -lutils by default directs the linker to search,
first in the specified library search directories (-Ldir) and then
in its default search directories, for either of the files libutils.so (
shared library) or libutils.a (static library), preferring libutils.so
if both of them are found in the same search directory.
If such a file is found, the linker stops searching and adds that file
to the input files of the linkage, whether or not it resolves any references in
the linkage. The linker cannot know whether the file resolves any references
if it does not input the file.
If no such file is found, the linker gives the error: cannot find -lutils. Because
you told it to find libutils.{so|a} and it could not.
You say:
My executable is a shared library
But it isn't. Your compile-and-link command:
$ g++ -L/path/to/libutils -lutils -I/path/to/utils_headers executable.cpp -o executable
is not an attempt to link a shared library. It is an attempt to link a program.1
This would be an attempt to link a shared library:
$ g++ -shared -I/path/to/utils_headers -o libexecutable.so executable.cpp -L/path/to/libutils -lutils
You cannot link a program with unresolved references. But you can link a shared library
with unresolved references.
So, you could link a libexecutable.so like that, or you could link it simply like:
$ g++ -shared -I/path/to/utils_headers -o libexecutable.so executable.cpp
These are two different linkages: if they succeed they produce different output files.
In the first linkage, some symbols will (let's assume) be resolved to definitions provided in libutils.so or libutils.a
(whichever one is found), and this will be reflected by:
libutils.so is found: The .dynamic section of libexecutable.so contains a DT_NEEDED
structure that expresses a runtime dependency on libutils.so. libutils.so will need to be included in any linkage that includes libexecutable.so, but the output file of such a linkage will itself contain a runtime dependency only on libexecutable.so.
libutils.a is found: libexecutable.so itself contains the definitions for all the symbols
it uses that are defined by object files in libutils.a.2 libexecutable.so may be included in subsequent linkages with no need for libutils.{so|a}.
In the second linkage, the .dynamic section of libexecutable.so will not express a runtime
dependency on libutils.so nor will the file contain definitions of any symbols provided by libutils.{so|a}. libutils.so will (again) need to be included in an subsequent linkage that includes libexecutable.so, but the output file of such a linkage will acquire independent runtime dependencies on both libexecutable.so and libutils.so.
But, if you specify -lutils in the linkage - or any linkage - and the linker cannot find libutils.{so|a}
in any of its search directories, then you get the error you observe, because you told the linker
to input a file, whose effects on the linkage can only be determined and implemented if that file is found - and it cannot be found.
[1] An attempt that is likely to fail, because it consumes libraries before the object
files that refer to them
[2] See static-libraries to understand
why.
In general, an ELF linker needs a sufficiently accurate representation of the shared object that is linked in. It does not have to be an actually working shared objects, just a sufficiently close representation of it. A few things absolute require data that is not available in the object itself:
When compiling C programs, a reference to a global data object of incomplete type does not contain size information. The linker cannot place the object into the data segment unless it obtains the size information from somewhere. By default (when compiling for executables, including PIE) the object needs to be allocated in the data segment on many targets because of the relocations the compiler uses for compiling accesses to global data objects.
Similarly, the link editor might get the alignment of global data objects wrong if it has insufficient information.
Many libraries use symbol versioning. Symbol version information is only available when the link editor can see the shared object. If that information is missing, the link editor will not emit a symbol version, which instructs the dynamic linker to bind the symbol to the base version at run time, leading to subtle bugs.
However, if you only use C function symbols (not data symbols, or the varieties of symbols that C++ requires) and the target library does not use symbol versioning, you can use a stub library for linking. This is a library that defines all the functions you need and has the appropriate soname, but the functions are just dummies which do not actually do anything.

static and dynamic linking using gcc

I've been recently reading about static and dynamic linking and I understood the differences and how to create static and dynamic library and link it to my project
But, a question came to my mind that I couldn't answer or find answer for it as It's a specific question ... when I compile my code on linux using the line
#include <stdio.h>
int main()
{
printf("hello, world!\n");
}
compiling using this command
[root#host ~]# gcc helloworld.c -o helloworld
which type of linking is this??
so the stdio.h is statically or dynamically linked to my project???
Libraries are mostly used as shared resources so, that several different programs can reuse the same pre-compiled code in some manner. Some libraries come as standard libraries which are delivered with the operating system and/or the compiler package. Some libraries come with other third party projects.
When you run just gcc in the manner of your example, you really run a compiler driver which provides you with few compilation-related functions, calling different parts of the compilation process and finally linking your application with a few standard libraries. The type of the libraries is chosen based on the qualifiers you provide. By default it will try to find dynamic (shared) libraries and if missing will attempt for static. Unless you tell it to use static libs only (-static).
When you link to project libraries you tell the gcc/g++ which libraries to use in a manner (-lname). In such a way it will do the same as with the standard libraries, looking for '.so' first and '.a' second, unless -static is used. You can directly specify the path to the full library name as well, actually telling it which library to use. There are several other qualifiers which control the linking process, please look man for 'g++' and 'ld'.
A library must contain real program code and data. The way it is linked to the main executable (and other libraries) is through symbol tables which are parts of the libraries. A symbol table contains entries for global functions an data.
There is a slight difference in the structure of the shared and static libs. The former one is actually a pre-linked object, similar to an executable image with some extra info related to the symbols and relocation (such a library can be loaded at any address in the memory and still should work correctly). The static library is actually an archive of '.o' files, ready for a full-blown linking.
The usual steps to create a library is to compile multiple parts of your program into '.o' files which in turn could be linked in a shared library by 'ld' (or g++) or archived in .a with 'ar'. Afterwards you can use them for linking in a manner described above.
An object file (.o) is created one per a .cpp source file. The source file contains code and can include any number of header files, as 'stdio.h' in your case (or cstdio) or whatever. These files become a part of the source which is insured by the cpp preprocessor. The latter takes care of macros and flattening all the #include hierarchies so that the compiler sees only a single text stream which it converts into '.o'. In general header files should not contain executable code, but declarations and macros, though it is not always true. But it does not matter since they become welded with the main source file.
Hope this would explain it.
which type of linking is this?? so the stdio.h is statically or
dynamically linked to my project???
stdio.h is not linked, it is a header file, and contains code / text, no compiled objects.
The normal link process prefers the '.so' library over the '.a' archive when both are found in the same directory. Your simple command is linking with the .so (if that is in the correct path) or the .a (if that is found in a path with no .so equivalent).
To achieve static linking, you have several choices, including
1) copy the '.a' archive to a directory you create, then specify that
directory (-L)
2) specify the path to the '.a' in the build command. Boost example:
$(CC) $(CC_FLAGS) $< /usr/local/lib/libboost_chrono.a -o $# $(LIB_DIRs) $(LIB_NMs)
I have used both techniques, I find the first easier.
Note that archive code might refer to symbols in another archive. You can command the linker to search a library multiple times.
If you let the build link with the .so, this does not pull in a copy of the entire .so into the build. Instead, the .so (the entire lib) is loaded into memory (if not already there) at run-time, after the program starts. For most applications, this is considered a 'small' start-up performance hit as the program adjusts its memory map (auto-magically behind the scenes) Note that the app itself can control when to load the .so, called dynamic library.
Unrelated:
// If your C++ 'Hello World' has no class ... why bother?
#include <iostream>
class Hello_t {
public:
Hello_t() { std::cout << "\n Hello" << std::flush; }
~Hello_t() { std::cout << "World!" << std::endl; }
void operator() () { std::cout << " C++ "; }
};
int main(int, char**) { Hello_t()(); }

Is it possible to artificially induce object file extraction for a given static library?

I was recently reading this answer and noticed that it seems inconvenient for users to have to link static libraries in the correct order.
Is there some flag or #pragma I can pass to gcc when compiling my library so that my library's object files will always be included?
To be more specific, I want it to be case that the user can link my static library before another library that depends on it, and not wind up with unresolved symbols, in the same way that object files which are explicitly specified on the linker line are.
In particular, I am looking for solutions where the user of the library does not need to do anything special, and merely needs to pass -lMylibrary the way they would link any other library.
Is there some flag or #pragma I can pass to gcc
No.
I want it to be case that the user can link my static library before another library that depends on it, and not wind up with unresolved symbols, in the same way that object files which are explicitly specified on the linker line are.
Ship your "library" as a single object file. In other words, instead of:
ar ru libMyLibrary.a ${OBJS}
use:
ld -r -o libMyLibrary.a ${OBJS}
In particular, I am looking for solutions where the user of the library does not need to do anything special, and merely needs to pass -lMylibrary the way they would link any other library.
You can name your object file libMyLibrary.a. I believe the linker will search for it using the usual rules, but when it finds it, it will discover that this is an object file, and treat it as such, despite it being "misnamed". This should work at least on Linux and other ELF platforms. I am not sure whether it will work on Windows.

Shared libraries and .h files

I have some doubt about how do programs use shared library.
When I build a shared library ( with -shared -fPIC switches) I make some functions available from an external program.
Usually I do a dlopen() to load the library and then dlsym() to link the said functions to some function pointers.
This approach does not involve including any .h file.
Is there a way to avoid doing dlopen() & dlsym() and just including the .h of the shared library?
I guess this may be how c++ programs uses code stored in system shared library. ie just including stdlib.h etc.
Nick, I think all the other answers are actually answering your question, which is how you link libraries, but the way you phrase your question suggests you have a misunderstanding of the difference between headers files and libraries. They are not the same. You need both, and they are not doing the same thing.
Building an executable has two main phases, compilation (which turns your source into an intermediate form, containing executable binary instructions, but is not a runnable program), and linking (which combines these intermediate files into a single running executable or library).
When you do gcc -c program.c, you are compiling, and you generate program.o. This step is where headers matter. You need to #include <stdlib.h> in program.c to (for example) use malloc and free. (Similarly you need #include <dlfcn.h> for dlopen and dlsym.) If you don't do that the compiler will complain that it doesn't know what those names are, and halt with an error. But if you do #include the header the compiler does not insert the code for the function you call into program.o. It merely inserts a reference to them. The reason is to avoid duplication of code: The code is only going to need to be accessed once by every part of your program, so if you needed further files (module1.c, module2.c and so on), even if they all used malloc you would merely end up with many references to a single copy of malloc. That single copy is present in the standard library in either it's shared or static form (libc.so or libc.a) but these are not referenced in your source, and the compiler is not aware of them.
The linker is. In the linking phase you do gcc -o program program.o. The linker will then search all libraries you pass it on the command line and find the single definition of all functions you've called which are not defined in your own code. That is what the -l does (as the others have explained): tell the linker the list of libraries you need to use. Their names often have little to do with the headers you used in the previous step. For example to get use of dlsym you need libdl.so or libdl.a, so your command-line would be gcc -o program program.o -ldl. To use malloc or most of the functions in the std*.h headers you need libc, but because that library is used by every C program it is automatically linked (as if you had done -lc).
Sorry if I'm going into a lot of detail but if you don't know the difference you will want to. It's very hard to make sense of how C compilation works if you don't.
One last thing: dlopen and dlsym are not the normal method of linking. They are used for special cases where you want to dynamically determine what behavior you want based on information that is, for whatever reason, only available at runtime. If you know what functions you want to call at compile time (true in 99% of the cases) you do not need to use the dl* functions.
You can link shared libraries like static one. They are then searched for when launching the program. As a matter of fact, by default -lXXX will prefer libXXX.so to libXXX.a.
You need to give the linker the proper instructions to link your shared library.
The shared library names are like libNAME.so, so for linking you should use -lNAME
Call it libmysharedlib.so and then link your main program as:
gcc -o myprogram myprogram.c -lmysharedlib
If you use CMake to build your project, you can use
TARGET_LINK_LIBRARIES(targetname libraryname)
As in:
TARGET_LINK_LIBRARIES(myprogram mylibrary)
To create the library "mylibrary", you can use
ADD_LIBRARY(targetname sourceslist)
As in:
ADD_LIBRARY(mylibrary ${mylibrary_SRCS})
Additionally, this method is cross-platform (whereas simply passing flags to gcc is not).
Shared libraries (.so) are object files where the actual source code of function/class/... are stored (in binary)
Header files (.h) are files indicating (the reference) where the compiler can find function/class/... (in .so) that are required by the main code
Therefore, you need both of them.