static and dynamic linking using gcc - c++

I've been recently reading about static and dynamic linking and I understood the differences and how to create static and dynamic library and link it to my project
But, a question came to my mind that I couldn't answer or find answer for it as It's a specific question ... when I compile my code on linux using the line
#include <stdio.h>
int main()
{
printf("hello, world!\n");
}
compiling using this command
[root#host ~]# gcc helloworld.c -o helloworld
which type of linking is this??
so the stdio.h is statically or dynamically linked to my project???

Libraries are mostly used as shared resources so, that several different programs can reuse the same pre-compiled code in some manner. Some libraries come as standard libraries which are delivered with the operating system and/or the compiler package. Some libraries come with other third party projects.
When you run just gcc in the manner of your example, you really run a compiler driver which provides you with few compilation-related functions, calling different parts of the compilation process and finally linking your application with a few standard libraries. The type of the libraries is chosen based on the qualifiers you provide. By default it will try to find dynamic (shared) libraries and if missing will attempt for static. Unless you tell it to use static libs only (-static).
When you link to project libraries you tell the gcc/g++ which libraries to use in a manner (-lname). In such a way it will do the same as with the standard libraries, looking for '.so' first and '.a' second, unless -static is used. You can directly specify the path to the full library name as well, actually telling it which library to use. There are several other qualifiers which control the linking process, please look man for 'g++' and 'ld'.
A library must contain real program code and data. The way it is linked to the main executable (and other libraries) is through symbol tables which are parts of the libraries. A symbol table contains entries for global functions an data.
There is a slight difference in the structure of the shared and static libs. The former one is actually a pre-linked object, similar to an executable image with some extra info related to the symbols and relocation (such a library can be loaded at any address in the memory and still should work correctly). The static library is actually an archive of '.o' files, ready for a full-blown linking.
The usual steps to create a library is to compile multiple parts of your program into '.o' files which in turn could be linked in a shared library by 'ld' (or g++) or archived in .a with 'ar'. Afterwards you can use them for linking in a manner described above.
An object file (.o) is created one per a .cpp source file. The source file contains code and can include any number of header files, as 'stdio.h' in your case (or cstdio) or whatever. These files become a part of the source which is insured by the cpp preprocessor. The latter takes care of macros and flattening all the #include hierarchies so that the compiler sees only a single text stream which it converts into '.o'. In general header files should not contain executable code, but declarations and macros, though it is not always true. But it does not matter since they become welded with the main source file.
Hope this would explain it.

which type of linking is this?? so the stdio.h is statically or
dynamically linked to my project???
stdio.h is not linked, it is a header file, and contains code / text, no compiled objects.
The normal link process prefers the '.so' library over the '.a' archive when both are found in the same directory. Your simple command is linking with the .so (if that is in the correct path) or the .a (if that is found in a path with no .so equivalent).
To achieve static linking, you have several choices, including
1) copy the '.a' archive to a directory you create, then specify that
directory (-L)
2) specify the path to the '.a' in the build command. Boost example:
$(CC) $(CC_FLAGS) $< /usr/local/lib/libboost_chrono.a -o $# $(LIB_DIRs) $(LIB_NMs)
I have used both techniques, I find the first easier.
Note that archive code might refer to symbols in another archive. You can command the linker to search a library multiple times.
If you let the build link with the .so, this does not pull in a copy of the entire .so into the build. Instead, the .so (the entire lib) is loaded into memory (if not already there) at run-time, after the program starts. For most applications, this is considered a 'small' start-up performance hit as the program adjusts its memory map (auto-magically behind the scenes) Note that the app itself can control when to load the .so, called dynamic library.
Unrelated:
// If your C++ 'Hello World' has no class ... why bother?
#include <iostream>
class Hello_t {
public:
Hello_t() { std::cout << "\n Hello" << std::flush; }
~Hello_t() { std::cout << "World!" << std::endl; }
void operator() () { std::cout << " C++ "; }
};
int main(int, char**) { Hello_t()(); }

Related

How dynamic linking works, its usage and how and why you would make a dylib

I have read several posts on stack overflow and read about dynamic linking online. And this is what I have taken away from all those readings -
Dynamic linking is an optimization technique that was employed to take full advantage of the virtual memory of the system. One process can share its pages with other processes. For example the libc++ needs to be linked with all C++ programs but instead of copying over the executable to every process, it can be linked dynamically with many processes via shared virtual pages.
However this leads me to the following questions
When a C++ program is compiled. It needs to have references to the C++ library functions and code (say for example the code of the thread library). How does the compiler make the executable have these references? Does this not result in a circular dependency between the compiler and the operating system? Since the compiler has to make a reference to the dynamic library in the executable.
How and when would you use a dynamic library? How do you make one? What is the specific compiling command that is used to produce such a file from a standard *.cpp file?
Usually when I install a library, there is a lib/ directory with *.a files and *.dylib (on mac-OSX) files. How do I know which ones to link to statically as I would with a regular *.o file and which ones are supposed to be dynamically linked with? I am assuming the *.dylib files are dynamic libraries. Which compiler flag would one use to link to these?
What are the -L and -l flags for? What does it mean to specify for example a -lusb flag on the command line?
If you feel like this question is asking too many things at once, please let me know. I would be completely ok with splitting this question up into multiple ones. I just ask them together because I feel like the answer to one question leads to another.
When a C++ program is compiled. It needs to have references to the C++
library functions and code (say for example the code for the library).
Assume we have a hypothetical shared library called libdyno.so. You'll eventually be able to peek inside it using using objdump or nm.
objdump --syms libdyno.so
You can do this today on your system with any shared library. objdump on a MAC is called gobjdump and comes with brew in the binutils package. Try this on a mac...
gobjdump --syms /usr/lib/libz.dylib
You can now see that the symbols are contained in the shared object. When you link with the shared object you typically use something like
g++ -Wall -g -pedantic -ldyno DynoLib_main.cpp -o dyno_main
Note the -ldyno in that command. This is telling the compiler (really the linker ld) to look for a shared object file called libdyno.so wherever it normally looks for them. Once it finds that object it can then find the symbols it needs. There's no circular dependency because you the developer asked for the dynamic library to be loaded by specifying the -l flag.
How and when would you use a dynamic library? How do you make one? As in what
is the specific compiling command that is used to produce such a file from a
standard .cpp file
Create a file called DynoLib.cpp
#include "DynoLib.h"
DynamicLib::DynamicLib() {}
int DynamicLib::square(int a) {
return a * a;
}
Create a file called DynoLib.h
#ifndef DYNOLIB_H
#define DYNOLIB_H
class DynamicLib {
public:
DynamicLib();
int square(int a);
};
#endif
Compile them to be a shared library as follows. This is linux specific...
g++ -Wall -g -pedantic -shared -std=c++11 DynoLib.cpp -o libdyno.so
You can now inspect this object using the command I gave earlier ie
objdump --syms libdyno.so
Now create a file called DynoLib_main.cpp that will be linked with libdyno.so and use the function we just defined in it.
#include "DynoLib.h"
#include <iostream>
using namespace std;
int main(void) {
DynamicLib *lib = new DynamicLib();
std::cout << "Square " << lib->square(1729) << std::endl;
return 1;
}
Compile it as follows
g++ -Wall -g -pedantic -L. -ldyno DynoLib_main.cpp -o dyno_main
./dyno_main
Square 2989441
You can also have a look at the main binary using nm. In the following I'm seeing if there is anything with the string square in it ie is the symbol I need from libdyno.so in any way referenced in my binary.
nm dyno_runner |grep square
U _ZN10DynamicLib6squareEi
The answer is yes. The uppercase U means undefined but this is the symbol name for our square method in the DynamicLib Class that we created earlier. The odd looking name is due to name mangling which is it's own topic.
How do I know which ones to link to statically as I would with a regular
.o file and which ones are supposed to be dynamically linked with?
You don't need to know. You specify what you want to link with and let the compiler (and linker etc) do the work. Note the -l flag names the library and the -L tells it where to look. There's a decent write up on how the compiler finds thing here
gcc Linkage option -L: Alternative ways how to specify the path to the dynamic library
Or have a look at man ld.
What are the -L and -l flags for? What does it mean to specify
for example a -lusb flag on the command line?
See the above link. This is from man ld..
-L searchdir
Add path searchdir to the list of paths that ld will search for
archive libraries and ld control scripts. You may use this option any
number of times. The directories are searched in the order in which
they are specified on the command line. Directories specified on the
command line are searched before the default directories. All -L
options apply to all -l options, regardless of the order in which the
options appear. -L options do not affect how ld searches for a linker
script unless -T option is specified.`
If you managed to get here it pays dividends to learn about the linker ie ld. It plays an important job and is the source of a ton of confusion because most people start out dealing with a compiler and think that compiler == linker and this is not true.
The main difference is that you include static linked libraries with your app. They are linked when you build your app. Dynamic libraries are linked at run time, so you do not need to include them with your app. These days dynamic libraries are used to reduce the size of apps by having many dynamic libraries on everyone's computer.
Dynamic libraries also allow users to update libraries without re-building the client apps. If a bug is found in a library that you use in your app and it is statically linked, you will have to rebuild your app and re-issue it to all your users. If a bug is found in a dynamically linked library, all your users just need to update their libraries and your app does not need an update.

How library classes are instantiated

I'm going to ask how it's done in c++, but this idea can apply to multiple languages. If you know how to do it in objective-c as well, please provide any similarities between the two
Lets say I want to create an instance of an ofstream like
ofstream myfile;
I'm assuming all I have on my computer is the *.o file (in a library archive) and the *.h file for iostream class. If this part isn't true let me know. I am assuming this when all I have installed is the runtime and the devel packages, not the source files.
How does it connect the header file to the object file, is there a naming scheme. And where does it look and in what order.?
Why this is confusing me is normally when I want to create a class I link my implementation of the class with the program, so where does it now and how does it now to link the files?
One more, does it matter if it loaded statically or dynamically?
Thank you in advance, and sry if this is a silly question.
Computer Science 101:
Broadly speaking (VERY broadly!), there are two kinds of "programs":
a) Interpreted: you read the program source line-by-line every time you execute it
<= *nix shell scripts and DOS .bat files are "interpeted"
b) Compiled: you read the source once (to convert it into a "binary machine code"). You link the machine code "object files" to build an "executable program".
You're talking about "compiled programs"
The "ofstream" part is irrelevant once the program is "compiled"
The binary implementation for "ofstream" can be compiled directly into the executable, or it can be dynamically loaded from a shared library (.dll) at runtime.
A "compiler" users ".h" headers to process the source file.
A "linker" uses ".lib" libraries to match symbols and link static code at link type.
The "Operating System" recognizes dynamic links and loads the needed shared libraries (.dll's) at runtime.
Three different things, all independent of each other: Compiler/source code, Linker/machine object code, OS/executable programs
'Hope that helps .. a bit...
This is not standardized and it's up to the implementation. I don't know about *unix, but I assume it's fairly similar to Windows.
You can assume that .o files are similar to library files .lib.
The header does define the class definition, so that the linker knows what to look for in the library.
Say you have a header:
class A
{
public:
A();
void foo();
};
and a lib file A.lib.
You include that header and call:
A a;
a.foo();
The compiler finds the declarations for bot A() and A::foo(). Now it knows it has to search the library for these functions. Names in the library are decorated, and contain modifiers, but its specific to the compiler so the linker finds the functions if they are exported in the library. It then binds the functions to the specific entry point from the dll.
If by dynamic loading you mean using LoadModule() and GetProcAddress() instead of linking, than the concept is pretty similar.
If you do static linking all symbols with linkage are available in the .obj file. The linker binds the calls of the functions to the entry points of the functions. There is a name mangeling involved in this process so that the symbols can be resolved correctly.
Dynamic linking is a platform dependent issue and not part of the C or C++ standard as far as I know.

Shared libraries and .h files

I have some doubt about how do programs use shared library.
When I build a shared library ( with -shared -fPIC switches) I make some functions available from an external program.
Usually I do a dlopen() to load the library and then dlsym() to link the said functions to some function pointers.
This approach does not involve including any .h file.
Is there a way to avoid doing dlopen() & dlsym() and just including the .h of the shared library?
I guess this may be how c++ programs uses code stored in system shared library. ie just including stdlib.h etc.
Nick, I think all the other answers are actually answering your question, which is how you link libraries, but the way you phrase your question suggests you have a misunderstanding of the difference between headers files and libraries. They are not the same. You need both, and they are not doing the same thing.
Building an executable has two main phases, compilation (which turns your source into an intermediate form, containing executable binary instructions, but is not a runnable program), and linking (which combines these intermediate files into a single running executable or library).
When you do gcc -c program.c, you are compiling, and you generate program.o. This step is where headers matter. You need to #include <stdlib.h> in program.c to (for example) use malloc and free. (Similarly you need #include <dlfcn.h> for dlopen and dlsym.) If you don't do that the compiler will complain that it doesn't know what those names are, and halt with an error. But if you do #include the header the compiler does not insert the code for the function you call into program.o. It merely inserts a reference to them. The reason is to avoid duplication of code: The code is only going to need to be accessed once by every part of your program, so if you needed further files (module1.c, module2.c and so on), even if they all used malloc you would merely end up with many references to a single copy of malloc. That single copy is present in the standard library in either it's shared or static form (libc.so or libc.a) but these are not referenced in your source, and the compiler is not aware of them.
The linker is. In the linking phase you do gcc -o program program.o. The linker will then search all libraries you pass it on the command line and find the single definition of all functions you've called which are not defined in your own code. That is what the -l does (as the others have explained): tell the linker the list of libraries you need to use. Their names often have little to do with the headers you used in the previous step. For example to get use of dlsym you need libdl.so or libdl.a, so your command-line would be gcc -o program program.o -ldl. To use malloc or most of the functions in the std*.h headers you need libc, but because that library is used by every C program it is automatically linked (as if you had done -lc).
Sorry if I'm going into a lot of detail but if you don't know the difference you will want to. It's very hard to make sense of how C compilation works if you don't.
One last thing: dlopen and dlsym are not the normal method of linking. They are used for special cases where you want to dynamically determine what behavior you want based on information that is, for whatever reason, only available at runtime. If you know what functions you want to call at compile time (true in 99% of the cases) you do not need to use the dl* functions.
You can link shared libraries like static one. They are then searched for when launching the program. As a matter of fact, by default -lXXX will prefer libXXX.so to libXXX.a.
You need to give the linker the proper instructions to link your shared library.
The shared library names are like libNAME.so, so for linking you should use -lNAME
Call it libmysharedlib.so and then link your main program as:
gcc -o myprogram myprogram.c -lmysharedlib
If you use CMake to build your project, you can use
TARGET_LINK_LIBRARIES(targetname libraryname)
As in:
TARGET_LINK_LIBRARIES(myprogram mylibrary)
To create the library "mylibrary", you can use
ADD_LIBRARY(targetname sourceslist)
As in:
ADD_LIBRARY(mylibrary ${mylibrary_SRCS})
Additionally, this method is cross-platform (whereas simply passing flags to gcc is not).
Shared libraries (.so) are object files where the actual source code of function/class/... are stored (in binary)
Header files (.h) are files indicating (the reference) where the compiler can find function/class/... (in .so) that are required by the main code
Therefore, you need both of them.

static link library

I am writing a hello world c++ application, in the instruction #include help the compiler or linker to import the c++ library. My " cout << "hello world"; " use a cout in the library. The question is after compile and generated exe is about 96k in size, so what instructions are actually contained in this exe file, does this file also contains the iostream library?
Thanks
In the general case, the linker will only bring in what it needs. Once the compiler phase has turned your source code into an object file, it's treated much the same as all other object files. You have:
the C start-up code which prepares the execution environment (sets up argv, argv and so on) then calls your main or equivalent.
your code itself.
whatever object files need to be dragged in from libraries (dynamic linking is a special case of linking that happens at runtime and I won't cover that here since you asked specifically about static linking).
The linker will include all the object files you explicitly specify (unless it's a particularly smart linker and can tell you're not using the object file).
With libraries, it's a little different. Basically, you start with a list of unresolved symbols (like cout). The linker will search all the object files in all the libraries you specify and, when it finds an object file that satisfies that symbol, it will drag it in and fix up the symbol references.
This may, of course, add even more unresolved symbols if, for example, there was something in the object file that relies on the C printf function (unlikely but possible).
The linker continues like this until all symbols are satisfied (when it gives you an executable) or one cannot be satisfied (when it complains to you bitterly about your coding practices).
So as to what is in your executable, it may be the entire iostream library or it may just be the minimum required to do what you asked. It will usually depend on how many object files the iostream library was built into.
I've seen code where an entire subsystem went into one object file so, that if you wanted to just use one tiny bit, you still got the lot. Alternatively, you can put every single function into its own object file and the linker will probably create an executable as small as possible.
There are options to the linker which can produce a link map which will show you how things are organised. You probably won't generally see it if you're using the IDE but it'll be buried deep within the compile-time options dialogs under MSVC.
And, in terms of your added comment, the code:
cout << "hello";
will quite possibly bring in sizeable chunks of both the iostream and string processing code.
Use cl /EHsc hello.cpp -link /MAP. The .map file generated will give you a rough idea which pieces of the static library are present in the .exe.
Some of the space is used by C++ startup code, and the portions of the static library that you use.
In windows, the library or part of the libraries (which are used) are also usually included in the .exe, the case is different in case of Linux. However, there are optimization options.
I guess this Wiki link will be useful : Static Libraries

Includes with the Linux GCC Linker

I don't understand how GCC works under Linux. In a source file, when I do a:
#include <math.h>
Does the compiler extract the appropriate binary code and insert it into the compiled executable OR does the compiler insert a reference to an external binary file (a-la Windows DLL?)
I guess a generic version of this question is: Is there an equivalent concept to Windows DLLs under *nix?
Well. When you include math.h the compiler will read the file that contains declarations of the functions and macros that can be used. If you call a function declared in that file (header), then the compiler inserts a call instruction into that place in your object file that will be made from the file you compile (let's call it test.c and the object file created test.o). It also adds an entry into the relocation table of that object-file:
Relocation section '.rel.text' at offset 0x308 contains 1 entries:
Offset Info Type Sym.Value Sym. Name
0000001c 00000902 R_386_PC32 00000000 bar
This would be a relocation entry for a function bar. An entry in the symbol table will be made noting the function is yet undefined:
9: 00000000 0 NOTYPE GLOBAL DEFAULT UND bar
When you link the test.o object file into a program, you need to link against the math library called libm.so . The so extension is similar to the .dll extension for windows. It means it is a shared object file. The compiler, when linking, will fix-up all the places that appear in the relocation table of test.o, replacing its entries with the proper address of the bar function. Depending on whether you use the shared version of the library or the static one (it's called libm.a then), the compiler will do that fix-up after compiling, or later, at runtime when you actually start your program. When finished, it will inject an entry in the table of shared libraries needed for that program. (can be shown with readelf -d ./test):
Dynamic section at offset 0x498 contains 22 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libm.so.6]
0x00000001 (NEEDED) Shared library: [libc.so.6]
... ... ...
Now, if you start your program, the dynamic linker will lookup that library, and will link that library to your executable image. In Linux, the program doing this is called ld.so. Static libraries don't have a place in the dynamic section, as they are just linked to the other object files and then they are forgotten about; they are part of the executable from then on.
In reality it is actually much more complex and i also don't understand this in detail. That's the rough plan, though.
There are several aspects involved here.
First, header files. The compiler simply includes the content of the file at the location where it was included, nothing more. As far as I know, GCC doesn't even treat standard header files differently (but I might be wrong there).
However, header files might actually not contain the implementation, only its declaration. If the implementation is located somewhere else, you've got to tell the compiler/linker that. By default, you do this by simply passing the appropriate library files to the compiler, or by passing a library name. For example, the following two are equivalent (provided that libcurl.a resides in a directory where it can be found by the linker):
gcc codefile.c -lcurl
gcc codefile.c /path/to/libcurl.a
This tells the link editor (“linker”) to link your code file against the implementation of the static library libcurl.a (the compiler gcc actually ignores these arguments because it doesn't know what to do with them, and simply passes them on to the linker). However, this is called static linking. There's also dynamic linking, which takes place at startup of your program, and which happens with .dlls under Windows (whereas static libraries correspond to .lib files on Windows). Dynamic library files under Linux usually have the file extension .so.
The best way to learn more about these files is to familiarize yourself with the GCC linker, ld, as well as the excellent toolset binutils, with which you can edit/view library files effortlessly (any binary code files, really).
Is there an equivalent concept to Windows DLLs under *nix?
Yes they are called "Shared Objects" or .so files. They are dynamically linked into your binary at runtime. In linux you can use the "ldd" command on your executable to see which shared objects your binary is linked to. You can use ListDLLs from sysinternals to accomplish the same thing in windows.
The compiler is allowed to do whatever it pleases, as long as, in effect, it acts as if you'd included the file. (All the compilers I know of, including GCC, simply include a file called math.h.)
And no, it doesn't usually contain the function definitions itself. That's libm.so, a "shared object", similar to windows .DLLs. It should be on every system, as it is a companion of libc.so, the C runtime.
Edit: And that's why you have to pass -lm to the linker if you use math functions - it instructs it to link against libm.so.
There is. The include does a textual include of the header file (which is standard C/C++ behavior). What you're looking for is the linker . The -l argument to gcc/g++ tells the linker what library(ies) to add in. For math (libm.so), you'd use -lm. The common pattern is:
source file: #include <foo.h>
gcc/g++ command line: -lfoo
shared library: libfoo.so
math.h is a slight variation on this theme.