I commonly hear the term "to link against a library".
I'm new to compilers and thus linking, so I would like to understand this a bit more.
What does it mean to link against a library and when would not doing so cause a problem?
A library is an "archive" that contains already compiled code. Typically, you want to use a ready-made library to use some functionality that you don't want to implement on your own (e.g. decoding JPEGs, parsing XML, providing you GUI widgets, you name it).
Typically in C and C++ using a library goes like this: you #include some headers of the library that contain the function/class declarations - i.e. they tell the compiler that the symbols you need do exist somewhere, without actually providing their code. Whenever you use them, the compiler will place in the object file a placeholder, which says that that function call is to be resolved at link time, when the rest of the object modules will be available.
Then, at the moment of linking, you have to specify the actual library where the compiled code for the functions of the library is to be found; the linker then will link this compiled code with yours and produce the final executable (or, in the case of dynamic libraries, it will add the relevant information for the loader to perform the dynamic linking at runtime).
If you don't specify that the library is to be linked against, the linker will have unresolved references - i.e. it will see that some functions were declared, you used them in your code, but their implementation is nowhere to be found; this is the cause of the infamous "undefined reference errors".
Notice that all this process is identical to what normally happens when you compile a project that is made of multiple .cpp files: each .cpp is compiled independently (knowing of the functions defined in the others only via prototypes, typically written in .h files), and at the end everything is linked together to produce the final executable.
Related
I'am new to c++ programming and I'm a little confused about how the compiler includes standard libraries in c++ program. Say for example I want to use the sqrt() function. I know that I have to include the math.h header file in my source code, but the math library contains many functions other than sqrt(). So my question is are all this functions source code added to the program, whitch is unnecessary, or just the function that I need?
I hope my question was clear and thanks in advance.
Functions that are NOT templates (and not so trivial that they are just one or two lines) are compiled separately, and then stored in a "libary" (which is not the header file, it just contains double sqrt(double); or some such).
The compiler will (given the right compile-time flags) link to the C library that contains those functions. The linker [called upon by the compiler] will then introduce the code that was compiled when the library was built. So, typically, the source is not compiled when you build your program - it was done some other time.
The linker understands what functions are needed by the code you are building, so will only add those functions to your program, not ALL of the functions [but it may pull in some other functions than the precise one that you asked for, for example there may be some helper functions and perhaps some generic error handling functions that are needed by sqrt].
No, linking means that the linker figures out which symbols (functions and data objects) from your library are necessary to build your program, and then only includes these that are.
In fact, with dynamic linking, it wouldn't even include the function itself, but just the reference to the function and how to load the library containing it.
Generally, libraries that are linked with your executables aren't source code, but binary objects, which already have been translated to machine language ("compiled").
You have a confusion between libraries and header files. Libraries are the implementations. Header files contain the declarations.
You use #include for a library file so that the compiler can find the syntax and semantics of the function you use.
All the declarations (unless blocked by preprocessor directives), are parsed by the compiler and stored in a dictionary. The only issue about you not using a declaration is that it takes up room in the compiler's dictionary. Usually this is not an issue (modern compilers have large capacity dictionaries).
As far as adding functions to your program, that is handled during the Linking phase (usually by a linker application). This is compiler dependent. Fundamentally, only functions that are used by your program are pulled from the library (static libraries only) and placed into your executable. Some compiler may speed up the build process and include groups of functions that are popular, but you may not use. This speeds up the build processor but makes your executables larger.
Some library functions may use other library functions. This means that a library function may add a lot more code into your executable. One example is printf. The printf function requires a lot of support, more than puts, because of all the formatting specifiers. So the printf may include other (internal) library functions.
EDIT: I know about include guards, but include files are not the issue here. I'm talking about actual compiled and already linked code that gets baked into the static library.
I'm creating a general-purpose utility library for myself in C++.
One of the functions I'm creating, printFile, requires string, cout and other such members of the standard library.
I'm worried that when the library is compiled, and then linked to another project that also uses string and cout, the code for string and cout will be duplicated: it will both be prelinked in the library binary the program is being linked with, and it will be again linked with the project that uses them itself.
The library is structured like this:
There is one libname.hpp file the programmer who uses the library is supposed to #include in his projects.
For every function fname declared in libname.hpp, there is an file fname.cpp implementing it.
All fname.cpp files also #include "libname.hpp".
The library itself compiles into libname.a which is copied to /usr/lib/.
Will this even happen?
If yes, is it a problem at all?
If yes, then how can I avoid this?
I'm worried that when the library is compiled, and then linked to another project that also uses string and cout, the code for string and cout will be duplicated
Don't worry: no modern compilation system will do that. The code for template functions is emitted into object files, but the linker discards duplicate entries.
The library definitions of the standard C++ library won't show up in your own statically library unless you explicitly include them there (i.e., you extract object files from the standard C++ library and include them into your library). Static libraries are not linked at all and will just have undefined references to other libraries. A static library is merely a collection of object files defining the symbols provided by the library. The definitions which come from the headers, e.g., inline functions and template instantiations, will be defined in such a way that multiple definitions in multiple translation units won't conflict. Where the code isn't actually inlined, it will define "weak" symbols which result in duplicates being ignored or removed at link time.
The only real concern is that the libraries linked into an executable need to use compatible library definitions. With substantial amount of code residing in header files, there are relatively frequent changes to the C++ header files, including standard C++ library headers (relative to the C library headers which contain a lot less code).
Yes, the code for standard library things will be duplicated. It can be a problem if for example you return a std::string or take one as a parameter in one of your methods. It may have a different layout in your standard library implementation than in the user's.
This is rarely a problem in practice.
For static functions and inline templated functions defined in header files, there's nothing to worry about: every compilation unit gets its own copy (e.g. within the .a library there may already be many anonymous copies). This is okay because these definitions aren't exported, so the linker doesn't need to worry about them.
For functions that are declared with non-static linkage, whether you have an issue depends on how you link the .a library.
When you build the library, you typically will not link in the standard C++ library. The created library will contain undefined references to the standard C++ library. These must be resolved before building the final executable binary. This is normally done automatically when linking that final binary in the default way (depending on the compiler).
There are times when people do link in the standard C++ library into a static library. If you're linking against multiple static libraries that each embed another library (like the standard C++ library), then expect trouble if there are any differences in those embedded libraries. Fortunately, this is a rare problem, at least with the gcc toolchain. It's a more frequent problem with Microsoft's tools.
In some cases, a workaround is to make one or more conflicting static libraries into a dynamic library. This way each of these dynamic libraries can statically link its own copy of the problematic library. As long as the dynamic library doesn't export the symbols from the problematic library and there are no memory layout incompatibilities, there generally isn't any trouble.
according to the book im reading it says:
After examining a program syntax the C++ compiler creates .obj file. Next the compiler calls the linker that combines program statements inside your .obj files with some functions such as printf().
Are functions not part of .obj file? Are they not statements?
Does the linker have a connection with the terms "static linking" and "dynamic linking"?
I know that dynamic linking is resolved at runtime, but according to the book the linker is called at compile time.
Functions which are defined in your .cpp are present in the corresponding .obj. Functions which are used but not defined (such as standard library functions like printf) aren't part of it. The linker solve the references with other .obj and libraries.
static libraries are just a collection of .obj and the linker take the .obj which provides needed symbols and put them in the executable;
dynamic libraries aren't put in the executable; the executable is marked as referencing them and they are found back when the executable starts. (At least in their main use, they may also be used for plugins and then they are searched when the process asks for them).
Well technically there's really no such thing as "dynamic linking" as something done by the linker. There's really only manually binding to a piece of code at run time, which really has nothing to do with the linker.
For example, under Windows there's a few ways of dealing with a dll
The lowest level solution is to use LoadLibrary or AfxLoadLibrary to manually access the function by name, casting them to a function pointer of the appropriate type.
You can use an import lib. This allows the linker to resolve functions in other dlls at link time. So you can directly call a function in the dll (ie just by saying Foo() in client code). However, those functions are simply wrappers for the LoadLibrary method mentioned above. They load the dll if not loaded, directly access a function pointer in that library, then execute that function.
Pretty much title sums it up.
I'm not sure the difference between the two if i'd like to use a library.
Thanks!
In general, you need both.
Include files contain declarations of types, prototypes of functions, inline functions, #defines, ..., in general every information about the library the compiler needs to be aware of when compiling your files.
Static libraries, instead, contain the actual object code of the functions of the library. If the headers contain the prototypes, the static libraries contain the (compiled) definitions of the functions, i.e. the object modules that the linker will link with yours.
If you only included the header file without linking against the static library, the linker would complain about missing definitions, because you would be using functions declared in the header, but not defined anywhere (i.e. with no implementation). On the other hand, if you only linked the static library without providing the header, the compiler would complain about unknown identifiers, since it wouldn't have a clue about the library symbols you're using.
The concept is very similar to when you compile a multi-file project: to access the definitions written in other .cpp you need to include just a header with their declarations, and the linker in the end links together the various object modules.
As far as dlls are concerned, usually an import library is provided; import libraries are like static libraries, but, instead of containing all the code of the library, they contain small stubs that call the functions into the dll. Every time a call to a library function is encountered in one of your object modules, the linker directs it to the stub, which in turn redirects it to the code into the dll1. All in all, when dealing with dlls on Windows you usually have a .h (prototypes/...), a .lib (import library you link against, contains the stubs) and a .dll (dynamic-linking library containing the actual code of the library).
By the way, some libraries are "header only" (you can find many in boost), which means that all their code is put into a header, so no static library is needed. Such libraries are often just made of inline code (functions/classes/...) and templates, for which no separate definition is needed.
Often this is done because static libraries are ugly beasts for several reasons:
you have to explicitly link against them;
since they are linked directly to your code, they have to use exactly your same C/C++ runtime library, which means that, at least on Windows, it's impractical to distribute static libraries (different compilers, different compiler versions, different configurations of the same compiler use different standard libraries, distributing a static library for every combination of these aspects would be impractical at least);
because of this, in general you have to first compile on your own version of the static library, and only then link against it.
Compare all this with just including a header file... :)
Actually, modern toolchains can recognize these stubs and avoid the extra indirection step. See this series by Raymond Chen for details.
The compiler needs to know the include directories, since it needs to include header (interface) files of libraries you want to use.
The linker needs to know the library directories, since it needs to link your executable to the (precompiled) implementation of the library.
See also What are the differences between a compiler and a linker?
Include directories are just for header files, which typically provide function/method signatures only. You need to link to a library to have access to its actual object code.
See this question.
I have a C++ executable and I'm dynamically linking against several libraries (Boost, Xerces-c and custom libs).
I understand why I would require the .lib/.a files if I choose to statically link against these libraries (relevant SO question here). However, why do I need to provide the corresponding .lib/.so library files when linking my executable if I'm dynamically linking against these external libraries?
The compiler isn't aware of dynamic linking, it just knows that a function exists via its prototype. The linker needs the lib files to resolve the symbol. The lib for a DLL contains additional information like what DLL the functions live in and how they are exported (by name, by ordinal, etc.) The lib files for DLL's contain much less information than lib files that contain the full object code - libcmmt.lib on my system is 19.2 MB, but msvcrt.lib is "only" 2.6 MB.
Note that this compile/link model is nearly 40 years old at this point, and predates dynamic linking on most platforms. If it were designed today, dynamic linking would be a first class citizen (for instance, in .NET, each assembly has rich metadata describing exactly what it exports, so you don't need separate headers and libs.)
Raymond Chen wrote a couple blog entries about this specific to Windows. Start with The classical model for linking and then follow-up with Why do we have import libraries anyway?.
To summarize, history has defined the compiler as the component that knows about detailed type information, whereas the linker only knows about symbol names. So the linker ends up creating the .DLL without type information, and therefore programs that want to link with it need some sort of metadata to tell it about how the functions are exported and what parameter types they take and return.
The reason .DLLs don't have all the information you need to link with them directly is is historic, and not a technical limitation.
For one thing, the linker inserts the versions of the libraries that exist at link time so that you have some chance of your program working if library versions are updated. Multiple versions of shared libraries can exist on a system.
The linker has the job of validating that all your undefined symbols are accounted for, either with static content or dynamic content.
By default, then, it insists on all your symbols being present.
However, that's just the default. See -z, and --allow-shlib-undefined, and friends.
Perhaps this dynamic linking is done via import libraries (function has __declspec(dllimport) before definition).
If this is the way than compilator expects that there's __imp_symbol function declared and this function is responsible for forwarding call to the right library dynamically loaded.
Those functions are generated during linkage of symbols with __declspec(dllimport) keyword
Here is a very SIMPLIFIED description that may help. Static linking puts all of the code needed to run your program into the executable so everything is found. Dynamic linking means some of the required code does not get put into the executable and will be found at runtime. Where do I find it? Is function x() there? How do I make a call to function x()? That is what the library tells the linker when you are dynamically linking.