Are functions not part of .obj file? - c++

according to the book im reading it says:
After examining a program syntax the C++ compiler creates .obj file. Next the compiler calls the linker that combines program statements inside your .obj files with some functions such as printf().
Are functions not part of .obj file? Are they not statements?
Does the linker have a connection with the terms "static linking" and "dynamic linking"?
I know that dynamic linking is resolved at runtime, but according to the book the linker is called at compile time.

Functions which are defined in your .cpp are present in the corresponding .obj. Functions which are used but not defined (such as standard library functions like printf) aren't part of it. The linker solve the references with other .obj and libraries.
static libraries are just a collection of .obj and the linker take the .obj which provides needed symbols and put them in the executable;
dynamic libraries aren't put in the executable; the executable is marked as referencing them and they are found back when the executable starts. (At least in their main use, they may also be used for plugins and then they are searched when the process asks for them).

Well technically there's really no such thing as "dynamic linking" as something done by the linker. There's really only manually binding to a piece of code at run time, which really has nothing to do with the linker.
For example, under Windows there's a few ways of dealing with a dll
The lowest level solution is to use LoadLibrary or AfxLoadLibrary to manually access the function by name, casting them to a function pointer of the appropriate type.
You can use an import lib. This allows the linker to resolve functions in other dlls at link time. So you can directly call a function in the dll (ie just by saying Foo() in client code). However, those functions are simply wrappers for the LoadLibrary method mentioned above. They load the dll if not loaded, directly access a function pointer in that library, then execute that function.

Related

Confused about how the compiler includes standard libraries in c++ program

I'am new to c++ programming and I'm a little confused about how the compiler includes standard libraries in c++ program. Say for example I want to use the sqrt() function. I know that I have to include the math.h header file in my source code, but the math library contains many functions other than sqrt(). So my question is are all this functions source code added to the program, whitch is unnecessary, or just the function that I need?
I hope my question was clear and thanks in advance.
Functions that are NOT templates (and not so trivial that they are just one or two lines) are compiled separately, and then stored in a "libary" (which is not the header file, it just contains double sqrt(double); or some such).
The compiler will (given the right compile-time flags) link to the C library that contains those functions. The linker [called upon by the compiler] will then introduce the code that was compiled when the library was built. So, typically, the source is not compiled when you build your program - it was done some other time.
The linker understands what functions are needed by the code you are building, so will only add those functions to your program, not ALL of the functions [but it may pull in some other functions than the precise one that you asked for, for example there may be some helper functions and perhaps some generic error handling functions that are needed by sqrt].
No, linking means that the linker figures out which symbols (functions and data objects) from your library are necessary to build your program, and then only includes these that are.
In fact, with dynamic linking, it wouldn't even include the function itself, but just the reference to the function and how to load the library containing it.
Generally, libraries that are linked with your executables aren't source code, but binary objects, which already have been translated to machine language ("compiled").
You have a confusion between libraries and header files. Libraries are the implementations. Header files contain the declarations.
You use #include for a library file so that the compiler can find the syntax and semantics of the function you use.
All the declarations (unless blocked by preprocessor directives), are parsed by the compiler and stored in a dictionary. The only issue about you not using a declaration is that it takes up room in the compiler's dictionary. Usually this is not an issue (modern compilers have large capacity dictionaries).
As far as adding functions to your program, that is handled during the Linking phase (usually by a linker application). This is compiler dependent. Fundamentally, only functions that are used by your program are pulled from the library (static libraries only) and placed into your executable. Some compiler may speed up the build process and include groups of functions that are popular, but you may not use. This speeds up the build processor but makes your executables larger.
Some library functions may use other library functions. This means that a library function may add a lot more code into your executable. One example is printf. The printf function requires a lot of support, more than puts, because of all the formatting specifiers. So the printf may include other (internal) library functions.

What does it mean to link against something?

I commonly hear the term "to link against a library".
I'm new to compilers and thus linking, so I would like to understand this a bit more.
What does it mean to link against a library and when would not doing so cause a problem?
A library is an "archive" that contains already compiled code. Typically, you want to use a ready-made library to use some functionality that you don't want to implement on your own (e.g. decoding JPEGs, parsing XML, providing you GUI widgets, you name it).
Typically in C and C++ using a library goes like this: you #include some headers of the library that contain the function/class declarations - i.e. they tell the compiler that the symbols you need do exist somewhere, without actually providing their code. Whenever you use them, the compiler will place in the object file a placeholder, which says that that function call is to be resolved at link time, when the rest of the object modules will be available.
Then, at the moment of linking, you have to specify the actual library where the compiled code for the functions of the library is to be found; the linker then will link this compiled code with yours and produce the final executable (or, in the case of dynamic libraries, it will add the relevant information for the loader to perform the dynamic linking at runtime).
If you don't specify that the library is to be linked against, the linker will have unresolved references - i.e. it will see that some functions were declared, you used them in your code, but their implementation is nowhere to be found; this is the cause of the infamous "undefined reference errors".
Notice that all this process is identical to what normally happens when you compile a project that is made of multiple .cpp files: each .cpp is compiled independently (knowing of the functions defined in the others only via prototypes, typically written in .h files), and at the end everything is linked together to produce the final executable.

C/C++ How Does Dynamic Linking Work On Different Platforms?

How does dynamic linking work generally?
On Windows (LoadLibrary), you need a .dll to call at runtime, but at link time, you need to provide a corresponding .lib file or the program won't link... What does the .lib file contain? A description of the .dll methods? Isn't that what the headers contain?
Relatedly, on *nix, you don't need a lib file... How how does the compiler know that the methods described in the header will be available at runtime?
As a newbie, when you think about either one of the two schemes, then the other, neither of them make sense...
To answer your questions one by one:
Dynamic linking defers part of the linking process to runtime.
It can be used in two ways: implicitly and explicitly.
Implicitly, the static linker will insert information into the
executable which will cause the library to load and resolve the
necessary symbols. Explicitly, you must call LoadLibrary or
dlopen manually, and then GetProcAddress/dlsym for each
symbol you need to use. Implicit loading is used for things
like the system library, where the implementation will depend on
the version of the system, but the interface is guaranteed.
Explicit loading is used for things like plug-ins, where the
library to be loaded will be determined at runtime.
The .lib file is only necessary for implicit loading. It
contains the information that the library actually provides this
symbol, so the linker won't complain that the symbol is
undefined, and it tells the linker in what library the symbols
are located, so it can insert the necessary information to cause
this library to automatically be loaded. All the header files
tell the compiler is that the symbols will exist, somewhere; the
linker needs the .lib to know where.
Under Unix, all of the information is extracted from the
.so. Why Windows requires two separate files, rather than
putting all of the information in one file, I don't know; it's
actually duplicating most of the information, since the
information needed in the .lib is also needed in the .dll.
(Perhaps licensing issues. You can distribute your program with
the .dll, but no one can link against the libraries unless
they have a .lib.)
The main thing to retain is that if you want implicit loading,
you have to provide the linker with the appropriate information,
either with a .lib or a .so file, so that it can insert that
information into the executable. And that if you want explicit
loading, you can't refer to any of the symbols in the library
directly; you have to call GetProcAddress/dlsym to get their
addresses yourself (and do some funny casting to use them).
The .lib file on Windows is not required for loading a dynamic library, it merely offers a convenient way of doing so.
In principle, you can use LoadLibrary for loading the dll and then use GetProcAddress for accessing functions provided by that dll. The compilation of the enclosing program does not need to access the dll in that case, it is only needed at runtime (ie. when LoadLibrary actually executes). MSDN has a code example.
The disadvantage here is that you need to manually write code for loading the functions from the dll. In case you compiled the dll yourself in the first place, this code simply duplicates knowledge that the compiler could have extracted from the dll source code automatically (like the names and signatures of exported functions).
This is what the .lib file does: It contains the GetProcAddress calls for the Dlls exported functions, generated by the compiler so you don't have to worry about it. In Windows terms, this is called Load-Time Dynamic Linking, since the Dll is loaded automatically by the code from the .lib file when your enclosing program is loaded (as opposed to the manual approach, referred to as run-time dynamic linking).
How does dynamic linking work generally?
The dynamic link library (aka shared object) file contains machine code instructions and data, along with a table of metadata saying which offsets in that code/data relate to which "symbols", the type of the symbol (e.g. function vs data), the number of bytes or words in the data, and a few other things. Different OS will tend to have different shared object file formats, and indeed the same OS may support several, but that's the gist of it.
So, imagine the shared library's a big chunk of bytes with an index like this:
SYMBOL ADDRESS TYPE SIZE
my_function 1000 function 2893
my_number 4800 variable 4
In general, the exact type of the symbols need not be captured in the metadata table - it's expected that declarations in the library's header files contain all the missing information. C++ is a bit special - compared to say C - because overloading can mean there are several functions with the same name, and namespaces allow for further symbols that would otherwise be ambiguously named - for that reason name mangling is typically used to concatenate some representation of the namespace and function arguments to the function name, forming something that can be unique in the library object file.
A program wanting to use the shared object can generally do one of two things:
have the OS load both itself and the shared object around the same time (before executing main()), with the OS Loader responsible for finding the symbols and examining metadata in the program file image about the use of those symbols, then patching in symbol addresses in the memory the program uses, such that the program can then just run and work functionally as if it'd known about the symbol addresses when it was first compiled (but perhaps a little slower)
or, explicitly in its own source code call dlopen sometime after main runs, then use dlsym or similar to get the symbol addresses, save them into (function/data) pointers based on the programmer's knowledge of the expected data types, then call them explicitly using the pointers.
On Windows (LoadLibrary), you need a .dll to call at runtime, but at link time, you need to provide a corresponding .lib file or the program won't link...
That doesn't sound right. Should be one or the other I'd think.
Wtf does the .lib file contain? A description of the .dll methods? Isn't that what the headers contain?
A lib file is - at this level of description - pretty much the same as a shared object file... the main difference is that the compiler's finding the symbol addresses before the program's shipped and run.
Modern *nix systems derive process of dynamic linking from Solaris OS. Linux, particularly, doesn't need separate .lib file because all external dependencies are contained in ELF format. .interp section of ELF file indicates that there are external symbols inside this executable that needed to be resolved dynamically. This comes for dynamic linking.
There is a way to handle dynamic linking in user space. This method is called dynamic loading. This is when you are using system calls to get function pointers to methods from external *.so.
More information can be found from this article http://www.ibm.com/developerworks/library/l-dynamic-libraries/.
Relatedly, on OS X (and I assume *nix... dlopen), you don't need a lib file... How how does the compiler know that the methods described in the header will be available at runtime?
Compilers or linkers do not need such information. You, the programmer, need to handle the situation that the shared libraries you try to open by dlopen() may not exist.
You can use a DLL file in Windows in two ways: Either you link with it, and you're done, nothing more to do. Or you load it dynamically during run-time.
If you link with it, then the DLL library file is used. The link-library contains information that the linker uses to actually know which DLL to load and where in the DLL functions are, so it can call them. When your program is loaded, the operating system also loads the DLL for you, basically what is does it call LoadLibrary for you.
In other operating systems (like OS X and Linux) it works in a similar way. The difference is that on these systems the linker can look directly at the dynamic library (the .so/.dynlib file) and figure out what's needed without a separate static library like on Windows.
To load a library dynamically, you don't need to link with anything related to the library you want to load.
Like others already said: what is included in a .lib file on Windows is included directly in the .so/.dynlib on Linux/OS X. But the main question is... why?
Isn't *nix solution better?
I think it is, but the .lib has one advantage. The developer linking to the DLL doesn't actually need to have access to the DLL file itself.
Does a scenario like that happen often in the real world? Is it worth the effort of maintaining two files per DLL file? I don't know.
Edit: Ok, guys let's make things even more confusing! You can link directly to a DLL on Windows, using MinGW. So the whole import library problem is not directly related to Windows itself. Taken from sampleDLL article from MinGW wiki:
The import library created by the "--out-implib" linker option is
required iff (==if and only if) the DLL shall be interfaced from some
C/C++ compiler other than the MinGW toolchain. The MinGW toolchain is
perfectly happy to directly link against the created DLL. More details
can be found in the ld.exe info files that are part of the binutils
package (which is a part of the toolchain).
Linux also requires to link, but instead against a .Lib library it needs to link to the dynamic linker /lib/ld-linux.so.2, but this usually happens behind the scenes when using GCC (however if using an assembler you do need to specify it manually).
Both approaches, either the Windows .LIB approach or the Linux dynamic linker linking approach, are considered in reality as static linking. There is, however, a difference that in Windows part of the work is done at link time although it still has work at load time (I am not sure, but I think that the .LIB file is merely for the linker to know the physical library name, the symbols however are only resolved at load time), while in Linux everything besides linking to the dynamic linker happen at load time.
Dynamic linking is in general referring to open manually the DLL file at runtime (such as using LoadLinrary()), in which case the burden is entirely on the programmer.
In shared library, such as .dll .dylib and .so, there is some information about symbol's name and address, like this:
------------------------------------
| symbol's name | symbol's address |
|----------------------------------|
| Foo | 0x12341234 |
| Bar | 0xabcdabcd |
------------------------------------
And the load function, such as LoadLibrary and dlopen, loads shared library and make it available to use.
GetProcAddress and dlsym find you symbol's address. For example:
HMODULE shared_lib = LoadLibrary("asdf.dll");
void *symbol = GetProcAddress("Foo");
// symbol is 0x12341234
In windows, there is .lib file to use .dll. When you link to this .lib file, you don't need to call LoadLibrary and GetProcAddress, and just use shared library's function as if they're "normal" functions. How can it work?
In fact, the .lib contains an import information. It's like that:
void *Foo; // please put the address of Foo there
void *Bar; // please put the address of Bar there
When the operating system loads your program (strictly speaking, your module), operating system performs LoadLibrary and GetProcAddress automatically.
And if you write code such as Foo();, compiler convert it into (*Foo)(); automatically. So you can use them as if they're "normal" functions.

Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?

I have been involved in some debate with respect to libraries in Linux, and would like to confirm some things.
It is to my understanding (please correct me if I am wrong and I will edit my post later), that there are two ways of using libraries when building an application:
Static libraries (.a files): At link time, a copy of the entire library is put into the final application so that the functions within the library are always available to the calling application
Shared objects (.so files): At link time, the object is just verified against its API via the corresponding header (.h) file. The library isn't actually used until runtime, where it is needed.
The obvious advantage of static libraries is that they allow the entire application to be self-contained, while the benefit of dynamic libraries is that the ".so" file can be replaced (ie: in case it needs to be updated due to a security bug) without requiring the base application to be recompiled.
I have heard some people make a distinction between shared objects and dynamic link libraries (DLL's), even though they are both ".so" files. Is there any distinction between shared objects and DLLs when it comes to C/C++ development on Linux or any other POSIX compliant OS (ie: MINIX, UNIX, QNX, etc)? I am told that one key difference (so far) is that shared objects are just used at runtime, while DLL's must be opened first using the dlopen() call within the application.
Finally, I have also heard some developers mention "shared archives", which, to my understanding, are also static libraries themselves, but are never used by an application directly. Instead, other static libraries will link against the "shared archives" to pull some (but not all) functions/resources from the shared archive into the static library being built.
Thank you all in advance for your assistance.
Update
In the context in which these terms were provided to me, it was effectively erroneous terms used by a team of Windows developers that had to learn Linux. I tried to correct them, but the (incorrect) language norms stuck.
Shared Object: A library that is automatically linked into a program when the program starts, and exists as a standalone file. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylib for a library file named mylib.so). The library must be present at compile time, and when the application starts.
Static Library: A library that is merged into the actual program itself at build time for a single (larger) application containing the application code and the library code that is automatically linked into a program when the program is built, and the final binary containing both the main program and the library itself exists as a single standalone binary file. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylib for a library file named mylib.a). The library must be present at compile time.
DLL: Essentially the same as a shared object, but rather than being included in the linking list at compile time, the library is loaded via dlopen()/dlsym() commands so that the library does not need to be present at build time for the program to compile. Also, the library does not need to be present (necessarily) at application startup or compile time, as it is only needed at the moment the dlopen/dlsym calls are made.
Shared Archive: Essentially the same as a static library, but is compiled with the "export-shared" and "-fPIC" flags. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylibS for a library file named mylibS.a). The distinction between the two is that this additional flag is required if a shared object or DLL wants to statically link the shared archive into its own code AND be able to make the functions in the shared object available to other programs, rather than just using them internal to the DLL. This is useful in the case when someone provides you with a static library, and you wish to repackage it as an SO. The library must be present at compile time.
Additional Update
The distinction between "DLL" and "shared library" was just a (lazy, inaccurate) colloquialism in the company I worked in at the time (Windows developers being forced to shift to Linux development, and the term stuck), adhering to the descriptions noted above.
Additionally, the trailing "S" literal after the library name, in the case of "shared archives" was just a convention used at that company, and not in the industry in general.
A static library(.a) is a library that can be linked directly into the final executable produced by the linker,it is contained in it and there is no need to have the library into the system where the executable will be deployed.
A shared library(.so) is a library that is linked but not embedded in the final executable, so will be loaded when the executable is launched and need to be present in the system where the executable is deployed.
A dynamic link library on windows(.dll) is like a shared library(.so) on linux but there are some differences between the two implementations that are related to the OS (Windows vs Linux) :
A DLL can define two kinds of functions: exported and internal. The exported functions are intended to be called by other modules, as well as from within the DLL where they are defined. Internal functions are typically intended to be called only from within the DLL where they are defined.
An SO library on Linux doesn't need special export statement to indicate exportable symbols, since all symbols are available to an interrogating process.
I've always thought that DLLs and shared objects are just different terms for the same thing - Windows calls them DLLs, while on UNIX systems they're shared objects, with the general term - dynamically linked library - covering both (even the function to open a .so on UNIX is called dlopen() after 'dynamic library').
They are indeed only linked at application startup, however your notion of verification against the header file is incorrect. The header file defines prototypes which are required in order to compile the code which uses the library, but at link time the linker looks inside the library itself to make sure the functions it needs are actually there. The linker has to find the function bodies somewhere at link time or it'll raise an error. It ALSO does that at runtime, because as you rightly point out the library itself might have changed since the program was compiled. This is why ABI stability is so important in platform libraries, as the ABI changing is what breaks existing programs compiled against older versions.
Static libraries are just bundles of object files straight out of the compiler, just like the ones that you are building yourself as part of your project's compilation, so they get pulled in and fed to the linker in exactly the same way, and unused bits are dropped in exactly the same way.
I can elaborate on the details of DLLs in Windows to help clarify those mysteries to my friends here in *NIX-land...
A DLL is like a Shared Object file. Both are images, ready to load into memory by the program loader of the respective OS. The images are accompanied by various bits of metadata to help linkers and loaders make the necessary associations and use the library of code.
Windows DLLs have an export table. The exports can be by name, or by table position (numeric). The latter method is considered "old school" and is much more fragile -- rebuilding the DLL and changing the position of a function in the table will end in disaster, whereas there is no real issue if linking of entry points is by name. So, forget that as an issue, but just be aware it's there if you work with "dinosaur" code such as 3rd-party vendor libs.
Windows DLLs are built by compiling and linking, just as you would for an EXE (executable application), but the DLL is meant to not stand alone, just like an SO is meant to be used by an application, either via dynamic loading, or by link-time binding (the reference to the SO is embedded in the application binary's metadata, and the OS program loader will auto-load the referenced SO's). DLLs can reference other DLLs, just as SOs can reference other SOs.
In Windows, DLLs will make available only specific entry points. These are called "exports". The developer can either use a special compiler keyword to make a symbol an externally-visible (to other linkers and the dynamic loader), or the exports can be listed in a module-definition file which is used at link time when the DLL itself is being created. The modern practice is to decorate the function definition with the keyword to export the symbol name. It is also possible to create header files with keywords which will declare that symbol as one to be imported from a DLL outside the current compilation unit. Look up the keywords __declspec(dllexport) and __declspec(dllimport) for more information.
One of the interesting features of DLLs is that they can declare a standard "upon load/unload" handler function. Whenever the DLL is loaded or unloaded, the DLL can perform some initialization or cleanup, as the case may be. This maps nicely into having a DLL as an object-oriented resource manager, such as a device driver or shared object interface.
When a developer wants to use an already-built DLL, she must either reference an "export library" (*.LIB) created by the DLL developer when she created the DLL, or she must explicitly load the DLL at run time and request the entry point address by name via the LoadLibrary() and GetProcAddress() mechanisms. Most of the time, linking against a LIB file (which simply contains the linker metadata for the DLL's exported entry points) is the way DLLs get used. Dynamic loading is reserved typically for implementing "polymorphism" or "runtime configurability" in program behaviors (accessing add-ons or later-defined functionality, aka "plugins").
The Windows way of doing things can cause some confusion at times; the system uses the .LIB extension to refer to both normal static libraries (archives, like POSIX *.a files) and to the "export stub" libraries needed to bind an application to a DLL at link time. So, one should always look to see if a *.LIB file has a same-named *.DLL file; if not, chances are good that *.LIB file is a static library archive, and not export binding metadata for a DLL.
You are correct in that static files are copied to the application at link-time, and that shared files are just verified at link time and loaded at runtime.
The dlopen call is not only for shared objects, if the application wishes to do so at runtime on its behalf, otherwise the shared objects are loaded automatically when the application starts. DLLS and .so are the same thing. the dlopen exists to add even more fine-grained dynamic loading abilities for processes. You dont have to use dlopen yourself to open/use the DLLs, that happens too at application startup.
I suspect some kind of misunderstanding here, but header files, at least of the .h variety used for compiling source code, are most definitely NOT checked during link time.
.h, and for that matter, .c/.cpp files, are only involved during the compilation phase, which includes preprocessing. Once the object code has been created the header file is long gone well before the linker gets around to dealing with things.

Why does the C++ linker require the library files during a build, even though I am dynamically linking?

I have a C++ executable and I'm dynamically linking against several libraries (Boost, Xerces-c and custom libs).
I understand why I would require the .lib/.a files if I choose to statically link against these libraries (relevant SO question here). However, why do I need to provide the corresponding .lib/.so library files when linking my executable if I'm dynamically linking against these external libraries?
The compiler isn't aware of dynamic linking, it just knows that a function exists via its prototype. The linker needs the lib files to resolve the symbol. The lib for a DLL contains additional information like what DLL the functions live in and how they are exported (by name, by ordinal, etc.) The lib files for DLL's contain much less information than lib files that contain the full object code - libcmmt.lib on my system is 19.2 MB, but msvcrt.lib is "only" 2.6 MB.
Note that this compile/link model is nearly 40 years old at this point, and predates dynamic linking on most platforms. If it were designed today, dynamic linking would be a first class citizen (for instance, in .NET, each assembly has rich metadata describing exactly what it exports, so you don't need separate headers and libs.)
Raymond Chen wrote a couple blog entries about this specific to Windows. Start with The classical model for linking and then follow-up with Why do we have import libraries anyway?.
To summarize, history has defined the compiler as the component that knows about detailed type information, whereas the linker only knows about symbol names. So the linker ends up creating the .DLL without type information, and therefore programs that want to link with it need some sort of metadata to tell it about how the functions are exported and what parameter types they take and return.
The reason .DLLs don't have all the information you need to link with them directly is is historic, and not a technical limitation.
For one thing, the linker inserts the versions of the libraries that exist at link time so that you have some chance of your program working if library versions are updated. Multiple versions of shared libraries can exist on a system.
The linker has the job of validating that all your undefined symbols are accounted for, either with static content or dynamic content.
By default, then, it insists on all your symbols being present.
However, that's just the default. See -z, and --allow-shlib-undefined, and friends.
Perhaps this dynamic linking is done via import libraries (function has __declspec(dllimport) before definition).
If this is the way than compilator expects that there's __imp_symbol function declared and this function is responsible for forwarding call to the right library dynamically loaded.
Those functions are generated during linkage of symbols with __declspec(dllimport) keyword
Here is a very SIMPLIFIED description that may help. Static linking puts all of the code needed to run your program into the executable so everything is found. Dynamic linking means some of the required code does not get put into the executable and will be found at runtime. Where do I find it? Is function x() there? How do I make a call to function x()? That is what the library tells the linker when you are dynamically linking.