How does dynamic linking work generally?
On Windows (LoadLibrary), you need a .dll to call at runtime, but at link time, you need to provide a corresponding .lib file or the program won't link... What does the .lib file contain? A description of the .dll methods? Isn't that what the headers contain?
Relatedly, on *nix, you don't need a lib file... How how does the compiler know that the methods described in the header will be available at runtime?
As a newbie, when you think about either one of the two schemes, then the other, neither of them make sense...
To answer your questions one by one:
Dynamic linking defers part of the linking process to runtime.
It can be used in two ways: implicitly and explicitly.
Implicitly, the static linker will insert information into the
executable which will cause the library to load and resolve the
necessary symbols. Explicitly, you must call LoadLibrary or
dlopen manually, and then GetProcAddress/dlsym for each
symbol you need to use. Implicit loading is used for things
like the system library, where the implementation will depend on
the version of the system, but the interface is guaranteed.
Explicit loading is used for things like plug-ins, where the
library to be loaded will be determined at runtime.
The .lib file is only necessary for implicit loading. It
contains the information that the library actually provides this
symbol, so the linker won't complain that the symbol is
undefined, and it tells the linker in what library the symbols
are located, so it can insert the necessary information to cause
this library to automatically be loaded. All the header files
tell the compiler is that the symbols will exist, somewhere; the
linker needs the .lib to know where.
Under Unix, all of the information is extracted from the
.so. Why Windows requires two separate files, rather than
putting all of the information in one file, I don't know; it's
actually duplicating most of the information, since the
information needed in the .lib is also needed in the .dll.
(Perhaps licensing issues. You can distribute your program with
the .dll, but no one can link against the libraries unless
they have a .lib.)
The main thing to retain is that if you want implicit loading,
you have to provide the linker with the appropriate information,
either with a .lib or a .so file, so that it can insert that
information into the executable. And that if you want explicit
loading, you can't refer to any of the symbols in the library
directly; you have to call GetProcAddress/dlsym to get their
addresses yourself (and do some funny casting to use them).
The .lib file on Windows is not required for loading a dynamic library, it merely offers a convenient way of doing so.
In principle, you can use LoadLibrary for loading the dll and then use GetProcAddress for accessing functions provided by that dll. The compilation of the enclosing program does not need to access the dll in that case, it is only needed at runtime (ie. when LoadLibrary actually executes). MSDN has a code example.
The disadvantage here is that you need to manually write code for loading the functions from the dll. In case you compiled the dll yourself in the first place, this code simply duplicates knowledge that the compiler could have extracted from the dll source code automatically (like the names and signatures of exported functions).
This is what the .lib file does: It contains the GetProcAddress calls for the Dlls exported functions, generated by the compiler so you don't have to worry about it. In Windows terms, this is called Load-Time Dynamic Linking, since the Dll is loaded automatically by the code from the .lib file when your enclosing program is loaded (as opposed to the manual approach, referred to as run-time dynamic linking).
How does dynamic linking work generally?
The dynamic link library (aka shared object) file contains machine code instructions and data, along with a table of metadata saying which offsets in that code/data relate to which "symbols", the type of the symbol (e.g. function vs data), the number of bytes or words in the data, and a few other things. Different OS will tend to have different shared object file formats, and indeed the same OS may support several, but that's the gist of it.
So, imagine the shared library's a big chunk of bytes with an index like this:
SYMBOL ADDRESS TYPE SIZE
my_function 1000 function 2893
my_number 4800 variable 4
In general, the exact type of the symbols need not be captured in the metadata table - it's expected that declarations in the library's header files contain all the missing information. C++ is a bit special - compared to say C - because overloading can mean there are several functions with the same name, and namespaces allow for further symbols that would otherwise be ambiguously named - for that reason name mangling is typically used to concatenate some representation of the namespace and function arguments to the function name, forming something that can be unique in the library object file.
A program wanting to use the shared object can generally do one of two things:
have the OS load both itself and the shared object around the same time (before executing main()), with the OS Loader responsible for finding the symbols and examining metadata in the program file image about the use of those symbols, then patching in symbol addresses in the memory the program uses, such that the program can then just run and work functionally as if it'd known about the symbol addresses when it was first compiled (but perhaps a little slower)
or, explicitly in its own source code call dlopen sometime after main runs, then use dlsym or similar to get the symbol addresses, save them into (function/data) pointers based on the programmer's knowledge of the expected data types, then call them explicitly using the pointers.
On Windows (LoadLibrary), you need a .dll to call at runtime, but at link time, you need to provide a corresponding .lib file or the program won't link...
That doesn't sound right. Should be one or the other I'd think.
Wtf does the .lib file contain? A description of the .dll methods? Isn't that what the headers contain?
A lib file is - at this level of description - pretty much the same as a shared object file... the main difference is that the compiler's finding the symbol addresses before the program's shipped and run.
Modern *nix systems derive process of dynamic linking from Solaris OS. Linux, particularly, doesn't need separate .lib file because all external dependencies are contained in ELF format. .interp section of ELF file indicates that there are external symbols inside this executable that needed to be resolved dynamically. This comes for dynamic linking.
There is a way to handle dynamic linking in user space. This method is called dynamic loading. This is when you are using system calls to get function pointers to methods from external *.so.
More information can be found from this article http://www.ibm.com/developerworks/library/l-dynamic-libraries/.
Relatedly, on OS X (and I assume *nix... dlopen), you don't need a lib file... How how does the compiler know that the methods described in the header will be available at runtime?
Compilers or linkers do not need such information. You, the programmer, need to handle the situation that the shared libraries you try to open by dlopen() may not exist.
You can use a DLL file in Windows in two ways: Either you link with it, and you're done, nothing more to do. Or you load it dynamically during run-time.
If you link with it, then the DLL library file is used. The link-library contains information that the linker uses to actually know which DLL to load and where in the DLL functions are, so it can call them. When your program is loaded, the operating system also loads the DLL for you, basically what is does it call LoadLibrary for you.
In other operating systems (like OS X and Linux) it works in a similar way. The difference is that on these systems the linker can look directly at the dynamic library (the .so/.dynlib file) and figure out what's needed without a separate static library like on Windows.
To load a library dynamically, you don't need to link with anything related to the library you want to load.
Like others already said: what is included in a .lib file on Windows is included directly in the .so/.dynlib on Linux/OS X. But the main question is... why?
Isn't *nix solution better?
I think it is, but the .lib has one advantage. The developer linking to the DLL doesn't actually need to have access to the DLL file itself.
Does a scenario like that happen often in the real world? Is it worth the effort of maintaining two files per DLL file? I don't know.
Edit: Ok, guys let's make things even more confusing! You can link directly to a DLL on Windows, using MinGW. So the whole import library problem is not directly related to Windows itself. Taken from sampleDLL article from MinGW wiki:
The import library created by the "--out-implib" linker option is
required iff (==if and only if) the DLL shall be interfaced from some
C/C++ compiler other than the MinGW toolchain. The MinGW toolchain is
perfectly happy to directly link against the created DLL. More details
can be found in the ld.exe info files that are part of the binutils
package (which is a part of the toolchain).
Linux also requires to link, but instead against a .Lib library it needs to link to the dynamic linker /lib/ld-linux.so.2, but this usually happens behind the scenes when using GCC (however if using an assembler you do need to specify it manually).
Both approaches, either the Windows .LIB approach or the Linux dynamic linker linking approach, are considered in reality as static linking. There is, however, a difference that in Windows part of the work is done at link time although it still has work at load time (I am not sure, but I think that the .LIB file is merely for the linker to know the physical library name, the symbols however are only resolved at load time), while in Linux everything besides linking to the dynamic linker happen at load time.
Dynamic linking is in general referring to open manually the DLL file at runtime (such as using LoadLinrary()), in which case the burden is entirely on the programmer.
In shared library, such as .dll .dylib and .so, there is some information about symbol's name and address, like this:
------------------------------------
| symbol's name | symbol's address |
|----------------------------------|
| Foo | 0x12341234 |
| Bar | 0xabcdabcd |
------------------------------------
And the load function, such as LoadLibrary and dlopen, loads shared library and make it available to use.
GetProcAddress and dlsym find you symbol's address. For example:
HMODULE shared_lib = LoadLibrary("asdf.dll");
void *symbol = GetProcAddress("Foo");
// symbol is 0x12341234
In windows, there is .lib file to use .dll. When you link to this .lib file, you don't need to call LoadLibrary and GetProcAddress, and just use shared library's function as if they're "normal" functions. How can it work?
In fact, the .lib contains an import information. It's like that:
void *Foo; // please put the address of Foo there
void *Bar; // please put the address of Bar there
When the operating system loads your program (strictly speaking, your module), operating system performs LoadLibrary and GetProcAddress automatically.
And if you write code such as Foo();, compiler convert it into (*Foo)(); automatically. So you can use them as if they're "normal" functions.
Related
Under Windows, when I compile C/C++ code in a DLL project in MSVC I am getting 2 files:
MyDll.dll
MyDll.lib
where as far as I understand MyDll.lib contains some kind of pointers table indicating functions locations in the dll. When using this dll, say in an exe file, MyDll.lib is embedded into the exe file during linkage so in runtime it "knows" where the functions are located in MyDll.dll and can use them.
But if I compile the same code under Linux I am getting only one file MySo.so without MySo.a (the equivalent to lib file in Linux) so how does an executable file under Linux knows where the functions are located in MySo.so if nothing is embedded into it during linking?
The MSVC linker can link together object files (.obj) and object libraries (.lib) to produce an .EXE or a .DLL.
To link with a DLL, the process in MSVC is to use a so-called import library (.LIB) that acts as a glue between the C function names and the DLL's export table (in a DLL a function can be exported by name or by ordinal - the latter was often used for undocumented APIs).
However, in most cases the DLL export table has all the function names and thus the import library (.LIB) contains largely redundant information ("import function ABC -> exported function ABC", etc).
It is even possible to generate a .LIB from an existing .DLL.
Linkers on other platforms don't have this "feature" and can link with dynamic libraries directly.
On Linux, the linker (not the dynamic linker) searches through the shared libraries specified at link time and creates references to them inside the executable. When the dynamic linker loads these executables it loads the shared libraries they require into memory and resolves the symbols, which allows the binaries to be run.
MySo.a, if created, would actually include the symbols to be linked directly into the binary instead of the "symbol lookup tables" used on Windows.
rustyx's answer explains the process on Windows more thoroughly than I can; it's been a long time since I've used Windows.
The difference you are seeing is more of an implementation detail - under the hood both Linux and Windows work similarly - you code calls a stub function which is statically linked in your executable and this stub then loads DLL/shlib if necessary (in case of delayed loading, otherwise library is loaded when program starts) and (on first call) resolves symbol via GetProcAddress/dlsym.
The only difference is that on Linux the these stub functions (which are called PLT stubs) are generated dynamically when you link your app with dynamic library (library contains enough information to generate them), whereas on Windows they are instead generated when DLL itself is created, in a separate .lib file.
The two approaches are so similar that it's actually possible to mimic Windows import libraries on Linux (see Implib.so project).
On Linux, you pass MySo.so to the linker and it is able to extract only what is needed for the link phase, putting in a reference that MySo.so is needed at run time.
.dll or .so are shared libs (linked in runtime), while .a and .lib is a static library (linked in compile time). This is no difference between Windows and Linux.
The difference is, how are they handled. Note: the difference is only in the customs, how are they used. It wouldn't be too hard to make Linux builds on the Windows way and vice versa, except that practically no one does this.
If we use a dll, or we call a function even from our own binary, there is a simple and clear way. For example, in C, we see that:
int example(int x) {
...do_something...
}
int ret = example(42);
However, on the asm level, there could be many differences. For example, on x86, a call opcode is executed, and the 42 is given on the stack. Or in some registers. Or anywhere. No one knows that before writing the dll, how it will be used. Or how the projects will want to use it, possible written with a compiler (or in a language!) which doesn't even exist now (or is it unknown for the developers of the dll).
For example, by default, both C and Pascal puts the arguments (and gets the return values) from the stack - but they are doing it in different order. You can also exchange arguments between your functions in the registers by some - compiler-dependent - optimization.
As you see correctly, the Windows custom is that building a dll, we also create a minimal .a/.lib with it. This minimal static library is only a wrapper, the symbols (functions) of that dll are reached through it. This makes the required asm-level calling conversions.
Its advantage is the compatibility. Its disadvantage is that if you have only a .dll, you can have a hard time to figure out, how its functions want to be called. This makes the usage of dlls a hacking task, if the developer of the dll does not give you the .a. Thus, it serves mainly closedness purposes, for example so is it easier to get extra cash for the SDKs.
Its another disadvantage is than even if you use a dynamical library, you need to compile this little wrapper statically.
In Linux, the binary interface of the dlls is standard and follows the C convention. Thus, no .a is required and there is binary compatibility between the shared libs, in exchange we don't have the advantages of the microsoft custom.
Can you help me to understand, why do we need .lib files when importing functions and data from dll?
I've heard, that it contains a list of the exported functions and data elements from the corresponding dll, but when I used CFF Explorer to explore my dll, I found out that dll already has addresses of exporting functions so I theoretically can link my program with .dll without any additional files.
Can you, please, explain what kind of data is stored in the .lib files more detailed.
And, also, yes, I know, that visual studio forces us to add .lib files into additional dependencies section, but why does it really needs them?
When your source code statically calls exported DLL functions, or statically accesses exported DLL variables, those references are compiled into your executable's intermediate object files as pointers, whose values get populated at run-time.
When the linker is combining the compiler-generated object files to make the final executable, it has to figure out what all of the compiler-generated references actually refer to. If it can't match a given reference to some piece of code in your executable, it needs to match it to an external DLL instead. So it needs to know which DLLs to even look at, and how those DLLs export things. A DLL may export a given function/variable by name OR by ordinal number, so the linker needs a way to map the identifiers used by your code references to specific entries in the EXPORTS tables of specific .dll files (especially in the case where things are exported by ordinals). Static-link .lib files provide the linker with that mapping information (ie FunctionA maps to Ordinal 123 in DLL XYZ.dll, FunctionB maps to name _FunctionB#4 in DLL ABC.dll, etc).
The linker can then populate the IMPORTS table of your executable with information about the appropriate EXPORTS entries needed, and then make the DLL references in your code point to the correct IMPORTS entries (if the linker can't resolve a compiler-generate reference to a piece of code in your executable, or to a specific DLL export, it aborts with an "unresolved external" error).
Then, when your executable is loaded at run-time, the OS Loader looks at the IMPORTS table to know which DLL exports are needed, so it can then load the appropriate DLLs into memory and update the entries in the IMPORTS table with real memory addresses that are based on each DLL's EXPORTS table (if a referenced DLL fails to load, or if a referenced export fails to be found, the OS Loader aborts loading your executable). That way, when your code calls DLL functions or accesses DLL variables, those accesses go to the right places.
Things are very different if your source code dynamically accesses DLL functions/variables via explicit calls to GetProcAddress() at run-time. In that case, static-link .lib files are not needed for those accesses, since your own code is handling the loading of DLLs into memory and locating the exports that it wants to use.
However, there is a 3rd option that blends the above scenarios together: you can write your code to access the DLL functions/variables statically but use your linker's delay-load feature (if it has one). In that case, you still need static-link .lib files for each delay-loaded DLL you access, but the linker populates a separate DELAYLOAD table in your executable with references to the DLL exports, instead of populating the IMPORTS table. It points the compiler-generated DLL references to stubs in your compiler's RTL that will replace the references with addresses from GetProcAddress() when the stubs are accessed for the first time at run-time, thus avoiding the need for the references to be populated by the OS Loader at load-time. This allows your executable to run normally even if the DLL exports are not present at load-time, and may not even need to load the DLLs at all if they are never used (of course, if your executable does try to access a DLL export dynamically and it fails to load, your code is likely to crash, but that is a separate issue).
I've heard, that it contains a list of the exported functions and data elements from the corresponding dll, but when I used CFF Explorer to explore my dll, I found out that dll already has addresses of exporting functions so I theoretically can link my program with .dll without any additional files.
As a trivial example of why this can't always work, consider an executable that accesses two DLLs, one for a Winsock filter and the other for an allocator. And say that on this particular machine, the Winsock filter DLL happens to also implement an allocator with the same API and the allocator DLL happens to also implement a Winsock filter with the same API. How could the compiler know which API functions to access from which DLL? The library file contains the intent in accessing the DLL, that is, the API and functions you want to access.
Importantly, there is no such thing as "The corresponding DLL". There might be different DLL files on different systems. What the linker needs to know is what the DLL is supposed to look like that it can rely on, not what the DLL that you might happen to use on some particular system might happen to be.
For example, suppose the DLL file contains an allocator. You might have one DLL file for an allocator with debugging, one for an allocator with optimizations for specific CPU versions, and one for an allocator that uses a new, experimental algorithm. What the linker needs to know is the API that all these DLL files implement, not the specific implementation in any one file.
You can produce a LIB file from a DLL file but you might wind up building an executable that doesn't work when using some other version of the DLL file. You would have to assume that whatever this particular DLL happens to do is precisely what every other DLL that implements the same API will happen to do.
Referring to this answer: https://stackoverflow.com/a/6264256/5324086,
I found that a linker has even more functionality than just managing absolute addresses for object file symbols.
What does the library produced by linker contain? Is it something other than ... say a C Standard library?
Why does the linker even need to produce a library?
The exact details depend on the type of library (you can search for shared library formats) but the basic components will include the compiled code, plus a symbol table that tells the linker which address corresponds to each name. Note that this is very similar to an object file. Static libraries are basically archives of object files and the compiler links them in a similar way. With dynamic libraries, the OS can look this up whenever it loads a program, and link the symbols then. They won't generally have the same absolute addresses in every program's address space, so these addresses will be relative to where the OS loads the library.
The C standard library (MSVC runtime on Windows) is an example of a library.
Static libraries are just a collection of object files. You can think of them as a tar file containing all the relevant .a files (or, on Windows, as a zip file containing obj files). The linking part of the linker is not involved here (in facts traditionally static libraries on Unix systems are done with the ar utility, which is somehow related to tar). They are completely resolved at compile time, and they are simply used as a way to avoid rebuilding all the time stuff that is long to build or has complex build procedures.
Dynamic libraries are a different beast. They are fully fledged executables that can be loaded by other processes, so the regular linker is needed for the same reasons it is used in normal executables. Instead of providing just a single entrypoint, they export a full symbols table that is used by the loader (or "runtime linker") to allow the host program to locate the required procedures. Generally they also contain relocation information to allow loading at any address in the target address space (or they are compiled in position independent code for this same reason).
I have been involved in some debate with respect to libraries in Linux, and would like to confirm some things.
It is to my understanding (please correct me if I am wrong and I will edit my post later), that there are two ways of using libraries when building an application:
Static libraries (.a files): At link time, a copy of the entire library is put into the final application so that the functions within the library are always available to the calling application
Shared objects (.so files): At link time, the object is just verified against its API via the corresponding header (.h) file. The library isn't actually used until runtime, where it is needed.
The obvious advantage of static libraries is that they allow the entire application to be self-contained, while the benefit of dynamic libraries is that the ".so" file can be replaced (ie: in case it needs to be updated due to a security bug) without requiring the base application to be recompiled.
I have heard some people make a distinction between shared objects and dynamic link libraries (DLL's), even though they are both ".so" files. Is there any distinction between shared objects and DLLs when it comes to C/C++ development on Linux or any other POSIX compliant OS (ie: MINIX, UNIX, QNX, etc)? I am told that one key difference (so far) is that shared objects are just used at runtime, while DLL's must be opened first using the dlopen() call within the application.
Finally, I have also heard some developers mention "shared archives", which, to my understanding, are also static libraries themselves, but are never used by an application directly. Instead, other static libraries will link against the "shared archives" to pull some (but not all) functions/resources from the shared archive into the static library being built.
Thank you all in advance for your assistance.
Update
In the context in which these terms were provided to me, it was effectively erroneous terms used by a team of Windows developers that had to learn Linux. I tried to correct them, but the (incorrect) language norms stuck.
Shared Object: A library that is automatically linked into a program when the program starts, and exists as a standalone file. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylib for a library file named mylib.so). The library must be present at compile time, and when the application starts.
Static Library: A library that is merged into the actual program itself at build time for a single (larger) application containing the application code and the library code that is automatically linked into a program when the program is built, and the final binary containing both the main program and the library itself exists as a single standalone binary file. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylib for a library file named mylib.a). The library must be present at compile time.
DLL: Essentially the same as a shared object, but rather than being included in the linking list at compile time, the library is loaded via dlopen()/dlsym() commands so that the library does not need to be present at build time for the program to compile. Also, the library does not need to be present (necessarily) at application startup or compile time, as it is only needed at the moment the dlopen/dlsym calls are made.
Shared Archive: Essentially the same as a static library, but is compiled with the "export-shared" and "-fPIC" flags. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylibS for a library file named mylibS.a). The distinction between the two is that this additional flag is required if a shared object or DLL wants to statically link the shared archive into its own code AND be able to make the functions in the shared object available to other programs, rather than just using them internal to the DLL. This is useful in the case when someone provides you with a static library, and you wish to repackage it as an SO. The library must be present at compile time.
Additional Update
The distinction between "DLL" and "shared library" was just a (lazy, inaccurate) colloquialism in the company I worked in at the time (Windows developers being forced to shift to Linux development, and the term stuck), adhering to the descriptions noted above.
Additionally, the trailing "S" literal after the library name, in the case of "shared archives" was just a convention used at that company, and not in the industry in general.
A static library(.a) is a library that can be linked directly into the final executable produced by the linker,it is contained in it and there is no need to have the library into the system where the executable will be deployed.
A shared library(.so) is a library that is linked but not embedded in the final executable, so will be loaded when the executable is launched and need to be present in the system where the executable is deployed.
A dynamic link library on windows(.dll) is like a shared library(.so) on linux but there are some differences between the two implementations that are related to the OS (Windows vs Linux) :
A DLL can define two kinds of functions: exported and internal. The exported functions are intended to be called by other modules, as well as from within the DLL where they are defined. Internal functions are typically intended to be called only from within the DLL where they are defined.
An SO library on Linux doesn't need special export statement to indicate exportable symbols, since all symbols are available to an interrogating process.
I've always thought that DLLs and shared objects are just different terms for the same thing - Windows calls them DLLs, while on UNIX systems they're shared objects, with the general term - dynamically linked library - covering both (even the function to open a .so on UNIX is called dlopen() after 'dynamic library').
They are indeed only linked at application startup, however your notion of verification against the header file is incorrect. The header file defines prototypes which are required in order to compile the code which uses the library, but at link time the linker looks inside the library itself to make sure the functions it needs are actually there. The linker has to find the function bodies somewhere at link time or it'll raise an error. It ALSO does that at runtime, because as you rightly point out the library itself might have changed since the program was compiled. This is why ABI stability is so important in platform libraries, as the ABI changing is what breaks existing programs compiled against older versions.
Static libraries are just bundles of object files straight out of the compiler, just like the ones that you are building yourself as part of your project's compilation, so they get pulled in and fed to the linker in exactly the same way, and unused bits are dropped in exactly the same way.
I can elaborate on the details of DLLs in Windows to help clarify those mysteries to my friends here in *NIX-land...
A DLL is like a Shared Object file. Both are images, ready to load into memory by the program loader of the respective OS. The images are accompanied by various bits of metadata to help linkers and loaders make the necessary associations and use the library of code.
Windows DLLs have an export table. The exports can be by name, or by table position (numeric). The latter method is considered "old school" and is much more fragile -- rebuilding the DLL and changing the position of a function in the table will end in disaster, whereas there is no real issue if linking of entry points is by name. So, forget that as an issue, but just be aware it's there if you work with "dinosaur" code such as 3rd-party vendor libs.
Windows DLLs are built by compiling and linking, just as you would for an EXE (executable application), but the DLL is meant to not stand alone, just like an SO is meant to be used by an application, either via dynamic loading, or by link-time binding (the reference to the SO is embedded in the application binary's metadata, and the OS program loader will auto-load the referenced SO's). DLLs can reference other DLLs, just as SOs can reference other SOs.
In Windows, DLLs will make available only specific entry points. These are called "exports". The developer can either use a special compiler keyword to make a symbol an externally-visible (to other linkers and the dynamic loader), or the exports can be listed in a module-definition file which is used at link time when the DLL itself is being created. The modern practice is to decorate the function definition with the keyword to export the symbol name. It is also possible to create header files with keywords which will declare that symbol as one to be imported from a DLL outside the current compilation unit. Look up the keywords __declspec(dllexport) and __declspec(dllimport) for more information.
One of the interesting features of DLLs is that they can declare a standard "upon load/unload" handler function. Whenever the DLL is loaded or unloaded, the DLL can perform some initialization or cleanup, as the case may be. This maps nicely into having a DLL as an object-oriented resource manager, such as a device driver or shared object interface.
When a developer wants to use an already-built DLL, she must either reference an "export library" (*.LIB) created by the DLL developer when she created the DLL, or she must explicitly load the DLL at run time and request the entry point address by name via the LoadLibrary() and GetProcAddress() mechanisms. Most of the time, linking against a LIB file (which simply contains the linker metadata for the DLL's exported entry points) is the way DLLs get used. Dynamic loading is reserved typically for implementing "polymorphism" or "runtime configurability" in program behaviors (accessing add-ons or later-defined functionality, aka "plugins").
The Windows way of doing things can cause some confusion at times; the system uses the .LIB extension to refer to both normal static libraries (archives, like POSIX *.a files) and to the "export stub" libraries needed to bind an application to a DLL at link time. So, one should always look to see if a *.LIB file has a same-named *.DLL file; if not, chances are good that *.LIB file is a static library archive, and not export binding metadata for a DLL.
You are correct in that static files are copied to the application at link-time, and that shared files are just verified at link time and loaded at runtime.
The dlopen call is not only for shared objects, if the application wishes to do so at runtime on its behalf, otherwise the shared objects are loaded automatically when the application starts. DLLS and .so are the same thing. the dlopen exists to add even more fine-grained dynamic loading abilities for processes. You dont have to use dlopen yourself to open/use the DLLs, that happens too at application startup.
I suspect some kind of misunderstanding here, but header files, at least of the .h variety used for compiling source code, are most definitely NOT checked during link time.
.h, and for that matter, .c/.cpp files, are only involved during the compilation phase, which includes preprocessing. Once the object code has been created the header file is long gone well before the linker gets around to dealing with things.
I know this may seem quite basic to geeks. But I want to make it crystal clear.
When I want to use a Win32 DLL, usually I just call the APIs like LoadLibrary() and GetProcAdderss(). But recently, I am developing with DirectX9, and I need to add d3d9.lib, d3dx9.lib, etc files.
I have heard enough that LIB is for static linking and DLL is for dynamic linking.
So my current understanding is that LIB contains the implementation of the methods and is statically linked at link time as part of the final EXE file. While DLL is dynamic loaded at runtime and is not part of the final EXE file.
But sometimes, there're some LIB files coming with the DLL files, so:
What are these LIB files for?
How do they achieve what they are meant for?
Is there any tools that can let me inspect the internals of these LIB files?
Update 1
After checking wikipedia, I remember that these LIB files are called import library.
But I am wondering how it works with my main application and the DLLs to be dynamically loaded.
Update 2
Just as RBerteig said, there're some stub code in the LIB files born with the DLLs. So the calling sequence should be like this:
My main application --> stub in the LIB --> real target DLL
So what information should be contained in these LIBs? I could think of the following:
The LIB file should contain the fullpath of the corresponding DLL; So the DLL could be loaded by the runtime.
The relative address (or file offset?) of each DLL export method's entry point should be encoded in the stub; So correct jumps/method calls could be made.
Am I right on this? Is there something more?
BTW: Is there any tool that can inspect an import library? If I can see it, there'll be no more doubts.
Linking to a DLL file can occur implicitly at compile link time, or explicitly at run time. Either way, the DLL ends up loaded into the processes memory space, and all of its exported entry points are available to the application.
If used explicitly at run time, you use LoadLibrary() and GetProcAddress() to manually load the DLL and get pointers to the functions you need to call.
If linked implicitly when the program is built, then stubs for each DLL export used by the program get linked in to the program from an import library, and those stubs get updated as the EXE and the DLL are loaded when the process launches. (Yes, I've simplified more than a little here...)
Those stubs need to come from somewhere, and in the Microsoft tool chain they come from a special form of .LIB file called an import library. The required .LIB is usually built at the same time as the DLL, and contains a stub for each function exported from the DLL.
Confusingly, a static version of the same library would also be shipped as a .LIB file. There is no trivial way to tell them apart, except that LIBs that are import libraries for DLLs will usually be smaller (often much smaller) than the matching static LIB would be.
If you use the GCC toolchain, incidentally, you don't actually need import libraries to match your DLLs. The version of the Gnu linker ported to Windows understands DLLs directly, and can synthesize most any required stubs on the fly.
Update
If you just can't resist knowing where all the nuts and bolts really are and what is really going on, there is always something at MSDN to help. Matt Pietrek's article An In-Depth Look into the Win32 Portable Executable File Format is a very complete overview of the format of the EXE file and how it gets loaded and run. Its even been updated to cover .NET and more since it originally appeared in MSDN Magazine ca. 2002.
Also, it can be helpful to know how to learn exactly what DLLs are used by a program. The tool for that is Dependency Walker, aka depends.exe. A version of it is included with Visual Studio, but the latest version is available from its author at http://www.dependencywalker.com/. It can identify all of the DLLs that were specified at link time (both early load and delay load) and it can also run the program and watch for any additional DLLs it loads at run time.
Update 2
I've reworded some of the earlier text to clarify it on re-reading, and to use the terms of art implicit and explicit linking for consistency with MSDN.
So, we have three ways that library functions might be made available to be used by a program. The obvious follow up question is then: "How to I choose which way?"
Static linking is how the bulk of the program itself is linked. All of your object files are listed, and get collected together in to the EXE file by the linker. Along the way, the linker takes care of minor chores like fixing up references to global symbols so that your modules can call each other's functions. Libraries can also be statically linked. The object files that make up the library are collected together by a librarian in a .LIB file which the linker searches for modules containing symbols that are needed. One effect of static linking is that only those modules from the library that are used by the program are linked to it; other modules are ignored. For instance, the traditional C math library includes many trigonometry functions. But if you link against it and use cos(), you don't end up with a copy of the code for sin() or tan() unless you also called those functions. For large libraries with a rich set of features, this selective inclusion of modules is important. On many platforms such as embedded systems, the total size of code available for use in the library can be large compared to the space available to store an executable in the device. Without selective inclusion, it would be harder to manage the details of building programs for those platforms.
However, having a copy of the same library in every program running creates a burden on a system that normally runs lots of processes. With the right kind of virtual memory system, pages of memory that have identical content need only exist once in the system, but can be used by many processes. This creates a benefit for increasing the chances that the pages containing code are likely to be identical to some page in as many other running processes as possible. But, if programs statically link to the runtime library, then each has a different mix of functions each laid out in that processes memory map at different locations, and there aren't many sharable code pages unless it is a program that all by itself is run in more than process. So the idea of a DLL gained another, major, advantage.
A DLL for a library contains all of its functions, ready for use by any client program. If many programs load that DLL, they can all share its code pages. Everybody wins. (Well, until you update a DLL with new version, but that isn't part of this story. Google DLL Hell for that side of the tale.)
So the first big choice to make when planning a new project is between dynamic and static linkage. With static linkage, you have fewer files to install, and you are immune from third parties updating a DLL you use. However, your program is larger, and it isn't quite as good citizen of the Windows ecosystem. With dynamic linkage, you have more files to install, you might have issues with a third party updating a DLL you use, but you are generally being friendlier to other processes on the system.
A big advantage of a DLL is that it can be loaded and used without recompiling or even relinking the main program. This can allow a third party library provider (think Microsoft and the C runtime, for example) to fix a bug in their library and distribute it. Once an end user installs the updated DLL, they immediately get the benefit of that bug fix in all programs that use that DLL. (Unless it breaks things. See DLL Hell.)
The other advantage comes from the distinction between implicit and explicit loading. If you go to the extra effort of explicit loading, then the DLL might not even have existed when the program was written and published. This allows for extension mechanisms that can discover and load plugins, for instance.
These .LIB import library files are used in the following project property, Linker->Input->Additional Dependencies, when building a bunch of dll's that need additional information at link time which is supplied by the import library .LIB files. In the example below to not get linker errors I need to reference to dll's A,B,C, and D through their lib files. (note for the linker to find these files you may need to include their deployment path in Linker->General->Additional Library Directories else you will get a build error about being unable to find any of the provided lib files.)
If your solution is building all dynamic libraries you may have been able to avoid this explicit dependency specification by relying instead on the reference flags exposed under the Common Properties->Framework and References dialog. These flags appear to automatically do the linking on your behalf using the *.lib files.
This however is as it says a Common Properties, which is not configuration or platform specific. If you need to support a mixed build scenario as in our application we had a build configuration to render a static build and a special configuration that built a constrained build of a subset of assemblies that were deployed as dynamic libraries. I had used the Use Library Dependency Inputs and Link Library Dependencies flags set to true under various cases to get things to build and later realizing to simplify things but when introducing my code to the static builds I introduced a ton of linker warnings and the build was incredibly slow for the static builds. I wound up introducing a bunch of these sort of warnings...
warning LNK4006: "bool __cdecl XXX::YYY() already defined in CoreLibrary.lib(JSource.obj); second definition ignored D.lib(JSource.obj)
And I wound up using the manual specification of Additional Dependencies to satisfy the linker for the dynamic builds while keeping the static builders happy by not using a common property that slowed them down. When I deploy the dynamic subset build I only deploy the dll files as these lib files are only used at link time, not at runtime.
Here are some related MSDN topics to answer my question:
Linking an Executable to a DLL
Linking Implicitly
Determining Which Linking Method to Use
Building an Import Library and Export File
There are three kinds of libraries: static, shared and dynamically loaded libraries.
The static libraries are linked with the code at the linking phase, so they are actually in the executable, unlike the shared library, which has only stubs (symbols) to look for in the shared library file, which is loaded at run time before the main function gets called.
The dynamically loaded ones are much like the shared libraries, except they are loaded when and if the need arises by the code you've written.
In my mind, there are two method to link dll to exe.
Use dll and the import library (.lib file) implicitly
Use functions like loadlibrary() explicitly