Static link an existing windows binary - c++

I was wondering if I can take an existing windows DLL and static link the dynamically-linked files?
I saw a number of projects to do this with Linux/elf
http://magicermine.com/
http://statifier.sourceforge.net/
http://bitwagon.com/jumpstart/jumpstart.html
I imagine this is most likely not possible, but I am running into some issues in WinPE where when I statically linked the DLLs everything started working great.
I don't have the source to the existing DLL.
I guess I could make a pass-through DLL that exposed all of the same functions and static linked?

There is no tool support for linking in the code of a DLL statically.
The problem is that a DLL is a full Windows PE executable, not a C or C++ “library” in any sense. The C++ standard has only one statement that is vaguely in support of DLL-like things (in the para about dynamic initialization after first statement of main). You’re out of luck.
But if you had the source code (as e.g. with MFC), which you say you don’t, then you could just have created static libraries.
Do note that there already is a meaning for “linking statically” a DLL, namely to have it loaded and have its functions resolved automatically.
Which is the usual way of using a DLL.
And which is in contrast to explicitly loading it dynamically and using GetProcAddress to resolve its functions.
Regarding
” when I statically linked the DLLs everything started working great
presumably earlier you have explictly loaded the DLLs dynamically, and used GetProcAddress, and presumably something about that did not work perfectly.
One main problem with GetProcAddress is that it assumes that the provided function name is encoded as Windows ANSI (the machine-dependent encoding reported by GetACP), and then (apparently) translates that to UTF-8 for the function lookup.
One workaround could be to access the function by ordinal rather than name.
One way to find the ordinal with Microsoft's tools, is to use dumpbin /exports.

Related

Dynamic linking - Linux Vs. Windows

Under Windows, when I compile C/C++ code in a DLL project in MSVC I am getting 2 files:
MyDll.dll
MyDll.lib
where as far as I understand MyDll.lib contains some kind of pointers table indicating functions locations in the dll. When using this dll, say in an exe file, MyDll.lib is embedded into the exe file during linkage so in runtime it "knows" where the functions are located in MyDll.dll and can use them.
But if I compile the same code under Linux I am getting only one file MySo.so without MySo.a (the equivalent to lib file in Linux) so how does an executable file under Linux knows where the functions are located in MySo.so if nothing is embedded into it during linking?
The MSVC linker can link together object files (.obj) and object libraries (.lib) to produce an .EXE or a .DLL.
To link with a DLL, the process in MSVC is to use a so-called import library (.LIB) that acts as a glue between the C function names and the DLL's export table (in a DLL a function can be exported by name or by ordinal - the latter was often used for undocumented APIs).
However, in most cases the DLL export table has all the function names and thus the import library (.LIB) contains largely redundant information ("import function ABC -> exported function ABC", etc).
It is even possible to generate a .LIB from an existing .DLL.
Linkers on other platforms don't have this "feature" and can link with dynamic libraries directly.
On Linux, the linker (not the dynamic linker) searches through the shared libraries specified at link time and creates references to them inside the executable. When the dynamic linker loads these executables it loads the shared libraries they require into memory and resolves the symbols, which allows the binaries to be run.
MySo.a, if created, would actually include the symbols to be linked directly into the binary instead of the "symbol lookup tables" used on Windows.
rustyx's answer explains the process on Windows more thoroughly than I can; it's been a long time since I've used Windows.
The difference you are seeing is more of an implementation detail - under the hood both Linux and Windows work similarly - you code calls a stub function which is statically linked in your executable and this stub then loads DLL/shlib if necessary (in case of delayed loading, otherwise library is loaded when program starts) and (on first call) resolves symbol via GetProcAddress/dlsym.
The only difference is that on Linux the these stub functions (which are called PLT stubs) are generated dynamically when you link your app with dynamic library (library contains enough information to generate them), whereas on Windows they are instead generated when DLL itself is created, in a separate .lib file.
The two approaches are so similar that it's actually possible to mimic Windows import libraries on Linux (see Implib.so project).
On Linux, you pass MySo.so to the linker and it is able to extract only what is needed for the link phase, putting in a reference that MySo.so is needed at run time.
.dll or .so are shared libs (linked in runtime), while .a and .lib is a static library (linked in compile time). This is no difference between Windows and Linux.
The difference is, how are they handled. Note: the difference is only in the customs, how are they used. It wouldn't be too hard to make Linux builds on the Windows way and vice versa, except that practically no one does this.
If we use a dll, or we call a function even from our own binary, there is a simple and clear way. For example, in C, we see that:
int example(int x) {
...do_something...
}
int ret = example(42);
However, on the asm level, there could be many differences. For example, on x86, a call opcode is executed, and the 42 is given on the stack. Or in some registers. Or anywhere. No one knows that before writing the dll, how it will be used. Or how the projects will want to use it, possible written with a compiler (or in a language!) which doesn't even exist now (or is it unknown for the developers of the dll).
For example, by default, both C and Pascal puts the arguments (and gets the return values) from the stack - but they are doing it in different order. You can also exchange arguments between your functions in the registers by some - compiler-dependent - optimization.
As you see correctly, the Windows custom is that building a dll, we also create a minimal .a/.lib with it. This minimal static library is only a wrapper, the symbols (functions) of that dll are reached through it. This makes the required asm-level calling conversions.
Its advantage is the compatibility. Its disadvantage is that if you have only a .dll, you can have a hard time to figure out, how its functions want to be called. This makes the usage of dlls a hacking task, if the developer of the dll does not give you the .a. Thus, it serves mainly closedness purposes, for example so is it easier to get extra cash for the SDKs.
Its another disadvantage is than even if you use a dynamical library, you need to compile this little wrapper statically.
In Linux, the binary interface of the dlls is standard and follows the C convention. Thus, no .a is required and there is binary compatibility between the shared libs, in exchange we don't have the advantages of the microsoft custom.

Does DLL linking on windows results in GetProcAddress on runtime?

I'm curious about how Dynamic Linking works on windows. Since we CAN NOT link to a directly, windows usually link your executable to a LIB file which contains the stub of functions exported by the DLL. Does this type of linking results in LoadLibrary and GetProcAddress at runtime? If not, how does the linking work internally?
The answer is maybe.
The default method is to create an Import Table, which lists all required DLL's and the functions used from there. This table is parsed directly by the OS. It will probably reuse some of the same code behind LoadLibrary for that. It most likely will not use the code from GetProcAddress but prefer to do a single bulk lookup of all necessary functions.
However there's an MSVC feature called delay-loading. With this feature, MSVC++ will not build such an import table, but insert actual LoadLibrary and GetProcAddress calls. The benefit is that these calls are made at the latest possible moment. While you don't need a particular DLL, it's not loaded. This can accelerate program start up.

C/C++ How Does Dynamic Linking Work On Different Platforms?

How does dynamic linking work generally?
On Windows (LoadLibrary), you need a .dll to call at runtime, but at link time, you need to provide a corresponding .lib file or the program won't link... What does the .lib file contain? A description of the .dll methods? Isn't that what the headers contain?
Relatedly, on *nix, you don't need a lib file... How how does the compiler know that the methods described in the header will be available at runtime?
As a newbie, when you think about either one of the two schemes, then the other, neither of them make sense...
To answer your questions one by one:
Dynamic linking defers part of the linking process to runtime.
It can be used in two ways: implicitly and explicitly.
Implicitly, the static linker will insert information into the
executable which will cause the library to load and resolve the
necessary symbols. Explicitly, you must call LoadLibrary or
dlopen manually, and then GetProcAddress/dlsym for each
symbol you need to use. Implicit loading is used for things
like the system library, where the implementation will depend on
the version of the system, but the interface is guaranteed.
Explicit loading is used for things like plug-ins, where the
library to be loaded will be determined at runtime.
The .lib file is only necessary for implicit loading. It
contains the information that the library actually provides this
symbol, so the linker won't complain that the symbol is
undefined, and it tells the linker in what library the symbols
are located, so it can insert the necessary information to cause
this library to automatically be loaded. All the header files
tell the compiler is that the symbols will exist, somewhere; the
linker needs the .lib to know where.
Under Unix, all of the information is extracted from the
.so. Why Windows requires two separate files, rather than
putting all of the information in one file, I don't know; it's
actually duplicating most of the information, since the
information needed in the .lib is also needed in the .dll.
(Perhaps licensing issues. You can distribute your program with
the .dll, but no one can link against the libraries unless
they have a .lib.)
The main thing to retain is that if you want implicit loading,
you have to provide the linker with the appropriate information,
either with a .lib or a .so file, so that it can insert that
information into the executable. And that if you want explicit
loading, you can't refer to any of the symbols in the library
directly; you have to call GetProcAddress/dlsym to get their
addresses yourself (and do some funny casting to use them).
The .lib file on Windows is not required for loading a dynamic library, it merely offers a convenient way of doing so.
In principle, you can use LoadLibrary for loading the dll and then use GetProcAddress for accessing functions provided by that dll. The compilation of the enclosing program does not need to access the dll in that case, it is only needed at runtime (ie. when LoadLibrary actually executes). MSDN has a code example.
The disadvantage here is that you need to manually write code for loading the functions from the dll. In case you compiled the dll yourself in the first place, this code simply duplicates knowledge that the compiler could have extracted from the dll source code automatically (like the names and signatures of exported functions).
This is what the .lib file does: It contains the GetProcAddress calls for the Dlls exported functions, generated by the compiler so you don't have to worry about it. In Windows terms, this is called Load-Time Dynamic Linking, since the Dll is loaded automatically by the code from the .lib file when your enclosing program is loaded (as opposed to the manual approach, referred to as run-time dynamic linking).
How does dynamic linking work generally?
The dynamic link library (aka shared object) file contains machine code instructions and data, along with a table of metadata saying which offsets in that code/data relate to which "symbols", the type of the symbol (e.g. function vs data), the number of bytes or words in the data, and a few other things. Different OS will tend to have different shared object file formats, and indeed the same OS may support several, but that's the gist of it.
So, imagine the shared library's a big chunk of bytes with an index like this:
SYMBOL ADDRESS TYPE SIZE
my_function 1000 function 2893
my_number 4800 variable 4
In general, the exact type of the symbols need not be captured in the metadata table - it's expected that declarations in the library's header files contain all the missing information. C++ is a bit special - compared to say C - because overloading can mean there are several functions with the same name, and namespaces allow for further symbols that would otherwise be ambiguously named - for that reason name mangling is typically used to concatenate some representation of the namespace and function arguments to the function name, forming something that can be unique in the library object file.
A program wanting to use the shared object can generally do one of two things:
have the OS load both itself and the shared object around the same time (before executing main()), with the OS Loader responsible for finding the symbols and examining metadata in the program file image about the use of those symbols, then patching in symbol addresses in the memory the program uses, such that the program can then just run and work functionally as if it'd known about the symbol addresses when it was first compiled (but perhaps a little slower)
or, explicitly in its own source code call dlopen sometime after main runs, then use dlsym or similar to get the symbol addresses, save them into (function/data) pointers based on the programmer's knowledge of the expected data types, then call them explicitly using the pointers.
On Windows (LoadLibrary), you need a .dll to call at runtime, but at link time, you need to provide a corresponding .lib file or the program won't link...
That doesn't sound right. Should be one or the other I'd think.
Wtf does the .lib file contain? A description of the .dll methods? Isn't that what the headers contain?
A lib file is - at this level of description - pretty much the same as a shared object file... the main difference is that the compiler's finding the symbol addresses before the program's shipped and run.
Modern *nix systems derive process of dynamic linking from Solaris OS. Linux, particularly, doesn't need separate .lib file because all external dependencies are contained in ELF format. .interp section of ELF file indicates that there are external symbols inside this executable that needed to be resolved dynamically. This comes for dynamic linking.
There is a way to handle dynamic linking in user space. This method is called dynamic loading. This is when you are using system calls to get function pointers to methods from external *.so.
More information can be found from this article http://www.ibm.com/developerworks/library/l-dynamic-libraries/.
Relatedly, on OS X (and I assume *nix... dlopen), you don't need a lib file... How how does the compiler know that the methods described in the header will be available at runtime?
Compilers or linkers do not need such information. You, the programmer, need to handle the situation that the shared libraries you try to open by dlopen() may not exist.
You can use a DLL file in Windows in two ways: Either you link with it, and you're done, nothing more to do. Or you load it dynamically during run-time.
If you link with it, then the DLL library file is used. The link-library contains information that the linker uses to actually know which DLL to load and where in the DLL functions are, so it can call them. When your program is loaded, the operating system also loads the DLL for you, basically what is does it call LoadLibrary for you.
In other operating systems (like OS X and Linux) it works in a similar way. The difference is that on these systems the linker can look directly at the dynamic library (the .so/.dynlib file) and figure out what's needed without a separate static library like on Windows.
To load a library dynamically, you don't need to link with anything related to the library you want to load.
Like others already said: what is included in a .lib file on Windows is included directly in the .so/.dynlib on Linux/OS X. But the main question is... why?
Isn't *nix solution better?
I think it is, but the .lib has one advantage. The developer linking to the DLL doesn't actually need to have access to the DLL file itself.
Does a scenario like that happen often in the real world? Is it worth the effort of maintaining two files per DLL file? I don't know.
Edit: Ok, guys let's make things even more confusing! You can link directly to a DLL on Windows, using MinGW. So the whole import library problem is not directly related to Windows itself. Taken from sampleDLL article from MinGW wiki:
The import library created by the "--out-implib" linker option is
required iff (==if and only if) the DLL shall be interfaced from some
C/C++ compiler other than the MinGW toolchain. The MinGW toolchain is
perfectly happy to directly link against the created DLL. More details
can be found in the ld.exe info files that are part of the binutils
package (which is a part of the toolchain).
Linux also requires to link, but instead against a .Lib library it needs to link to the dynamic linker /lib/ld-linux.so.2, but this usually happens behind the scenes when using GCC (however if using an assembler you do need to specify it manually).
Both approaches, either the Windows .LIB approach or the Linux dynamic linker linking approach, are considered in reality as static linking. There is, however, a difference that in Windows part of the work is done at link time although it still has work at load time (I am not sure, but I think that the .LIB file is merely for the linker to know the physical library name, the symbols however are only resolved at load time), while in Linux everything besides linking to the dynamic linker happen at load time.
Dynamic linking is in general referring to open manually the DLL file at runtime (such as using LoadLinrary()), in which case the burden is entirely on the programmer.
In shared library, such as .dll .dylib and .so, there is some information about symbol's name and address, like this:
------------------------------------
| symbol's name | symbol's address |
|----------------------------------|
| Foo | 0x12341234 |
| Bar | 0xabcdabcd |
------------------------------------
And the load function, such as LoadLibrary and dlopen, loads shared library and make it available to use.
GetProcAddress and dlsym find you symbol's address. For example:
HMODULE shared_lib = LoadLibrary("asdf.dll");
void *symbol = GetProcAddress("Foo");
// symbol is 0x12341234
In windows, there is .lib file to use .dll. When you link to this .lib file, you don't need to call LoadLibrary and GetProcAddress, and just use shared library's function as if they're "normal" functions. How can it work?
In fact, the .lib contains an import information. It's like that:
void *Foo; // please put the address of Foo there
void *Bar; // please put the address of Bar there
When the operating system loads your program (strictly speaking, your module), operating system performs LoadLibrary and GetProcAddress automatically.
And if you write code such as Foo();, compiler convert it into (*Foo)(); automatically. So you can use them as if they're "normal" functions.

How to use dll's?

I know that if I have an .a or .so file and a header file for that library (for example for SystemC), I should
1. include header file
2. link the appropriate library.
But I can't handle with only .dll file as I can link it as well, but don't have a hearer file to include and to use commands. Con someone explain me what kind of .dll-s exist and how they are possible to use? Is it possible to use any .dll file or it should be a specific kind of .dll to be able to integrate to my application?
A DLL is functionally equivalent to a .so file (also know as a 'shared object' or 'shared library'). You need a header to declare the functions that are available inside the DLL and you need to link against a library which handles the business of loading and executing DLL calls (mostly delegated to the OS).
It is possible to use a DLL without any sort of header. You can directly call Win32 API's which will dynamically load a DLL into your programs virtual address space and call other API's which will give you what are essentially function pointers. However, you need to know the signatures of the function pointers to use the properly so what you're effectively doing in that case is declaring a tiny subsection of the actual DLL header for your use.
This wikipedia article may help, especially the section on shared libraries
Unlike Linux, Windows libraries are seperated into two forms: DLL (for runtime linking) and LIB for symbol declarations. link.exe (the windows linker) expects .lib files to resolve symbols being used by your program's headers during build time. More information here:
http://msdn.microsoft.com/en-us/library/ba1z7822(VS.71).aspx
Note that if you load a DLL compiled in C++, you hvae to avoid passing object pointers across the interface, as they are in general not portable. You have to keep to basic C calls and calling conventions, as that is what is defined by the Windows or Linux platform ABI.

Can multiple versions of a same (Boost) DLL co-exist in same process?

My (C++, cross-platform) app is heavily using Boost libraries (say version 1.x), and I want to also link against a 3rd-party (vendor)'s SDK (no source), itself using Boost (but version 1.y).
So, we both link dynamically against our own version of Boost DLLs, CRT being identical. Consequently, at run-time my app would have to load both DLL of Boost 1.x & 1.y.
What are the potential issues & gotchas associated?
I can't change vendor's SDK, but I can change my app. Maybe I should try to link statically against my Boost 1.x?
PS: Name of Boost's DLL include their version, so no name collision, both are identifiable. Not the usual DLL-hell.
As far as using the DLLs for different versions there should be no problem. At least not on Windows.
This is true if the SDK is using boost internally. If the SDK uses boost constructs in its interface, for example: it has a function that returns a boost::optional, then having multiple versions can cause problems. It could still work fine, dependent on the changes between the versions, but that will definitely be a risk. I don't know of any good solution in that case. This is also true if you include a SDK header file that includes a boost header file.
This is a big problem.
Do a search on DLL hell.
Basically the DLL (or shared libs in Linux) are loaded but not all the names are resolved at load time. What happens is a lazy evaluation, so the names are evaluated on first use. The problem is that if 2 dll have the same name then the location where the name is resolved to depends on the what order the DLL are searched in (which depends on load order).
If you statically link then you will not have problems with method calls as yours will all be resolved at compile time and the third party will be resolved at runtime from the DLL. But what about structures that are created by version-1 boost. If you then pass these to the third party library that then passes it to the version-x boost. Are the structures layed out in the same way?
This is a very tricky area and when problems occur very hard to de-bug.
So try and use the same version.
If you write a function foo, and export it from F.dll, and another function foo exported from G.dll, would you expect problems?
When AF.exe is linked, the linker is told: put some code in there that loads the address of function foo from F.dll. Now BG.dll is linked to retrieve the foo address from G.dll. I still see no problem.
Now replace AF.exe with your app, BG.dll with your vendor's app, F.dll with your boost version, G.dll with the vendor's boost version.
Concluding: I see no problems if the dll names are different.