Stub in an executable - c++

I have gone through SO question1 and SO question2 but they are far more descriptive for my simple problem and here it is:
I have an application which is dynamically linked to a shared object(.dll, .so or whatever!). I am aware that the tool chain leaves a stub in our application which will be filled by dynamic linker. Fare enough !!
What I didn't get:
1) What will a stub look like( I know it's an odd way to put it)? I can
guess that it is an entry point to our application but is it what we call a
backdoor?
2) Suppose the that we looking for the object code for a function printf() but
the dynamic library that we are linking to, say mylib.dll contains object
code for printf() but not restricted to that. When the linking happens are
linkers smart enough to copy the object code for printf() alone or will
it copy the entire dynamic library to the application?
Or am I totally confused?

When you link against a DLL, the linker just creates an entry in the Import Directory of the PE file. There is no copying of code since that will duplicate code unnecessarily. Instead, the linker will create an entry telling the PE loader what to load.
For example, if you use the function foo_bar from your foo.dll, the linker inserts import descriptor (IMAGE_IMPORT_DESCRIPTOR) that specifies the name of the dll to load (foo.dll) and function descriptor (IMAGE_THUNK_DATA) that specifies the name of the function (foo_bar). When your code that calls foo_bar is compiled, the compiler is actually generating an instruction that calls an address from the IMAGE_THUNK_DATA entry. So when your executable runs, the PE loader will check the import descriptors and load foo.dll and then check the function descriptors and gets the address of those functions from foo.dll and puts the address in the IMAGE_THUNK_DATA structure. After that, control is transferred to your application and the call to foo_bar will work since it's now pointing to the address of foo_bar.

The DLL is an entity that exist for itself. I will be loaded into your process at load time when you used the import library. The Windows API functions LoadLibrary and GetProcAddress also allows to load the DLL at run time. In any case the DLL will not be changed. If you call only a subset of the function it still provides all functions.
The Linker doesn't change the DLL. It justs adds the stub code to the program that uses the DLL function. The stub
loads the DLL to the process
ajusts the function pointer to the actual implementation in the DLL utilizing the import table of the DLL.

Related

difference between Load DLL and Direct Call

i think is very stupid, but I can't understand,
for example, I want use Windows API like GetWindowsDirectory, GetSystemInfo and etc... I can use Api directly or calling through GetProcAddress :
Method 1
here I can calling APIs with LoadLibrary and GetProcAddress :
#include <windows.h>
typedef UINT (WINAPI *GET_WIN_DIR)(LPWSTR lpBuffer, UINT size);
TCHAR infoBuffer[MAX_PATH + 1];
HINSTANSE dllLoad = LoadLibrary("Kernel32.dll");
GET_WIN_DIR function = (GET_WIN_DIR )GetProcAddress(dllLoad, "GetWindowsDirectoryW");
int result = function2(infoBuffer, MAX_PATH + 1);
Method 2
here I can calling directly APIs like GetWindowsDirectory :
#include <windows.h>
TCHAR infoBuffer[MAX_PATH + 1];
GetWindowsDirectory(infoBuffer, MAX_PATH);
I have 2 question :
What is the difference between the two methods above?
is it load Library impact on executable file?(.exe)(I did test, but it'snot changed)
Microsoft calls
Method 1 ... Explicit linking
Method 2 ... Implicit linking
From MSDN Linking an Executable to a DLL:
Implicit linking is sometimes referred to as static load or load-time dynamic linking. Explicit linking is sometimes referred to as dynamic load or run-time dynamic linking.
With implicit linking, the executable using the DLL links to an import library (.lib file) provided by the maker of the DLL. The operating system loads the DLL when the executable using it is loaded. The client executable calls the DLL's exported functions just as if the functions were contained within the executable.
With explicit linking, the executable using the DLL must make function calls to explicitly load and unload the DLL and to access the DLL's exported functions. The client executable must call the exported functions through a function pointer.
An executable can use the same DLL with either linking method. Furthermore, these mechanisms are not mutually exclusive, as one executable can implicitly link to a DLL and another can attach to it explicitly.
In our projects we use implicit linking in any common case.
We use the explicit linking exceptionally in two situations:
for plug-in DLLs which are loaded explicitly at run-time
in special cases where the implicit linked function is not the right one.
The 2nd case may happen if we use DLLs which themselves link to distinct versions of other DLLs (e.g. from Microsoft). This is, of course, a bit critical. Actually, we try to prevent the 2nd case.
No, I don't think it's stupid at all. If you don't understand, ask. That's what this site is for. Maybe you'll get downvoted, who knows, but not by me. Goes with the territory. No pain, no gain, ask me how I know.
Anyway, the main purpose of what #Scheff calls 'explicit linking' is twofold:
If you're not sure whether the the DLL you want to use to is going to be present on the machine at runtime (although you can also use /DELAYLOAD for this which is a lot more convenient).
If you're not sure if the function you want to call is present in (for example) all versions of Windows on which you want your application to run.
Regard point 1, an example of this might be reading or writing WMA files. Some older versions of Windows did not include WMA support by default (we're going back quite a long way here) and if you implicitly link to WMA.DLL then your application won't start up if it's not present. Using explicit linking (or /DELAYLOAD) lets you check for this at runtime and put up a polite message if it's missing while still allowing the rest of your app to function as normal.
As for point 2, you might, for example, want to make use of the LoadIconWithScaleDown() function because it generally produces a nicer scaled icon than LoadIcon(). However, if you just blindly call it then, again, your app wont run on XP because XP doesn't support it, so you would instead call it conditionally, via GetProcAddress(), if it's available and fall back to LoadIcon() if not.
Okay, so to round off, what's the deal with /DELAYLOAD? Well, this is a linker switch that lets you tell the linker which DLL's are optional for your app. Once you've done that, then you can do something like this:
if (LoadIconWithScaleDown)
LoadIconWithScaleDown (...);
else
LoadIcon (...);
And that is pretty neat.
So I hope you can now see that this question is really about the utility of explicit linking versus the inconvenience involved (all of which goes way anyway with /DELAYLOAD). What goes on under the covers is, for me, less interesting.
And yes, the end result, in terms of the way the program behaves, is the same. Explicit linking or delay loading might involve a small (read: tiny) performance overhead but I really wouldn't worry about that, and delay loading involves a few potential 'gotchas' (which won't affect most normal mortals) as detailed here.

MSVCR and CRT Initialization

Out of curiosity, what exactly happens when an application compiled with the MSVCR is loaded, resp. how does the loader of Windows actually initialize the CRT? For what I have gathered so far, when the program as well as all the imported libraries are loaded into memory and all relocations are done, the CRT startup code (_CRT_INIT()?) initializes all global initializers in the .CRT$XC* sections and calls the user defined main() function. I hope this is correct so far.
But let's assume, for the sake of explanation, a program that is not using the MSVCR (e.g. an application built with Cygwin GCC or other compilers) tries to load a library at runtime, requiring the CRT, using a custom loader/runtime linker, so no LoadLibrary() involved. How would the loader/linker has to handle CRT initialization? Would it have to manually initialize all "objects" in said sections, does it have to do something else to make the internal wiring of the library work properly, or would it have to just call _CRT_INIT() (which unpractically is defined in the runtime itself and not exported anywhere as far as I am aware). Would this mix-up even work in any way, assuming that the non-CRT application and the CRT-library would not pass any objects, exceptions and things the like between them?
I would be very interested in finding out, because I can't quite make out what the CRT has an effect on the actual loading process...
Any information is very appreciated, thanks!
The entrypoint for an executable image is selected with the /ENTRY linker option. The defaults it uses are documented well in the MSDN Library article. They are the CRT entrypoint.
If you want to replace the CRT then either pick the same name or use the /ENTRY option explicitly when you link. You'll also need /NODEFAULTLIB to prevent it from linking the regular .lib
Each library compiled against the C++ runtime is calling _DllMainCRTStartup when it's loaded. _DllMainCRTStartup calls _CRT_INIT, which initializes the C/C++ run-time library and invokes C++ constructors on static, non-local variables.
The PE format contains an optional header that has a slot called 'addressofentrypoint', this slot calls a function that will call _DllMainCRTStartup which fires the initialization chain.
after _DllMainCRTStartup finishes the initialization phase it will call your own implemented DllMain() function.
When you learn about programming, someone will tell you that "the first thing that happens is that the code runs in main. But that's a bit like when you learn about atoms in School, they are fairly well organized and operate acorrding to strict rules. If you later go to a Nuclear/Particle Physics class at university, those simple/strict rules are much more detailed and don't always apply, etc.
When you link a C or C++ program the CRT contains some code something like this:
start()
{
CRT_init();
...
Global_Object_Constructors();
...
exit(main());
}
So the initialization is done by the C runtime library itself, BEFORE it calls your main.
A DLL has a DllMain that is executed by LoadLibrary() - this is responsible for initializing/creating global objects in the DLL, and if you don't use LoadLibrary() [e.g. loading the DLL into memory yourself] then you would have to ensure that objects are created and initialized.

Win32 DLL importing issues (DllMain)

I have a native DLL that is a plug-in to a different application (one that I have essentially zero control of). Everything works just great until I link with an additional .lib file (links my DLL to another DLL named ABQSMABasCoreUtils.dll). This file contains some additional API from the parent application that I would like to utilize. I haven't even written any code to use any of the functions exported but just linking in this new DLL is causing problems. Specifically, I get the following error when I attempt to run the program:
The application failed to initialize properly (0xc0000025). Click on OK to terminate the application.
I believe I have read somewhere that this is typically due to a DllMain function returning FALSE. Also, the following message is written to the standard output:
ERROR: Memory allocation attempted before component initialization
I am almost 100% sure this error message is coming from the application and is not some type of Windows error.
Looking into this a little more (aka flailing around and flipping every switch I know of) I linked with /MAP turned on and found this in the resulting .map file:
0001:000af220 ??3#YAXPEAX#Z 00000001800b0220 f ABQSMABasCoreUtils_import:ABQSMABasCoreUtils.dll
0001:000af226 ??2#YAPEAX_K#Z 00000001800b0226 f ABQSMABasCoreUtils_import:ABQSMABasCoreUtils.dll
0001:000af22c ??_U#YAPEAX_K#Z 00000001800b022c f ABQSMABasCoreUtils_import:ABQSMABasCoreUtils.dll
0001:000af232 ??_V#YAXPEAX#Z 00000001800b0232 f ABQSMABasCoreUtils_import:ABQSMABasCoreUtils.dll
If I undecorate those names using "undname" they give the following (same order):
void __cdecl operator delete(void * __ptr64)
void * __ptr64 __cdecl operator new(unsigned __int64)
void * __ptr64 __cdecl operator new[](unsigned __int64)
void __cdecl operator delete[](void * __ptr64)
I am not sure I understand how anything from ABQSMABasCoreUtils.dll can exist within this .map file or why my DLL is even attempting to load ABQSMABasCoreUtils.dll if I don't have any code that references this DLL. Can anyone help me put this information together and find out why this isn't working? For what it's worth I have confirmed via "dumpbin" that the parent application imports ABQSMABasCoreUtils.dll, so it is being loaded no matter what. I have also tried delay loading this DLL in my DLL but that did not change the results.
EDIT
I have double checked and all files involved are 64 bit.
I just had exactly the same problem. This is an issue with the Abaqus API rather than with the loading of DLLS.
I think it is because the Abaqus API overrides the new and delete functions (as you seem to have noticed). If you call new or delete in your program before initializing the Abaqus API, such as by calling odb_initializeAPI(); then you get the
ERROR: Memory allocation attempted before component initialization
error message and the program crashes.
In my program, calling odb_initializeAPI(); before the first new resolved the problem.
Well, sure you'll reference the imports of that library. Hard to write a C++ program without using the new or delete operator. Dealing with 3rd party software that thinks it needs to override the CRT version of those operators is hard enough, impossible when it won't allow you to call them until it thinks the time is right. Abandon all hope or seek help from the vendor.
One of the possible reason of an error during loading of ABQSMABasCoreUtils.dll is that some dependency module (inclusive delayed load DLLs) could not be found. Use Dependency Walker (see http://www.dependencywalker.com/) to examine all dependencies of ABQSMABasCoreUtils.dll.
I have two suggestions:
Verify that you can load ABQSMABasCoreUtils.dll with respect of LoadLibrary. You don't need call any function from ABQSMABasCoreUtils.dll. Usage of LoadLibrary I don't see as the end solution. It' s only a diagnostic test. With the test you can verify either you have some general problem of loading ABQSMABasCoreUtils.dll in your program or you have some kind of process initialization problem.
If loading of ABQSMABasCoreUtils.dll with respect of LoadLibrary will failed, then use profiling feature of Dependency Walker to protocol of all calls done during loading of ABQSMABasCoreUtils.dll. One other way would be usage of Process Monitor (see http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) to trace what file and registry operations will be done during loading of ABQSMABasCoreUtils.dll.
If LoadLibrary is not failed, then you have really an initialization problem of DLLs. Typically the problem exist if a DLL inside of DllMain try use a function from another DLL which is not yet initialized (not yet returns from DllMain). Before one start diagnostic of this problem, we should try to exclude a more simple problems with LoadLibrary.
The ABQSMABasCoreUtils.dll looks like it's importing 64-bit functions. Is your dll also 64-bit? If not, then that's the problem - you cannot mix DLLs compiled for different architectures in the same process.

Call function in c++ dll without header

I would like to call a method from an dll, but i don't have the source neither the header file. I tried to use the dumpbin /exports to see the name of the method, but i can found the methods signature?
Is there any way to call this method?
Thanks,
If the function is a C++ one, you may be able to derive the function signature from the mangled name. Dependency Walker is one tool that will do this for you. However, if the DLL was created with C linkage (Dependency Walker will tell you this), then you are out of luck.
The C++ language does not know anything about dlls.
Is this on Windows? One way would be to:
open the dll up in depends.exe shipped with (Visual Studio)
verify the signature of the function you want to call
use LoadLibrary() to get load this dll (be careful about the path)
use GetProcAddress() to get a pointer to the function you want to call
use this pointer-to-function to make a call with valid arguments
use FreeLibrary() to release the handle
BTW: This method is also commonly referred to as runtime dynamic linking as opposed to compile-time dynamic linking where you compile your sources with the associated lib file.
There exists some similar mechanism for *nixes with dlopen, but my memory starts to fail after that. Something called objdump or nm should get you started with inspecting the function(s).
As you have found, the exports list in a DLL only stores names, not signatures. If your DLL exports C functions, you will probably have to disassemble and reverse engineer the functions to determine method signatures. However, C++ encodes the method signature in the export name. This process of combining the method name and signature is called "name mangling". This Stackoverflow question has a reference for determining the method signature from the mangled export name.
Try the free "Dependency Walker" (a.k.a. "depends") utility. The "Undecorate C++ Functions" option should determine the signature of a C++ method.
It is possible to figure out a C function signature by analysing beginnig of its disassembly. The function arguments will be on the stack and the function will do some "pops" to read them in reverse order. You will not find the argument names, but you should be able to find out their number and the types. Things may get more difficult with return value - it may be via 'eax' register or via a special pointer passed to the function as the last pseudo-argument (on the top of the stack).
If you indeed know or strongly suspect the function is there, you can dynamically load the DLL with loadLibrary and get a pointer to the function with getProcAddress. See MSDN
Note that this is a manual, dynamic way to load the library; you'll still have to know the correct function signature to map to the function pointer in order to use it. AFAIK there is no way to use the dll in a load-time capability and use the functions without a header file.
Calling non-external functions is a great way to have your program break whenever the 3rd party DLL is updated.
That said, the undname utility may also be helpful.

Compiling a DLL with gcc

Sooooo I'm writing a script interpreter. And basically, I want some classes and functions stored in a DLL, but I want the DLL to look for functions within the programs that are linking to it, like,
program dll
----------------------------------------------------
send code to dll-----> parse code
|
v
code contains a function,
that isn't contained in the DLL
|
list of functions in <------/
program
|
v
corresponding function,
user-defined in the
program--process the
passed argument here
|
\--------------> return value sent back
to the parsing function
I was wondering basically, how do I compile a DLL with gcc? Well, I'm using a windows port of gcc. Once I compile a .dll containing my classes and functions, how do I link to it with my program? How do I use the classes and functions in the DLL? Can the DLL call functions from the program linking to it? If I make a class { ... } object; in the DLL, then when the DLL is loaded by the program, will object be available to the program? Thanks in advance, I really need to know how to work with DLLs in C++ before I can continue with this project.
"Can you add more detail as to why you want the DLL to call functions in the main program?"
I thought the diagram sort of explained it... the program using the DLL passes a piece of code to the DLL, which parses the code, and if function calls are found in said code then corresponding functions within the DLL are called... for example, if I passed "a = sqrt(100)" then the DLL parser function would find the function call to sqrt(), and within the DLL would be a corresponding sqrt() function which would calculate the square root of the argument passed to it, and then it would take the return value from that function and put it into variable a... just like any other program, but if a corresponding handler for the sqrt() function isn't found within the DLL (there would be a list of natively supported functions) then it would call a similar function which would reside within the program using the DLL to see if there are any user-defined functions by that name.
So, say you loaded the DLL into the program giving your program the ability to interpret scripts of this particular language, the program could call the DLLs to process single lines of code or hand it filenames of scripts to process... but if you want to add a command into the script which suits the purpose of your program, you could say set a boolean value in the DLL telling it that you are adding functions to its language and then create a function in your code which would list the functions you are adding (the DLL would call it with the name of the function it wants, if that function is a user-defined one contained within your code, the function would call the corresponding function with the argument passed to it by the DLL, the return the return value of the user-defined function back to the DLL, and if it didn't exist, it would return an error code or NULL or something). I'm starting to see that I'll have to find another way around this to make the function calls go one way only
This link explains how to do it in a basic way.
In a big picture view, when you make a dll, you are making a library which is loaded at runtime. It contains a number of symbols which are exported. These symbols are typically references to methods or functions, plus compiler/linker goo.
When you normally build a static library, there is a minimum of goo and the linker pulls in the code it needs and repackages it for you in your executable.
In a dll, you actually get two end products (three really- just wait): a dll and a stub library. The stub is a static library that looks exactly like your regular static library, except that instead of executing your code, each stub is typically a jump instruction to a common routine. The common routine loads your dll, gets the address of the routine that you want to call, then patches up the original jump instruction to go there so when you call it again, you end up in your dll.
The third end product is usually a header file that tells you all about the data types in your library.
So your steps are: create your headers and code, build a dll, build a stub library from the headers/code/some list of exported functions. End code will link to the stub library which will load up the dll and fix up the jump table.
Compiler/linker goo includes things like making sure the runtime libraries are where they're needed, making sure that static constructors are executed, making sure that static destructors are registered for later execution, etc, etc, etc.
Now as to your main problem: how do I write extensible code in a dll? There are a number of possible ways - a typical way is to define a pure abstract class (aka interface) that defines a behavior and either pass that in to a processing routine or to create a routine for registering interfaces to do work, then the processing routine asks the registrar for an object to handle a piece of work for it.
On the detail of what you plan to solve, perhaps you should look at an extendible parser like lua instead of building your own.
To your more specific focus.
A DLL is (typically?) meant to be complete in and of itself, or explicitly know what other libraries to use to complete itself.
What I mean by that is, you cannot have a method implicitly provided by the calling application to complete the DLLs functionality.
You could however make part of your API the provision of methods from a calling app, thus making the DLL fully contained and the passing of knowledge explicit.
How do I use the classes and functions in the DLL?
Include the headers in your code, when the module (exe or another dll) is linked the dlls are checked for completness.
Can the DLL call functions from the program linking to it?
Yes, but it has to be told about them at run time.
If I make a class { ... } object; in the DLL, then when the DLL is loaded by the program, will object be available to the program?
Yes it will be available, however there are some restrictions you need to be aware about. Such as in the area of memory management it is important to either:
Link all modules sharing memory with the same memory management dll (typically c runtime)
Ensure that the memory is allocated and dealloccated only in the same module.
allocate on the stack
Examples!
Here is a basic idea of passing functions to the dll, however in your case may not be most helpfull as you need to know up front what other functions you want provided.
// parser.h
struct functions {
void *fred (int );
};
parse( string, functions );
// program.cpp
parse( "a = sqrt(); fred(a);", functions );
What you need is a way of registering functions(and their details with the dll.)
The bigger problem here is the details bit. But skipping over that you might do something like wxWidgets does with class registration. When method_fred is contructed by your app it will call the constructor and register with the dll through usage off methodInfo. Parser can lookup methodInfo for methods available.
// parser.h
class method_base { };
class methodInfo {
static void register(factory);
static map<string,factory> m_methods;
}
// program.cpp
class method_fred : public method_base {
static method* factory(string args);
static methodInfo _methoinfo;
}
methodInfo method_fred::_methoinfo("fred",method_fred::factory);
This sounds like a job for data structures.
Create a struct containing your keywords and the function associated with each one.
struct keyword {
const char *keyword;
int (*f)(int arg);
};
struct keyword keywords[max_keywords] = {
"db_connect", &db_connect,
}
Then write a function in your DLL that you pass the address of this array to:
plugin_register(keywords);
Then inside the DLL it can do:
keywords[0].f = &plugin_db_connect;
With this method, the code to handle script keywords remains in the main program while the DLL manipulates the data structures to get its own functions called.
Taking it to C++, make the struct a class instead that contains a std::vector or std::map or whatever of keywords and some functions to manipulate them.
Winrawr, before you go on, read this first:
Any improvements on the GCC/Windows DLLs/C++ STL front?
Basically, you may run into problems when passing STL strings around your DLLs, and you may also have trouble with exceptions flying across DLL boundaries, although it's not something I have experienced (yet).
You could always load the dll at runtime with load library