Wrapping DLL to log certain functions

Wrapping DLL to log certain functions - c++

So if I have both a complete set of headers and a .lib file for a C++ dll, is it possible to create a second C++ dll that wraps the original and allows me to see when certain functions are called and then it just calls the original functions? Is there a simpler way of doing this? I also am only concerned about a couple of functions in a large dll

Of course it is possible. Why wouldn't you think it is? It is even possible to define an exported function to be an alias for an exported function in another DLL, to pass through the functions you are not interested in.
Where you might run into a problem is when software uses the original .lib file to static link to the original DLL. Since you likely wouldn't be able to recompile such software to use your .lib files, your DLL would need to have the same filename as the original DLL, and replicate the original DLL's exports exactly (names and ordinal).
Another problem would be if the original DLL exports a class that software uses. Those would be harder to replicate.
A different approach would be to not replace the original DLL at all, but instead to inject your DLL into the target process and then detour just the DLL exports that you are interested in.

Related

How to control C++ DLL exported functions "easily"

Say I have a bunch of functions exported in a C++ DLL.
A customer asks for some code implementation which uses only some of the functions in the DLL.
I'd like to give the customer a DLL with the relevant functions only (because full DLL is too big, I dont want to send code he doesnt need to see, etc..).
Is there any way to export only part of the functions easily (e.g. without adding/removing "__declspec dllexport" declarations everytime)?
Perhaps "hide" function from full DLL or re-compile slimmer DLL (but control which functions included in a clean way).
Thanks!

Do I need to distribute a header file and a lib file with a DLL?

I'm updating a DLL for a customer and, due to corporate politics - among other things - my company has decided to no longer share source code with the customer.
Previously. I assume they had all the source and imported it as a VC++6 project. Now they will have to link to the pre-compiled DLL. I would imagine, at minimum, that I'll need to distribute the *.lib file with the DLL so that DLL entry-points can be defined. However, do I also need to distribute the header file?
If I can get away with not distributing it, how would the customer go about importing the DLL into their code?

Yes, you will need to distribute the header along with your .lib and .dll
Why ?
At least two reasons:
because C++ needs to know the return type and arguments of the functions in the library (roughly said, most compilers use name mangling, to map the C++ function signature to the library entry point).
because if your library uses classes, the C++ compiler needs to know their layout to generate code in you the library client (e.g. how many bytes to put on the stack for parameter passing).
Additional note: If you're asking this question because you want to hide implementation details from the headers, you could consider the pimpl idiom. But this would require some refactoring of your code and could also have some consequences in terms of performance, so consider it carefully

However, do I also need to distribute the header file?
Yes. Otherwise, your customers will have to manually declare the functions themselves before they can use it. As you can imagine, that will be very error prone and a debugging nightmare.

In addition to what others explained about header/LIB file, here is different perspective.
The customer will anyway be able to reverse-engineer the DLL, with basic tools such as Dependency Walker to find out what system DLLs your DLL is using, what functions being used by your DLL (for example some function from AdvApi32.DLL).
If you want your DLL to be obscured, your DLL must:
Load all custom DLLs dynamically (and if not possible, do the next thing anyhow)
Call GetProcAddress on all functions you want to call (GetProcessToken from ADVAPI32.DLL for example
This way, at least dependency walker (without tracing) won't be able to find what functions (or DLLs) are being used. You can load the functions of system DLL by ordinal, and not by name so it becomes more difficult to reverse-engineer by text search in DLL.
Debuggers will still be able to debug your DLL (among other tools) and reverse engineer it. You need to find techniques to prevent debugging the DLL. For example, the very basic API is IsDebuggerPresent. Other advanced approaches are available.
Why did I say all this? Well, if you intend to not to deliver header/DLL, the customer will still be able to find exported functions and would use it. You, as a DLL provider, must also provide programming elements with it. If you must hide, then hide it totally.

One alternative you could use is to pass only the DLL and make the customer to load it dynamically using LoadLibrary() + GetProcAddress(). Although you still need to let your customer know the signature of the functions in the DLL.
More detailed sample here:
Dynamically load a function from a DLL

C++: Include vs LoadLibrary()

I am having some trouble understanding why both #include and LoadLibrary() is needed in C++. In C++ "#include" forces the pre-processor to replace the #include line with the contents of the file you are including (usually a header file containing declarations). As far as I understand, this enables me to use the routines I might want in the external libraries the headers belong to.
Why do I then need LoadLibrary()? Can't i just #include the library itself?
Just as a side note: In C#, which I am more familiar with, I just Add a Reference to a DLL if I want to use types or routines from that DLL in my program. I do not have to #include anything, as the .NET framework apparently automatically searches all the referenced assemblies for the routines I want to use (as specified by the namespace)
Thank you very much in advance.
Edit: Used the word "definitions", but meant "declarations". Now fixed.
Edit 2: Tough to pick one answer, many good replies. Thanks for all contributions.

C++ uses a full separate compilation model; you can even compile
against code which hasn't been written. (This often occurs in
large projects.) When you include a file, all you are doing is
telling the compiler that the functions, etc. exist. You do not
provide an implementation (except for inline functions and
templates). In order to execute the code, you have to provide
the implementation, by linking it into your application. This
can occur in several different ways:
You have the source files; you compile them along with your
sources, and link in the resulting objects.
You have a static library; you must link against it.
You have a dynamic library. Here, what you must do will
depend on the implemention: under Windows, you must link
against a .lib stub, and put the .dll somewhere where the
runtime will find it when you execute. (Putting it in the same
directory as your application is usually a good solution.)
I don't quite understand your need to call LoadLibrary. The
only time I've needed this is when I've intentionally avoided
using anything in the library directly, and want to load it
conditionally, use GetProcAddr to get the addresses of the
functions I need.
EDIT:
Since I was asked to clarify "linking": program translation
(from the source to an executable) takes place in a number of
steps. In traditional terms, each translation unit is
"compiled" into an object file, which contains an image of the
machine instructions, but with unfilled spaces for external
references. For example, if you have:
extern void function();
in your source (probably via inclusion of a header), and you
call function, the compiler will leave the address field of
the call instruction blank, since it doesn't know where the
function will be located. Linking is the process of taking all
of the object files, and filling in these blanks. One of the
object files will define function, and the linker will
establish the actual address in the memory image, and fill in
the blank referring to function with the address of function
in that image. The result is a complete memory image of the
executable. On the early systems I worked on: literally. The
OS would simply copy the executable file directly into memory,
and then jump into it. Things like virtual memory and shared,
write protected code segments make this a little more
complicated today, but for statically linked libraries or object
files (my first two cases above), the differences aren't that
great.
Modern system technologies have blurred the lines somewhat. For
example, most Java (and I think C#) compilers don't generate
classical object files, with machine code, but rather byte code,
and the compile and link phases, above, don't take place until
runtime. Some C++ compilers also only generate byte code, which
will be compiled when the code is "linked". This is done to
permit cross-module optimizations. And all modern systems
support dynamic linking: some of the blank addresses are left
blank until execution time. And dynamic linking can be implicit
or explicit: when it is implicit, the link phase will insert
information into the executable concerning the libraries it
needs, and where to find them, and the OS will link them,
implicitly, either when the executable is loaded, or on demand,
triggered by the code attempting to use one of the unfilled
address slots. When it is explicit, you normally don't have any
explicit referenced to the name in your code. In the case of
function, above, for example, you wouldn't have any code which
directly called function. Your code would, however, load the
dynamic library using LoadLibrary (or dlopen under Unix),
then request the address of a name, using GetProcAddr (or
dlsys), and call the function indirectly through the pointer
it received.

The #include directive is, like all preprocessor functionality, merely a text replacement. The text "#include " is replaced with the contents of that file.
Typically (but not necessarily), this is used to include a header file which declares the functions that you want to use, i.e. you tell the compiler (which runs after the preprocessor) how some functions that you intend to use are named, what parameters they take, and what the return type is. It does not define what the function is actually doing.
You then also need an implementation of these functions, too. Usually, if you do not implement them in your program, you leave this task to the link stage. You give a list of libraries that your program depends on to the linker, and the linker divines via some implementation-defined way (such as an "import library") what it needs to do to "make it work". The linker will produce some glue code and write some information into the executable that will make the loader automatically load the required libraries. Everything "just works" without you having to do something special.
In some cases, however, you want to postpone the linker stage and do the loading "fully dynamically" by hand rather than automatically. This is when you have to call LoadLibrary() and GetProcAddress. The former brings the DLL into memory and does some setup (e.g. relocation), the latter gives you the address of a function that you want to call.
The #include in your code is still necessary so the compiler knows what to do with that pointer. Otherwise, you could of course call the obtained function via its address, but it would not be possible to call the function in a meaningful way.
One reason why one would want to load a library manually (using LoadLibrary) is that it is more failsafe. If you link a program against a library and the library cannot be found (or a symbol cannot be found), then your application will not start up and the user will see a more or less obscure error message.
If LoadLibrary fails or GetProcAddress doesn't work, your program can in principle still run, albeit with reduced functionality.
Another example for using LoadLibrary might be to load an alternative version of a function from a different library (some programs implement "plugins" that way). The function "looks" the same to the compiler, as defined in the include file, but may behave differently, as by whatever is in the loaded binary.

#include brings in source code only: symbol declarations for the compiler. A library (or a DLL) is object code: Use either LoadLibrary or link to a lib file to bring in object code.

LoadLibrary() causes the code module to be loaded from disk into your applications memory space for execution. This allows for dynamically loading code at runtime. You would not use LoadLibrary(), for example, if the code you want to use is compiled into a statically linked library. In that case you would provide the name of the .lib file that contained the code to the linker and it gets resolved at link time - the code is linked in to your .exe and the .lib is not distributed with the .exe in order for it to execute.
LoadLibrary() creates a dependency on an external DLL which must be present on the path provided to the method call in order for the .exe to properly execute. If LoadLibrary() fails, you must ensure your code will handle it appropriately, by either exiting gracefully or providing some other execution alternative. You must provide a .lib file to the linker the same as you would for the static library above. This .lib file however does not contain code, just entry points for the actual code that resides in the .dll.
In both cases you must #include the headers for the code you wish to execute. This is required by the compiler in order to build function call signatures properly based on the type information provided by the header.
C# assemblies contain both type information and IL. A single reference is sufficient to satisfy the need for header information and binding to the code itself.

#include is static, the substitution is done at compile time. LoadLibrary() lets you load a DLL at runtime, for example based on user imput.

why do we need to export a class in C++?

I am a beginner, so, please bear with me, if this sounds too trivial.When i searched over the net for this, i got results showing how to do it. My question is why we do it in the first place ?

This is specific to the Windows platform, when you're developping C++ DLLs.
You have to use the __declspec(dllexport) modifier in order for your class and its methods to appear on the exported symbol list for your DLL.
This way, executables using your DLL can instanciate and call methods in these classes.
However you have to make sure that the executable and the DLL are compiled by the same version of the same compiler, because C++ symbols are exported using a relatively complex name mangling encoding (you can see that using depends.exe), which format varies from one compiler to another.

You don't need to export anything, unless you're creating a DLL. In that case, you can use the dllexport attribute as an alternative to the "old school" way of using .def files.

Technically you cannot export a class, only functions.
However you can designate on a class level that all functions be exported.
Exporting a function means that that function can be called from outside of the current executable.
This is needed when you are writing a dll for example which is a separate entity.

DLL monitoring

Is there an application which allows me to see what is being sent to a DLL from a process?
I have a process and I have a DLL and I would like to monitor the parameters that are being sent to the functions so that I can use the DLL myself.
The EXPORT of the DLL is.
??0CCPCompressor##AAE#XZ
??0CCPExpandor##AAE#XZ
??1CCPCompressor##AAE#XZ
??1CCPExpandor##AAE#XZ
?Clear#CCPCompressor##QAEHXZ
?Clear#CCPExpandor##QAEHXZ
..Compress#CCPCompressor..
..Delete#CCPCompressor..
..Delete#CCPExpandor..
..Expand#CCPExpandor..
..Free#CCPCompressor..
..Free#CCPExpandor..
..Init#CCPCompressor..
..Init#CCPExpandor..
..New#CCPCompressor..
..New#CCPExpandor..

In general, this is a bad idea. Even if you have some set of captured parameters, without deep analysis of the DLL code you don't know what to do with those parameters and what ranges of parameters are accepted by certain methods. Example: if I call a method DoMathOperation(Add, 1, 2), you can mimic this call, but you won't be able to do DoMathOperation(Multiply, 2, 2) as you don't know that this is possible.

The simplest approach has been to simply relocate the original dll, and create a new dll that you make yourself, with the same exports. This dll would LoadLibrary the old dll from the alternate location.
This doesn't quite apply here - the dll is exporting c++ class members which has two consequences: c++ classes have to be statically loaded as there is no c++ mechanism to 'glue' c++ function pointers (obtained via GetProcAddress) into a class instance.
This means your shim dll would be in the unfortunate place of having to both import, and export, and identical set of symbols.
The only way around this is to write your shim dll in two parts:
Shim1:
One part would get the name of the original dll, and would export the same class defintion the original dll exported:
class __decldpec(dllexport) CCPCompressor {
...
Depends can crack the name decoration, or Undname.exe is distributed with Visual Studio.
This part would LoadLibrary() using an explicit path to shimdll2.dll located in some other folder, along with the original dll. GetProcAddress() would be needed to import functions exported by shimdll2.dll
Shim2:
The other shim dll would be located in a folder with the dll you are trying to intercept. This dll would have to import the class from the original compressor dll:
class __declspec(dllimport) CCPCompressor {
...
You can use the dll import library made by the first dll to actually link the symbols.
Then its a case of exporting functions from shim2.dll that shim1.dll will call whenever a CCPCompressor method is called.
NB. Other things: your version of the CCPCompressor class will need to have, at least, a large dummy array as you can't know from the dll exports how big the application expects the class to be (unless you happen to have an actual header file describing the class).
To decompose the exported names to build a class definition:
Open up the Visual Studio 20XX Command Prompt from the Start > Programs > Visual Studio 20XX -> Tools menu.
c:\...\VC>undname ?Clear#CCPCompressor##QAEHXZ
Microsoft (R) C++ Name Undecorator
Undecoration of :- "?Clear#CCPCompressor##QAEHXZ"
is :- "public: int __thiscall CCPCompressor:Clear(void)"
c:\...\VC>_
Do that for each function exported from the original dll (undname accepts some kind of text file to speed this process up) to find out how to declare a matching class def.

Is using detours compatible with your requirements?
From the site:
Overview
Innovative systems research hinges on the ability to easily instrument and extend existing operating system and application functionality. With access to appropriate source code, it is often trivial to insert new instrumentation or extensions by rebuilding the OS or application. However, in today's world systems researchers seldom have access to all relevant source code.
Detours is a library for instrumenting arbitrary Win32 functions on x86, x64, and IA64 machines. Detours intercepts Win32 functions by re-writing the in-memory code for target functions. The Detours package also contains utilities to attach arbitrary DLLs and data segments (called payloads) to any Win32 binary.
Detours preserves the un-instrumented target function (callable through a trampoline) as a subroutine for use by the instrumentation. Our trampoline design enables a large class of innovative extensions to existing binary software.
We have used Detours to create an automatic distributed partitioning system, to instrument and analyze the DCOM protocol stack, and to create a thunking layer for a COM-based OS API. Detours is used widely within Microsoft and within the industry.

The only reliable way is to debug your program (using any debugger like OllyDBG) and set breakpoint on required export function. Then you can simply trace the stack parameters sent to the calling function. This is only the start, you need to fully analyze function instructions within a debugger or disassembler to see what each parameter is doing and its type.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Wrapping DLL to log certain functions - c++

Related

How to control C++ DLL exported functions "easily"

Do I need to distribute a header file and a lib file with a DLL?

C++: Include vs LoadLibrary()

why do we need to export a class in C++?

DLL monitoring

Categories

Resources