Calling unexported functions in Win32 C++ - c++

How would I go about calling an unexported function in Win32 C++?

Calling unexported functions that are defined in the same module (DLL/EXE) as your code is easy: just call them like any other C++ function. Obviously this isn't what you're asking about. If you want to call unexported functions in a different module, you need to find out their addresses somehow.
One way to do this is to have the first module call an exported function in the second module which returns a function pointer. (Or: a struct containing function pointers, a pointer to an instance of a class, etc.) Think factory pattern.
Another way is to export a registration function from the first module and have the second module's initialization code call it, passing it pointers to unexported functions along with some sort of identifying info. (Better also have a corresponding unregistration function which is called before the second module is unloaded.)
Yet another way is to grovel through the debug symbols using dbghelp.dll. This would not be recommended for a real-world application because it would require distributing debug symbols and would be extremely slow, not to mention overly complex.

Additionally to bk1e's answer, there's still another method (not recommended as well).
Obtain the relative Adress of that function in the dll (e.g. via disassembly). This has to be done manually and before compiling.
In the program, you now have to obtain the startadress of the dll in memory (for example using an exported function and some calculation).
Now you can directly call that function using the relative Adress of the function + the startadress of the exported function.
I don't recommend this though. It works only on one defined version of that dll. Any recompile and the adress may change. Or that function may not be needed any more and gets deleted. There must be a reason, why this function is NOT exported. In general - you try to archive something the author of the library intentionally did not want you to do and that's "evil" most of the time.
You mentioned the ida-name. This name includes the startadress.

No two ways about it, you'll have to study the disassembly to figure out what gets pushed on the stack, and how it's used to determine the types.

Related

Using tcmalloc - How to load the malloc extensions properly?

In file gperftools-2.2.1/src/gperftools/malloc_extension.h, it reads:
// Extra extensions exported by some malloc implementations. These
// extensions are accessed through a virtual base class so an
// application can link against a malloc that does not implement these
// extensions, and it will get default versions that do nothing.
//
// NOTE FOR C USERS: If you wish to use this functionality from within
// a C program, see malloc_extension_c.h.
My question is how exactly can I access these extensions through a virtual base class?
Usually to load a class from a dynamic library, I would need to write a base class which allows me to get an instance of the wanted class and its functions through polymorphism, as described here.
However to do so there must be some class factory functions available in the API, but there are no such functions in any tcmalloc files. Moreover I would also need to load the tcmalloc library with dlopen(), which is not recommended according to the install note:
...loading a malloc-replacement library via dlopen is
asking for trouble in any case: some data will be allocated with one malloc, some with another.
So clearly accessing the extensions through the typical way as mentioned above is not an option. I can get away with using the C versions as declared in malloc_extensions_c.h but just wonder if there is any better solution.
I managed to load the malloc extensions via some 'hack', which is not as clean as I would prefer, but it gets the job done. Here is the (temporary) solution for those who are interested in.
First, most of these malloc extension functions are similar to static functions in a way that they are mostly called on the current instance only, e.g. to call the GetMemoryReleaseRate() function on the current process you just call MallocExtension::instance()->GetMemoryReleaseRate(). Therefore we don't need to create a base class and get an instance of MallocExtension class to call these functions.
For the example above, I'd just create a standalone function getMemoryReleaseRate() which simply calls the required function when it gets called, as below:
getMemoryReleaseRate()
{
MallocExtension::instance()->GetMemoryReleaseRate();
}
This function can be inserted directly to a source file, e.g. tcmalloc.cc, or, if you prefer not to edit the tcmalloc source every time there is a new version, added to your makefile, to be attached to the source file when it is compiled.
Now in your code, you can call the MallocExtension function via the 'facade' function you have created via dlsym(), e.g. as below:
typedef void (*getMemoryReleaseRate)();
((getMemoryReleaseRate)dlsym(RTLD_DEFAULT, "getMemoryReleaseRate"))();
Simply including this header and doing MallocExtension::instance()->GetMemoryReleaseRate(); would work too. No need to modify tcmalloc for that.

Why we have to export the function used by spawn?

In Erlang and while dealing with process, you have to export the function used in spawn function.
-module(echo).
-export([start/0, loop/0]).
start() ->
spawn(echo, loop, []).
The reason from the book "Programming Erlang, 2nd Edition. page 188" is
"Note that we also have to export the argument of spawn from the module. This is a good practice because we will be able to change the internal details of the server without changing the client code.".
And in the book "Erlang Programming", page 121:
-module(frequency).
-export([start/0, stop/0, allocate/0, deallocate/1]).
-export([init/0]).
%% These are the start functions used to create and
%% initialize the server.
start() ->
register(frequency, spawn(frequency, init, [])).
init() ->
Frequencies = {get_frequencies(), []},
loop(Frequencies).
Remember that when spawning a process, you have to export the init/ 0 function as it is used by the spawn/3 BIF. We have put this function in a separate export clause to distinguish it from the client functions, which are supposed to be called from other modules.
Would you please explain to me the logic behind that reason?
short answer is: spawn is not 'language construction' it's library function.
It means 'spawn' is situated in another module, which does not have access to any functions in your module but exported.
You have to pass to 'spawn' function some way to start your code. It can be function value (ie spawn(fun() -> (any code you want, including any local functions invocations) end) ) or module/exported function name/arguments, which is visible from other modules.
The logic is quite straightforward. Yet confusion can easily arise as:
export does not exactly match object-oriented encapsulation and especially public methods;
several common patterns require to export functions not meant to be called by regular clients.
What export really does
Export has a very strict meaning: exported functions are the only functions that can be referred to by their fully qualified name, i.e. by module, function name and arity.
For example:
-module(m).
-export([f/0]).
f() -> foo.
f(_Arg) -> bar.
g() -> foobar.
You can call the first function with an expression such as m:f() but this wouldn't work for the other two functions. m:f(ok) and m:g() will fail with an error.
For this reason, the compiler will warn in the example above that f/1 and g/0 are not called and cannot be called (they are unused).
Functions can always be called from outside a module: functions are values and you can refer to a local function (within a module), and pass this value outside. For example, you can spawn a new process by using a non-exported function, using spawn/1. You could rewrite your example as follows:
start() ->
spawn(fun loop/0).
This doesn't require to export loop. Joe Armstrong in other editions of Programming Erlang explicitely suggests to transform the code as above to avoid exporting loop/0.
Common patterns requiring an export
Because exports are the only way to refer to a function by name from outside a module, there are two common patterns that require exported functions even if those functions are not part of a public API.
The example you mention is whenever you want to call a library function that takes a MFA, i.e. a module, a function name and a list of arguments. These library functions will refer to the function by its fully qualified name. In addition to spawn/3, you might encounter timer:apply_after/4.
Likewise, you can write functions that take MFA arguments, and call the function using apply/3.
Sometimes, there are variants of these library functions that directly take a 0-arity function value. This is the case with spawn, as mentioned above. apply/1 doesn't make sense as you would simply write F().
The other common case is behavior callbacks, and especially OTP behaviors. In this case, you will need to export the callback functions which are of course referred to by name.
Good practice is to use separate export attributes for these functions to make it clear these functions are not part of the regular interface of the module.
Exports and code change
There is a third common case for using exports beyond a public API: code changes.
Imagine you are writing a loop (e.g. a server loop). You would typically implement this as following:
-module(m).
-export([start/0]).
start() -> spawn(fun() -> loop(state) end).
loop(State) ->
NewState = receive ...
...
end,
loop(NewState). % not updatable !
This code cannot be updated, as the loop will never exit the module. The proper way would be to export loop/1 and perform a fully qualified call:
-module(m).
-export([start/0]).
-export([loop/1]).
start() -> spawn(fun() -> loop(state) end).
loop(State) ->
NewState = receive ...
...
end,
?MODULE:loop(NewState).
Indeed, when you refer to an exported function using its fully qualified name, the lookup is always performed against the latest version of the module. So this trick allows to jump to the newer version of the code at every iteration of the loop. Code updates are actually quite complex, and OTP, with its behaviors, does it right for you. It typically uses the same construct.
Conversely, when you call a function passed as a value, this is always from the version of the module that created this value. Joe Armstrong argues this is an advantage of spawn/3 over spawn/1 in a dedicated section of his book (8.10, Spawning with MFAs). He writes:
Most programs we write use spawn(Fun) to create a new process. This is fine provided we don’t want to dynamically upgrade our code. Sometimes we want to write code that can be upgraded as we run it. If we want to make sure that our code can be dynamically upgraded, then we have to use a different form of spawn.
This is far-fetched as when you spawn a new process, it starts immediately, and an update is unlikely to occur between the start of the new process and the moment the function value is created. Besides, Armstrong's statement is partly untrue: to make sure the code can dynamically be upgraded, spawn/1 will work as well (cf example above), the trick is not to use spawn/3, but to perform a fully qualified call (Joe Armstrong describes this in another section). spawn/3 has other advantages over spawn/1.
Still, the difference between passing a function by value and by name explains why there is no version of timer:apply_after/4 that takes a function by value, since there is a delay and the function by value might be old when the timer fires. Such a variant would actually be dangerous because at most two versions of a module: the current one of the old one. If you reload a module more than once, processes trying to call even older versions of the code will be killed. For this reason, you would often prefer MFAs and their exports to function values.
When you do a spawn you create a new completely new process with its own environment and thread of execution. This means that you are no longer executing "inside" the module where the spawn is called, so you must make an "outside" call into the module. the only functions in a module which can be called from the "outside" are exported functions, hence the spawned function must be exported.
It might seem a little strange seeing you are spawning a function in the same module but this is why.
I think it is important to remember that a module is just code and does not contain any deeper meaning than that, for example like a class in an OO language. So even if you have functions from the same module being executed in different processes, a very common occurrence, then there is no implicit connection between them. You still have to send messages between processes even if it is from/to functions in the same module.
EDIT:
About the last part of your question with the quote about putting export init/1 in a separate export declaration. There is no need to do this and it has no semantic significance, you can use as many or as few export declarations as you wish. So you could put all the functions in one export declaration or have a separate one for each function; it makes no difference.
The reason to split them is purely visual and for documentation purposes. You typically group functions which go together into separate export declarations to make it easier to see that they are a group. You also typically put "internal" exported functions, functions which aren't meant for the user to directly call, in a separate export declaration. In this case init/1 has to be exported for the spawn but is not meant to be called directly outside the spawn.
By having the user call the start/0 function to start the server and not have them explicitly spawn the init/1 function allows you to change the internals as you wish later on. The user only sees the start/0 function. Which is what the first quote is trying to say.
If you're wondering why you have to export anything and not have everything visible by default, it's because it's clearer to the user which functions they should call if you hide all the ones they shouldn't. That way, if you change your mind on the implementation, people using your code won't notice. Otherwise, there may be someone who is using a function that you want to change or eliminate.
For example, say you have a module:
-module(somemod).
useful() ->
helper().
helper() ->
i_am_helping.
And you want to change it to:
-module(somemod).
useful() ->
betterhelper().
betterhelper() ->
i_am_helping_more.
If people should only be calling useful, you should be able to make this change. However, if everything was exported, people might be depending on helper when they shouldn't be. This change will break their code when it shouldn't.

(Dependency Walker) missing explicit type on function

Apologies in advance for noob mistakes. This is my first question here. First, some background:
I am trying to create a module for a program using dependency walker to find C++ functions in a .dll that I don't have the lib or any source code for. You can also assume that I can't get support from the original developer. Basically, I checked another file that accesses it to see what the minimum functions were to get it working. Here is an example of the undecorated names that are output:
void foo::bar::baz(float)
float foo::bar::qux(void)
foo::bar::bar(void)
class foo::bar & foo::bar::operator=(class foo::bar const &)
The top two functions obviously take float or void and return float or void. I got a similar function working using something like:
HINSTANCE THEDLL = LoadLibrary("C:\\dllFolder\\theDll.dll");
typedef float (*quxType)(void);
quxType qux = (quxType)GetProcAddress(THEDLL, "quxMangledName");
So those are not a problem.
Now, the third on the list looks like another function that takes void, but it doesn't have an explicit return type. Does this mean I should just use an implicit type for it, is it void, or is it not really a function? If not, what is it?
I have no idea what to do with the fourth one. Is it even possible to handle without the associated .h file?
I looked around, but I couldn't find any information on what to do when the function doesn't look like a normal function with an explicit return type. Despite using basically the same code that I used to get a function working in a similar .dll, I keep getting an access violation crash when I try to use function #2 here (I really just need function #2). So I am guessing that the .dll needs more information or needs something initialized first, which is why I am interested in the others on the list.
I realize this is a complicated problem, so there probably won't be a "Right answer" solution to get it working, but if I am making any obvious mistakes, or if there are any general suggestions for how to attack the problem (even alternatives to dependency walker), let me know.
The 3rd one is the default constructor of bar.
The 4th one is the copy assignment operator of bar.
I think you need to instantiate the class first, in order to call the 2nd method. Otherwise the method would be called with an invalid 'this' that causes access violation.
The problem is how you instantiate it?
If you can find a factory function that returns a bar in the DLL, you can try to use it.
If you don't see a factory function and you don't have the lib file, you can refer to answers here on how to create a lib from a DLL: How to make a .lib file when have a .dll file and a header file
You also need to create header file for the class, with the correct order and types of members. This way you don't have to use LoadLibrary and GetProcAddress, just use the class as normal.
You may still use LoadLibrary and GetProcAddress without the lib and header though, this blog shows how to manually allocate memory, call constructor, gets an object and pass that object to call a method: http://recxltd.blogspot.com/2012/02/working-with-c-dll-exports-without.html

Is it possible to Invoke an exported "private" method in c++

Is it possible to invoke a private method of a class exported from a DLL?
Will it be hidden from people who would like to use it but are not supposed to?
thanks :)
Yes, it's possible, but you need to use dirty casting tricks or rely on semi-undefined behaviour, but for certainty you can call an exported function, no matter it's private/public status.
The language does not provide security against malicious attackers. It will help everyone play by the rules, but it will not guard against those who try to break the system.
For instance:
use GetProcAddress() to get the function's address, cast it to the right member function type, and call it.
create a modified header file of the class, declaring everything as public (or just add a static function, void crowbar() ), compile against that. (Undefined behaviour, since you're violating the One Defintion Rule, but it will probably work...)
Do not rely on C++ private keyword for security.
If it appears in the DLL's export table, it can be invoked by using GetProcAddress and calling the returned function pointer. There are some technical hurdles to get the right calling convention, but it is possible (most likely some assembly language will be required).
Strictly speaking, any function for which the compiler generates an out-of-line instance can be called by any native code. Being exported by a DLL just makes it far easier to find the address of the code for the function.

C/C++ Dynamic loading of functions with unknown prototype

I'm in the process of writing a kind of runtime system/interpreter, and one of things that I need to be able to do is call c/c++ functions located in external libraries.
On linux I'm using the dlfcn.h functions to open a library, and call a function located within. The problem is that, when using dlsysm() the function pointer returned need to be cast to an appropriate type before being called so that the function arguments and return type are know, however if I’m calling some arbitrary function in a library then obviously I will not know this prototype at compile time.
So what I’m asking is, is there a way to call a dynamically loaded function and pass it arguments, and retrieve it’s return value without knowing it’s prototype?
So far I’ve come to the conclusion there is not easy way to do this, but some workarounds that I’ve found are:
Ensure all the functions I want to load have the same prototype, and provide some sort mechanism for these functions to retrieve parameters and return values. This is what I am doing currently.
Use inline asm to push the parameters onto the stack, and to read the return value. I really want to steer clear of doing this if possible!
If anyone has any ideas then it would be much appreciated.
Edit:
I have now found exactly what I was looking for:
http://sourceware.org/libffi/
"A Portable Foreign Function Interface Library"
(Although I’ll admit I could have been clearer in the original question!)
What you are asking for is if C/C++ supports reflection for functions (i.e. getting information about their type at runtime). Sadly the answer is no.
You will have to make the functions conform to a standard contract (as you said you were doing), or start implementing mechanics for trying to call functions at runtime without knowing their arguments.
Since having no knowledge of a function makes it impossible to call it, I assume your interpreter/"runtime system" at least has some user input or similar it can use to deduce that it's trying to call a function that will look like something taking those arguments and returning something not entirely unexpected. That lookup is hard to implement in itself, even with reflection and a decent runtime type system to work with. Mix in calling conventions, linkage styles, and platforms, and things get nasty real soon.
Stick to your plan, enforce a well-defined contract for the functions you load dynamically, and hopefully make due with that.
Can you add a dispatch function to the external libraries, e.g. one that takes a function name and N (optional) parameters of some sort of variant type and returns a variant? That way the dispatch function prototype is known. The dispatch function then does a lookup (or a switch) on the function name and calls the corresponding function.
Obviously it becomes a maintenance problem if there are a lot of functions.
I believe the ruby FFI library achieves what you are asking. It can call functions
in external dynamically linked libraries without specifically linking them in.
http://wiki.github.com/ffi/ffi/
You probably can't use it directly in your scripting language but perhapps the ideas are portable.
--
Brad Phelan
http://xtargets.heroku.com
I'm in the process of writing a kind of runtime system/interpreter, and one of things that I need to be able to do is call c/c++ functions located in external libraries.
You can probably check for examples how Tcl and Python do that. If you are familiar with Perl, you can also check the Perl XS.
General approach is to require extra gateway library sitting between your interpreter and the target C library. From my experience with Perl XS main reasons are the memory management/garbage collection and the C data types which are hard/impossible to map directly on to the interpreter's language.
So what I’m asking is, is there a way to call a dynamically loaded function and pass it arguments, and retrieve it’s return value without knowing it’s prototype?
No known to me.
Ensure all the functions I want to load have the same prototype, and provide some sort mechanism for these functions to retrieve parameters and return values. This is what I am doing currently.
This is what in my project other team is doing too. They have standardized API for external plug-ins on something like that:
typedef std::list< std::string > string_list_t;
string_list_t func1(string_list_t stdin, string_list_t &stderr);
Common tasks for the plug-ins is to perform transformation or mapping or expansion of the input, often using RDBMS.
Previous versions of the interface grew over time unmaintainable causing problems to both customers, products developers and 3rd party plug-in developers. Frivolous use of the std::string is allowed by the fact that the plug-ins are called relatively seldom (and still the overhead is peanuts compared to the SQL used all over the place). The argument stdin is populated with input depending on the plug-in type. Plug-in call considered failed if inside output parameter stderr any string starts with 'E:' ('W:' is for warnings, rest is silently ignored thus can be used for plug-in development/debugging).
The dlsym is used only once on function with predefined name to fetch from the shared library array with the function table (function public name, type, pointer, etc).
My solution is that you can define a generic proxy function which will convert the dynamic function to a uniform prototype, something like this:
#include <string>
#include <functional>
using result = std::function<std::string(std::string)>;
template <class F>
result proxy(F func) {
// some type-traits technologies based on func type
}
In user-defined file, you must add define to do the convert:
double foo(double a) { /*...*/ }
auto local_foo = proxy(foo);
In your runtime system/interpreter, you can use dlsym to define a foo-function. It is the user-defined function foo's responsibility to do calculation.