Can I programatically deduce the calling convention used by a C++ dll? - c++

Imagine you'd like to write a program that tests functions in a c++ dll file.
You should enable the user to select a dll (we assume we are talking about c++ dlls).
He should be able to obtain a list of all functions exported by the dll.
Then, the user should be able to select a function name from the list, manually input a list of arguments ( the arguments are all basic types, like int, double, bool or char arrays (e.g. c-type strings) ) and attempt to run the selected function with the specified arguments.
He'd like to know if the function runs with the specified arguments, or do they cause it to crash ( because they don't match the signature for example ).
The main problem is that C++, being a strongly typed language, requires you to know the number and type of the arguments for a function call at compile time.And in my case, I simply don't know what these arguments are, until the user selects them at runtime.
The only solution I came up with, was to use assembly to manually push the arguments on the call stack.
However, I've come to understand that if I want to mess with assembly, I'd better make damn sure that I know which calling convention are the functions in the dll using.
So (finally:) here's my question: can I deduce the calling convention programmaticaly? Dependency Walker won't help me, and I've no idea how to manually read PE format.

The answer is maybe.
If the functions names are C++ decorated, then you can determine the argument count and types from the name decoration, this is your best case scenario, and fairly likely if MSVC was used to write the code in the first place.
If the exported functions are stdcall calling convention (the default for windows api), you can determine the number of bytes to be pushed, but not the types of the arguments.
The bad news is that for C calling convention, there isn't any way to tell by looking at the symbol names. You would need to have access to the source code or the debug info.
http://en.wikipedia.org/wiki/X86_calling_conventions
The name that a function is given as an export is not required to have any relationship with the name that the linker sees, but most of the time, the exported name and the symbol name that the linker sees are the same.

You didn't specify whether you're talking 32-bit or 64-bit here, and the difficulties outlined by you and the other posters mainly apply to 32-bit code. On 64-bit Windows, there's essentially only one calling convention (it's in also in the wikipedia article linked by John Knoeller), which means that you do know the calling convention (of course with the exception of anybody cooking up their own).
Also, with the Microsoft x64 calling convention, not knowing the number of parameters of the function to be called does not stop you from calling it, providing as many parameters as you wish/the user wishes to. This is because you as a caller set aside stack space and clean it up afterwards. -- Of course, not providing the right [number of] parameters may still have the called function do silly things because you're providing invalid input, but that's another story.

The compiled code does not just say 'Here this function is a fastcall, and this one here is stdcall' unfortunately.
Not even modern disassemblers like IDA try to deduce call types by default (there might be a plugin or an option somewhere idk).
Basically if you are a human you cn look at the first few instructions and tell 90% of the time. If they are pop and push, its stdcall, if its passing params through the registers (especially ecx) then its cdecl. Fastcall also uses the registers but does something special.. dunno off the top of my head. But all this info is useless because your program obviously will not be a human.
If you are doing testing, dont you at least have the header files?? This is an awfully hard way to skin a cat..

If you want to know what calling convention a C++ function uses, your best hope is to study
The header that declares that function, and
The documentation for the compiler that compiled your particular DLL.
But this whole thing sounds like a bit of a mess, honestly. Why does your friend want to be able to do this, and why can't he get the information he needs by parsing a header that declares the relevant functions?

This page describes the way VC++6 encodes parameter and calling convention info into a symbol name: http://www.bottledlight.com/docs/mangle.html
I suspect that later versions of VC++ will be compatible but I haven't confirmed this.
There are also some tools that automate this which accompany the compiler: http://msdn.microsoft.com/en-us/library/5x49w699.aspx
The name mangling only applies for C++ functions; if a function is 'extern "C"' then this won't work.

Related

How to find a pointer to a function by string

I have a list of functions in a text file that I'd like to expose to LLVM for its execution engine at run time, I'm wondering if its possible to find pointers to the functions at runtime rather than hard code in all the GlobalMappings by hand as I'd probably like to add in more later. For example:
// File: InternalFunctions.txt
PushScreen
PopScreen
TopScreen
// File: ExposeEngine.cpp
// Somehow figure out the address of the function specified in a string
void* addy = magicAddress("PushScreen");
jit->addGlobalMapping(llvmfunction, addy);
If this is possible I love to know how to do it, as I am trying to write my game engine by jit-ing c++. I was able to create some results earlier, but I had to hard-code in the mappings. I noticed that Gtk uses something along the lines of what I'm asking. When you use glade and provide a signal handler, the program you build in c will automatically find the function in your executable referenced by the string provided in the glade file. If getting results requires me to look into this Gtk thing I'd be more than happy to, but I don't know what feature or part of the api deals with that - I've already tried to look it up. I'd love to hear suggestions or advice.
Yes, you can do this. Look at the man pages for dlopen() and dlsym(): these functions are standard on *nix systems and let you look up symbols (functions or variables) by name. There is one significant issue, which is that C++ function names are usually "mangled" to encode type information. A typical way around this is to define a set of wrapper functions in an extern "C" {} block: these will be non-member, C-style functions which can then call into your C++ code. Their names will not be mangled, making them easy to look up using dlsym().
This is a pretty standard way that some plugin architectures work. Or at least used to work, before everyone started using interpreted languages!

How to call indirectly a C function

Let's suppose I have the following function:
int func(int a, char* b, float c)
{
return 42;
}
I am curios if there is a possibility to call this function without:
explicitly calling it (func(1, "abc", 2.4))
creating a function pointer to it, and then calling it via the function pointer.
The function is written in C (or C++) and might be located either in a library (DLL on Windows) or somewhere compiled in the current application. For now let's assume there are no name mangling issues.
However, I know the following:
the name of the function.
the number and type of parameters as text based input (such as "int", "char*", "float").
its return type
I'm open to any suggestions, but I'm somewhat afraid of some lower level assembly hacks, since I'd like this to be as portable as possible.
I'd prefer a C solution, and I'd like to avoid boost::bind...
Edit - some clarifications ...
The one "calling" the "function" is a scripting language's compiled library (DLL). It loads the scripting language (source file) which has "bindings" to exteral "functions" (The ones I am trying to call) and when in the scripting language it encounters "call this external function" it tries to call that external function which might be in a DLL ... or the application which actually loaded the scripting language's DLL...
In order to be able to call functions with parameter types that are not clear at compiler time, I fear you won't come around said "lower level assembly hacks".
In cases where portability to architectures other than x86 or AMD64 isn't relevant, take a look at this wonderful library. It allows OS-unspecific ways of generating native bytecode at runtime and should be the easiest way to fulfil your wishes.
It's still beta, however I'm using it for a while now without encountering any problems.

calling kernel32.dll function without including windows.h

if kernel32.dll is guaranteed to loaded into a process virtual memory,why couldn't i call function such as Sleep without including windows.h?
the below is an excerpt quoting from vividmachine.com
5. So, what about windows? How do I find the addresses of my needed DLL functions? Don't these addresses change with every service pack upgrade?
There are multitudes of ways to find the addresses of the functions that you need to use in your shellcode. There are two methods for addressing functions; you can find the desired function at runtime or use hard coded addresses. This tutorial will mostly discuss the hard coded method. The only DLL that is guaranteed to be mapped into the shellcode's address space is kernel32.dll. This DLL will hold LoadLibrary and GetProcAddress, the two functions needed to obtain any functions address that can be mapped into the exploits process space. There is a problem with this method though, the address offsets will change with every new release of Windows (service packs, patches etc.). So, if you use this method your shellcode will ONLY work for a specific version of Windows. Further dynamic addressing will be referenced at the end of the paper in the Further Reading section.
The article you quoted focuses on getting the address of the function. You still need the function prototype of the function (which doesn't change across versions), in order to generate the code for calling the function - with appropriate handling of input and output arguments, register values, and stack.
The windows.h header provides the function prototype that you wish to call to the C/C++ compiler, so that the code for calling the function (the passing of arguments via register or stack, and getting the function's return value) can be generated.
After knowing the function prototype by reading windows.h, a skillful assembly programmer may also be able to write the assembly code to call the Sleep function. Together with the function's address, these are all you need to make the function call.
With some black magic you can ;). there have been many custom implementations of GetProcAddress, which would allow you to get away with not needing windows.h, this however isn't be all and end all and could probably end up with problems due to internal windows changes. Another method is using toolhlp to enumerate the modules in the process to get kernel.dll's base, then spelunks its PE for the EAT and grab the address of GetProcAddress. from there you just need function pointer prototypes to call the addresses correctly(and any structure defs needed), which isn't too hard meerly labour intensive(if you have many functions), infact under windows xp this is required to disable DEP due to service pack differencing, ofc you need windows.h as a reference to get this, you just don't need to include it.
You'd still need to declare the function in order to call it, and you'd need to link with kernel32.lib. The header file isn't anything magic, it's basically just a lot of function declarations.
I can do it with 1 line of assembly and then some helper functions to walk the PEB
file by hard coding the correct offsets to different members.
You'll have to start here:
static void*
JMIM_ASM_GetBaseAddr_PEB_x64()
{
void* base_address = 0;
unsigned long long var_out = 0;
__asm__(
" movq %%gs:0x60, %[sym_out] ; \n\t"
:[sym_out] "=r" (var_out) //:OUTPUTS
);
//: printf("[var_out]:%d\n", (int)var_out);
base_address=(void*)var_out;
return( base_address );
}
Then use windbg on an executable file to inspect the data structures on your machine.
A lot of the values you'll be needing are hard to find and only really documented by random hackers. You'll find yourself on a lot of malware writing sites digging for answers.
dt nt!_PEB -r #$peb
Was pretty useful in windbg to get information on the PEB file.
There is a full working implementation of this in my game engine.
Just look in: /DEP/PEB2020 for the code.
https://github.com/KanjiCoder/AAC2020
I don't include <windows.h> in my game engine. Yet I use "GetProcAddress"
and "LoadLibraryA". Might be in-advisable to do this. But my thought was the more
moving parts, the more that can go wrong. So figured I'd take the "#define WIN32_LEAN_AND_MEAN" to it's absurd conclusion and not include <windows.h> at all.

let the user use a function in c++ [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Dynamic source code in C++
is it possible to let the user type in a function and then run that function without using a lot of if's or a huge switch?
It is not possible to execute arbitrary c++ code in your program, since you than need a c++ compiler inside your program. But you could try to embed Python to your program. Boost python makes this relatively easy. The user can than write a python function that is executed and can interact with the classes and functions of your program. You need to make your functions explicitely visible to python.
What ever a user types in will be text, or a string. The only way I know to have it get mapped to a function is to use if/else or switch statements. That or the cringe inducing option of mapping each of your functions to a UI widget.
The end of the story, is it's your code. You have to write, and live with it. Just be careful, your program may be wildly successful, and you may not write code anymore, and then someone else will have to maintain your code. So be nice to the maintenance programmer who may follow you, and write code that isn't too tricky to figure out.
I assume you want something like eval from php.
You can try to play with command design pattern, but I doubt it will be an easy task. Basically you need to write simple C++ interpreter.
What type of function do you mean? A C++ function? If so, then you will have to either (1)interpret it or (2)compile and execute it. Interpretation would be the more likely choice here. I'm not sure if there are libraries out there already to do this but I'd assume there are.
If you don't like mega-if's or huge switches, you may be SoL on any solution for anything ever, but then again there is seldom one perfect way to do things. Consider looking in to various logic structures and algorithms to see how to do something that would normally be the job of a 23-case switch could be done another way. Like I said initially, however, sometimes you really do just need a million nested if's to do what you want to.
No, in C++ this is not possible. C++ is a compiled language. When the program runs, the compiler doesn't need to be accessible, or even installed on the machine that runs the program.
If you want to do this in C++, you need to write your own interpreter that parses whatever the user enters.
Here is my best idea, but it is a tad memory intensive.
First, create a class, lets call it MyFuncPtr to store a union of several different types of pointers to functions and an integer to tell which type it is. Overload the () operator to call the function stored with a variable length argument list. Make sure to include some sort of run-time argument checking.
Finally create a map of strings to MyFuncPtrs. Store your functions in this map along with their names. Then all you need to do is feed the name into the [] command to get a function that can be easily called. Templates could probably be used to aid in the making of MyFuncPtr instances.
This would be the easiest if it were plain C functions and no name mangling is performed on the symbols (use extern "C" { ... })
With some platform-specific code you can get the address of a function by its name. Then you cast the address as a function pointer which you can use to call the function.
On windows you must be using GetProcAddress and dlsym on Posix compliant platforms.

Function pointers and unknown number of arguments in C++

I came across the following weird chunk of code.Imagine you have the following typedef:
typedef int (*MyFunctionPointer)(int param_1, int param_2);
And then , in a function , we are trying to run a function from a DLL in the following way:
LPCWSTR DllFileName; //Path to the dll stored here
LPCSTR _FunctionName; // (mangled) name of the function I want to test
MyFunctionPointer functionPointer;
HINSTANCE hInstLibrary = LoadLibrary( DllFileName );
FARPROC functionAddress = GetProcAddress( hInstLibrary, _FunctionName );
functionPointer = (MyFunctionPointer) functionAddress;
//The values are arbitrary
int a = 5;
int b = 10;
int result = 0;
result = functionPointer( a, b ); //Possible error?
The problem is, that there isn't any way of knowing if the functon whose address we got with LoadLibrary takes two integer arguments.The dll name is provided by the user at runtime, then the names of the exported functions are listed and the user selects the one to test ( again, at runtime :S:S ).
So, by doing the function call in the last line, aren't we opening the door to possible stack corruption? I know that this compiles, but what sort of run-time error is going to occur in the case that we are passing wrong arguments to the function we are pointing to?
There are three errors I can think of if the expected and used number or type of parameters and calling convention differ:
if the calling convention is different, wrong parameter values will be read
if the function actually expects more parameters than given, random values will be used as parameters (I'll let you imagine the consequences if pointers are involved)
in any case, the return address will be complete garbage, so random code with random data will be run as soon as the function returns.
In two words: Undefined behavior
I'm afraid there is no way to know - the programmer is required to know the prototype beforehand when getting the function pointer and using it.
If you don't know the prototype beforehand then I guess you need to implement some sort of protocol with the DLL where you can enumerate any function names and their parameters by calling known functions in the DLL. Of course, the DLL needs to be written to comply with this protocol.
If it's a __stdcall function and they've left the name mangling intact (both big ifs, but certainly possible nonetheless) the name will have #nn at the end, where nn is a number. That number is the number of bytes the function expects as arguments, and will clear off the stack before it returns.
So, if it's a major concern, you can look at the raw name of the function and check that the amount of data you're putting onto the stack matches the amount of data it's going to clear off the stack.
Note that this is still only a protection against Murphy, not Machiavelli. When you're creating a DLL, you can use an export file to change the names of functions. This is frequently used to strip off the name mangling -- but I'm pretty sure it would also let you rename a function from xxx#12 to xxx#16 (or whatever) to mislead the reader about the parameters it expects.
Edit: (primarily in reply to msalters's comment): it's true that you can't apply __stdcall to something like a member function, but you can certainly use it on things like global functions, whether they're written in C or C++.
For things like member functions, the exported name of the function will be mangled. In that case, you can use UndecorateSymbolName to get its full signature. Using that is somewhat nontrivial, but not outrageously complex either.
I do not think so, it is a good question, the only provision is that you MUST know what the parameters are for the function pointer to work, if you don't and blindly stuff the parameters and call it, it will crash or jump off into the woods never to be seen again... It is up to the programmer to convey the message on what the function expects and the type of parameters, luckily you could disassemble it and find out from looking at the stack pointer and expected address by way of the 'stack pointer' (sp) to find out the type of parameters.
Using PE Explorer for instance, you can find out what functions are used and examine the disassembly dump...
Hope this helps,
Best regards,
Tom.
It will either crash in the DLL code (since it got passed corrupt data), or: I think Visual C++ adds code in debug builds to detect this type of problem. It will say something like: "The value of ESP was not saved across a function call", and will point to code near the call. It helps but isn't totally robust - I don't think it'll stop you passing in the wrong but same-sized argument (eg. int instead of a char* parameter on x86). As other answers say, you just have to know, really.
There is no general answer. The Standard mandates that certain exceptions be thrown in certain circumstances, but aside from that describes how a conforming program will be executed, and sometimes says that certain violations must result in a diagnostic. (There may be something more specific here or there, but I certainly don't remember one.)
What the code is doing there isn't according to the Standard, and since there is a cast the compiler is entitled to go ahead and do whatever stupid thing the programmer wants without complaint. This would therefore be an implementation issue.
You could check your implementation documentation, but it's probably not there either. You could experiment, or study how function calls are done on your implementation.
Unfortunately, the answer is very likely to be that it'll screw something up without being immediately obvious.
Generally if you are calling LoadLibrary and GetProcByAddrees you have documentation that tells you the prototype. Even more commonly like with all of the windows.dll you are provided a header file. While this will cause an error if wrong its usually very easy to observe and not the kind of error that will sneak into production.
Most C/C++ compilers have the caller set up the stack before the call, and readjust the stack pointer afterwards. If the called function does not use pointer or reference arguments, there will be no memory corruption, although the results will be worthless. And as rerun says, pointer/reference mistakes almost always show up with a modicum of testing.