I have a problem with detours. Detours, as you all know, can only move among 5 bytes of space (i.e a 'jmp' call and a 4 byte address). Because of this it is impossible to have the 'hook' function in a class (a method), you cannot supply the 'this' pointer because there is simply not enough space (here's the problem more thoroughly explained). So I've been brainstorming all day for a solution, and now I want your thoughts on the subject so I don't begin a 3-5 day project without knowing if it would be possible or not.
I had 3 goals initially, I wanted the 'hook' functions to be class methods, I wanted the whole approach to be object-oriented (no static functions or global objects) and, the worst/hardest part, to be completely dynamic. This is my (in theory) solution; with assembly one can modify functions at runtime (a perfect example is any detouring method). So since I can modify functions dynamically, shouldn't I also be able to create them dynamically? For example; I allocate memory for, let's say ~30 bytes (through malloc/new). Wouldn't it be possible to just replace all bytes with binary numbers corresponding to different assembly operators (like 0xE9 is 'jmp') and then call the address directly (since it would contain a function)?
NOTE: I know on beforehand the return value, and all the arguments to all functions that I want to detour, and since I'm using GCC, the thiscall convention is practically identical to the _cdecl one.
So this is my thought/soon-to-be implementation; I create a 'Function' class. This constructor takes a variadic amount of arguments (except the first argument, which describes the return value of the target function).
Each argument is a description of the arguments the hook will receive (the size, and whether it is a pointer or not). So let's say I want to create a Function class for a int * RandomClass::IntCheckNum(short arg1);. Then I would just have to do like this:Function func(Type(4, true), Type(4, true), Type(2, false));. Where 'Type' is defined as Type(uint size, bool pointer). Then through assembly I could dynamically create the function (note: this would all be using _cdecl calling convention) since I can calculate the number of arguments and total size.
EDIT: With the example, Type(4, true) is the return value (int*), the scondType(4, true) is the RandomClass 'this' pointer and Type(2, false) describes the first argument (short arg1).
With this implementation I could easily have class methods as callbacks, but it would require an extensive amount of assembly code (which I'm not even especially experienced at).
In the end, the only non-dynamic thing would be the methods in my callback class (which also would require pre and post callbacks).
So I wanted to know; is this possible? How much work would it require, and am I way over my head here?
EDIT: I'm sorry if I presented everything a bit fuzzy, but if there is something you want more thoroughly explained, do ask!
EDIT2: I'd also like to know, if I can find the hex values for all assembly operators somewhere? A list would help a ton! And/or if it is possible to somehow 'save' the asm(""); code at a memory address (which I highly doubt).
What you describe is usually called "thunking", and is quite commonly implemented. Historically, the most common purpose has been mapping between 16-bit and 32-bit code (by autogenerating a new 32-bit function that calls an existing 16-bit one or vice versa). I believe some C++ compilers generate similar functions to adjust base class pointers to subclass pointers in multiple inheritance, also.
It certainly seems like a viable solution to your problem, and I don't foresee any huge issues. Just make sure you allocate the memory with any flags needed in your operating system to make sure the memory is executable (most modern OSs give out non-executable memory by default).
You may find this link helpful, particularly if working in Win32: http://www.codeproject.com/Articles/16785/Thunking-in-Win32-Simplifying-Callbacks-to-Non-sta
Regarding finding the hex values of assembly operations, the best reference I know of is the Appendix to the manual of the NASM assembler (and I don't just say that because I helped write it). There's a copy available here: http://www.posix.nl/linuxassembly/nasmdochtml/nasmdoca.html
Related
In all of the create info structs (vk*CreateInfo) in the new Vulkan API, there is ALWAYS a .sType member. Why is this there if the value can only be one thing? Also the Vulkan specification is very explicit that you can only use vk*CreateInfo structs as parameters for their corresponding vkCreate* function. It seems a little redundant. I can see that if the driver was passing this struct straight to the GPU, you might need to have it (I did notice it is always the first member). But this seems like a really bad idea for the app to do it because if the driver did it, apps would be much less error prone, and prepending an int to a struct doesn't seems like an extremely computational inefficient operation. I just don't see why it exists.
TL;DR
Why do the vk*CreateInfo structs have the .sType member?
They have one so that the pNext field actually works.
Yes, the API takes a struct with a proper C type, so both the caller and the receiver agree on what type that struct is. But especially nowadays, many such structs have linked lists of structures that provide additional information to the implementation. These extension structures (though many are core in Vulkan 1.1/2) are just like all other structures, with their own sType field.
These fields are crucial because the linked lists are built with pNext pointers... which are void*s. They have no set type. The way the implementation determines what a non-NULL pNext pointer points to is by examining the first 4 bytes stored there. This is the sType field; it allows the implementation to know what type to cast the pointer to.
Of course, the primary struct that an API takes doesn't strictly need an sType field, since its type is part of the API itself. However, there is a hypothetical reason to do so (it hasn't panned out in Vulkan releases).
A later version of Vulkan could expand on the creation of, for example, command buffer pools. But how would it do that? Well, they could add a whole new entrypoint: vkCreateCommandPool2. But this function would have almost the exact same signature as vkCreateCommandPool; the only difference is that they take different pCreateInfo structures.
So instead, all you have to do is declare a VkCommandPoolCreateInfo2 structure. And then declare that vkCreateCommandPool can take either one. How would the implementation tell which one you passed in?
Because the first 4 bytes of any such structure is sType. They can test that value. If the value is VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO, then it's the old structure. If it's VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO_2, then it's the new one.
Of course, as previously stated, this hasn't panned out; post-1.0 Vulkan versions opted to incorporate extension structs rather than replacing existing ones. But the option is there.
I want a common function which can take any data type as a argument and return result in that data type only. How to implement this via dll.
It seems that you would like to export in the dll a templated function, without specifying it's type.
You cannot do that because templates are resolved at compile time (so when the code is generated). As mentioned by #MSlaters you cannot have an infinitely big template.
If you have a predefined number of data types, you can force instantiate each of them in your dll code in order to have them exposed.
If you want to make the most generic thing possible , you can only have
void* getResult (void* inputParameter)
But unfortunately, you won't know how the memory is mapped for the object (so less of gain, more of a pain if you'd ask me).
Not. A DLL contains compiled code, in particular the return statements. Since you support an inifinite number of types with an infinite number of return statements, the DLL would be infinitely big.
I'm in the process of writing a kind of runtime system/interpreter, and one of things that I need to be able to do is call c/c++ functions located in external libraries.
On linux I'm using the dlfcn.h functions to open a library, and call a function located within. The problem is that, when using dlsysm() the function pointer returned need to be cast to an appropriate type before being called so that the function arguments and return type are know, however if I’m calling some arbitrary function in a library then obviously I will not know this prototype at compile time.
So what I’m asking is, is there a way to call a dynamically loaded function and pass it arguments, and retrieve it’s return value without knowing it’s prototype?
So far I’ve come to the conclusion there is not easy way to do this, but some workarounds that I’ve found are:
Ensure all the functions I want to load have the same prototype, and provide some sort mechanism for these functions to retrieve parameters and return values. This is what I am doing currently.
Use inline asm to push the parameters onto the stack, and to read the return value. I really want to steer clear of doing this if possible!
If anyone has any ideas then it would be much appreciated.
Edit:
I have now found exactly what I was looking for:
http://sourceware.org/libffi/
"A Portable Foreign Function Interface Library"
(Although I’ll admit I could have been clearer in the original question!)
What you are asking for is if C/C++ supports reflection for functions (i.e. getting information about their type at runtime). Sadly the answer is no.
You will have to make the functions conform to a standard contract (as you said you were doing), or start implementing mechanics for trying to call functions at runtime without knowing their arguments.
Since having no knowledge of a function makes it impossible to call it, I assume your interpreter/"runtime system" at least has some user input or similar it can use to deduce that it's trying to call a function that will look like something taking those arguments and returning something not entirely unexpected. That lookup is hard to implement in itself, even with reflection and a decent runtime type system to work with. Mix in calling conventions, linkage styles, and platforms, and things get nasty real soon.
Stick to your plan, enforce a well-defined contract for the functions you load dynamically, and hopefully make due with that.
Can you add a dispatch function to the external libraries, e.g. one that takes a function name and N (optional) parameters of some sort of variant type and returns a variant? That way the dispatch function prototype is known. The dispatch function then does a lookup (or a switch) on the function name and calls the corresponding function.
Obviously it becomes a maintenance problem if there are a lot of functions.
I believe the ruby FFI library achieves what you are asking. It can call functions
in external dynamically linked libraries without specifically linking them in.
http://wiki.github.com/ffi/ffi/
You probably can't use it directly in your scripting language but perhapps the ideas are portable.
--
Brad Phelan
http://xtargets.heroku.com
I'm in the process of writing a kind of runtime system/interpreter, and one of things that I need to be able to do is call c/c++ functions located in external libraries.
You can probably check for examples how Tcl and Python do that. If you are familiar with Perl, you can also check the Perl XS.
General approach is to require extra gateway library sitting between your interpreter and the target C library. From my experience with Perl XS main reasons are the memory management/garbage collection and the C data types which are hard/impossible to map directly on to the interpreter's language.
So what I’m asking is, is there a way to call a dynamically loaded function and pass it arguments, and retrieve it’s return value without knowing it’s prototype?
No known to me.
Ensure all the functions I want to load have the same prototype, and provide some sort mechanism for these functions to retrieve parameters and return values. This is what I am doing currently.
This is what in my project other team is doing too. They have standardized API for external plug-ins on something like that:
typedef std::list< std::string > string_list_t;
string_list_t func1(string_list_t stdin, string_list_t &stderr);
Common tasks for the plug-ins is to perform transformation or mapping or expansion of the input, often using RDBMS.
Previous versions of the interface grew over time unmaintainable causing problems to both customers, products developers and 3rd party plug-in developers. Frivolous use of the std::string is allowed by the fact that the plug-ins are called relatively seldom (and still the overhead is peanuts compared to the SQL used all over the place). The argument stdin is populated with input depending on the plug-in type. Plug-in call considered failed if inside output parameter stderr any string starts with 'E:' ('W:' is for warnings, rest is silently ignored thus can be used for plug-in development/debugging).
The dlsym is used only once on function with predefined name to fetch from the shared library array with the function table (function public name, type, pointer, etc).
My solution is that you can define a generic proxy function which will convert the dynamic function to a uniform prototype, something like this:
#include <string>
#include <functional>
using result = std::function<std::string(std::string)>;
template <class F>
result proxy(F func) {
// some type-traits technologies based on func type
}
In user-defined file, you must add define to do the convert:
double foo(double a) { /*...*/ }
auto local_foo = proxy(foo);
In your runtime system/interpreter, you can use dlsym to define a foo-function. It is the user-defined function foo's responsibility to do calculation.
In a lot of C++ API'S (COM-based ones spring to mind) that make something for you, the pointer to the object that is constructed is usually required as a ** pointer (and the function will construct and init it for you)
You usually see signatures like:
HRESULT createAnObject( int howbig, Object **objectYouWantMeToInitialize ) ;
-- but you seldom see the new object being passed as a return value.
Besides people wanting to see error codes, what is the reason for this? Is it better to use the ** pattern rather than a returned pointer for simpler operations such as:
wchar_t* getUnicode( const char* src ) ;
Or would this better be written as:
void getUnicode( const char* src, wchar_t** dst ) ;
The most important thing I can think of is to remember to free it, and the ** way, for some reason, tends to remind me that I have to deallocate it as well.
"Besides wanting error codes"?
What makes you think there is a besides. Error codes are pretty much the one and only reason. The function needs some way to indicate failure. C doesn't have exceptions, so it has to do that through either a pointer parameter, or the return value, and the return value is idiomatic, and easier to check when calling the function.
(By the way, there's no universal rule that ** means you have to free the object. That's not always the case, and it's probably a bad idea to use something that arbitrary to remind you of which objects to clean up.)
Two reasons come to my mind.
First are error codes actually. Other than C++, C doesn't have exceptions, and COM is a C-API. Also many C++ based projects prefer not to use exceptions for various reasons.
There may be cases, where a return value can't signal errors, E.g. if your function returns an integer, there may be no integer value, that can represent an error code. While signalling errors with pointers is easy (NULL == Error), some API designers prefer to signal errors in a consistent way over all functions.
Second, functions can have only one return value, but calling them may create multiple objects. Some Win32 API functions take multiple pointers to pointers that can be filled optionally, if you call these functions with non-NULL pointers. You cannot return two pointers, or rather this would be awkward to use, if the return value is some struct by value containing more than one pointer. Here too a consistent API is a sensible goal to achieve.
New objects in function arguments passed by ** is better. This take me a comfort to future use of change void to bool for example to return success of a function or other information providing function works.
Answer in one line: This is much better for resulting error codes.
Besides people wanting to see error codes, what is the reason for this?
There are some reasons for this. One of them is writing an interface that is usable in C (you see this in the WinAPI and Windows COM).
Backwards compatibility is another reason (i.e. the interface was written like that and breaking it now would break existing code).
I'd go with C compatibility for a design principle when using code like this. If you were to write in C++ you'd write
retval Myfunction(Result *& output);
instead of
retval Myfunction(Result ** output);
or (even better):
Result *Myfunction();
and have the function throw an exception on error.
I'm not sure I agree that's the best way to do it... this might be better:
Object * createAnObject(int howbig, HRESULT * optPlaceResultCodeHereIfNotNull = NULL);
That way there is no messing about with double-indirection (which can be a little bit tricky for people who aren't used to it), and the people who don't care about result codes don't have to worry about the second argument at all... they can just check to see if the return value is NULL or not.
Actually, since it's C++, you could make things easier still, using function overloading:
Object * createAnObject(int howbig);
Object * createAnObject(int howbig, HRESULT & returnResultCode);
Any method call in a COM call has to be HRESULT. The return codes get leveraged all over the framework and passing a double pointer is a well-known way to get the created object.
Not answering your question but a comment as your question brought out some thoughts I have about COM/DCOM programming using C++.
All these "pointer" and "pointer to pointer", memory management and reference counting are the reasons why I shy away from doing COM programming with C++. Even with ATL in place, I dislike it for the simple reason that it does not look natural enough. Having said that, I did do a few projects using ATL.
Back then the alternative is use VB. VB code looks more natural for COM or DCOM programming.
Today, I would use C#.
in C++, I can easily create a function pointer by taking the address of a member function. However, is it possible to change the address of that local function?
I.e. say I have funcA() and funcB() in the same class, defined differently. I'm looking to change the address of funcA() to that of funcB(), such that at run time calling funcA() actually results in a call to funcB(). I know this is ugly, but I need to do this, thanks!
EDIT----------
Background on what I'm trying to do:
I'm hoping to implement unit tests for an existing code base, some of the methods in the base class which all of my modules are inheriting from are non-virtual. I'm not allowed to edit any production code. I can fiddle with the build process and substitute in a base class with the relevant methods set to virtual but I thought I'd rather use a hack like this (which I thought was possible).
Also, I'm interested in the topic out of technical curiosity, as through the process of trying to hack around this problem I'm learning quite a bit about how things such as code generation & function look-up work under the hood, which I haven't had a chance to learn in school having just finished 2nd year of university. I'm not sure as to I'll ever be taught such things in school as I'm in a computer engineering program rather than CS.
Back on topic
The the method funcA() and funcB() do indeed have the same signature, so the problem is that I can only get the address of a function using the & operator? Would I be correct in saying that I can't change the address of the function, or swap out the contents at that address without corrupting portions of memory? Would DLL injection be a good approach for a situation like this if the functions are exported to a dll?
No. Functions are compiled into the executable, and their address is fixed throughout the life-time of the program.
The closest thing is virtual functions. Give us an example of what you're trying to accomplish, I promise there's a better way.
It cannot be done the way you describe it. The only way to change the target for a statically bound call is by modifying the actual executable code of your program. C++ language has no features that could accomplish that.
If you want function calls to be resolved at run-time you have to either use explicitly indirect calls (call through function pointers), or use language features that are based on run-time call resolution (like virtual functions), or you can use plain branching with good-old if or switch. Which is more appropriate in your case depends on your specific problem.
Technically it might be possible for virtual functions by modifying the vtable of the type, but you most certainly cannot do it without violating the standard (causing Undefined Behavior) and it would require knowledge of how your specific compiler handles vtables.
For other functions it is not possible because the addresses of the functions are directly written to program code, which is generally on a read-only memory area.
I am fairly sure this is impossible in pure C++. C++ is not a dynamic language.
What you want is a pointer to a function, you can point it to FuncA or FuncB assuming that they have the same signature.
You cannot do what you want to do directly. However, you can achieve a similar result with some slightly different criteria, using something you are already familiar with -- function pointers. Consider:
// This type could be whatever you need, including a member function pointer type.
typedef void (*FunctionPointer)();
struct T {
FunctionPointer Function;
};
Now you can set the Function member on any given T instance, and call it. This is about as close as you can reasonably get, and I presume that since you are already aware of function pointers you're already aware of this solution.
Why don't you edit your question with a more complete description of the problem you're trying to solve? As it stands it really sounds like you're trying to do something horrible.
Its simple!
For
at run time calling funcA() actually results in a call to funcB().
write funcA() similar to following:
int funcA( int a, int b) {
return funcB( a, b );
}
:-)