Getting the "real" function from a method pointer - c++

Given an c++ object pointer and compatible method pointer to a virtual method, is there any remotely robust/portable way to get a pointer to the actual concert function that would be called?
The use case is that I want to run said pointer thought the debug symbols to get the name of the type/function that would be called (without actually calling it).
If this is only possible via implementation specific solutions, then I'm primarily interested in supporting GCC/LLVM.

Both LLVM and GCC follow the Itanium C++ ABI, so you need to find a way to read the data structures as specified therein. I'll give a rough outline.
A pointer to virtual member is represented by an offset into the virtual function table, +1 for some reason.
class A {
public:
virtual void f();
virtual void g();
};
void (A::*pAg)() = & A::g;
ptrdiff_t offset = *(ptrdiff_t*)(&pAg) - 1;
The pointer to the virtual table is typically located right at the beginning of an object:
A a;
void* vtable = *(void**)&a;
Then you look at the calculated offset within that virtual table and find your actual function pointer.
void* function = *(void**)(vtable+offset)

Related

pointer to access member function through virtual pointer

I came across articles where in they explain about vptr and vtable.
I know that the first pointer in an object in case of a class with virtual functions stored, is a vptr to vtable and vtable's array entries are pointers to the function in the same sequence as they occur in class ( which I have verified with my test program).
But I am trying to understand what syntax must compiler put in order to call the appropriate function.
Example:
class Base
{
virtual void func1()
{
cout << "Called me" << endl;
}
};
int main()
{
Base obj;
Base *ptr;
ptr=&obj;
// void* is not needed. func1 can be accessed directly with obj or ptr using vptr/vtable
void* ptrVoid=ptr;
// I can call the first virtual function in the following way:
void (*firstfunc)()=(void (*)(void))(*(int*)*(int*)ptrVoid);
firstfunc();
}
Questions:
1. But what I am really trying to understand is how compiler replaces the call to ptr->func1() with vptr?
If I were to simulate the call then what should I do? should I overload the -> operator. But even that would not help as I would not know what really the name func1 is. Even if they say that compiler accesses the vtable through vptr, still how does it know that the entry of func1 is the first array adn entry of func2 is the second element in the array? There must be some mapping for the names of function to the elements of array.
2. How can I simulate it. Can you provide the actual syntax that compiler uses to call function func1(how does it replace ptr->func1())?
Don't think of a vtable as an array. It's only an array if you strip it of everything C++ knows about it other than the size of its members. Instead, think of it as a second struct whose members are all pointers to functions.
Suppose I have a class like this:
struct Foo {
virtual void bar();
virtual int baz(int qux);
int quz;
}
int callSomeFun(Foo* foo) {
foo->bar();
return foo->baz(2);
}
Breaking it down 1 step:
class Foo;
// adding Foo* parameter to simulate the this pointer, which
// in the above would be a pointer to foo.
struct FooVtable {
void (*bar)(Foo* foo);
int (*baz)(Foo* foo, int qux);
}
struct Foo {
FooVtable* vptr;
int quz;
}
int callSomeFun(Foo* foo) {
foo->vptr->bar(foo);
return foo->vptr->baz(foo, 2);
}
I hope that's what you're looking for.
The backgroud:
After compilation (without debug info) binaries of C/C++ have no names, and names aren't required to runtime work, its only machine code
You can think about vptr like clasic C function pointer, in sense that type, argument list etc is known.
It isn't important on which positions are placed func1, func2 etc, only required is order was always the same (so all parts of multi file C++ must be compiled in the same way, compiler settings etc). Lets imagine, position is in declaration order, FIRST parent class, then newly declared in override BUT reimplemented virtuals are at lower positions, like from parent.
Its only image. Implementation must correctly fire overrides classApionter->methodReimplementedInB()
Usually C++ compiler has/had (my knowledge is from years 16/32b migration) 2-4 option to optimalize vtables against speed/size etc. Classic C sizeof() was quite well to understand (size of data plus ev. alignment), in C++ sizeof is bigger, but can guarantee if it is 2,4,8 bytes.
4 Few conversion tool can convert "object" files i.e. from MS format to Borland etc, but usually/only classic C was possible/safe, because of unknown machine code implementations of vtable.
Hard to touch vtable from high level code, fire analysers for intermediate files (.obj, . etc)
EDIT: story about runtime is different than about compilation. My answer is about compiled code & runtime
EDIT2: quasi assembler code (from my head)
load ax, 2
call vt[ax]
vt:
0x123456
0x126785 // virlual parent func1()
derrived:
vt:
0x123456
0x126999 // overriden finc1()
0x456788 // new method
EDIT3: BTW I can't totally agree that C++ has always better speed JVM/.NET because "these are interpreted". C++ has part of "intepretation", and interpreted part is groving: real component/GUI frameworks have interpreted connections between too (map for example). Out of our discussion: what memory model is better, with C++ delete or GC?

Is virtual table necessary for C++?

I have a doubt about C++ virtual table recently.
Why does C++ use virtual table?
=>Because C++ compiler does not know the actual function address
--->Why?
=>Because C++ compiler does not know the exact type(Cat? Dog? Animal?) of the object the pointer "panimal" points to
---Why? Is that any way compiler can figure out the object type?
=>Yes, I think the compiler can make it via tracking object type.
Let's consider the sources where an object pointer gets its value. 2 sources indeed.
another pointer
address of class instance
Where does "another pointer" get its value? Eventually, there's a pointer that gets its value from "class instance".
So, via tracking the assignment thread backwards to the original source object
  => the compiler is able to figure out the exact type of a pointer.
  =>the compiler knows the address of the exact function being called
  =>no virtual table is needed.
Object type tracking saves both virtual table memery and virtual table pointer of each class instances.
Where does object type tracking not work?
Library Linking.
If a library function returns a base-class pointer, there's no way for the compiler to track back to the original source object. The compiler can probably adapt to library code and none-library code. For library classes that are exported out, use virtual table. For other classes, just track theire object type to save memory.
I am not sure whether there's some error in above statements, please kindly point it out if any. Thanks in advance~
In some cases, yes, the compiler can figure out the type a pointer points to at compile time. It is quite easy to construct a case where it cannot though.
int x;
cin >> x;
Animal* p;
if (x == 10)
p = new Cat();
else
p = new Dog();
If the compiler can, in all cases, prove the type of an object, it is free to eliminate virtual tables from its generated code, as per the as-if rule.
the compiler is able to figure out the exact type of a pointer.
yes, but how do you want it to call the right function at runtime? The compiler knows, but c++ has no virtual machine to tell it the type of the object being passed at runtime, ergo, the need for a vtable for virtual functions of inherited types.
Would you rather the compiler creates code for all the different code paths that lead to the execution of each virtual function so it calls the right function at runtime? That would lead to much much bigger binaries, if at all possible.
In this example it becomes clear that, whatever the static code analysis the compiler could take, the actual method that gets called at ptrA->f(); can only be known at runtime.
#include <sys/time.h>
#include <iostream>
#include <stdlib.h>
struct A {
virtual int f()
{
std::cout<<"class A\n";
}
};
struct B: public A {
int f()
{
std::cout<<"class B\n";
}
};
int main()
{
A objA;
B objB;
A* ptrA;
timeval tv;
gettimeofday(&tv, NULL);
unsigned int seed = (unsigned int)tv.tv_sec;
int randVal = rand_r(&seed);
if( randVal < RAND_MAX/2)
{
ptrA=&objA;
}
else
{
ptrA=&objB;
}
ptrA->f();
return 0;
}`

get the real address(or index in vTable) of virtual member function

In c++ is there any way to get the real address of member function, or the index in vTable ?
Updated:
I don't know the INDEX in vTable and
I don't know the address
Here's why I want to know this:
I want to hook the function ID3DXFont->DrawText of DirectX. If I know the index of the DrawText in the vTable, I can replace it to do the hook. But how to get the index? If it's able to get the the real address, I can search it in the vTable to get the index.
And not particularly ID3DXFont->DrawText, maybe some other functions in the future, so I'm trying to write a generic hook function.
Here's what I've tried so far:
#include <iostream>
using namespace std;
struct cls {
virtual int fn1() {
cout << "fn1 called" << endl;
return 1;
}
virtual int fn2() {
cout << "fn2 called" << endl;
return 2;
}
};
template <typename fn_t>
DWORD fn_to_addr(fn_t fn) { // convert function to DWORD for printing
union U {
fn_t fn;
DWORD addr;
};
U u;
u.fn = fn;
return u.addr;
}
int main() {
cls c;
DWORD addr = fn_to_addr(&cls::fn2);
cout << hex << addr << endl;
}
In debug mode, the code above outputs the address of jump table.
And in release mode, the &cls::fn2 returns 0x00401058, which points to some optimized code:
00401058 . mov eax, dword ptr [ecx] // get vptr
0040105A . jmp dword ptr [eax+4] // jmp to the second function (fn2)
Both are not the real address. Anyway to do that?
Thanks.
Don't give up so easily!
While the other answers are correct in saying that the C++ language doesn't allow you to do this in a portable way, there's an important factor in your particular case that may make this a more reasonable thing to do.
The key is that ID3DXFont is a COM interface and the exact binary details of how those work are specified separately from the language used to access them. So while C++ doesn't say what you'll find at the other end of that pointer, COM does say that there's a vtable there with an array of function pointers in a specified order and with a specified calling convention. This allows me to tell you that the index of the DrawText function is 314 (DrawTextA) or 15 (DrawTextW) and that this will still be true in Visual C++ 28.0 many years from now. Or in GCC 8.3.1 for that matter: since COM is a binary interface specification, all compilers are supposed to implement it the same way (if they claim to support COM).
Have a look at the second link below for a ready-made implementation of COM function hooking using two different methods. Approach#2 is the closest to what you're asking for but I think you may want to consider the first one instead because it involves less voodoo.
Sources:
[http://msdn.microsoft.com/en-us/library/ms680573(v=vs.85).aspx]
[http://www.codeproject.com/Articles/153096/Intercepting-Calls-to-COM-Interfaces]
[http://goodrender.googlecode.com/svn/trunk/include/d3dx9core.h]
There's nothing anywhere near portable. Your attempt using
&cls::fn2 can't work, since the results must work in cases
like (pCls->*fn)() even when pCls points to a derived class
which overrides the function. (Pointers to member functions are
complicated beasts, which identify whether the function is
virtual or not, and provide different information depending on
this. And if you're experimenting with MSC, be aware that you
have to specify /vmg for pointers to member functions to work
correctly.)
Even for a given implementation, you need an instance of the
correct type. Given that, if you know the class layout, and
the layout of the virtual function table, you can track it down.
Typically, the pointer to the virtual function table is the
first word in the class, although this is not guaranteed. And
usually, the functions will appear in the order they are
declared. Along with additional information, however, like
pointers to the RTTI, and possibly offset information required
to fix up the this pointer when calling the function (although
many compilers will use a trampoline for this). For 64 bit g++
under Windows (CygWin version):
struct C
{
virtual ~C() {}
virtual void fn1() const { std::cout << "In C::fn1\n"; }
virtual void fn2() const {}
};
void const*
fn1ToAddr( C const* pC )
{
void const* const* vPtr = *reinterpret_cast<void const* const* const*>(pC);
return vPtr[2];
}
fn1ToAddr returns the address of fn1 for the object passed
to it; if the object is of type C, it returns the address of
C::fn1, and if it is of a derived type which overrides fn1,
it returns the address of the overriding function.
Whether this works all of the time or not, I cannot say; I think
g++ uses trampolines in cases of multiple inheritance, for
example (in which case, the returned address would be the
address of the trampoline). And it might not work the next
major release of g++. (For the version of MSC I have at hand,
replacing the index 2 with 1 seems to work. But again,
I only tried very simple cases. There are absolutely no
guarantees.)
Basically, you would never want to do anything like this in
production code. It can be useful, however, if you're trying to
understand how the compiler works.
EDIT:
Re your edit with the why? Just because you have the address
(maybe), it doesn't mean that you can call the function. You
cannot call a member function without an object, and depending
on any number of things, you may not be able to pass the
function the object. (With MSC, for example, the object will
usually be passed in ECX.)
as mentioned in this wiki page:
Whenever a class defines a virtual function (or method), most
compilers add a hidden member variable to the class which points to a
so-called virtual method table (VMT or Vtable). This VMT is basically
an array of pointers to (virtual) functions.
as far as I know, you don't have access to the Vtable, the compiler doesn't even know the number of entries in the table.

How to get "direct" function pointer to a virtual member function?

I am working on an embedded platform which doesn't cope very well with dynamic code (no speculative / OOO execution at all).
On this platform I call a virtual member function on the same object quite often, however the compiler fails to optimize the vtable-lookup away, as it doesn't seem to recognize the lookup is only required for the first invocation.
Therefore I wonder: Is there a manual way to devirtualize a virtual member function of a C++ class in order to get a function-pointer which points directly to the resolved address?
I had a look at C++ function pointers, but since they seem to require a type specified, I guess this won`t work out.
Thank you in advance
There's no general standard-C++-only way to find the address of a virtual function, given only a reference to a base class object. Furthermore there's no reasonable type for that, because the this needs not be passed as an ordinary argument, following a general convention (e.g. it can be passed in a register, with the other args on stack).
If you do not need portability, however, you can always do whatever works for your given compiler. E.g., with Microsoft's COM (I know, that's not your platform) there is a known memory layout with vtable pointers, so as to access the functionality from C.
If you do need portability then I suggest to design in the optimization. For example, instead of
class Foo_base
{
public:
virtual void bar() = 0;
};
do like
class Foo_base
{
public:
typedef (*Bar_func)(Foo_base&);
virtual Bar_func bar_func() const = 0;
void bar() { bar_func()( *this ); }
};
supporting the same public interface as before, but now exposing the innards, so to speak, thus allowing manual optimization of repeated calls to bar.
Regarding gcc I have seen the following while debuggging the assembly code compiled.
I have seen that a generic method pointer holds two data:
a) a "pointer" to the method
b) an offset to add eventually to the class instance starting address ( the offset is used when multiple inheritance is involved and for methods of the second and further parent class that if applied to their objects would have their data at different starting points).
The "pointer" to the method is as follows:
1) if the "pointer" is even it is interpreted as a normal (non virtual) function pointer.
2) If the "pointer" is odd then 1 should be subtracted and the remaining value should be 0 or 4 or 8 or 12 ( supposing a pointer size of 4 bytes).
The previous codification supposes obviously that all normal methods start at even addresses (so the compiler should align them at even addresses).
So that offset is the offset into the vtable where to fetch the address of the "real" non virual method pointer.
So the correct idea in order to devirtualize the call is to convert a virtual method pointer to a non virtual method pointer and use it aftewards in order to apply it to the "subject" that is our class instance.
The code bellow does what described.
#include <stdio.h>
#include <string.h>
#include <typeinfo>
#include <typeindex>
#include <cstdint>
struct Animal{
int weight=0x11111111;
virtual int mm(){printf("Animal1 mm\n");return 0x77;};
virtual int nn(){printf("Animal1 nn\n");return 0x99;};
};
struct Tiger:Animal{
int weight=0x22222222,height=0x33333333;
virtual int mm(){printf("Tigerxx\n");return 0xCC;}
virtual int nn(){printf("Tigerxx\n");return 0x99;};
};
typedef int (Animal::*methodPointerT)();
typedef struct {
void** functionPtr;
size_t offset;
} MP;
void devirtualize(methodPointerT& mp0,const Animal& a){
MP& t=*(MP*)&mp0;
if((intptr_t)t.functionPtr & 1){
size_t index=(t.functionPtr-(void**)1); // there is obviously a more
void** vTable=(void**)(*(void**)&a); // efficient way. Just for clearness !
t.functionPtr=(void**)vTable[index];
}
};
int main()
{
int (Animal::*mp1)()=&Animal::nn;
MP& mp1MP=*(MP*)&mp1;
Animal x;Tiger y;
(x.*mp1)();(y.*mp1)();
devirtualize(mp1,x);
(x.*mp1)();(y.*mp1)();
}
Yes, this is possible in a way that works at least with MSVC, GCC and Clang.
I was also looking for how to do this, and here is a blog post I found that explains it in detail: https://medium.com/#calebleak/fast-virtual-functions-hacking-the-vtable-for-fun-and-profit-25c36409c5e0
Taking the code from there, in short, this is what you need to do. This function works for all objects:
template <typename T>
void** GetVTable(T* obj) {
return *((void***)obj);
}
And then to get a direct function pointer to the first virtual function of the class, you do this:
typedef void(VoidMemberFn)(void*);
VoidMemberFn* fn = (VoidMemberFn*)GetVTable<BaseType>(my_obj_ptr)[0];
// ... sometime later
fn(my_obj_ptr);
So it's quite easy actually.

Call a member function with bare function pointer

What's the best way to call a member function if you have an object and a bare function pointer pointing to the member? Essentially I want to call the function pointer with thiscall calling convention.
Background: I'm looking up symbols in a shared library dynamically, obtaining a factory function pointer and a pointer to a certain member function I want to call. The member function itself is not virtual. I have no control over the shared library, I just have the binary.
Example:
typedef void * (*GenericFptr)();
GenericFptr lookup(const char *);
class CFoo;
GenericFptr factoryfn(lookup("CFoo factory function"));
CFoo *foo = reinterpret_cast<CFoo *>(factoryfn());
GenericFptr memberfn(lookup("CFoo member function"));
// now invoke memberfn on foo
Currently I'm using an union to convert the function pointer to a pointer to member function. It's ugly and creates dependencies to compiler implementation details:
class CFoo {
public:
void *dummy() { return 0; }
};
typedef void * (CFoo::*FooMemberPtr)();
union {
struct {
// compiler-specific layout for pointer-to-member
void *x, *y;
GenericFptr ptr;
} fnptr;
FooMemberPtr memberfn;
} f;
f.memberfn = &CFoo::dummy; // init pointer-to-member
f.fnptr.ptr = memberfn; // rewrite pointer
void *result = (foo->*f.memberfn)();
A pointer to member function can't be stored in a pointer to function because it needs more information (for instance in case of multiple inheritance an offset may have to be applied to this before the call). So you can't do without knowledge of implementation details.
If you want to be portable, the easiest is for your library to provide wrapper functions doing the member call.
Unfortunately a member function pointer has more information than a standard function pointer, and when you get the standard function pointer, converting it to a member function pointer would effectively be trying to generate extra data out of thin air.
I don't think there's any portable way to do what you're attempting, although if the union appears to work you could probably get away with that. Again, you would need to know the representation and calling convention for these methods for each compiler you wish to use to build the bode.
If you know the member function's name, why can't you just do foo->dummy() for example? Otherwise either the lookup function needs to provide a full member function pointer or the library would have to provided a C wrapper interface with normal functions to which a this pointer can be passed.
The following two links provide insight and possibly a solution. Note that calling a member function pointer with a this argument usually don't work, since you must take into account virtual methods, multiple and virtual inheritance.
http://www.codeproject.com/KB/cpp/FastDelegate.aspx
http://www.codeproject.com/KB/cpp/ImpossiblyFastCppDelegate.aspx
According to the answer below it is doable in GCC, but not portable:
https://stackoverflow.com/a/5067992/705086