Member function pointer to integer? - c++

Is it possible to get the virtual address as an integer of a member function pointer?
I have tried.
void (AClass::*Test)();
Test = &AClass::TestFunc;
int num = *(int*)&Test;
But all that does is get me the virtual address of a jmp to the function. I need the actual functions virtual address.

I know this is old, but since there's no meaningful on-the-subject answer, here I go.
Some things need to be taken into account first.
Member-function calling convention in C++ is called __thiscall. This convention is almost identical to __stdcall, the only significant difference being that, before the effective call is made, ECX is set to be the pointer this of the object of which's method is called.
To illustrate this and answer your question at the same time, let's say that the class AClass has a member function declared like this: int AClass::myFunction(int a, int b) and that we have an instance of AClass called aClassObject.
Here's a rather hackish way to do what you initially asked for AND 'simulate' a AClass::myFunction call on the aClassObject once you obtain the raw pointer:
// declare a delegate, __stdcall convention, as stated above
typedef int (__stdcall *myFunctionDelegate)(int a, int b);
// here's the 'hackish' solution to your question
char myFunctionPtrString[10];
sprintf(myFunctionPtrString, "%d", &AClass::myFunction);
int myFunctionPtr = atoi(myFunctionPtrString);
// now let's call the method using our pointer and the aClassObject instance
myFunctionDelegate myFunction = (myFunctionDelegate)myFunctionPtr;
// before we make the call, we must put a pointer to aClassObject
// in ECX, to finally meet the __thiscall calling convention
int aClassObjectPtr = (int)&aClassObject;
__asm{
mov ecx, aClassObjectPtr
}
// make the call!
myFunction(2, 3);
And of course, the instance can be any instance of type AClass.

No, member function pointers can have a variety of sizes (from 4-16 bytes or more depending on platform, see the table in the article) and cannot reliably fit inside the space of an integer. This is because virtual functions and inheritence can cause the compiler to store several pieces of information in order to call the correct function, so in some cases there is not a simple address.

While I can't say definitively whether there is a portable way to do this, I generally recommend making a static wrapper function to provide this type of external access to a class method. Otherwise even if you succeeded you would be creating very tight coupling of the application to that class implementation.

If this is what I suspect it is, just switch off incremental linking. In the mean time, you're getting the right answer.
My other suspicion is that TestFunc may be virtual. For virtual functions whose address is taken, VC++ fakes up a little thunk that does the vtable lookup, and gives a pointer to that thunk as the address. (This ensures the correct derived function is found when the actual object is of a more-derived type. There are other ways of doing this, but this allows such pointers to be a single pointer and simplifies the calling code, at the cost of double jump when they're called through.) Switch on assembly language output for your program, and look through the result; it should be clear enough what's going on.
Here, too, what you're getting is the correct answer, and there's no way to find out the address of the "actual" function. Indeed, a virtual function doesn't name one single actual function, it names whichever derived function is appropriate for the object in question. (If you don't like this behaviour, make the function non-virtual.)
If you REALLY need the genuine address of the actual function, you've got two options. The annoying one is to write some code to scan the thunk to find out the vtable index of the function. Then look in the vtable of the object in question to get the function.
(But note that taking the address of a non-virtual function will give you the address of the actual function to call, and not a thunk -- so you'd have to cater for this possibility as well. The types of pointers to virtual functions and pointers to non-virtual functions are the same.)
The easier one is to make a non-virtual function that contains the code for each virtual function, then have each virtual function call the non-virtual function. That gives you the same behaviour as before. And if you want to find out where the code is, take the address of the non-virtual function.
(In either case, this would be difficult to make work well, it would be annoying, and it would be rather VC++-specific -- but you could probably make it happen if you're willing to put in the effort.)

Related

Eliminate pointer resolution on call of function pointer

I have a really strange question (I know, these types of optimizations are whacky and 99% of the time useless, but this is just an interesting case):
Suppose we have a struct with 1 method and 1 function pointer, that is assigned at RT. Considering when the functions are identical, the call to function pointer will require an additional pointer resolution and thus little-bit slower (where a method call is just a static offset).
Can we somehow eliminate this pointer resolution? (given that our dynamically assigned function pointer will never change afterwards)
The only solution I've thought of was to declare this "function pointer" as a static array of bytes, copy the code there, set memory to be executable and call it. That way the call also be identical to a "method" call.
Are there any other ways to achieve this strange run-time "linking"? (if you can call it this way :))

this is not a real pointer?

I am reading something about virtual table. When it comes to pointer __vptr,
it is stated that by the author
Unlike the *this pointer, which is actually a function parameter used by the compiler to resolve self-references, *__vptr is a real pointer. Consequently, it makes each class object allocated bigger by the size of one pointer.
What does it mean here by saying this is actually a function parameter? And this is not a real pointer?
Both pointers are real in the sense that they store an address of something else in memory. By "real" the author means "stored within the class", as opposed to this pointer, which is passed to member functions without being stored in the object itself. Essentially, the pointer to __vptr is part of the object, while this pointer is not.
this is always a hidden implicit formal argument. Practically speaking, every non static member function of a class is getting an implicit first argument which is this
so in
class Foo {
int x; // a field, i.e. an instance variable
void bar(double x);
};
the Foo::bar function has two arguments, exactly as if it was the C (not C++) function
void Foo__bar(Foo* mythis, double x);
And actually, name mangling and the compiler is transforming the first into a very close equivalent of the second. (I am using mythis instead of this because this is a keyword in C++).
In principle, the ABI of your implementation could mandate a different passing convention for this (e.g. use another machine register) and for other explicit arguments. In practice, it often does not. On my Linux system the x86-64 ABI (its figure 3.4 page 21) defines a calling convention that passes this (and first pointer formal argument to C function) in %rdi processor register.
Practically speaking, in C++, most -but not all- member functions are small (defined inside the class) and inlined by the optimizing compiler (and the latest C++11 and C++14 standards have been written with optimizing compilers in mind; see also this). In that case, the question of where is this stored becomes practically meaningless... (because of the inlining).
The virtual method table (vtable) is generally an implicit first pointer field (or instance variable) of objects, but things could become more complex, e.g. with virtual multiple inheritance. the vtable data itself (the addresses of virtual functions) is generated by the compiler. See also this answer.
In theory, a C++ implementation could provide the dynamic method dispatching by another mechanism than vtable. In practice, I know no C++ implementation doing that.

How does a pointer to virtual function differ from a pointer to a non-virtual one?

And why is it required to use "&" before function in the next piece of code?
void (Mammal::*pFunc) () const=0;
pFunc=&Mammal::Move;
Move() is a virtual function in the basic class and pFunc is a pointer to virtual function in this class.
So why we need to use "&"? According to some special properties of virtual function?
Or it is simply syntax?
An ordinary function can be called with just its address, so a pointer to an ordinary function is just an address.
A non-virtual function can also be called with just its address (plus, of course, the this pointer, passed by whatever mechanism the compiler uses), so a pointer to a non-virtual function could be just an address.
A virtual function has to be looked up by a compiler-specific mechanism (usually a vtable, at a known offset in the object; the address of the function is found by indexing into the table), and a pointer to virtual function has to contain whatever information is needed to determine what the actual function to call is, depending on the actual type of the object.
But a pointer to member function has to be able to handle both virtual and non-virtual functions, so it will have enough room for the right mechanism for both, and the runtime call will check the stored data to figure out what to do.
Some compilers provide an alternative, where if you promise that you will absolutely, positively, never store a pointer to virtual function in the pointer, the compiler will generate a smaller pointer representation, and you'll find that you're in trouble later when you break that promise.
As to why the & is required, it's required. Microsoft's early C++ compilers didn't require the & (and didn't require the class name; if you left it out you'd get a pointer to the member function of the class of the current object); they talked about proposing eliminating the rule, but it didn't get anywhere.
& is the address-of operator. While in C it is possible to simply specify the name of the function without this, in C++ you are supposed to use it with a fully qualified function name.
If I remember correctly, when you take the address of a virtual function, the compiler will actually generate an intermediate function, the address of which will then be used instead. When that intermediate function is called, all it will do is call the virtual function you specified.
In your first line, pFunc isn't a pointer to a virtual function, it's a pointer to a const method in Mammal that takes no arguments and returns void. You are then assigning 0 to that pointer.
The '&' operator is giving you the address of the Mammal::Move method in class scope, suitable for assigning to a member function pointer.
The standard requires the '&' operator, though I have used compilers that didn't require it.

-> operator on null objects

According to C++ Standard, it's perfectly acceptable to do this:
class P
{
void Method() {}
};
...
P* p = NULL;
p->Method();
However, a slight change to this:
class P
{
virtual void Method() {}
};
...
P* p = NULL;
p->Method();
produces an access violation when compiled with Visual Studio 2005.
As far as I understand, this is caused by some quirk in Microsoft's compiler implementation and not by my sheer incompetence for a change, so the questions are:
1) Does this behavior persist in more recent versions of VS?
2) Are there any, I don't know, compiler settings that prevent this access violation?
According to C++ Standard, it's perfectly acceptable to do this
No it is not!
Dereferencing a NULL pointer is Undefined Behavior as per the C++ Standard.[#1]
However, If you do not access any members inside a non virtual member function it will most likely work on every implementation because for a non virtual member function the this only needs to be derefernced for accessing members of this since there are no members being accessed inside the function hence the result.
However, just because the observable behavior is okay does not mean the program is well-formed. correct.
It still is ill-formed.
It is an invalid program nevertheless.
The second version crashes because while accessing a virtual member function, the this pointer needs to be dereferenced just even for calling the appropriate member function even if there are no members accessed within that member function.
A good read:
What's the difference between how virtual and non-virtual member functions are called?
[#1]Reference:
C++03 Standard: ยง1.9/4
Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior. ]
As said by AIs... I'll even explain why: in many C++ implementations the this pointer is simply passed as the first "hidden" parameter of the method. So what you see as
void Method() {}
is really
void Method(P* this) {}
But for virtual methods it's more complex. The runtime needs to access the pointer to find the "real" type of P* to be able to call the "right" virtual implementation of the method. So it's something like
p->virtualTable->Method(p);
so p is always used.
First of all, neither one will even compile, because you've defined Method as private.
Assuming you make Method public, you end up with undefined behavior in both cases. Based on the typical implementation, most compilers will allow the first to "work" (for a rather loose definition of work) while the second will essentially always fail.
This is because a non-virtual member function is basically a normal function that receives an extra parameter. Inside that function, the keyword this refers to that extra parameter, which is a pointer to the class instance for which the function was invoked. If you invoke the member function via a null pointer, it mostly means that inside that function this will be a null pointer. As long as nothing in the function attempts to dereference this, chances are pretty good that you see any noticeable side effects.
A virtual function, however, is basically a function called via a pointer. In a typical implementation, any class that has one or more virtual functions (whether defined directly in that class, or inherited from a base class) will have a vtable. Each instance of that class (i.e., each object) will contain a pointer to the vtable for its class. When you try to call a virtual function via a pointer, the compiler will generate code that:
Dereferences that pointer.
Gets the vtable pointer from the proper offset in that object
dereferences the vtable pointer to get the class' vtable
looks at the proper offset in the vtable to get a pointer to the function to invoke
invokes that function
Given a null pointer, step one of that process is going to break.
I'd note for the record that this applies to virtually all C++ compilers. VC++ is far from unique in this regard. Quite the contrary -- while it's theoretically possible for a compiler to implement virtual functions (for one example) differently than this, the reality is that every compiler of which I'm aware works essentially identically for the kind of code you posted. Virtually all C++ compilers will show similar behavior given the same code -- major differences in implementation are mostly a theoretical possibility, not one you're at all likely to encounter in practice.

How does a C++ object access its member functions?

How does a C++ object know where it's member function definitions are present? I am quite confused as the Object itself does not contain the function pointers.
sizeof on the Object proves this.
So how is the object to function mapping done by the Runtime environment? where is a class's member function-pointer table maintained?
If you're calling non-virtual functions, there's no need for a function-pointer table; the compiler can resolve the function addresses at compile-time. So:
A a;
a.func();
translates to something along the lines of:
A a;
A_func(&a);
Calling a virtual function through a base-class pointer typically uses a vtable. So:
A *p_a = new B();
p_a->func();
translates to something along the lines of:
A *p_a = new B();
p_a->p_vtbl->func(p_a);
where p_vtbl is a compiler-implemented pointer to the vtable specific to the actual class of *p_a.
There are generally two ways that an object and its member functions are associated:
For a non-virtual function, the compiler determines the appropriate function at compile time. Non-static member functions are usually passed a hidden parameter that contains the this pointer, which takes care of the association of the object and the class member function.
For virtual functions, most compilers tend to use a lookup table that is usually referenced via the object's this pointer or a similar mechanism. This table, normally called the vtable, contains the function pointer for the virtual functions only.
As C++ is not a dynamic language, the compiler can do most of the object/function/symbol resolution at compile time with the exception of some virtual functions. In some cases, it's even possible for the compiler to determine exactly which instance of a virtual function gets called and skip the resolution via the vtable.
Member functions are not part of the object - they are defined statically, in one place, just like any other function. There is no magic look-up needed.
Virtual functions are different, but I don't think your question is about that...
For non-virtual functions there is one (global, per-class) function table which all instances use. Since it's the same for all of them - deterministic at compile-time - you would not want it duplicated in each instance.
For virtual functions, resolution is done at runtime and the object will contain a function table for them. Try that and look at your object again.