How to get every virtual function index just as the compiler does? - c++

Is there some plugin or tool which can read a .h file (or simply modify Intellisense itself) and spit out every function and it's virtual function table index? There's a pattern which I have yet to figure out having to do with polymorphism, and it gets 5x harder when you start to have 5 classes or more deriving from each other. No matter what, though, the MSVC++ compiler always spits out the correct virtual function table index when it compiles the virtual function call from C++ to Assembly. There has to be a better way to get that index without loading, break-pointing, reading the offset, and rewriting the code, right?
Thanks!

Use the hidden Microsoft C/C++ compiler option "/d1 reportAllClassLayout". This will print out the memory layout and vtables of all your classes.

You could try writing a hack that determines it for you - this is implementation defined, but you can usually find a pointer to the virtual function table at the beginning of class memory. If you follow to this table there'll be a list of function pointers in memory (but you won't know how many). However, by searching for the functions you know about in the table of function pointers, you could identify its index.

In MSVC you can't browse at runtime the vtable and compare equality with a given member function pointer as they are not the same. One is the real pointer, the other is a pointer which indirects to the real one.
However with this compiler you can do it at runtime with another hack I discovered.
Create a class (named IndexFinder for example) inside which you declare as many instance methods as the max virtual methods you can have in a class. Each of these method must return an unique integer value starting from 0 to your max.
Create a fake virtual method table and store your method pointers so that the integer they return match the index in which you store them (the method which returns 0 will be the first in your fake vtable).
When you want to find the index of any virtual method you have to do some dirty cast of the method member pointer to a IndexFinder method pointer.
The principle is easy : for virtual methods the compiler will generate code which indirects to the real method using the vtable with the good index. As you replaced the compiler generated vtable with a fake one, it will jump to yours and not to the supposed one. As your method returns the index inside which it's stored, you just have to get the return and you have your index.
Here is a code which is more explicit (I repeat it's a compiler dependant hack, the ones who don't like that, don't read it ^^). But I tried it, it works perfectly as it's just a redirection hack (I'm looking for a trick with GCC, but i haven't find it yet).
It's possible that it depends on the call convention, i haven't tried it in all cases for now.
One advantage of this trick is that you don't need to construct an instance of your class to find the index of one of it's virtual methods.
// In the header .h
class IndexFinder
{
typedef int (IndexFinder::*method_pointer)();
public:
template<typename _MethodPtr>
int getIndexOf(_MethodPtr ptr) {
return (reinterpret_cast<IndexFinder*>(&fake_vtable_ptr)->**((IndexFinder::method_pointer*)(&ptr)))()
}
protected:
int method0() { return 0; }
int method1() { return 1; }
int method2() { return 2; }
int method3() { return 3; }
protected:
typedef method_pointer fake_vtable_t [4];
static fake_vtable_t fake_vtable;
void* fake_vtable_ptr;
};
// In the cpp file
IndexFinder::fake_vtable_t IndexFinder::fake_vtable = {
&IndexFinder::method0 ,
&IndexFinder::method1 ,
&IndexFinder::method2 ,
&IndexFinder::method3
};
void* IndexFinder::fake_vtable_ptr = &IndexFinder::fake_vtable;
// to use it :
int index = IndexFinder::getIndexOf(&YourClass::yourVirtualMethod);

You can use Developer Command Prompt
cl /d1 reportSingleClassLayoutXXX filename
XXX -> class name
Example
class CBase
{
public:
CBase() {
};
virtual void Walk() { cout << "CBase:Walk" << endl; }
virtual void Jump() { cout << "CBase:Jump" << endl; }
void Run(int speed) { cout << "CBase:Run:" << "Speed=" << speed << endl; }
};
class CDerivedA : public CBase
{
public:
CDerivedA() {
};
void Walk() { cout << "CDerivedA:Walk" << endl; }
void Jump() { cout << "CDerivedA:Jump" << endl; }
void Run(int speed) { cout << "CDerivedA:Run" << "Speed=" << speed << endl; }
};
Layout
D:\nicolas>cl /d1 reportSingleClassLayoutCDerivedA nicolas.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.10.25019 for x86
Copyright (C) Microsoft Corporation. All rights reserved.
class CDerivedA size(4):
+---
0 | +--- (base class CBase)
0 | | {vfptr}
| +---
+---
CDerivedA::$vftable#:
| &CDerivedA_meta
| 0
0 | &CDerivedA::Walk
1 | &CDerivedA::Jump
CDerivedA::Walk this adjustor: 0
CDerivedA::Jump this adjustor: 0

Related

Why does adding virtual method increase class size in C++?

#include <iostream>
using namespace std;
class a {
virtual int foo() {
return 0;
}
};
class b {
int foo() {
return 0;
}
};
int main() {
cout << sizeof(b) << endl;
cout << sizeof(a) << endl;
}
Output (with g++ 4.9, -O3):
1
8
I assume the increase in size is due to adding a vpointer. But I thought the compiler would see that a is not actually deriving or being derived from anything, hence there is no need to add the vpointer?
The vpointer is needed because the compiler cannot guarantee an external (e.g. shared) library does not use a derived type. The existence-of-derived-class resolution happens at runtime.
Run-time type information. Any polymorphic class creates extra meta-data in the program to make things like typeof and dynamic_cast work. This is in addition to the virtual function table.

vtables and this pointer

I was trying to learn some more about the inner workings of vtables and vpointers, so I decided to try to access the vtable directly using some tricks. I created two classes, Base and Derv, each having two virtual functions (Derv overriding those of Base).
class Base
{
int x;
int y;
public:
Base(int x_, int y_) : x(x_), y(y_) {}
virtual void foo() { cout << "Base::foo(): x = " << x << '\n'; }
virtual void bar() { cout << "Base::bar(): y = " << y << '\n'; }
};
class Derv: public Base
{
int x;
int y;
public:
Derv(int x_, int y_) : Base(x_, y_), x(x_), y(y_) {}
virtual void foo() { cout << "Derived::foo(): x = " << x << '\n'; }
virtual void bar() { cout << "Derived::bar(): y = " << y << '\n'; }
};
Now, the compiler adds a vtable pointer to each class, occupying the first 4 bytes (32 bits) in memory. I accessed this pointer by casting the address of an object to a size_t*, since the pointer points to another pointer of size sizeof(size_t). The virtual functions can now be accessed by indexing the vpointer, and casting the result to a function pointer of the appropriate type. I encapsulated these steps in a function:
template <typename T>
void call(T *ptr, size_t num)
{
typedef void (*FunPtr)();
size_t *vptr = *reinterpret_cast<size_t**>(ptr);
FunPtr fun = reinterpret_cast<FunPtr>(vptr[num]);
//setThisPtr(ptr); added later, see below!
fun();
}
When one of the memberfunctions are called this way, e.g. call(new Base(1, 2), 0) to call Base::foo(), it is hard to predict what will happen, since they are called without a this-pointer. I solved this by adding a little templatized function, knowing that g++ stores the this-pointer in the ecx register (this however forces me to compile with the -m32 compiler flag):
template <typename T>
void setThisPtr(T *ptr)
{
asm ( mov %0, %%ecx;" :: "r" (ptr) );
}
Uncommenting the setThisPtr(ptr) line in the snippet above now makes it a working program:
int main()
{
Base* base = new Base(1, 2);
Base* derv = new Derv(3, 4);
call(base, 0); // "Base::foo(): x = 1"
call(base, 1); // "Base::bar(): y = 2"
call(derv, 0); // "Derv::foo(): x = 3"
call(derv, 1); // "Derv::bar(): y = 4"
}
I decided to share this, since in the process of writing this little program I gained more insight in how vtables work and it might help others in understanding this material a little better.
However I still have some questions:
1. Which register is used (gcc 4.x) to store the this-pointer when compiling a 64-bit binary? I tried all 64-bit registers as documented here: http://developers.sun.com/solaris/articles/asmregs.html
2. When/how is the this-pointer set? I suspect that the compiler sets the this pointer on each function call through an object in a similar way as to how I just did it. Is this the way polymorphism actually works? (By setting the this-pointer first, then calling the virtual function from the vtable?).
On Linux x86_64, and I believe other UNIX-like OSes, function calls follow the System V ABI (AMD64), which itself follows the IA-64 C++ ABI for C++. Depending on the method's type, the this pointer is either passed implicitly through first argument or the second argument (when the return value has non-trivial copy constructor or destructor, it must live as a temporary on stack, and the first argument is implicitly a pointer to that space); otherwise, virtual method calls are identical to function calls in C (integer/pointer arguments in %rdi, %rsi, %rdx, %rcx, %r8, %r9, overflowing to stack; integer/pointer return in %rax; floats in %xmm0-%xmm7; etc.). Virtual method dispatch works by looking up a pointer in the vtable then calling it just like a non-virtual method.
I'm less familiar with Windows x64 conventions, but I believe it to be similar in that C++ method calls follow the exact same structure as C function calls (which use different registers than on Linux), just with an implicit this argument first.

How does the Visual C++ compiler pass the this ptr to the called function?

I'm learning C++ using Eckel's "Thinking in C++". It states the following:
If a class contains virtual methods, a virtual function table is created for that class etc. The workings of the function table are explained roughly. (I know a vtable is not mandatory, but Visual C++ creates one.)
The calling object is passed to the called function as an argument. (This might not be true for Visual C++ (or any compiler).) I'm trying to find out how VC++ passes the calling object to the function.
To test both points in Visual C++, I've created the following class (using Visual Studio 2010, WinXP Home 32bit):
ByteExaminer.h:
#pragma once
class ByteExaminer
{
public:
short b[2];
ByteExaminer(void);
virtual void f() const;
virtual void g() const;
void bruteFG();
};
ByteExaminer.cpp:
#include "StdAfx.h"
#include "ByteExaminer.h"
using namespace std;
ByteExaminer::ByteExaminer(void)
{
b[0] = 25;
b[1] = 26;
}
void ByteExaminer::f(void) const
{
cout << "virtual f(); b[0]: " << hex << b[0] << endl;
}
void ByteExaminer::g(void) const
{
cout << "virtual g(); b[1]: " << hex << b[1] << endl;
}
void ByteExaminer::bruteFG(void)
{
int *mem = reinterpret_cast<int*>(this);
void (*fg[])(ByteExaminer*) = { (void (*)(ByteExaminer*))(*((int *)*mem)), (void (*)(ByteExaminer*))(*((int *)(*mem + 4))) };
fg[0](this);
fg[1](this);
}
The navigation through the vtable in bruteFG() works - when I call fg[0](this), f() is called. What does NOT work, however, is the passing of this to the function - meaning that this->b[0] is not printed correctly (garbage comes out instead. I'm actually lucky this doesn't produce a segfault).
So the actual output for
ByteExaminer be;
be.bruteFG();
is:
virtual f(); b[0]: 1307
virtual g(); b[1]: 0
So how should I proceed to get the correct result? How are the this pointers passed to functions in VC++?
(Nota bene: I'm NOT going to program this way seriously, ever. This is "for the lulz"; or for the learning experience. So don't try to convert me to proper C++ianity :))
Member functions in Visual Studio have a special calling convention, __thiscall, where this is passed in a special register. Which one, I don't recall, but MSDN will say. You will have to go down to assembler if you want to call a function pointer which is in a vtable.
Of course, your code exhibits massively undefined behaviour- it's only OK to alias an object using a char or unsigned char pointer, and definitely not an int pointer- even ignoring the whole vtable assumptions thing.
OK using DeadMG's hint I've found a way without using assembler:
1) Remove the ByteExaminer* arg from the functions in the fg[] array
2) Add a function void callfunc(void (*)()); to ByteExaminer:
void ByteExaminer::callfunc(void (*func)())
{
func();
}
... this apparently works because func() is the first thing to be used in callfunc, so ecx is apparently not changed before. But this is a dirty trick (as you can see in the code above, I'm always on the hunt for clean code). I'm still looking for better ways.

calling virtual functions through pointers with and without consulting the VM-table

I want to take the address of a member function of a c++ class, store it in a pointer, and call the virtual function later on.
I know some things about it, but do not now how to take the address of a certain implementation of a virtual function that is NOT the implementation of the most descendent class (the actual class of the object).
Here is some sample code:
#include <iostream>
using namespace std;
class ca
{
public:
virtual void vfunc() {cout << "a::vfunc ";}
void mfunc() {cout << "a::mfunc ";}
};
class cb : public ca
{
public:
virtual void vfunc() {cout << "b::vfunc ";}
};
extern "C" int main(int, char **)
{
void (ca:: *ptr_to_vfunc)() = &ca::vfunc;
cout << sizeof(ptr_to_vfunc) << " ";
cb b;
(b.*ptr_to_vfunc)();
ca a;
(a.*ptr_to_vfunc)();
void (ca:: *ptr_to_mfunc)() = &ca::mfunc;
cout << sizeof(ptr_to_mfunc) << " ";
(a.*ptr_to_mfunc)();
}
The output is:
12 b::vfunc a::vfunc 12 a::mfunc
I am working with win32-environment, and the size of member function pointers is 3 * 32-bits values! I did not specify an object when I took the address of the member function and yet, my call invokes the most descendant class' implementation of vfunc().
1) What is going on here? Why 12 bytes in stead of 4?
2) How can I take the address of ca::vfunc() and call it on b, like I normaly would do with b.ca::vfunc().
Ok: Its doing exactly what it it is supposed to do.
But to answer you questions:
1) What is going on here? Why 12 bytes in stead of 4?
Why not.
The standard does not specify a size.
I am not sure why you expect a normal pointer to be 4.
If the question is "why is a method pointer larger than a normal pointer?"
Because the implementation needs the extra space to hold information about the call.
2) How can I take the address of ca::vfunc() and call it on b, like I normaly would do with b.ca::vfunc().
You cant.

Static ctor/dtor observer for arb. C++ classes

I have a series of classes A, B, ... which have many derived classes which are created inside a module I do not wish to change.
Additionally, I have at least one class Z, which has to be informed whenever an object of type A (or derived classes) is created or destroyed. In the future, there may be more classes, Y, X that want to observe different objects.
I am looking for a convenient way to solve this.
At first glance, the problem seemed trivial, but I'm kind of stuck right now.
What I came up with, is two base classes SpawnObserver and SpawnObservable which are supposed to do the job, but I am very unhappy with them for several reasons (see attached simplification of these classes).
When Z is notified, the actual object is either not yet or not anymore existent, due to the order in which base classes are created/destroyed. Although the pointers can be compared when destroying an object (to remove them from some data-structures in Z) this does not work when it is created and it surely does not work when you have multiple inheritance.
If you want to observe only one class, say A, you are always notified of all (A, B, ...).
You have to explicitly if/else through all classes, so you have to know all classes that inherit from SpawnObservable, which is pretty bad.
Here are the classes, which I tried to trim down to the most basic functionality, which you need to know to understand my problem. In a nutshell: You simply inherit from SpawnObservable and the ctor/dtor does the job of notifying the observers (well, at least, this is what I want to have).
#include <list>
#include <iostream>
class SpawnObservable;
class SpawnObserver {
public:
virtual void ctord(SpawnObservable*) = 0;
virtual void dtord(SpawnObservable*) = 0;
};
class SpawnObservable {
public:
static std::list<SpawnObserver*> obs;
SpawnObservable() {
for (std::list<SpawnObserver*>::iterator it = obs.begin(), end = obs.end(); it != end; ++it) {
(*it)->ctord(this);
}
}
~SpawnObservable() {
for (std::list<SpawnObserver*>::iterator it = obs.begin(), end = obs.end(); it != end; ++it) {
(*it)->dtord(this);
}
}
virtual void foo() {} // XXX: very nasty dummy virtual function
};
std::list<SpawnObserver*> SpawnObservable::obs;
struct Dummy {
int i;
Dummy() : i(13) {}
};
class A : public SpawnObservable {
public:
Dummy d;
A() : SpawnObservable() {
d.i = 23;
}
A(int i) : SpawnObservable() {
d.i = i;
}
};
class B : public SpawnObservable {
public:
B() { std::cout << "making B" << std::endl;}
~B() { std::cout << "killing B" << std::endl;}
};
class PrintSO : public SpawnObserver { // <-- Z
void print(std::string prefix, SpawnObservable* so) {
if (dynamic_cast<A*>(so)) {
std::cout << prefix << so << " " << "A: " << (dynamic_cast<A*>(so))->d.i << std::endl;
} else if (dynamic_cast<B*>(so)) {
std::cout << prefix << so << " " << "B: " << std::endl;
} else {
std::cout << prefix << so << " " << "unknown" << std::endl;
}
}
virtual void ctord(SpawnObservable* so) {
print(std::string("[ctord] "),so);
}
virtual void dtord(SpawnObservable* so) {
print(std::string("[dtord] "),so);
}
};
int main(int argc, char** argv) {
PrintSO pso;
A::obs.push_back(&pso);
B* pb;
{
std::cout << "entering scope 1" << std::endl;
A a(33);
A a2(34);
B b;
std::cout << "adresses: " << &a << ", " << &a2 << ", " << &b << std::endl;
std::cout << "leaving scope 1" << std::endl;
}
{
std::cout << "entering scope 1" << std::endl;
A a;
A a2(35);
std::cout << "adresses: " << &a << ", " << &a2 << std::endl;
std::cout << "leaving scope 1" << std::endl;
}
return 1;
}
The output is:
entering scope 1
[ctord] 0x7fff1113c640 unknown
[ctord] 0x7fff1113c650 unknown
[ctord] 0x7fff1113c660 unknown
making B
adresses: 0x7fff1113c640, 0x7fff1113c650, 0x7fff1113c660
leaving scope 1
killing B
[dtord] 0x7fff1113c660 unknown
[dtord] 0x7fff1113c650 unknown
[dtord] 0x7fff1113c640 unknown
entering scope 1
[ctord] 0x7fff1113c650 unknown
[ctord] 0x7fff1113c640 unknown
adresses: 0x7fff1113c650, 0x7fff1113c640
leaving scope 1
[dtord] 0x7fff1113c640 unknown
[dtord] 0x7fff1113c650 unknown
I want to stress, that I am perfectly aware why my solution behaves the way it does. My question is whether you have a better approach of doing this.
EDIT
As an extension to this question (and inspired by the comments below), I'd like to know:
Why do you think this is a terrible approach?
As an additional note: What I an trying to accomplish by this is to install a normal Observer in each and every created object.
EDIT 2
I will accept an answer that solves problem 1 (bold one in the enumeration above) or describes why the whole thing is a very bad idea.
Use the curiously recurring template pattern.
template<typename T> class watcher {
typename std::list<T>::iterator it;
watcher();
~watcher();
void ctord(T*);
void dtord(T*);
};
template<typename T> class Observer {
public:
typedef std::list<T*> ptr_list;
static ptr_list ptrlist;
typedef typename ptr_list::iterator it_type;
it_type it;
typedef std::list<watcher<T>*> watcher_list;
static watcher_list watcherlist;
typedef typename watcher_list::iterator watcher_it_type;
Observer() {
ptrlist.push_back(this);
it_type end = ptrlist.end();
end--;
it = end;
for(watcher_it_type w_it = watcherlist.begin(); w_it != watcherlist.end(); w_it++)
w_it->ctord(this);
}
~Observer() {
ptrlist.erase(it);
for(watcher_it_type w_it = watcherlist.begin(); w_it != watcherlist.end(); w_it++)
w_it->ctord(this);
}
};
class A : public Observer<A> {
};
class B : public Observer<B> {
};
class C : public A, public B, public Observer<C> {
// No virtual inheritance required - all the Observers are a different type.
};
template<typename T> watcher<T>::watcher<T>() {
Observer<T>::watcherlist.push_back(this);
it = watcherlist.end();
it--;
}
template<typename T> watcher<T>::~watcher<T>() {
Observer<T>::watcherlist.erase(it);
}
template<typename T> void watcher<T>::ctord(T* ptr) {
// ptr points to an instance of T that just got constructed
}
template<typename T> void watcher<T>::dtord(T* ptr) {
// ptr points to an instance of T that is just about to get destructed.
}
Not just that, but you can inherit from Observer multiple times using this technique, as two Observer<X> and Observer<Y> are different types and thus doesn't require diamond inheritance or anything like that. Plus, if you need different functionality for Observer<X> and Observer<Y>, you can specialize.
Edit # Comments:
class C DOES inherit from Observer<A> and Observer<B> through A and B, respectively. It doesn't need to know or care whether or not they're being observed. A C instance will end up on all three lists.
As for ctord and dtord, I don't actually see what function they perform. You can obtain a list of any specific type using Observer::ptrlist.
Edit again: Oooooh, I see. Excuse me a moment while I edit some more. Man, this is some of the most hideous code I've ever written. You should seriously consider not needing it. Why not just have the objects that need to be informed about the others do their creation?
Issue 1 isn't easily solved (in fact I think it's impossible to fix). The curiously recurring template idea comes closest to solving it, because the base class encodes the derived type, but you'll have to add a base to every derived class, if you really insist on knowing the derived type when the base is being constructed.
If you don't mind performing your actual operations (other than the bookkeeping, I mean) or examining the list outside the constructor or destructor of each object, you could have it (re)build the minimal list only when the operation is about to be performed. This gives you a chance to use the fully-constructed object, and makes it easier to solve issue 2.
You'd do this by first having a list of objects that have been constructed, but aren't on the 'full' list. And the 'full' list would contain two pointers per constructed object. One is the pointer to the base class, which you'll store from the Observable constructor, possibly multiple times during the construction of a single object. The other is a void *, pointing to the most derived part of the object -- use dynamic_cast<void *> to retrieve this -- and is used to make sure that each object only appears once in the list.
When an object is destroyed, if it has multiple Observable bases, each will try to remove itself from the lists, and when it comes to the full list, only one will succeed -- but that's fine, because each is equally good as an arbitrary base of that object.
Some code follows.
Your full list of objects, iterable in as straightforward a fashion as std::map will allow. (Each void * and each Observable * is unique, but this uses the Observable * as the key, so that it's easy to remove the entry in the Observable destructor.)
typedef std::map<Observable *, void *> AllObjects;
AllObjects allObjects;
And your list of objects that have been constructed, but aren't yet added to allObjects:
std::set<Observable *> recentlyConstructedObjects;
In the Observable constructor, add the new object to the list of pending objects:
recentlyConstructedObjects.insert(this);
In the Observable destructor, remove the object:
// 'this' may not be a valid key, if the object is in 'allObjects'.
recentlyConstructedObjects.erase(this);
// 'this' may not be a valid key, if the object is in 'recentlyConstructedObjects',
// or this object has another Observable base object and that one got used instead.
allObjects.erase(this);
Before you're about to do your thing, update allObjects, if there've been any objects constructed since last time it was updated:
if(!recentlyConstructedObjects.empty()) {
std::map<void *, Observable *> newObjects;
for(std::set<Observable *>::const_iterator it = recentlyConstructedObjects.begin(); it != recentlyConstructedObjects.end(); ++it)
allObjectsRev[dynamic_cast<void *>(*it)] = *it;
for(std::map<void *, Observable *>::const_iterator it = newObjects.begin(); it != newObjects.end(); ++it)
allObjects[it->second] = it->first;
recentlyConstructedObjects.clear();
}
And now you can visit each object the once:
for(std::map<Observable *,void *>::const_iterator it = allObjects.begin(); it != allObjects.end(); ++it) {
// you can dynamic_cast<whatever *>(it->first) to see what type of thing it is
//
// it->second is good as a key, uniquely identifying the object
}
Well... now that I've written all that, I'm not sure whether this solves your problem. It was interesting to consider nonetheless.
(This idea would solve one of the problems with the curiously recurring template, namely that you have lots of base objects per derived object and it's harder to disentangle because of that. (Unfortunately, no solution to the large number of base classes, sorry.) Due to the use of dynamic_cast, of course, it's not much use if you call it during an object's construction, which is of course the advantage of the curiously recurring thing: you know the derived type during the base's construction.
(So, if your'e going with that style of solution, AND you are OK with performing your operations outside the construction/destruction stage, AND you don't mind the (multiple) base classes taking up space, you could perhaps have each base's constructor store some class-specific info -- using typeid, perhaps, or traits -- and merge these together when you build the larger list. This should be straightforward, since you'll know which base objects correspond to the same derived object. Depending on what you're trying to do, this might help you with issue 3.)
Take a look at Signals and Slots especially Boost Signals and Slots