I am working on an embedded platform which doesn't cope very well with dynamic code (no speculative / OOO execution at all).
On this platform I call a virtual member function on the same object quite often, however the compiler fails to optimize the vtable-lookup away, as it doesn't seem to recognize the lookup is only required for the first invocation.
Therefore I wonder: Is there a manual way to devirtualize a virtual member function of a C++ class in order to get a function-pointer which points directly to the resolved address?
I had a look at C++ function pointers, but since they seem to require a type specified, I guess this won`t work out.
Thank you in advance
There's no general standard-C++-only way to find the address of a virtual function, given only a reference to a base class object. Furthermore there's no reasonable type for that, because the this needs not be passed as an ordinary argument, following a general convention (e.g. it can be passed in a register, with the other args on stack).
If you do not need portability, however, you can always do whatever works for your given compiler. E.g., with Microsoft's COM (I know, that's not your platform) there is a known memory layout with vtable pointers, so as to access the functionality from C.
If you do need portability then I suggest to design in the optimization. For example, instead of
class Foo_base
{
public:
virtual void bar() = 0;
};
do like
class Foo_base
{
public:
typedef (*Bar_func)(Foo_base&);
virtual Bar_func bar_func() const = 0;
void bar() { bar_func()( *this ); }
};
supporting the same public interface as before, but now exposing the innards, so to speak, thus allowing manual optimization of repeated calls to bar.
Regarding gcc I have seen the following while debuggging the assembly code compiled.
I have seen that a generic method pointer holds two data:
a) a "pointer" to the method
b) an offset to add eventually to the class instance starting address ( the offset is used when multiple inheritance is involved and for methods of the second and further parent class that if applied to their objects would have their data at different starting points).
The "pointer" to the method is as follows:
1) if the "pointer" is even it is interpreted as a normal (non virtual) function pointer.
2) If the "pointer" is odd then 1 should be subtracted and the remaining value should be 0 or 4 or 8 or 12 ( supposing a pointer size of 4 bytes).
The previous codification supposes obviously that all normal methods start at even addresses (so the compiler should align them at even addresses).
So that offset is the offset into the vtable where to fetch the address of the "real" non virual method pointer.
So the correct idea in order to devirtualize the call is to convert a virtual method pointer to a non virtual method pointer and use it aftewards in order to apply it to the "subject" that is our class instance.
The code bellow does what described.
#include <stdio.h>
#include <string.h>
#include <typeinfo>
#include <typeindex>
#include <cstdint>
struct Animal{
int weight=0x11111111;
virtual int mm(){printf("Animal1 mm\n");return 0x77;};
virtual int nn(){printf("Animal1 nn\n");return 0x99;};
};
struct Tiger:Animal{
int weight=0x22222222,height=0x33333333;
virtual int mm(){printf("Tigerxx\n");return 0xCC;}
virtual int nn(){printf("Tigerxx\n");return 0x99;};
};
typedef int (Animal::*methodPointerT)();
typedef struct {
void** functionPtr;
size_t offset;
} MP;
void devirtualize(methodPointerT& mp0,const Animal& a){
MP& t=*(MP*)&mp0;
if((intptr_t)t.functionPtr & 1){
size_t index=(t.functionPtr-(void**)1); // there is obviously a more
void** vTable=(void**)(*(void**)&a); // efficient way. Just for clearness !
t.functionPtr=(void**)vTable[index];
}
};
int main()
{
int (Animal::*mp1)()=&Animal::nn;
MP& mp1MP=*(MP*)&mp1;
Animal x;Tiger y;
(x.*mp1)();(y.*mp1)();
devirtualize(mp1,x);
(x.*mp1)();(y.*mp1)();
}
Yes, this is possible in a way that works at least with MSVC, GCC and Clang.
I was also looking for how to do this, and here is a blog post I found that explains it in detail: https://medium.com/#calebleak/fast-virtual-functions-hacking-the-vtable-for-fun-and-profit-25c36409c5e0
Taking the code from there, in short, this is what you need to do. This function works for all objects:
template <typename T>
void** GetVTable(T* obj) {
return *((void***)obj);
}
And then to get a direct function pointer to the first virtual function of the class, you do this:
typedef void(VoidMemberFn)(void*);
VoidMemberFn* fn = (VoidMemberFn*)GetVTable<BaseType>(my_obj_ptr)[0];
// ... sometime later
fn(my_obj_ptr);
So it's quite easy actually.
Related
Why does C++ RTTI require the class to have a virtual methods table? While it seems reasonable to use the table as a means for polymorphic upcasting, it doesn't seem like it is strictly required from a design point of view. For instance, the class could contain a hash or a unique identifier that conveys the information.
For the C++ experts who consider this question overly trivial, it would help the poster of this question, who is a humble beginner at C++, to provide an explanation of why vtables are required from a design point of view for RTTI, as well as what are the other design approaches (instead of using vtables) to implement RTTI (and why they work/don't work as well as vtables).
From a language perspective, the answer is: it doesn't. Nowhere in the C++ standard does it say how virtual functions are to be implemented. The compiler is free to make sure the correct function is called however it sees fit.
So, what would be gained by replacing the vptr (not the vtable) with an id and dropping the vtable? (replacing the vtable with an id doesn't really help anything whatsoever, once you have resolved vptr, you already know the run-time type)
How does the runtime know which function to actually call?
Consider:
template <int I>
struct A {
virtual void foo() {}
virtual void bar() {}
virtual ~A() {}
};
template <int I>
struct B : A<I> {
virtual void foo() {}
};
Suppose your compiler gives A<0> the ... lets call it vid ... 0 and A<1> the vid 1. Note that A<0> and A<1> are completely unrelated classes at this point. What happens if you say a0.foo() where a0 is an A<0>? At runtime a non-virtual function would just result in a statically dispatched call. But for a virtual function, the address of the function-to-call must be determined at runtime.
If all you had was vid 0 you'd still have to encode which function you want. This would result in a forest of if-else branches, to figure out the correct function pointer.
if (vid == 0) {
if (fid == 0) {
call A<0>::foo();
} else if (fid == 1) {
call A<0>::bar();
} /* ... */
} else if (vid == 1) {
if (fid == 0) {
call A<1>::foo();
} else if (fid == 1) {
call A<1>::bar();
} /* ... */
} /* ... */
This would get out of hand. Hence, the table. Add an offset that identifies the foo() function to the base of A<0>'s vtable and you have the address of the actual function to call. If you have a B<0> object on your hands instead, add the offset to that class' table's base pointer.
In theory compilers could emit if-else code for this but it turns out a pointer addition is faster and the resulting code smaller.
Vtables are a very efficient way of providing virtual functions. For the price of a single pointer per object, every member of the class can share the same static vtable.
Adding a second bunch of static information per class would require a second pointer per object. It's much easier to make the existing vtable pointer do double duty.
In the end it’s all down to history and trade offs.
On one side you need to be compatible with C, specifically standard layout types must have the same layout as in C, which means no place for RTTI.
On the other hand adding RTTI to a vtable will result in no size cost for the instance.
The designers of C++ decided to combine these two facts to the current implementation: only polymorphic types have dynamic RTTI information.
You can still obtain the static RTTI information and make your own layout for a non polymorphic type:
template<typename T>
struct S
{
const std::type_info &type = typeid(T);
T value;
};
You can even pass void pointers to value, they will have the same structure as T, and you know there is a type info pointer behind them.
I was thinking about mechanism of polymorphism in C++ and I can't understand one thing. Here I have very simple piece of code with one class:
#include <iostream>
using namespace std;
class A
{
public:
int x;
void fun1();
double fun2(int, char*);
void fun3(double, float[]);
};
int main()
{
cout << sizeof(A) << endl;
return 0;
}
On the console there will be printed size of int object (x) - it's obvious. If I modife my class by added keyword virtual the size will change because compiler are adding pointer to array (vtable) of virtual functions. But how it is possible that size of my class doesn't change while writing declarations of new virtual methods of completely different signatures? I mean that:
void (*(tab[100]) )(int, double, char*);
It's a definition of array which is obliged to has adresses of functions with signature:
void fun(int, double, char*);
And only this type of functions may be added to this array so why no matter of type of virtual method class contains only one pointer to one virtual array? Where have I made a mistake in my logic?
It could be useful:
The virtual table is actually quite simple, though it’s a little complex to describe in words. First, every class that uses virtual functions (or is derived from a class that uses virtual functions) is given it’s own virtual table. This table is simply a static array that the compiler sets up at compile time. A virtual table contains one entry for each virtual function that can be called by objects of the class. Each entry in this table is simply a function pointer that points to the most-derived function accessible by that class
first thing first - the standard doesn't say nothing about virtual tables. it only talks about virtual functions and polymorphism. every compiler is allowed to implement this feature in any way it likes.
virtual tables are only common implementation of virtual function, it is not mendatory, and the implementation is different in every compiler.
lastly, on my Visual studio 2015, this :
class A1 {
int x;
void doIT(){}
};
class A2 {
int x;
virtual void doIT(){}
};
constexpr int size = sizeof(A1);
constexpr int size2 = sizeof(A2);
makes size 4 bytes, but size2 12 bytes, which breaks your assumptions.
again, GCC, Clang and even C++/CLI may have different behaviour, and yield different size.
I came across articles where in they explain about vptr and vtable.
I know that the first pointer in an object in case of a class with virtual functions stored, is a vptr to vtable and vtable's array entries are pointers to the function in the same sequence as they occur in class ( which I have verified with my test program).
But I am trying to understand what syntax must compiler put in order to call the appropriate function.
Example:
class Base
{
virtual void func1()
{
cout << "Called me" << endl;
}
};
int main()
{
Base obj;
Base *ptr;
ptr=&obj;
// void* is not needed. func1 can be accessed directly with obj or ptr using vptr/vtable
void* ptrVoid=ptr;
// I can call the first virtual function in the following way:
void (*firstfunc)()=(void (*)(void))(*(int*)*(int*)ptrVoid);
firstfunc();
}
Questions:
1. But what I am really trying to understand is how compiler replaces the call to ptr->func1() with vptr?
If I were to simulate the call then what should I do? should I overload the -> operator. But even that would not help as I would not know what really the name func1 is. Even if they say that compiler accesses the vtable through vptr, still how does it know that the entry of func1 is the first array adn entry of func2 is the second element in the array? There must be some mapping for the names of function to the elements of array.
2. How can I simulate it. Can you provide the actual syntax that compiler uses to call function func1(how does it replace ptr->func1())?
Don't think of a vtable as an array. It's only an array if you strip it of everything C++ knows about it other than the size of its members. Instead, think of it as a second struct whose members are all pointers to functions.
Suppose I have a class like this:
struct Foo {
virtual void bar();
virtual int baz(int qux);
int quz;
}
int callSomeFun(Foo* foo) {
foo->bar();
return foo->baz(2);
}
Breaking it down 1 step:
class Foo;
// adding Foo* parameter to simulate the this pointer, which
// in the above would be a pointer to foo.
struct FooVtable {
void (*bar)(Foo* foo);
int (*baz)(Foo* foo, int qux);
}
struct Foo {
FooVtable* vptr;
int quz;
}
int callSomeFun(Foo* foo) {
foo->vptr->bar(foo);
return foo->vptr->baz(foo, 2);
}
I hope that's what you're looking for.
The backgroud:
After compilation (without debug info) binaries of C/C++ have no names, and names aren't required to runtime work, its only machine code
You can think about vptr like clasic C function pointer, in sense that type, argument list etc is known.
It isn't important on which positions are placed func1, func2 etc, only required is order was always the same (so all parts of multi file C++ must be compiled in the same way, compiler settings etc). Lets imagine, position is in declaration order, FIRST parent class, then newly declared in override BUT reimplemented virtuals are at lower positions, like from parent.
Its only image. Implementation must correctly fire overrides classApionter->methodReimplementedInB()
Usually C++ compiler has/had (my knowledge is from years 16/32b migration) 2-4 option to optimalize vtables against speed/size etc. Classic C sizeof() was quite well to understand (size of data plus ev. alignment), in C++ sizeof is bigger, but can guarantee if it is 2,4,8 bytes.
4 Few conversion tool can convert "object" files i.e. from MS format to Borland etc, but usually/only classic C was possible/safe, because of unknown machine code implementations of vtable.
Hard to touch vtable from high level code, fire analysers for intermediate files (.obj, . etc)
EDIT: story about runtime is different than about compilation. My answer is about compiled code & runtime
EDIT2: quasi assembler code (from my head)
load ax, 2
call vt[ax]
vt:
0x123456
0x126785 // virlual parent func1()
derrived:
vt:
0x123456
0x126999 // overriden finc1()
0x456788 // new method
EDIT3: BTW I can't totally agree that C++ has always better speed JVM/.NET because "these are interpreted". C++ has part of "intepretation", and interpreted part is groving: real component/GUI frameworks have interpreted connections between too (map for example). Out of our discussion: what memory model is better, with C++ delete or GC?
Given an c++ object pointer and compatible method pointer to a virtual method, is there any remotely robust/portable way to get a pointer to the actual concert function that would be called?
The use case is that I want to run said pointer thought the debug symbols to get the name of the type/function that would be called (without actually calling it).
If this is only possible via implementation specific solutions, then I'm primarily interested in supporting GCC/LLVM.
Both LLVM and GCC follow the Itanium C++ ABI, so you need to find a way to read the data structures as specified therein. I'll give a rough outline.
A pointer to virtual member is represented by an offset into the virtual function table, +1 for some reason.
class A {
public:
virtual void f();
virtual void g();
};
void (A::*pAg)() = & A::g;
ptrdiff_t offset = *(ptrdiff_t*)(&pAg) - 1;
The pointer to the virtual table is typically located right at the beginning of an object:
A a;
void* vtable = *(void**)&a;
Then you look at the calculated offset within that virtual table and find your actual function pointer.
void* function = *(void**)(vtable+offset)
This question is not about the C++ language itself(ie not about the Standard) but about how to call a compiler to implement alternative schemes for virtual function.
The general scheme for implementing virtual functions is using a pointer to a table of pointers.
class Base {
private:
int m;
public:
virtual metha();
};
equivalently in say C would be something like
struct Base {
void (**vtable)();
int m;
}
the first member is usually a pointer to a list of virtual functions, etc. (a piece of area in the memory which the application has no control of). And in most case this happens to cost the size of a pointer before considering the members, etc. So in a 32bit addressing scheme around 4 bytes, etc. If you created a list of 40k polymorphic objects in your applications, this is around 40k x 4 bytes = 160k bytes before any member variables, etc. I also know this happens to be the fastest and common implementation among C++ compiles.
I know this is complicated by multiple inheritance (especially with virtual classes in them, ie diamond struct, etc).
An alternative way to do the same is to have the first variable as a index id to a table of vptrs(equivalently in C as below)
struct Base {
char classid; // the classid here is an index into an array of vtables
int m;
}
If the total number of classes in an application is less than 255(including all possible template instantiations, etc), then a char is good enough to hold an index thereby reducing the size of all polymorphic classes in the application(I am excluding alignment issues, etc).
My questions is, is there any switch in GNU C++, LLVM, or any other compiler to do this?? or reduce the size of polymorphic objects?
Edit: I understand about the alignment issues pointed out. Also a further point, if this was on a 64bit system(assuming 64bit vptr) with each polymorphic object members costing around 8 bytes, then the cost of vptr is 50% of the memory. This mostly relates to small polymorphics created in mass, so I am wondering if this scheme is possible for at least specific virtual objects if not the whole application.
You're suggestion is interesting, but it won't work if the executable is made of several modules, passing objects among them. Given they are compiled separately (say DLLs), if one module creates an object and passes it to another, and the other invokes a virtual method - how would it know which table the classid refers to? You won't be able to add another moduleid because the two modules might not know about each other when they are compiled. So unless you use pointers, I think it's a dead end...
A couple of observations:
Yes, a smaller value could be used to represent the class, but some processors require data to be aligned so that saving in space may be lost by the requirement to align data values to e.g. 4 byte boundaries. Further, the class-id must be in a well defined place for all members of a polymorphic inheritance tree, so it is likely to be ahead of other date, so alignment problems can't be avoided.
The cost of storing the pointer has been moved to the code, where every use of a polymorphic function requires code to translate the class-id to either a vtable pointer, or some equivalent data structure. So it isn't for free. Clearly the cost trade-off depends on the volume of code vs numer of objects.
If objects are allocated from the heap, there is usually space wasted in orer to ensure objects are alogned to the worst boundary, so even if there is a small amount of code, and a large number of polymorphic objects, the memory management overhead migh be significantly bigger than the difference between a pointer and a char.
In order to allow programs to be independently compiled, the number of classes in the whole program, and hence the size of the class-id must be known at compile time, otherwise code can't be compiled to access it. This would be a significant overhead. It is simpler to fix it for the worst case, and simplify compilation and linking.
Please don't let me stop you trying, but there are quite a lot more issues to resolve using any technique which may use a variable size id to derive the function address.
I would strongly encourage you to look at Ian Piumarta's Cola also at Wikipedia Cola
It actually takes a different approach, and uses the pointer in a much more flexible way, to to build inheritance, or prototype-based, or any other mechanism the developer requires.
No, there is no such switch.
The LLVM/Clang codebase avoids virtual tables in classes that are allocated by the tens of thousands: this work well in a closed hierachy, because a single enum can enumerate all possible classes and then each class is linked to a value of the enum. The closed is obviously because of the enum.
Then, virtuality is implemented by a switch on the enum, and appropriate casting before calling the method. Once again, closed. The switch has to be modified for each new class.
A first alternative: external vpointer.
If you find yourself in a situation where the vpointer tax is paid way too often, that is most of the objects are of known type. Then you can externalize it.
class Interface {
public:
virtual ~Interface() {}
virtual Interface* clone() const = 0; // might be worth it
virtual void updateCount(int) = 0;
protected:
Interface(Interface const&) {}
Interface& operator=(Interface const&) { return *this; }
};
template <typename T>
class InterfaceBridge: public Interface {
public:
InterfaceBridge(T& t): t(t) {}
virtual InterfaceBridge* clone() const { return new InterfaceBridge(*this); }
virtual void updateCount(int i) { t.updateCount(i); }
private:
T& t; // value or reference ? Choose...
};
template <typename T>
InterfaceBridge<T> interface(T& t) { return InterfaceBridge<T>(t); }
Then, imagining a simple class:
class Counter {
public:
int getCount() const { return c; }
void updateCount(int i) { c = i; }
private:
int c;
};
You can store the objects in an array:
static Counter array[5];
assert(sizeof(array) == sizeof(int)*5); // no v-pointer
And still use them with polymorphic functions:
void five(Interface& i) { i.updateCount(5); }
InterfaceBridge<Counter> ib(array[3]); // create *one* v-pointer
five(ib);
assert(array[3].getCount() == 5);
The value vs reference is actually a design tension. In general, if you need to clone you need to store by value, and you need to clone when you store by base class (boost::ptr_vector for example). It is possible to actually provide both interfaces (and bridges):
Interface <--- ClonableInterface
| |
InterfaceB ClonableInterfaceB
It's just extra typing.
Another solution, much more involved.
A switch is implementable by a jump table. Such a table could perfectly be created at runtime, in a std::vector for example:
class Base {
public:
~Base() { VTables()[vpointer].dispose(*this); }
void updateCount(int i) {
VTables()[vpointer].updateCount(*this, i);
}
protected:
struct VTable {
typedef void (*Dispose)(Base&);
typedef void (*UpdateCount)(Base&, int);
Dispose dispose;
UpdateCount updateCount;
};
static void NoDispose(Base&) {}
static unsigned RegisterTable(VTable t) {
std::vector<VTable>& v = VTables();
v.push_back(t);
return v.size() - 1;
}
explicit Base(unsigned id): vpointer(id) {
assert(id < VTables.size());
}
private:
// Implement in .cpp or pay the cost of weak symbols.
static std::vector<VTable> VTables() { static std::vector<VTable> VT; return VT; }
unsigned vpointer;
};
And then, a Derived class:
class Derived: public Base {
public:
Derived(): Base(GetID()) {}
private:
static void UpdateCount(Base& b, int i) {
static_cast<Derived&>(b).count = i;
}
static unsigned GetID() {
static unsigned ID = RegisterTable(VTable({&NoDispose, &UpdateCount}));
return ID;
}
unsigned count;
};
Well, now you'll realize how great it is that the compiler does it for you, even at the cost of some overhead.
Oh, and because of alignment, as soon as a Derived class introduces a pointer, there is a risk that 4 bytes of padding are used between Base and the next attribute. You can use them by careful selecting the first few attributes in Derived to avoid padding...
The short answer is that no, I don't know of any switch to do this with any common C++ compiler.
The longer answer is that to do this, you'd just about have to build most of the intelligence into the linker, so it could coordinate distributing the IDs across all the object files getting linked together.
I'd also point out that it wouldn't generally do a whole lot of good. At least in a typical case, you want each element in a struct/class at a "natural" boundary, meaning its starting address is a multiple of its size. Using your example of a class containing a single int, the compiler would allocate one byte for the vtable index, followed immediately by three byes of padding so the next int would land at an address that was a multiple of four. The end result would be that objects of the class would occupy precisely the same amount of storage as if we used a pointer.
I'd add that this is not a far-fetched exception either. For years, standard advice to minimize padding inserted into structs/classes has been to put the items expected to be largest at the beginning, and progress toward the smallest. That means in most code, you'd end up with those same three bytes of padding before the first explicitly defined member of the struct.
To get any good from this, you'd have to be aware of it, and have a struct with (for example) three bytes of data you could move where you wanted. Then you'd move those to be the first items explicitly defined in the struct. Unfortunately, that would also mean that if you turned this switch off so you have a vtable pointer, you'd end up with the compiler inserting padding that might otherwise be unnecessary.
To summarize: it's not implemented, and if it was wouldn't usually accomplish much.